Explore ☕ Блог Help

Register Sign In

mirror/v

1

0

You've already forked v

mirror of https://github.com/vlang/v.git synced 2023-08-10 21:13:21 +03:00

Code Issues Projects Releases Wiki Activity

Files

7bd2804ce932bdb8438e37a6a236dea85a561768

v/vlib/net/html

History

Delyan Angelov e8ff94fb8b net.html: simplify map setting (fixes compilation with tcc on aarch64)

2020-08-20 16:45:54 +03:00

..

data_structures.v

…

dom_test.v

cgen: error if ForInStmt is not handled (#6131 )

2020-08-14 21:01:43 +02:00

dom.v

…

parser_test.v

…

parser.v

net.html: simplify map setting (fixes compilation with tcc on aarch64)

2020-08-20 16:45:54 +03:00

README.md

…

tag.v

gg: handle bad image index

2020-08-18 01:08:58 +02:00

README.md

V HTML

A HTML parser made in V

Usage

If description below isn't enought, see test files

Parser

Responsible for read HTML in full strings or splited string and returns all Tag objets of it HTML or return a DocumentObjectModel, that will try to find how the HTML Tree is.

split_parse(data string)

This functions is the main function called by parse method to fragment parse your HTML

parse_html(data string, is_file bool)

This function is called passing a filename or a complete html data string to it

add_code_tag(name string)

This function is used to add a tag for the parser ignore it's content. For example, if you have an html or XML with a custom tag, like <script>, using this function, like add_code_tag('script') will make all script tags content be jumped, so you still have its content, but will not confuse the parser with it's > or <

finalize()

When using split_parse method, you must call this function to ends the parse completely

get_tags() []Tag_ptr

This functions returns a array with all tags and it's content

get_dom() DocumentObjectModel

Returns the DocumentObjectModel for current parsed tags

WARNING

If you want to reuse parser object to parse another HTML, call initialize_all() function first

DocumentObjectModel

A DOM object that will make easier to access some tags and search it

get_by_attribute_value(name string, value string) []Tag_ptr

This function retuns a Tag array with all tags in document that have a attribute with given name and given value

get_by_tag(name string) []Tag_ptr

This function retuns a Tag array with all tags in document that have a name with the given value

get_by_attribute(name string) []Tag_ptr

This function retuns a Tag array with all tags in document that have a attribute with given name

get_root() Tag_ptr

This function returns the root Tag

get_all_tags() []Tag_ptr

This function returns all important tags, removing close tags

Tag

An object that holds tags information, such as name, attributes, children

get_children() []Tag_ptr

Returns all children as an array

get_parent() &Tag

Returns the parent of current tag

get_name() string

Returns tag name

get_content() string

Returns tag content

get_attributes() map[string]string

Returns all attributes and it value

text() string

Returns the content of the tag and all tags inside it. Also, any <br> tag will be converted into \n

Some questions that can appear

Q: Why in parser have a `builder_str() string` method that returns only the lexeme string?

A: Because in early stages of the project, strings.Builder are used, but for some bug existing somewhere, it was necessary to use string directly. Later, it's planned to use strings.Builder again

Q: Why have a `compare_string(a string, b string) bool` method?

A: For some reason when using != and == in strings directly, it not working. So, this method is a workaround

Q: Will be something like `XPath`?

A: Like XPath yes. Exactly equal to it, no.

Roadmap

Parser
-  detection
- Open Generic tags detection
- Close Generic tags detection
- verify string detection
- tag attributes detection
- attributes values detection
- tag text (on tag it is declared as content, maybe change for text in the future)
- text file for parse support (open local files for parsing)
- open_code verification
DocumentObjectModel
- push elements that have a close tag into stack
- remove elements from stack
- ~~create a new document root if have some syntax error (deleted)~~
- search tags in DOM by attributes
- search tags in DOM by tag type
- finish dom test

License

Reference in New Issue View Git Blame Copy Permalink

Powered by Gitea

English

Bahasa Indonesia Deutsch English Español Français Gaeilge Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語简体中文繁體中文（台灣）繁體中文（香港） 한국어

Licenses