docs_ci: check all md files except thirdparty (#6855)

2023-08-10 21:13:21 +03:00 · 2020-11-18 18:28:28 +01:00
parent d8f64f516b
commit df4165c7ee
20 changed files with 373 additions and 221 deletions
--- a/vlib/regex/README.md
+++ b/vlib/regex/README.md
@@ -8,10 +8,12 @@ Write here the introduction... not today!! -_-

 ## Basic assumption

-In this release, during the writing of the code some assumptions are made and are valid for all the features.
+In this release, during the writing of the code some assumptions are made
+and are valid for all the features.

 1. The matching stops at the end of the string not at the newline chars.
-2. The basic elements of this regex engine are the tokens, in a query string a simple char is a token. The token is the atomic unit of this regex engine.
+2. The basic elements of this regex engine are the tokens,
+    in a query string a simple char is a token. The token is the atomic unit of this regex engine.

 ## Match positional limiter

@@ -37,19 +39,26 @@ The cc matches all the chars specified inside, it is delimited by square bracket

 the sequence of chars in the class is evaluated with an OR operation.

-For example, the following cc `[abc]` matches any char that is `a` or `b` or `c` but doesn't match `C` or `z`.
+For example, the following cc `[abc]` matches any char that is `a` or `b` or `c`
+but doesn't match `C` or `z`.

-Inside a cc is possible to specify a "range" of chars, for example `[ad-f]` is equivalent to write `[adef]`. 
+Inside a cc is possible to specify a "range" of chars,
+for example `[ad-f]` is equivalent to write `[adef]`.

-A cc can have different ranges at the same time like `[a-zA-z0-9]` that matches all the lowercase,uppercase and numeric chars.
+A cc can have different ranges at the same time like `[a-zA-z0-9]` that matches all the lowercase,
+uppercase and numeric chars.

-It is possible negate the cc using the caret char at the start of the cc like: `[^abc]` that matches every char that is not `a` or `b` or `c`.
+It is possible negate the cc using the caret char at the start of the cc like: `[^abc]`
+that matches every char that is not `a` or `b` or `c`.

-A cc can contain meta-chars like: `[a-z\d]` that matches all the lowercase latin chars `a-z` and all the digits `\d`.
+A cc can contain meta-chars like: `[a-z\d]` that matches all the lowercase latin chars `a-z`
+and all the digits `\d`.

 It is possible to mix all the properties of the char class together.

-**Note:** In order to match the `-` (minus) char, it must be located at the first position in the cc, for example  `[-_\d\a]` will match `-` minus, `_`underscore, `\d` numeric chars, `\a` lower case chars.
+**Note:** In order to match the `-` (minus) char, it must be located at the first position
+ in the cc, for example `[-_\d\a]` will match `-` minus, `_`underscore, `\d` numeric chars,
+ `\a` lower case chars.

 ### Meta-chars

@@ -63,7 +72,7 @@ A meta-char can match different type of chars.
 * `\D` matches a non digit
 * `\s`matches a space char, one of `[' ','\t','\n','\r','\v','\f']`
 * `\S` matches a non space char
-* `\a` matches only a lowercase char `[a-z]` 
+* `\a` matches only a lowercase char `[a-z]`
 * `\A` matches only an uppercase char `[A-Z]`

 ### Quantifier
@@ -80,16 +89,21 @@ Each token can have a quantifier that specify how many times the char can or mus

 - `{x}` matches exactly x time, `a{2}` matches `aa` but doesn't match `aaa` or `a`
 - `{min,}` matches at minimum min time, `a{2,}` matches `aaa` or `aa` but doesn't match `a`
- `{,max}` matches at least 0 time and maximum max time, `a{,2}` matches `a` and `aa` but doesn't match `aaa`
- `{min,max}` matches from min times to max times, `a{2,3}` matches `aa` and `aaa` but doesn't match `a` or `aaaa`
+- `{,max}` matches at least 0 time and maximum max time,
+    `a{,2}` matches `a` and `aa` but doesn't match `aaa`
+- `{min,max}` matches from min times to max times,
+    `a{2,3}` matches `aa` and `aaa` but doesn't match `a` or `aaaa`

-a long quantifier may have a `greedy off` flag that is the `?` char after the brackets, `{2,4}?` means to match the minimum number possible tokens in this case 2.
+a long quantifier may have a `greedy off` flag that is the `?` char after the brackets,
+`{2,4}?` means to match the minimum number possible tokens in this case 2.

 ### dot char

-the dot is a particular meta char that matches  "any char", is more simple explain it with an example:
+the dot is a particular meta char that matches  "any char",
+is more simple explain it with an example:

-suppose to have `abccc ddeef` as source string to parse with regex, the following table show the query strings and the result of parsing source string.
+suppose to have `abccc ddeef` as source string to parse with regex,
+the following table show the query strings and the result of parsing source string.

 | query string | result |
 | ------------ | ------ |
@@ -102,39 +116,50 @@ the dot char matches any char until the next token match is satisfied.

 ### OR token

-the token `|` is a logic OR operation between two consecutive tokens, `a|b` matches a char that is `a` or `b`.
+the token `|` is a logic OR operation between two consecutive tokens,
+`a|b` matches a char that is `a` or `b`.

-The OR token can work in a "chained way": `a|(b)|cd ` test first `a` if the char is not `a` then test the group `(b)` and if the group doesn't match test the token `c`.
+The OR token can work in a "chained way": `a|(b)|cd ` test first `a` if the char is not `a`
+then test the group `(b)` and if the group doesn't match test the token `c`.

 **note: The OR work at token level! It doesn't work at concatenation level!**

-A query string like `abc|bde` is not equal to `(abc)|(bde)`!!  The OR work only on `c|b` not at char concatenation level.
+A query string like `abc|bde` is not equal to `(abc)|(bde)`!!
+The OR work only on `c|b` not at char concatenation level.

 ### Groups

 Groups are a method to create complex patterns with repetition of blocks of tokens.

-The groups are delimited by round brackets `( )`, groups can be nested and can have a quantifier as all the tokens.
+The groups are delimited by round brackets `( )`,
+groups can be nested and can have a quantifier as all the tokens.

 `c(pa)+z` match `cpapaz` or `cpaz` or `cpapapaz` .

-`(c(pa)+z ?)+` matches `cpaz cpapaz cpapapaz` or `cpapaz` 
+`(c(pa)+z ?)+` matches `cpaz cpapaz cpapapaz` or `cpapaz`

-let analyze this last case, first we have the group `#0` that are the most outer round brackets `(...)+`, this group has a quantifier that say to match its content at least one time `+`. 
+let analyze this last case, first we have the group `#0`
+that are the most outer round brackets `(...)+`,
+this group has a quantifier that say to match its content at least one time `+`.

-After we have a simple char token `c` and a second group that is the number `#1` :`(pa)+`, this group try to match the sequence `pa` at least one time as specified by the `+` quantifier.
+After we have a simple char token `c` and a second group that is the number `#1` :`(pa)+`,
+this group try to match the sequence `pa` at least one time as specified by the `+` quantifier.

-After, we have another simple token `z` and another simple token ` ?` that is the space char (ascii code 32) followed by the `?` quantifier that say to capture the space char 0 or 1 time.
+After, we have another simple token `z` and another simple token ` ?`
+that is the space char (ascii code 32) followed by the `?` quantifier
+that say to capture the space char 0 or 1 time.

 This explain because the `(c(pa)+z ?)+` query string can match `cpaz cpapaz cpapapaz` .

-In this implementation the groups are "capture groups", it means that the last temporal result for each group can be retrieved from the `RE` struct.
+In this implementation the groups are "capture groups",
+it means that the last temporal result for each group can be retrieved from the `RE` struct.

-The "capture groups" are store as couple of index in the field `groups` that is an `[]int` inside the `RE` struct. 
+The "capture groups" are store as couple of index in the field `groups`
+that is an `[]int` inside the `RE` struct.

 **example:**

-```v
+```v oksyntax
 text := "cpaz cpapaz cpapapaz"
 query:= r"(c(pa)+z ?)+"
 mut re := regex.regex_opt(query) or { panic(err) }
@@ -157,16 +182,19 @@ for gi < re.groups.len {
 // 1 :[pa]
 ```

-**note:** *to show the `group id number` in the result of the `get_query()` the flag `debug` of the RE object must be `1` or `2`*
+**note:** *to show the `group id number` in the result of the `get_query()`*
+*the flag `debug` of the RE object must be `1` or `2`*

 ### Groups Continuous saving

-In particular situations it is useful have a continuous save of the groups, this is possible initializing the saving array field in `RE` struct: `group_csave`.
+In particular situations it is useful have a continuous save of the groups,
+this is possible initializing the saving array field in `RE` struct: `group_csave`.

 This feature allow to collect data in a  continuous way.

-In the example we pass a text followed by a integer list that we want collect. 
-To achieve this task we can use the continuous saving of the group that save each captured group in a array that we set with: `re.group_csave = [-1].repeat(3*20+1)`.
+In the example we pass a text followed by a integer list that we want collect.
+To achieve this task we can use the continuous saving of the group
+that save each captured group in a array that we set with: `re.group_csave = [-1].repeat(3*20+1)`.

 The array will be filled with the following logic:

@@ -176,9 +204,10 @@ The array will be filled with the following logic:
 `re.group_csave[1+n*3]` start index in the source string of the saved group
 `re.group_csave[1+n*3]` end index in the source string of the saved group

-The regex save until finish or found that the array have no space. If the space ends no error is raised, further records will not be saved.
+The regex save until finish or found that the array have no space.
+If the space ends no error is raised, further records will not be saved.

-```v
+```v oksyntax
 fn example2() {
 	test_regex()

@@ -234,7 +263,7 @@ cg id: 0 [4, 8] => [ 01,]
 cg id: 0 [8, 11] => [23,]
 cg id: 0 [11, 15] => [45 ,]
 cg id: 0 [15, 19] => [56, ]
-cg id: 0 [19, 21] => [78] 
+cg id: 0 [19, 21] => [78]
 ```

 ### Named capturing groups
@@ -245,13 +274,14 @@ This regex module support partially the question mark `?` PCRE syntax for groups

 `(?P<mygroup>abcdef)` **named group:** the group content is saved and labeled as `mygroup`

-The label of the groups is saved in the `group_map` of the `RE` struct, this is a map from `string` to `int` where the value is the index in `group_csave` list of index.
+The label of the groups is saved in the `group_map` of the `RE` struct,
+this is a map from `string` to `int` where the value is the index in `group_csave` list of index.

 Have a look at the example for the use of them.

 example:

-```v
+```v oksyntax
 import regex
 fn main() {
 	test_regex()
@@ -270,8 +300,8 @@ fn main() {
    q_str := re.get_query()
    println("O.Query: $query")
    println("Query  : $q_str")
-    
-    re.debug = 0	
+
+    re.debug = 0
    start, end := re.match_string(text)
    if start < 0 {
        err_str := re.get_parse_error_string(start)
@@ -331,7 +361,7 @@ cg id: 1 [22, 28] => [hello/]
 cg id: 1 [28, 37] => [pippo12_/]
 cg id: 1 [37, 42] => [pera.]
 cg id: 1 [42, 46] => [html]
-raw array: [8, 0, 0, 4, 1, 7, 11, 1, 11, 16, 1, 16, 22, 1, 22, 28, 1, 28, 37, 1, 37, 42, 1, 42, 46] 
+raw array: [8, 0, 0, 4, 1, 7, 11, 1, 11, 16, 1, 16, 22, 1, 22, 28, 1, 28, 37, 1, 37, 42, 1, 42, 46]
 named capturing groups:
 'format':[0, 4] => 'http'
 'token':[42, 46] => 'html'
@@ -341,25 +371,27 @@ named capturing groups:

 It is possible to set some flags in the regex parser that change the behavior of the parser itself.

-```v
+```v oksyntax
 // example of flag settings
 mut re := regex.new()
-re.flag = regex.F_BIN 
-
+re.flag = regex.F_BIN
 ```

 - `F_BIN`: parse a string as bytes, utf-8 management disabled.

 - `F_EFM`: exit on the first char matches in the query, used by the find function.
- `F_MS`: matches only if the index of the start match is 0, same as `^` at the start of the query string.
- `F_ME`: matches only if the end index of the match is the last char of the input string, same as `$` end of query string.
+- `F_MS`: matches only if the index of the start match is 0,
+    same as `^` at the start of the query string.
+- `F_ME`: matches only if the end index of the match is the last char of the input string,
+    same as `$` end of query string.
 - `F_NL`: stop the matching if found a new line char `\n` or `\r`

 ## Functions

 ### Initializer

-These functions are helper that create the `RE` struct, a `RE` struct can be created manually if you needed.
+These functions are helper that create the `RE` struct,
+a `RE` struct can be created manually if you needed.

 #### **Simplified initializer**

@@ -378,7 +410,7 @@ pub fn new() RE
 pub fn new_by_size(mult int) RE
 ```
 After a base initializer is used, the regex expression must be compiled with:
-```v
+```v oksyntax
 // compile compiles the REgex returning an error if the compilation fails
 pub fn (re mut RE) compile_opt(in_txt string) ?
 ```
@@ -387,7 +419,7 @@ pub fn (re mut RE) compile_opt(in_txt string) ?

 These are the operative functions

-```v
+```v oksyntax
 // match_string try to match the input string, return start and end index if found else start is -1
 pub fn (re mut RE) match_string(in_txt string) (int,int)

@@ -409,7 +441,7 @@ This module has few small utilities to help the writing of regex expressions.

 the following example code show how to visualize the syntax errors in the compilation phase:

-```v
+```v oksyntax
 query:= r"ciao da ab[ab-]"  // there is an error, a range not closed!!
 mut re := new()

@@ -425,7 +457,8 @@ re.compile_opt(query) or { println(err) }

 ### **Compiled code**

-It is possible view the compiled code calling the function `get_query()` the result will be something like this:
+It is possible to view the compiled code calling the function `get_query()`.
+The result will be something like this:

 ```
 ========================================
@@ -495,21 +528,24 @@ the columns have the following meaning:

 `PC:   1` program counter of the step

-`=>7fffffff ` hex code of the instruction 
+`=>7fffffff ` hex code of the instruction

-`i,ch,len:[  0,'a',1]` `i` index in the source string, `ch` the char parsed, `len` the length in byte of the char parsed
+`i,ch,len:[  0,'a',1]` `i` index in the source string, `ch` the char parsed,
+`len` the length in byte of the char parsed

 `f.m:[  0,  1]` `f` index of the first match in the source string, `m` index that is actual matching

 `query_ch: [b]` token in use and its char

-`{2,3}:1?` quantifier `{min,max}`, `:1` is the actual counter of repetition, `?` is the greedy off flag if present
+`{2,3}:1?` quantifier `{min,max}`, `:1` is the actual counter of repetition,
+`?` is the greedy off flag if present.

 ### **Custom Logger output**

-The debug functions output uses the `stdout` as default, it is possible to  provide an alternative output setting a custom output function:
+The debug functions output uses the `stdout` as default,
+it is possible to  provide an alternative output setting a custom output function:

-```v
+```v oksyntax
 // custom print function, the input will be the regex debug string
 fn custom_print(txt string) {
 	println("my log: $txt")
@@ -524,7 +560,7 @@ re.log_func = custom_print  // every debug output from now will call this functi

 Here there is a simple code to perform some basically match of strings

-```v
+```v oksyntax
 struct TestObj {
 	source string // source string to parse
 	query  string // regex query string
@@ -545,18 +581,18 @@ fn example() {
 	for c,tst in tests {
 		mut re := regex.new()
 		re.compile_opt(tst.query) or { println(err) continue }
-			
+
        // print the query parsed with the groups ids
        re.debug = 1 // set debug on at minimum level
        println("#${c:2d} query parsed: ${re.get_query()}")
        re.debug = 0
-        
+
        // do the match
        start, end := re.match_string(tst.source)
        if start >= 0 && end > start {
            println("#${c:2d} found in: [$start, $end] => [${tst.source[start..end]}]")
-        }	
-        
+        }
+
        // print the groups
        mut gi := 0
        for gi < re.groups.len {
@@ -564,7 +600,7 @@ fn example() {
                println("group ${gi/2:2d} :[${tst.source[re.groups[gi]..re.groups[gi+1]]}]")
            }
            gi += 2
-        }		
+        }
        println("")
 	}
 }
@@ -575,4 +611,3 @@ fn main() {
 ```

 more example code is available in the test code for the `regex` module `vlib\regex\regex_test.v`.
-