A value of type 'a t
is a regex that parses 'a
s.
The matching is implemented using Re2.
UTF-8 is supported by Re2 but not by this module. This is because we want to use
char
as a character type, but that's just wrong in a multibyte encoding.
case_sensitive
defaults to true
.
compile
, run
, and matches
suffer from Re2's limitations with regards to null
bytes in the input: they are considered to end the string.
to_regex_string
and to_re2
both forget what a 'a t
knows
about turning the matching strings into 'a
s
of_re2 r
forgets the options that r
was compiled with, instead using
`Encoding_latin1 true
, `Dot_nl true
, and the case-sensitivity setting of the
overall pattern. You can still try and use '(?flags:re)' Re2 syntax to set options
for the scope of this regex.
The returned values are precisely the captures of the underlying regex, in order:
note that unlike (say) Re2.Match.get_all
, the whole match is *not* included (if
you want that, just use capture
). Named captures are not accessible by name.
Regex that matches nothing
repeat ~min ~max t
constructs the regex t{min,max}
. min
defaults to 0
and
max
defaults to None
(unbounded), so that just plain repeat t
is equivalent
to t*
.
It would be better for repeat
to be 'a t -> 'a list t
, but the re2 library
doesn't give you access to repeated submatches like that. Hence, repeat
ignores
all submatches of its argument and does not call any callbacks that may have been
attached to them, as if it had ignore
called on its result.
string
, Char.one_of
, and Char.not_one_of
raise exceptions in the presence of
null bytes
Matches empty string at the beginning of the text
Matches empty string at the end of the text