module Regex:sig
..end
> RE2 is a fast, safe, thread-friendly alternative to backtracking regular expression > engines like those used in PCRE, Perl, and Python. It is a C++ library.
> Unlike most automata-based engines, RE2 implements almost all the common Perl and > PCRE features and syntactic sugars. It also finds the leftmost-first match, the same > match that Perl would, and can return submatch information. The one significant > exception is that RE2 drops support for backreferences¹ and generalized zero-width > assertions, because they cannot be implemented efficiently. The syntax page gives > full details.
Syntax reference: http://code.google.com/p/re2/wiki/Syntax
*
> RE2 is a fast, safe, thread-friendly alternative to backtracking regular expression > engines like those used in PCRE, Perl, and Python. It is a C++ library.
> Unlike most automata-based engines, RE2 implements almost all the common Perl and > PCRE features and syntactic sugars. It also finds the leftmost-first match, the same > match that Perl would, and can return submatch information. The one significant > exception is that RE2 drops support for backreferences¹ and generalized zero-width > assertions, because they cannot be implemented efficiently. The syntax page gives > full details.
Syntax reference: http://code.google.com/p/re2/wiki/Syntax
*
Although OCaml strings may legally have internal null bytes, it is expensive to check
for them, so this library just assumes that it will never see such a string. The
failure mode is the search stops early, which isn't bad considering how rare internal
null bytes are in practice.
type
t
typeregex =
t
typeid_t =
[ `Index of int | `Name of string ]
val index_of_id_exn : t -> id_t -> int
index_of_id t id
resolves subpattern names and indices into indices. *sub
keyword argument means, omit location information for subpatterns with index
greater than sub
.
Subpatterns are indexed by the number of opening parentheses preceding them:
~sub:(`Index 0)
: only the whole match
~sub:(`Index 1)
: the whole match and the first submatch, etc.
If you only care whether the pattern does match, you can request no location
information at all by passing ~sub:(`Index -1)
.
With one exception, I quote from re2.h:443,
> Don't ask for more match information than you will use: > runs much faster with nmatch == 1 than nmatch > 1, and > runs even faster if nmatch == 0.
For sub > 1
, re2 executes in three steps:
1. run a DFA over the entire input to get the end of the whole match
2. run a DFA backward from the end position to get the start position
3. run an NFA from the match start to match end to extract submatches
sub == 1
lets it stop after (2) and sub == 0
lets it stop after (1).
(See re2.cc:692 or so.)
The one exception is for the functions get_matches
, replace
, and Iterator.next
:
Since they must iterate correctly through the whole string, they need at least the
whole match (subpattern 0). These functions will silently rewrite ~sub
to be
non-negative.
module Options:Options.S
val create : ?options:Options.t list -> string -> t Core.Std.Or_error.t
val create_exn : ?options:Options.t list -> string -> t
include Stringable
val num_submatches : t -> int
num_submatches t
returns 1 + the number of open-parens in the pattern.
N.B. num_submatches t == 1 + RE2::NumberOfCapturingGroups()
because
RE2::NumberOfCapturingGroups()
ignores the whole match ("subpattern zero").
val pattern : t -> string
pattern t
returns the pattern from which the regex was constructed. *val find_all : ?sub:id_t -> t -> string -> string list Core.Std.Or_error.t
find_all t input
a convenience function that returns all non-overlapping
matches of t
against input
, in left-to-right order.
If sub
is given, and the requested subpattern did not capture, then no match is
returned at that position even if other parts of the regex did match.
val find_all_exn : ?sub:id_t -> t -> string -> string list
val find_first : ?sub:id_t -> t -> string -> string Core.Std.Or_error.t
find_first ?sub pattern input
a convenience function around find_all
that
returns the first match onlyval find_first_exn : ?sub:id_t -> t -> string -> string
val find_submatches : t -> string -> string option array Core.Std.Or_error.t
find_submatches t input
finds the first match and returns all submatches.
Element 0 is the whole match and element 1 is the first parenthesized submatch, etc.val find_submatches_exn : t -> string -> string option array
val matches : t -> string -> bool
matches pattern input
pattern
matches input
val split : ?max:int -> ?include_matches:bool -> t -> string -> string list
split pattern input
input
broken into pieces where pattern
matches. Subpatterns are ignored.max
: (default: unlimited) split only at the leftmost max
matchesinclude_matches
: (default: false) include the matched substrings in the
returned list (e.g., the regex /,()
/ on "foo(bar,baz)" gives ["foo"; "(";
"bar"; ","; "baz"; ")"]
instead of ["foo"; "bar"; "baz"]
)
If t
never matches, the returned list has input
as its one element.
val rewrite : t -> template:string -> string -> string Core.Std.Or_error.t
rewrite pattern ~template input
is a convenience function for replace
:
Instead of requiring an arbitrary transformation as a function, it accepts a
template string with zero or more substrings of the form "\\n", each of
which will be replaced by submatch n
. For every match of pattern
against input
, the template will be specialized and then substituted for
the matched substring.val rewrite_exn : t -> template:string -> string -> string
val valid_rewrite_template : t -> template:string -> bool
valid_rewrite_template pattern ~template
true
iff template
is a
valid rewrite template for pattern
val escape : string -> string
escape nonregex
nonregex
with everything escaped (i.e.,
if the return value were t to regex, it would match exactly the
original input)module Infix:sig
..end
module Match:sig
..end
val get_matches : ?sub:id_t ->
?max:int -> t -> string -> Match.t list Core.Std.Or_error.t
get_matches pattern input
returns all non-overlapping matches of pattern
against input
sub
: (default: all) returned Match.t's will contain only the first sub
matches.max
: (default: unlimited) return only the leftmost max
matchesval get_matches_exn : ?sub:id_t -> ?max:int -> t -> string -> Match.t list
val replace : ?sub:id_t ->
?only:int ->
f:(Match.t -> string) ->
t -> string -> string Core.Std.Or_error.t
replace ?sub ?max ~f pattern input
input
with every
substring matched by pattern
transformed by f
.only
: (default: all) replace only the nth matchf
: if f
returns None
, the original match is put back in; if it
returns Some s
then s
is substituted for the matched string. (So,
returning Some "" means delete the matched string.)val replace_exn : ?sub:id_t ->
?only:int -> f:(Match.t -> string) -> t -> string -> string
module Exceptions:sig
..end
val t_of_sexp : Sexplib.Sexp.t -> t
val sexp_of_t : t -> Sexplib.Sexp.t
val compare : t -> t -> int
val bin_t : t Bin_prot.Type_class.t
val bin_read_t : t Bin_prot.Read_ml.reader
val bin_read_t_ : t Bin_prot.Unsafe_read_c.reader
val bin_read_t__ : (int -> t) Bin_prot.Unsafe_read_c.reader
val bin_reader_t : t Bin_prot.Type_class.reader
val bin_size_t : t Bin_prot.Size.sizer
val bin_write_t : t Bin_prot.Write_ml.writer
val bin_write_t_ : t Bin_prot.Unsafe_write_c.writer
val bin_writer_t : t Bin_prot.Type_class.writer
index_of_id t id
resolves subpattern names and indices into indices. *sub
keyword argument means, omit location information for subpatterns with index
greater than sub
.
Subpatterns are indexed by the number of opening parentheses preceding them:
~sub:(`Index 0)
: only the whole match
~sub:(`Index 1)
: the whole match and the first submatch, etc.
If you only care whether the pattern does match, you can request no location
information at all by passing ~sub:(`Index -1)
.
With one exception, I quote from re2.h:443,
> Don't ask for more match information than you will use: > runs much faster with nmatch == 1 than nmatch > 1, and > runs even faster if nmatch == 0.
For sub > 1
, re2 executes in three steps:
1. run a DFA over the entire input to get the end of the whole match
2. run a DFA backward from the end position to get the start position
3. run an NFA from the match start to match end to extract submatches
sub == 1
lets it stop after (2) and sub == 0
lets it stop after (1).
(See re2.cc:692 or so.)
The one exception is for the functions get_matches
, replace
, and Iterator.next
:
Since they must iterate correctly through the whole string, they need at least the
whole match (subpattern 0). These functions will silently rewrite ~sub
to be
non-negative.
num_submatches t
returns 1 + the number of open-parens in the pattern.
N.B. num_submatches t == 1 + RE2::NumberOfCapturingGroups()
because
RE2::NumberOfCapturingGroups()
ignores the whole match ("subpattern zero").
pattern t
returns the pattern from which the regex was constructed. *
find_all t input
a convenience function that returns all non-overlapping
matches of t
against input
, in left-to-right order.
If sub
is given, and the requested subpattern did not capture, then no match is
returned at that position even if other parts of the regex did match.
find_first ?sub pattern input
a convenience function around find_all
that
returns the first match only
find_submatches t input
finds the first match and returns all submatches.
Element 0 is the whole match and element 1 is the first parenthesized submatch, etc.
matches pattern input
split pattern input
rewrite pattern ~template input
is a convenience function for replace
:
Instead of requiring an arbitrary transformation as a function, it accepts a
template string with zero or more substrings of the form "\\n", each of
which will be replaced by submatch n
. For every match of pattern
against input
, the template will be specialized and then substituted for
the matched substring.
valid_rewrite_template pattern ~template
escape nonregex
create_exn
input =~ pattern
an infix alias of matches
input //~ pattern
an infix alias of find_first
*~sub
), the error returned is
Regex_no_such_subpattern
, just as though that subpattern were never defined.get_all t
returns all available matches as strings in an array. For the indexing
convention, see comment above regarding sub
parameter. *get_pos_exn ~sub t
returns the start offset and length in bytes. Note that for
variable-width encodings (e.g., UTF-8) this may not be the same as the character
offset and character length.get_matches pattern input
returns all non-overlapping matches of pattern
against input
replace ?sub ?max ~f pattern input
Regex_no_such_subpattern (n, max)
means n
was requested but only max
subpatterns are defined (so max
- 1 is the highest valid index)Regex_no_such_named_subpattern (name, pattern)
Match_failed pattern
Regex_submatch_did_not_capture (s, i)
means the i
th subpattern in the
regex compiled from s
did not capture a substring.Regex_rewrite_template_invalid (template, error_msg)