include Regex
include Re2__.Regex_intf.S
These are OCaml bindings for Google's re2 library. Quoting from the re2 homepage:
RE2 is a fast, safe, thread-friendly alternative to backtracking regular expression engines like those used in PCRE, Perl, and Python. It is a C++ library. Unlike most automata-based engines, RE2 implements almost all the common Perl and PCRE features and syntactic sugars. It also finds the leftmost-first match, the same match that Perl would, and can return submatch information. The one significant exception is that RE2 drops support for backreferences¹ and generalized zero-width assertions, because they cannot be implemented efficiently. The syntax page gives full details.
Syntax reference: https://github.com/google/re2/wiki/Syntax *
Although OCaml strings and C++ strings may legally have internal null bytes, this library doesn't handle them correctly by doing conversions via C strings. The failure mode is the search stops early, which isn't bad considering how rare internal null bytes are in practice.
The strings are considered in UTF-8 encoding by default or in
ISO 8859-1 if Options.latin1
is used.
include sig ... end
val bin_t : t Bin_prot.Type_class.t
val bin_read_t : t Bin_prot.Read.reader
val __bin_read_t__ : (int ‑> t) Bin_prot.Read.reader
val bin_reader_t : t Bin_prot.Type_class.reader
val bin_size_t : t Bin_prot.Size.sizer
val bin_write_t : t Bin_prot.Write.writer
val bin_writer_t : t Bin_prot.Type_class.writer
val bin_shape_t : Bin_prot.Shape.t
val t_of_sexp : Base.Sexp.t ‑> t
val sexp_of_t : t ‑> Base.Sexp.t
val hash_fold_t : Base.Hash.state ‑> t ‑> Base.Hash.state
val hash : t ‑> Base.Hash.hash_value
Subpatterns are referenced by name if labelled with the /(?P<...>...)/
syntax, or
else by counting open-parens, with subpattern zero referring to the whole regex.
The sub
keyword argument means, omit location information for subpatterns with
index greater than sub
.
Subpatterns are indexed by the number of opening parentheses preceding them:
~sub:(`Index 0)
: only the whole match
~sub:(`Index 1)
: the whole match and the first submatch, etc.
If you only care whether the pattern does match, you can request no location
information at all by passing ~sub:(`Index -1)
.
With one exception, I quote from re2.h:443,
Don't ask for more match information than you will use: runs much faster with nmatch == 1 than nmatch > 1, and runs even faster if nmatch == 0.
For sub > 1
, re2 executes in three steps:
1. run a DFA over the entire input to get the end of the whole match
2. run a DFA backward from the end position to get the start position
3. run an NFA from the match start to match end to extract submatches
sub == 1
lets it stop after (2) and sub == 0
lets it stop after (1).
(See re2.cc:692 or so.)
The one exception is for the functions get_matches
, replace
, and
Iterator.next
: Since they must iterate correctly through the whole string, they
need at least the whole match (subpattern 0). These functions will silently rewrite
~sub
to be non-negative.
val create : ?options:Options.t list ‑> string ‑> t Core_kernel.Or_error.t
include Core_kernel.Stringable with type t := t
val of_string : string ‑> t
val to_string : t ‑> string
val num_submatches : t ‑> int
num_submatches t
returns 1 + the number of open-parens in the pattern.
N.B. num_submatches t == 1 + RE2::NumberOfCapturingGroups()
because
RE2::NumberOfCapturingGroups()
ignores the whole match ("subpattern zero").
val find_all : ?sub:id_t ‑> t ‑> string ‑> string list Core_kernel.Or_error.t
find_all t input
a convenience function that returns all non-overlapping
matches of t
against input
, in left-to-right order.
If sub
is given, and the requested subpattern did not capture, then no match is
returned at that position even if other parts of the regex did match.
val find_first : ?sub:id_t ‑> t ‑> string ‑> string Core_kernel.Or_error.t
find_first ?sub pattern input
finds the first match of pattern
in input
, and
returns the subpattern specified by sub
, or an error if the subpattern didn't
capture.
val find_submatches : t ‑> string ‑> string option array Core_kernel.Or_error.t
find_submatches t input
finds the first match and returns all submatches.
Element 0 is the whole match and element 1 is the first parenthesized submatch, etc.
val find_submatches_exn : t ‑> string ‑> string option array
val split : ?max:int ‑> ?include_matches:bool ‑> t ‑> string ‑> string list
split pattern input
returns input
broken into pieces where pattern
matches. Subpatterns are ignored.
max
matchesParameter include_matches: (default: false) include the matched substrings in the
returned list (e.g., the regex /[,()]/
on "foo(bar,baz)"
gives ["foo"; "(";
"bar"; ","; "baz"; ")"]
instead of ["foo"; "bar"; "baz"]
)
If t
never matches, the returned list has input
as its one element.
val rewrite : t ‑> template:string ‑> string ‑> string Core_kernel.Or_error.t
rewrite pattern ~template input
is a convenience function for replace
:
Instead of requiring an arbitrary transformation as a function, it accepts a
template string with zero or more substrings of the form "\\n"
, each of
which will be replaced by submatch n
. For every match of pattern
against input
, the template will be specialized and then substituted for
the matched substring.
val rewrite_exn : t ‑> template:string ‑> string ‑> string
val valid_rewrite_template : t ‑> template:string ‑> bool
valid_rewrite_template pattern ~template
returns true
iff template
is a
valid rewrite template for pattern
val escape : string ‑> string
escape nonregex
returns a copy of nonregex
with everything escaped (i.e.,
if the return value were t to regex, it would match exactly the
original input)
module Infix : sig ... end
include sig ... end
val sexp_of_without_trailing_none : ('a ‑> Base.Sexp.t) ‑> 'a without_trailing_none ‑> Base.Sexp.t
val without_trailing_none : 'a ‑> 'a without_trailing_none
This type marks call sites affected by a bugfix that eliminated a trailing None. When you add this wrapper, check that your call site does not still work around the bug by dropping the last element.
module Match : sig ... end
val get_matches : ?sub:id_t ‑> ?max:int ‑> t ‑> string ‑> Match.t list Core_kernel.Or_error.t
get_matches pattern input
returns all non-overlapping matches of pattern
against input
max
matchessub
matches.val to_sequence_exn : ?sub:id_t ‑> t ‑> string ‑> Match.t Core_kernel.Sequence.t
val first_match : t ‑> string ‑> Match.t Core_kernel.Or_error.t
first_match pattern input
returns the first match iff pattern
matches input
val replace : ?sub:id_t ‑> ?only:int ‑> f:(Match.t ‑> string) ‑> t ‑> string ‑> string Core_kernel.Or_error.t
replace ?sub ?max ~f pattern input
returns an edited copy of input
with every
substring matched by pattern
transformed by f
.
module Exceptions : sig ... end
module Multiple : sig ... end
module Parser : sig ... end
module Std : sig ... end
module Regex : sig ... end