Operations for escaping and unescaping strings, with paramaterized escape and escapeworthy characters. Escaping/unescaping using this module is more efficient than using Pcre. Benchmark code can be found in core/benchmarks/string_escaping.ml.
escape_gen_exn escapeworthy_map escape_char
returns a function that will escape a
string s
as follows: if (c1,c2)
is in escapeworthy_map
, then all occurences of
c1
are replaced by escape_char
concatenated to c2
.
Raises an exception if escapeworthy_map
is not one-to-one. If escape_char
is
not in escapeworthy_map
, then it will be escaped to itself.
escape ~escapeworthy ~escape_char s
is
escape_gen_exn ~escapeworthy_map:(List.zip_exn escapeworthy escapeworthy)
~escape_char
.
Duplicates and escape_char
will be removed from escapeworthy
. So, no
exception will be raised
unescape_gen_exn
is the inverse operation of escape_gen_exn
. That is,
let escape = Staged.unstage (escape_gen_exn ~escapeworthy_map ~escape_char) in
let unescape = Staged.unstage (unescape_gen_exn ~escapeworthy_map ~escape_char) in
assert (s = unescape (escape s))
always succeed when ~escapeworthy_map is not causing exceptions.
unescape ~escape_char
is defined as unescape_gen_exn ~map:[] ~escape_char
Any char in an escaped string is either escaping, escaped or literal. For example, for escaped string "0_a0__0" with escape_char as '_', pos 1 and 4 are escaping, 2 and 5 are escaped, and the rest are literal
is_char_escaping s ~escape_char pos
return true if the char at pos
is escaping,
false otherwise.
is_char_escaped s ~escape_char pos
return true if the char at pos
is escaped,
false otherwise.
is_literal s ~escape_char pos
return true if the char at pos
is not escaped or
escaping.
index s ~escape_char char
find the first literal (not escaped) instance of
char in s starting from 0.
rindex s ~escape_char char
find the first literal (not escaped) instance of
char in s starting from the end of s and proceeding towards 0.
index_from s ~escape_char pos char
find the first literal (not escaped)
instance of char in s starting from pos and proceeding towards the end of s.
rindex_from s ~escape_char pos char
find the first literal (not escaped)
instance of char in s starting from pos and towards 0.
split s ~escape_char ~on
s
that are separated by
literal versions of on
. Consecutive on
characters will cause multiple empty
strings in the result. Splitting the empty string returns a list of the empty
string, not the empty list."foo"; "bar_,baz"
split_on_chars s ~on
s
that are separated by
one of the literal chars from on
. on
are not grouped. So a grouping of on
in
the source string will produce multiple empty string splits in the result.',';'|'
"foo_|bar,baz|0" ->
"foo_|bar"; "baz"; "0"
lsplit2 s on escape_char
splits s into a pair on the first literal instance of
on
(meaning the first unescaped instance) starting from the left.