Operations for escaping and unescaping strings, with paramaterized escape and escapeworthy characters. Escaping/unescaping using this module is more efficient than using Pcre. Benchmark code can be found in core/benchmarks/string_escaping.ml.
escape_gen_exn escapeworthy_map escape_char returns a function that will escape a
string s as follows: if (c1,c2) is in escapeworthy_map, then all occurences of
c1 are replaced by escape_char concatenated to c2.
Raises an exception if escapeworthy_map is not one-to-one. If escape_char is
not in escapeworthy_map, then it will be escaped to itself.
escape ~escapeworthy ~escape_char s is
escape_gen_exn ~escapeworthy_map:(List.zip_exn escapeworthy escapeworthy)
~escape_char
.
Duplicates and escape_char will be removed from escapeworthy. So, no
exception will be raised
unescape_gen_exn is the inverse operation of escape_gen_exn. That is,
let escape = Staged.unstage (escape_gen_exn ~escapeworthy_map ~escape_char) in
let unescape = Staged.unstage (unescape_gen_exn ~escapeworthy_map ~escape_char) in
assert (s = unescape (escape s))
always succeed when ~escapeworthy_map is not causing exceptions.
unescape ~escape_char is defined as unescape_gen_exn ~map:[] ~escape_char
Any char in an escaped string is either escaping, escaped or literal. For example, for escaped string "0_a0__0" with escape_char as '_', pos 1 and 4 are escaping, 2 and 5 are escaped, and the rest are literal
is_char_escaping s ~escape_char pos return true if the char at pos is escaping,
false otherwise.
is_char_escaped s ~escape_char pos return true if the char at pos is escaped,
false otherwise.
is_literal s ~escape_char pos return true if the char at pos is not escaped or
escaping.
index s ~escape_char char find the first literal (not escaped) instance of
char in s starting from 0.
rindex s ~escape_char char find the first literal (not escaped) instance of
char in s starting from the end of s and proceeding towards 0.
index_from s ~escape_char pos char find the first literal (not escaped)
instance of char in s starting from pos and proceeding towards the end of s.
rindex_from s ~escape_char pos char find the first literal (not escaped)
instance of char in s starting from pos and towards 0.
split s ~escape_char ~on
s that are separated by
literal versions of on. Consecutive on characters will cause multiple empty
strings in the result. Splitting the empty string returns a list of the empty
string, not the empty list."foo"; "bar_,baz"
split_on_chars s ~on
s that are separated by
one of the literal chars from on. on are not grouped. So a grouping of on in
the source string will produce multiple empty string splits in the result.',';'|' "foo_|bar,baz|0" ->
"foo_|bar"; "baz"; "0"
lsplit2 s on escape_char splits s into a pair on the first literal instance of
on (meaning the first unescaped instance) starting from the left.