module Extended_string:Extensions tosig
..end
Core.Core_String
.val collate : string -> string -> int
collate s1 s2
sorts string in an order that's is usaully more suited
for human consumption by treating ints specificaly:
(e.g. it will output: ["rfc1.txt";"rfc822.txt";"rfc2086.txt"]
).
It works by splitting the strings in numerical and non numerical chunks and comparing chunks two by two from left to right (and starting on a non numerical chunks):
val unescaped : ?strict:bool -> string -> string
unescaped s
is the inverse operation of escaped
: it takes a string where
all the special characters are escaped following the lexical convention of
OCaml and returns an unescaped copy.
The strict
switch is on by default and makes the function treat illegal
backslashes as errors.
When strict
is false
every illegal backslash except escaped numeral
greater than 255
is copied literally. The aforementioned numerals still
raise errors. This mimics the behaviour of the ocaml lexer.val unescaped_res : ?strict:bool -> string -> (string, int * string) Core.Result.t
unescaped
but instead of raising Failure _
returns an error
message with the position in the string in case of failure.val squeeze : string -> string
squeeze str
reduces all sequences of spaces, newlines, tables, and
* carriage returns to single spaces.val is_substring : substring:string -> string -> bool
is_substring ~substring t
returns true
if substring is a substring
* of t.val pad_left : ?char:char -> string -> int -> string
pad_left ~char s len
Returns s
padded to the length len
by adding characters char
to the
left of the string. If s is already longer than len
it is returned unchanged.val pad_right : ?char:char -> string -> int -> string
val line_break : len:int -> string -> string list
val word_wrap : ?trailing_nl:bool ->
?soft_limit:int -> ?hard_limit:int -> ?nl:string -> string -> string
word_wrap ~soft_limit s
Wraps the string so that it fits the length soft_limit
. It doesn't break
words unless we go over hard_limit
.
if nl
is passed it is inserted instead of the normal newline character.
val consolidate_strings : ?int_sets:[ `Asterisk | `Bounds | `Exact ] -> string list -> string
^
) losslessly. E.g.:
abc-def-1-ghijk abc-def-2-ghijk abc-def-5-ghijk abc-xyz-2 abc-xyz-3
becomes:
abc-
The algorithm is conceptually as follows:
1) if all strings are sequences of digits, return contiguous subranges
,
otherwise:
2) split all strings into groups of consecutive letters or digits (but not both)
3) break the list into sublists by the 1st and last token (1st token has precedence)
4) for each sublist, find the longest prefix and suffix common for all entries
5) replace each sublist with:
"common_prefix-<result_of_the_recursive_call>-common_suffix"
6) return String.concat ~sep:"," <compressed_sublists>
In the implementation, we only tokenize the strings once, and then recursively work with lists of tokens.
^
Repeated entries and the original ordering of the list are not preserved.
Otherwise, this transformation is lossless.
Additionally, one may choose to represent sets of integers as the lower..upper bound
(so 1-2,5
becomes 1..5
), or just by an asterisk ("*"), when brevity is preferred
over accuracy.
^
Repeated entries and the original ordering of the list are not preserved.
Otherwise, this transformation is lossless.
val consolidate_strings' : max_len:int -> string list -> string
1) plain consolidate_strings
2) consolidate_strings ~int_sets:`Bounds
3) consolidate_strings ~int_sets:`Asterisk
4) prefix of 3 ^ "..."
val edit_distance : ?transpose:unit -> string -> string -> int
transpose
argument, it alsos considers transpositions (Damerau-Levenshtein
distance).