# Module `Patience_diff.String`

`val get_matching_blocks : transform:('a -> elt) -> ?big_enough:int -> prev:'a array -> next:'a array -> Matching_block.t list`

Get_matching_blocks not only aggregates the data from

`matches a b`

but also attempts to remove random, semantically meaningless matches ("semantic cleanup"). The value of`big_enough`

governs how aggressively we do so. See`get_hunks`

below for more details.

`val matches : elt array -> elt array -> (int * int) list`

`matches a b`

returns a list of pairs (i,j) such that a.(i) = b.(j) and such that the list is strictly increasing in both its first and second coordinates. This is essentially a "unfolded" version of what`get_matching_blocks`

returns. Instead of grouping the consecutive matching block using`length`

this function would return all the pairs (prev_start * next_start).

`val match_ratio : elt array -> elt array -> float`

`match_ratio ~compare a b`

computes the ratio defined as:`2 * len (matches a b) / (len a + len b)`

It is an indication of how much alike a and b are. A ratio closer to 1.0 will indicate a number of matches close to the number of elements that can potentially match, thus is a sign that a and b are very much alike. On the next hand, a low ratio means very little match.

`val get_hunks : transform:('a -> elt) -> context:int -> ?big_enough:int -> prev:'a array -> next:'a array -> 'a Hunk.t list`

`get_hunks ~transform ~context ~prev ~next`

will compare the arrays`prev`

and`next`

and produce a list of hunks. (The hunks will contain Same ranges of at most`context`

elements.) Negative`context`

is equivalent to infinity (producing a singleton hunk list). The value of`big_enough`

governs how aggressively we try to clean up spurious matches, by restricting our attention to only matches of length less than`big_enough`

. Thus, setting`big_enough`

to a higher value results in more aggressive cleanup, and the default value of 1 results in no cleanup at all. When this function is called by`Patdiff_core`

, the value of`big_enough`

is 3 at the line level, and 7 at the word level.

`type 'a segment`

`=`

`|`

`Same of 'a array`

`|`

`Different of 'a array array`

`type 'a merged_array`

`= 'a segment list`

`val merge : elt array array -> elt merged_array`