Module Sexplib0__Raw_grammar
Representation of S-expression grammars
Goals and non-goals
Functionality goals: With post-processing, sexp grammars can be pretty-printed in a human-readable format and provides enough information to implement completion and validation tools.
Performance goals: @@deriving sexp_grammar adds minimal overhead and introduces no toplevel side effect. The compiler can lift the vast majority of ASTs generated by @@deriving sexp_grammar as global constants. Common sub-grammars are usually shared, particularly when they derive from multiple applications of the same functor.
Non-goals: Stability, although we will make changes backwards-compatible or at least provide a reasonable upgrade path.
In what follows, we describe how this is achieved.
Encoding of generated grammars to maximize sharing
A group contains the grammars for all types of a mutually recursive group of OCaml type declarations.
To ensure maximum sharing, a group is split into two parts:
- The
generic_groupdepends only on the textual type declarations. Where the type declaration refers to an existing concrete type, the generic group takes a variable to represent the grammar of that type. This means that the compiler can lift each type declaration in the source code to a shared global constant.
- The
groupbinds the type variables of thegeneric_group, either to concrete grammars where the type declaration refers to a concrete type, or to another variable where the type declaration itself was polymorphic.
To understand this point better, imagine the following type declaration
type t = X of uwere explicitly split into its generic_group and group parts:
type 'u t_generic = X of 'u
type t = u t_genericIf u came from a functor argument, it's easy to see that t_generic would be exactly the same in all applications of the functor and only t would vary. The grammar of t_generic, which is the biggest part, would be shared between all applications of the functor.
Processing of grammars
The Raw_grammar.t type optimizes for performance over ease of use. To help users process the raw grammars into a more usable form, we keep two identifiers in the generated grammars:
- The
generic_group_iduniquely identifies ageneric_group. It is a hash of the generic group itself. (It is okay that this scheme would conflate identical type declarations, because the resulting generic groups would be identical as well.)
- The
group_iduniquely identifies agroup. It is a unique integer, generated lazily so that we don't create a side effect at module creation time.
The exact processing would depend on the final application. We expect that a typical consumer of sexp grammars would define less-indirected equivalents of the t and group types, possibly re-using the _ type_ and Atom.t types.
type generic_group_id= stringtype group_id= Sexplib0__.Lazy_group_id.ttype var_name= stringVariable names. These are used to improve readability of the printed grammars. Internally, we use numerical indices to represent variables; see
Implicit_varbelow.
module Atom : sig ... endA grammatical type which classifies atoms.
type 't type_=|AnyAny list or atom.
|Apply of 't type_ * 't type_ listAssign types to (explicit) type variables.
|Atom of Atom.tAn atom, in particular one of the given
Atom.t.|Explicit_bind of var_name list * 't type_In
Bind ([ "a"; "b" ], Explicit_var 0),Explicit_var 0is"a". One must bind all available type variables: free variables are not permitted.|Explicit_var of intIndices for type variables, e.g.
'a, introduced by polymorphic definitions.Unlike de Bruijn indices, these are always bound by the nearest ancestral
Explicit_bind.|Grammar of 'tEmbeds other types in a grammar.
|Implicit_var of intIndices for type constructors, e.g.
int, in scope. Unlike de Bruijn indices, these are always bound by theimplicit_varsof the nearest enclosinggeneric_groups.|List of 't sequence_typeA list of a certain form. Depending on the
sequence_type, this might correspond to an OCaml tuple, list, or embedded record.|Option of 't type_An optional value. Either syntax recognized by
option_of_sexpis supported:(Some 42)or(42)for a value andNoneor()for no value.|Record of 't record_typeA list of lists, representing a record of the given
record_type. For validation,Record rectyis equivalent toList [Fields recty].|Recursive of type_nameA type in the same mutually recursive group, possibly the current one.
|Union of 't type_ listAny sexp matching any of the given types.
Variantshould be preferred when possible, especially for complex types, since validation and other algorithms may behave exponentially.One useful special case is
Union [], the empty type. This is occasionally generated for things such as abstract types.|Variant of 't variant_typeA sexp which matches the given
variant_type.A grammatical type which classifies sexps. Corresponds to a non-terminal in a context-free grammar.
and 't sequence_type= 't component listA grammatical type which classifies sequences of sexps. Here, a "sequence" may mean either a list on its own or, say, the sexps following a constructor in a list matching a
variant_type.Certain operations may greatly favor simple sequence types. For example, matching
List [ Many type_ ]is easy for any typetype_(assumingtype_itself is easy), butList [ Many type1; Many type2 ]may require backtracking. Grammars derived from OCaml types will only have "nice" sequence types.
and 't component=|One of 't type_Exactly one sexp of the given type.
|Optional of 't type_One sexp of the given type, or nothing at all.
|Many of 't type_Any number of sexps, each of the given type.
|Fields of 't record_typeA succession of lists, collectively defining a record of the given
record_type. The fields may appear in any order. The number of lists is not necessarily fixed, as some fields may be optional. In particular, if all fields are optional, there may be zero lists.Part of a sequence of sexps.
and 't variant_type={ignore_capitalization : bool;If true, the grammar is insensitive to the case of the first letter of the label. This matches the behavior of derived
sexp_of_tfunctions.alts : (label * 't sequence_type) list;An association list of labels (constructors) to sequence types. A matching sexp is a list whose head is the label as an atom and whose tail matches the given sequence type. As a special case, an alternative whose sequence is empty matches an atom rather than a list (i.e.,
labelrather than(label)). This is in keeping with generatedt_of_sexpfunctions.As a workaround, to match
(label)one could use("label", [ Optional (Union []) ]).}A tagged union of grammatical types. Grammars derived from OCaml variants will have variant types.
and 't record_type={allow_extra_fields : bool;fields : (label * 't field) list;}A collection of field definitions specifying a record type. Consists only of an association list from labels to fields.
and 't field={optional : bool;If true, the field is optional.
args : 't sequence_type;A sequence type which the arguments to the field must match. An empty sequence is permissible but would not be generated for any OCaml type.
}A field in a record.
type t=|Ref of type_name * group|Inline of t type_and group={gid : group_id;generic_group : generic_group;origin : string;originprovides a human-readable hint as to where the type was defined.For a globally unique identifier, use
gidinstead.See
ppx/ppx_sexp_conv/test/expect/test_origin.mlfor examples.apply_implicit : t list;}and generic_group={implicit_vars : var_name list;ggid : generic_group_id;types : (type_name * t type_) list;}