Module Sexplib0__Raw_grammar
Representation of S-expression grammars
Goals and non-goals
Functionality goals: With post-processing, sexp grammars can be pretty-printed in a human-readable format and provides enough information to implement completion and validation tools.
Performance goals: @@deriving sexp_grammar
adds minimal overhead and introduces no toplevel side effect. The compiler can lift the vast majority of ASTs generated by @@deriving sexp_grammar
as global constants. Common sub-grammars are usually shared, particularly when they derive from multiple applications of the same functor.
Non-goals: Stability, although we will make changes backwards-compatible or at least provide a reasonable upgrade path.
In what follows, we describe how this is achieved.
Encoding of generated grammars to maximize sharing
A group
contains the grammars for all types of a mutually recursive group of OCaml type declarations.
To ensure maximum sharing, a group is split into two parts:
- The
generic_group
depends only on the textual type declarations. Where the type declaration refers to an existing concrete type, the generic group takes a variable to represent the grammar of that type. This means that the compiler can lift each type declaration in the source code to a shared global constant.
- The
group
binds the type variables of thegeneric_group
, either to concrete grammars where the type declaration refers to a concrete type, or to another variable where the type declaration itself was polymorphic.
To understand this point better, imagine the following type declaration
type t = X of u
were explicitly split into its generic_group
and group
parts:
type 'u t_generic = X of 'u
type t = u t_generic
If u
came from a functor argument, it's easy to see that t_generic
would be exactly the same in all applications of the functor and only t
would vary. The grammar of t_generic
, which is the biggest part, would be shared between all applications of the functor.
Processing of grammars
The Raw_grammar.t
type optimizes for performance over ease of use. To help users process the raw grammars into a more usable form, we keep two identifiers in the generated grammars:
- The
generic_group_id
uniquely identifies ageneric_group
. It is a hash of the generic group itself. (It is okay that this scheme would conflate identical type declarations, because the resulting generic groups would be identical as well.)
- The
group_id
uniquely identifies agroup
. It is a unique integer, generated lazily so that we don't create a side effect at module creation time.
The exact processing would depend on the final application. We expect that a typical consumer of sexp grammars would define less-indirected equivalents of the t
and group
types, possibly re-using the _ type_
and Atom.t
types.
type generic_group_id
= string
type group_id
= Sexplib0__.Lazy_group_id.t
type var_name
= string
Variable names. These are used to improve readability of the printed grammars. Internally, we use numerical indices to represent variables; see
Implicit_var
below.
module Atom : sig ... end
A grammatical type which classifies atoms.
type 't type_
=
|
Any
Any list or atom.
|
Apply of 't type_ * 't type_ list
Assign types to (explicit) type variables.
|
Atom of Atom.t
An atom, in particular one of the given
Atom.t
.|
Explicit_bind of var_name list * 't type_
In
Bind ([ "a"; "b" ], Explicit_var 0)
,Explicit_var 0
is"a"
. One must bind all available type variables: free variables are not permitted.|
Explicit_var of int
Indices for type variables, e.g.
'a
, introduced by polymorphic definitions.Unlike de Bruijn indices, these are always bound by the nearest ancestral
Explicit_bind
.|
Grammar of 't
Embeds other types in a grammar.
|
Implicit_var of int
Indices for type constructors, e.g.
int
, in scope. Unlike de Bruijn indices, these are always bound by theimplicit_vars
of the nearest enclosinggeneric_groups
.|
List of 't sequence_type
A list of a certain form. Depending on the
sequence_type
, this might correspond to an OCaml tuple, list, or embedded record.|
Option of 't type_
An optional value. Either syntax recognized by
option_of_sexp
is supported:(Some 42)
or(42)
for a value andNone
or()
for no value.|
Record of 't record_type
A list of lists, representing a record of the given
record_type
. For validation,Record recty
is equivalent toList [Fields recty]
.|
Recursive of type_name
A type in the same mutually recursive group, possibly the current one.
|
Union of 't type_ list
Any sexp matching any of the given types.
Variant
should be preferred when possible, especially for complex types, since validation and other algorithms may behave exponentially.One useful special case is
Union []
, the empty type. This is occasionally generated for things such as abstract types.|
Variant of 't variant_type
A sexp which matches the given
variant_type
.A grammatical type which classifies sexps. Corresponds to a non-terminal in a context-free grammar.
and 't sequence_type
= 't component list
A grammatical type which classifies sequences of sexps. Here, a "sequence" may mean either a list on its own or, say, the sexps following a constructor in a list matching a
variant_type
.Certain operations may greatly favor simple sequence types. For example, matching
List [ Many type_ ]
is easy for any typetype_
(assumingtype_
itself is easy), butList [ Many type1; Many type2 ]
may require backtracking. Grammars derived from OCaml types will only have "nice" sequence types.
and 't component
=
|
One of 't type_
Exactly one sexp of the given type.
|
Optional of 't type_
One sexp of the given type, or nothing at all.
|
Many of 't type_
Any number of sexps, each of the given type.
|
Fields of 't record_type
A succession of lists, collectively defining a record of the given
record_type
. The fields may appear in any order. The number of lists is not necessarily fixed, as some fields may be optional. In particular, if all fields are optional, there may be zero lists.Part of a sequence of sexps.
and 't variant_type
=
{
ignore_capitalization : bool;
If true, the grammar is insensitive to the case of the first letter of the label. This matches the behavior of derived
sexp_of_t
functions.alts : (label * 't sequence_type) list;
An association list of labels (constructors) to sequence types. A matching sexp is a list whose head is the label as an atom and whose tail matches the given sequence type. As a special case, an alternative whose sequence is empty matches an atom rather than a list (i.e.,
label
rather than(label)
). This is in keeping with generatedt_of_sexp
functions.As a workaround, to match
(label)
one could use("label", [ Optional (Union []) ])
.}
A tagged union of grammatical types. Grammars derived from OCaml variants will have variant types.
and 't record_type
=
{
allow_extra_fields : bool;
fields : (label * 't field) list;
}
A collection of field definitions specifying a record type. Consists only of an association list from labels to fields.
and 't field
=
{
optional : bool;
If true, the field is optional.
args : 't sequence_type;
A sequence type which the arguments to the field must match. An empty sequence is permissible but would not be generated for any OCaml type.
}
A field in a record.
type t
=
|
Ref of type_name * group
|
Inline of t type_
and group
=
{
gid : group_id;
generic_group : generic_group;
origin : string;
origin
provides a human-readable hint as to where the type was defined.For a globally unique identifier, use
gid
instead.See
ppx/ppx_sexp_conv/test/expect/test_origin.ml
for examples.apply_implicit : t list;
}
and generic_group
=
{
implicit_vars : var_name list;
ggid : generic_group_id;
types : (type_name * t type_) list;
}