At Jane Street we believe in code as a means of expression, not just a way of getting things done. If you feel the same way, then join us.
Parametric polymorphism is a basic mechanism in ML for writing code that is generic, i.e., that can be used on multiple different types. To get the basic idea of how what parametric polymorphism is, think about the following simple example.
module M : sig (* Takes a list and returns a stuttered version, e.g., [1;2;3] is mapped to [1;1;2;2;3;3] *) val double : 'a list -> 'a list end = struct let rec double = function | [] -> [] | hd :: tl -> hd :: hd :: double tl end
In the type signature for double, the expression 'a is a type
variable, meaning that this function can be used with an arbitrary
type plugged in for 'a. The reason that the type variable shows up
is that the code of double doesn't depend in any way on the
properties of the elements of the list.
At first glance, parametric polymorphism doesn't seem all that powerful. After all, how useful is it to write functions that can only be generalized over all possible types? More often than not you want to write functions that can be used over some narrow universe of types with particular properties. Object oriented languages provide this functionality with subtyping, and Haskell lets you get at this with type-classes. What do you do in ML?
It turns out that ML does allow you to write functions like this in a quite straightforward and ordinary way. A simple example can be found be examining the signature of a standard sort function:
val sort : cmp : ('a -> 'a -> int) -> 'a list -> 'a list
The signature above can be used on a value of any type, provided that you can also provide a comparison function for that type. In other words, a polymorphic function can take advantage of the idiosyncratic capabilities of the types it deals with, but the ability to take advantage of those capabilities must be passed in along with the values in question.
This is used in small ways throughout any non-trivial ML codebase.
But we can use this in a more structured way by creating what are
sometimes called type-indexed values. A type-indexed value is a
value used to represent a set of capabilities associated with a type.
Here's an example of a simple type-indexed value for capturing
number-ness. In what follows, the type-indexed value is Num.Type.t,
and the rest of the Num module is just utility functions to make the
interface pretty.
module Num : sig module Type : sig type 'a t val int : int t val float : float t end val (+) : 'a Type.t -> 'a -> 'a -> 'a val (-) : 'a Type.t -> 'a -> 'a -> 'a val ( * ) : 'a Type.t -> 'a -> 'a -> 'a val neg : 'a Type.t -> 'a -> 'a val zero : 'a Type.t -> 'a val sum : 'a Type.t -> 'a list -> 'a val sum_product 'a Type.t -> 'a list -> 'a list -> 'a end = struct module Type = struct module T = struct type 'a t = { plus : 'a -> 'a -> 'a; mul : 'a -> 'a -> 'a; neg : 'a -> 'a; zero : 'a; } end open T let int = { plus = Int.(+); neg = Int.(-); zero = Int.zero; mul = Int.mul; } let float = { plus = Float.(+); neg = Float.(-); zero = Float.zero; mul = Float.mul; } end open Type.T let (+) typ x y = typ.plus x y let neg typ x = typ.neg x let zero typ = typ.zero let ( * ) typ x y = typ.mul x y (* Some derived operations *) let (-) typ x y = typ.plus x (typ.neg y) let sum typ l = List.fold_left ~init:typ.zero ~f:typ.plus l let sum_product typ l1 l2 = sum typ (List.map2 ~f:typ.mul l1 l2) end
You'll note that the definition above of Type.int and Type.float
are basically boilerplate. Because the modules in question themselves
have a fairly standardized interface, we could instead use a functor to
create these type-indexed values without the boilerplate:
module type Arith = sig type t val (+) : t -> t -> t val neg : t -> t val zero : t end module Build_type(M:Arith) = struct let typ x = { Type. plus = M.(+); neg = M.(-); zero = M.zero; } end let int = let module Z = Build_type(Int) in Z.typ let int64 = let module Z = Build_type(Int64) in Z.typ let int32 = let module Z = Build_type(Int32) in Z.typ let native = let module Z = Build_type(Native_int) in Z.typ let float = let module Z = Build_type(Float) in Z.typ let complex = let module Z = Build_type(Complex) in Z.typ
This is yet another advantage one gets from having standardized interfaces.
If type indexed-values look similar to Haskell's type-classes, it's
because they are. In my limited understanding of Haskell, the
implementation is similar as well, in that under the cover, Haskell
passes around dictionaries of functions which play the same role that
the Type.ts play here.
The number typeclass described above is just an example, and not something I've felt the need for in practice. But here are some places where we've used type-indexed values to good effect:
The latest (unreleased) version of the bin_prot macros that we use
for binary serialization and deserialization now come with a
type-indexed value that ties together all the little bits that you
need to use the library. Before we did that, one could only
instantiate useful bin-prot functionality using the module language.
Now, we can do it using ordinary polymorphic functions.
Sometimes we design domain-specific languages embedded in the type system. It is often useful to have values representing the different types that can be generated in the language. For example, we use this as part of a set of SQL bindings to represent types that we know how to convert to and from SQL.
We've started experimenting with type-indexed values representing the container-hood of a given object. This is a little trickier than the previous examples, since the type-indexed value has two type parameters, one for the type of the container, and one for the type of the elements of the container. In the end, this let's you write functions with signatures like
max: ('a,'b) Container.t -> cmp:('a -> 'a -> int) -> 'b -> 'a
and use it to find the maximum element of a list (where the
type-indexed value has type ('a, 'a list) Container.t) or an array
(('a, 'a array) Container.t) or a string ((char,string)
Container.t).
Type-indexed values obviously have their downsides: they can be somewhat inconvenient syntactically, since you need to explicitly pass them along; and they sacrifice some performance because it leads you to call closures where you could otherwise call static functions that could be inlined instead. But overall, they are a flexible and elegant way of writing generic code in ML.
Comments
Batteries Included
For information, we have something quite similar to your Arith in Batteries Included -- the documentation for these modules is not on-line yet, but that will e part of release 0.1.
As for the containers, well, we don't have that yet, but I like very much the idea of [('a, 'b) Container.t].
Modules and type classes
I would be curious to learn more about the motivation behind this approach. What is the advantage over simply instantiating
Arithwith modulesInt,Floatetc.?The main benefit of Haskell type classes seems to be that the dictionary is inserted automatically. As you point out, this isn't done here (and is probably impossible to achieve through programming tricks in OCaml).
Functors vs. Parametric polymorphism
If you want to write generic code with modules, you use functors. If you want to do the same with type-indexed values (or really any ordinary kind of value) then you rely on parametric polymorphism. The big obvious advantage of parametric polymorphism is that it is considerably lighter-weight. Imagine having to instantiate a functor every time you wanted to use a new kind of list. Even worse, you'd have to start pushing functors everywhere in your code to be able to write your list code generically.
I think the key advantage of type-indexed values is that they provide a structured approach for writing code that is polymorphic over a complex types.
We actually have a version of this issue in our own codebase. We use functorized versions of hashtables throughout our codebase, both for performance and to get more control over the equality and hash functions used. But it's become clear to us that the functor approach is a mistake, precisely because it doesn't allow one to write polymorphic code. Our intent is to switch over to using hashtables that are instantiated by passing in a type-indexed value that has what you need to construct the hashtable, including pretty-printers for the types (so you can have decent error messages) and equality and hashing functions.
mixed use of modules and classes
It seems your approach is similar to the "mixed" approach using classes and modules in Xavier Leroy's ICFP99 slides.
Also, can you please give an example where functors make it impossible to write polymorphic code?
Modules vs Type classes
There's a lot of meat to chew on here.
First, any idea as to how much performance is given up using this record of functions approach versus functorized code? In an ideal world, there's be an OCamlton whole world type compiler and both approaches would have negligible (zero?) overhead, but we don't live there, and neither functors nor records of functions are compiled away.
Second, if I read you right, you think this approach is generally superior to functorized libraries. If that's true, then ML is not really at a sweet spot in language design at all. A better ML would be based on type classes, since your type indexed values are just the translation of type classes to ML without the convenience they provide. You could dispense with functors altogether, and just provide a Modula style Module system for scoping/namespace control. Is that a correct assessment of your opinions?
Modules vs Type classes
I don't think that type-indexed values are generally superior to functorized libraries. I think both approaches have their uses, and sometimes one is the right thing, and sometimes the other (although I must admit I'm having a hard time coming up with clear guidelines as to when one should use one or the other. Perhaps Steven would have something better developed to say about it.)
As for ML being a sweet spot in language design, I stand my ground, but I don't want to overstate the point. I never meant to suggest that ML is the best of all possible languages; merely that it's close to a local optimum. There are various small tweaks and improvements one can make, but to make a really big leap forward (like the leap from Java to ML) appears to be very difficult.
That said, it's quite possible that a better language could be designed that featured type-classes more prominently. I'm probably too ignorant about Haskell to say this, but it is nonetheless my belief that Haskell is farther from the sweet spot because of laziness (which makes it too hard to reason about performance, particularly space performance) and the requirement to explicitly reflect all side-effects in the type system, which in my mind is too strict of a constraint in practice.
Modules vs Type classes
I wasn't suggesting that Haskell is near a sweet spot. I happen to share your opinion about laziness and imperative features; I might even go a step further and say that I wish OCaml were more like SML and had a defined evaluation order. I was suggesting that the module system of ML could be simplified by removing functors, and that type classes could be added, and that something like that would be a better language. I haven't used Haskell enough to feel sure about that. I do know that I've long wanted some kind of overloading in ML.
If there are examples where you think functors are better, that would make the discussion more interesting. I've used functors, every OCaml programmer has because the standard library makes use of them, but I have to admit, almost all of the code I write is functor free. I may even have used classes more than functors, but I'd be way happier if OCaml had a decent record system (like SML#) and did away with classes.
Anyways, this is yet another motivation for me to try and use Haskell in earnest, for some real projects, so that I can really learn the weaknesses of type classes. Unfortunately, I always end up having to use union find, or corner stitched tiles, or some other data structure that cries out for mutability in the implementation :-).
AKA ad-hoc polymorphism
On the relation between Haskell type-classes and your Ocaml module solution, Oleg Kiselyov sums it up nicely by saying, "The only difference is in the shape of the arrow". Except that it's much less convenient to have to manually perform the dictionary-passing transform (described in Phil Wadler's first paper on this topic).
The upside comes especially in cases like you described, where you want to apply one generic function to a container, and another to its elements. You can address these independently:
And thankfully, after the "proof obligation" is settled once you can forget about it.
This kind of overloading is tremendously valuable. Somebody must have already ported type-classes to Ocaml. If not, I think I'll give it a shot -- it seems like a worthwhile project.