Last modified 8 years ago Last modified on 02/26/10 08:55:08

PML's design discussion

Compiler design


First version : LLVM + Boehm's conservative GC

Using the constraints

PML language is small, it has only 4 kind of objects:

  • cartesian product : record / projection
  • variants : constructor / matching
  • functions : abstraction / application
  • external types : int, int32, float64, char, ...

For each of this class of objects, the typing can produce a (large) matrice relating each constructor to each destructor. For instance, for structure, it is a matrice whose rows are indexed by the position in the source of structure constructors and whose column are the position of projections.

From this matrice, one can optimise the compiled code:

  • For structure, this should allow contant time access to structure field and one needs to minimize the unused allocated space in structure. In a first version, only this optimisation is really necessary
  • For function, this means optimizing register allocation through function calls
  • For variant, it means trying to choose their representation to optimize case analysis

Here is an example for structure. Consider the following code:

Error: Failed to load processor CodeExample
No macro or processor named 'CodeExample' found

This access matrix is (taking position in the code order to index lines and columns): 1;1;0];[1;0;1];[0;1;1?

If we have a modulo mod(s) and an offset off(s) stored in the header of each structure s and if each label l is tranlated in an interger int(l). Then, s.l can be compiled in s[int(l) % mod(s) + off(s) + hsize]. (the "+ hsize" is just here to avoid the header word).

In this case, an optimal choice for the above program is int(a) = 1, int(b) = 2, int(c) = 3, sab(mod) = sbc(mod) = 2, sac(mod) = 3, sab(off) = sbc(off) = sac(off) = 0. Here the offset being not used at all, the compilation can opimize it away ...

Some problems still remaining

Do not dream, PML is not making compilation trivial !

  • Compilation of polymorphic functions ... if we want unboxed values in structure and in the stack, and if we do not want full monomorphisation (like MLTon) some (most?) polymorphic functions needs extra boolean arguments telling what are pointers and what are not ... As long as we use conservative Boehm GC, we can avoid the problem and unbox everything ...
  • PML is using bindlib ... this should make it easy to add inlining/partial evaluation ... But still one needs a criteria to know what to do ...
  • PML supports deep pattern matching ... this is always hard to compile efficiently (no problem for a first compiler, we can do a trivial compilation scheme for pattern matching)
  • Separate compilation : the constraints matrice are global to the whole program (no separate compilation). However, we can still compile librairies that could be dynamically loaded , but we need to remember the represetation of the structure used in the library and two librairies could be incompatible ... This only happens if you use the internal representation of the data-structure provided by the library ... So it should be managable.
  • FFI (foreing function interface) ... trivial as long as we use Boehm GC again ... But after, we could take a simple solution : no way to call PML from C or to examine PML data in C, we can just call C from PML ... We need an exception for this rule, just for callbacks to use existing GUI library.

Uniqueness typing

Probably the best way (better than Haskell Monad) to have the imperative features that are needed in a realistic programming language. But, it is not clear yet how to adapt this to PML ...

Other subjects / To do

  • Better simplification of the typing map (in the present version, the complexity of the type checking is too high)
  • We need to implement subsumption for the automtic proof and other optimisation of proof search (with proof-depth > 0)
  • Other optimisations (similar to sat solver optimisation) can be implemented for proof search
  • Extensionality
  • Quotient