Camlp4: Major changes

New bootstrap system

Camlp4 no longer keeps a generated directory of pure OCaml sources in order to start the build. It starts from to OCaml files (boot/camlp4boot.ml and boot/Camlp4.ml). With these two files we build a single bytecode camlp4 program called camlp4boot. This camlp4boot is a full-featured Camlp4 engine able to parse all Camlp4 sources. With this new system developers who have to change the Camlp4 side of the OCaml distribution shouldn't be hurt by the bootstrap process. More details about the bootstrap can be found here:

New module organization

The new Camlp4 version is much more modular and abstract than the previous one. As a first benefit Camlp4 no more pollutes the global namespace, all Camlp4 modules are packed inside modules that begins with Camlp4.

Camlp4 makes heavy use of functors to factor and abstract different parts of the system but -don't be afraid- a pre-casted module is waiting for you as a default Camlp4 structure.

The (partial) module tree can be found below.

Locations handling

Locations are now abstract, and therefore can be replaced by any representation that fits with the module signature, in particular regular OCaml locations, fake locations (for those that don't need them). Furthermore locations are now generally handled in a functional way, instead of imperatively.

Lexer

The lexer now uses ocamllex and gains the following features:

AST filters: filtering Abstract Syntax Trees

This new version makes easier to write and apply transformations over an AST.

Camlp4 can be used as a translator in between two languages. For example you can translate your OCaml sources (foo.ml) into the revised syntax with this command:

$ camlp4 -parser OCaml -printer OCamlr foo.ml
   # equivalent to:
   #   * camlp4 pa_o.cmo pr_r.cmo foo.ml
   #   * camlp4 Camlp4Parsers/OCaml.cmo Camlp4Printers/OCamlr.cmo foo.ml

The parser reads the source, the printer writes it, but nothing can be inserted between. Camlp4Filters fill this hole by providing a way to transform an AST into another one by registering transformation functions.

$ camlp4 -parser OCaml -filter ExceptionTracer -printer OCamlr foo.ml
   # equivalent to:
   #   * camlp4 Camlp4Parsers/OCaml.cmo Camlp4Filters/ExceptionTracer.cmo Camlp4Printers/OCamlr.cmo foo.ml
   #   * camlp4o -filter ExceptionTracer pr_r.cmo foo.ml
The Camlp4 standard distribution contains some AST transformations described in the module tree below.

Grammars as functors

The former version of Camlp4 proposed two different ways of writing grammars: Now grammars are always modules (created by a functor application), with two flavors.

Syntax extensions can be parameterized

For that purpose you write a functor that takes a module Syntax as argument and returns an extended module Syntax.

For dynamic loading and integration in the camlp4 binary reasons, one can register an extension to makes it usable at command line.

Printers

The Camlp4 pretty-printing technology is being redesigned falling back to more classical tools (mutually recursive functions and the Format module).

Command line changes:

Modules naming/organization:

Syntax extensions changes:

Before

Let two grammar extensions in the old syntax dynamic_old_syntax.ml and static_old_syntax.ml. dynamic_old_syntax.ml
type t1 = A | B
type t2 = Foo of string * t1
open Pcaml
let foo = Entry.mk gram "foo"
let bar = Entry.mk gram "bar"
EXTEND
  GLOBAL: foo bar;
  foo: [ [ "foo"; i = LIDENT; b = bar -> Foo(i, b) ] ];
  bar: [ [ "?" -> A | "." -> B ] ];
END;;
Entry.parse foo (Stream.of_string "foo x?") = Foo("x", A)
DELETE_RULE foo: "foo"; LIDENT; bar END
static_old_syntax.ml
type t1 = A | B
type t2 = Foo of string * t1
module Gram = Grammar.GMake(...)
let foo = Gram.Entry.mk "foo"
let bar = Gram.Entry.mk "bar"
GEXTEND Gram
  GLOBAL: foo bor;
  foo: [ [ "foo"; i = LIDENT; b = bar -> Foo(i, b) ] ];
  bar: [ [ "?" -> A | "." -> B ] ];
END;;
Gram.Entry.parse foo (Stream.of_string "foo x?") = Foo("x", A)
GDELETE_RULE Gram foo: "foo"; LIDENT; bar END

The quick and non extensible way: the Camlp4.PreCast module

quick_non_extensible_example.ml
(* This scheme only works when the grammar value is not really
   used for other things than entry creation. In fact grammars
   are now static by default. *)
type t1 = A | B
type t2 = Foo of string * t1
open Camlp4.PreCast
open Syntax
let foo = Gram.Entry.mk "foo"
let bar = Gram.Entry.mk "bar"
EXTEND Gram
  GLOBAL: foo bar;
  foo: [ [ "foo"; i = LIDENT; b = bar -> Foo(i, b) ] ];
  bar: [ [ "?" -> A | "." -> B ] ];
END;;
Gram.parse_string foo (Loc.mk "<string>") "foo x?" = Foo("x", A)
DELETE_RULE Gram foo: "foo"; LIDENT; bar END

The functorial way

dynamic_functor_example.ml
type t1 = A | B
type t2 = Foo of string * t1
open Camlp4

module Id = struct (* Information for dynamic loading *)
  let name = "My_extension"
  let version = "$Id$"
end

(* An extension is just a functor: Syntax -> Syntax or Camlp4Syntax -> Camlp4Syntax *)
module Make (Syntax : Sig.Camlp4Syntax) = struct
  include Syntax
  let foo = Gram.Entry.mk "foo"
  let bar = Gram.Entry.mk "bar"
  open Camlp4.Sig
  EXTEND Gram
    GLOBAL: foo bar;
    foo: [ [ "foo"; i = LIDENT; b = bar -> Foo(i, b) ] ];
    bar: [ [ "?" -> A | "." -> B ] ];
  END;;
  Gram.parse_string foo (Loc.mk "<string>") "foo x?" = Foo("x", A)
  DELETE_RULE Gram foo: "foo"; LIDENT; bar END
end

(* Register it to make it usable via the camlp4 binary. *)
module M = Register.OCamlSyntaxExtension(Id)(Make)
static_functor_example.ml
type t1 = A | B
type t2 = Foo of string * t1
open Camlp4.PreCast
module Lexer = struct
  ... if you need a different lexer ...
end
module Gram = MakeGram(Lexer)
let foo = Gram.Entry.mk "foo"
let foo = Gram.Entry.mk "foo"
EXTEND Gram
  GLOBAL: foo;
  foo: [ [ "foo"; i = LIDENT; b = bar -> Foo(i, b) ] ];
  bar: [ [ "?" -> A | "." -> B ] ];
END;;
Gram.parse_string foo (Loc.mk "<string>") "foo x?" = Foo("x", A)
DELETE_RULE Gram foo: "foo"; LIDENT; bar END

Grammar syntax additions

Full ocaml patterns for tokens

Preceded by a backquote ("`") as in the revised stream syntax. You can match and capture any part you want in this token. You are no more constrain to get just a string. Example in normal syntax: without_patterns_example.ml
expr:
  [ [ ...
    | "-"; s = INT -> Ast.Int (- int_of_string s)
    | ... (* or worse: here `s' is a quotation
            token serialized as one string *)
    | s = QUOTATION ->
      let name = String.index ':' ... in
      let loc_name = ... in
      let code = ... in
      let quot = { name = name; loc_name = loc_name; code = code } 
      in expand_quotatiton quot
  ] ]
This one can now be written as: with_patterns_example.ml
expr:
  [ [ ...
    | "-"; `INT(i, _) -> Ast.Int (-i)
    | ...
    | `QUOTATION quot -> expand_quotatiton quot
  ] ]
But one can also be more specific: specific_patterns_example.ml
[ [ `INT ((0 | 2) as i, _); ... -> ...
  | `INT (42, _); ... -> ...
  | `INT (i, s); ... -> ... ] ]

In order to support a nice pretty-printing the Camlp4 token type hold both versions of the value. For instance in INT(i, s), i is the integer value and s is the source string without processing.

Quotation/antiquotation syntax additions

Duality with the `PATTERN, the $`int:...$ antiquotation

In a quotation <:expr< $int:"42"$ >> the type of this antiquotation contents is a string. With the pattern support in grammars one can now directly get the integer value of an INT token but sometimes you're constrained to write <:expr< $int:string_of_int i$ >>. To get rid of this one can write <:expr< $`int:i$ >> (thanks to Alain Frish for this idea).

Here a simple equivalence table:

$`int:i$       -> $int:string_of_int i$
$`int32:i$     -> $int32:Int32.to_string i$
$`int64:i$     -> $int64:Int64.to_string i$
$`nativeint:i$ -> $nativeint:Nativeint.to_string i$
$`flo:f$       -> $flo:string_of_float f$
$`str:s$       -> $str:String.escaped s$
$`chr:c$       -> $chr:Char.escaped c$

Pitfalls

Here a list of common errors encountered during the migration to this new version of Camlp4.

More details

Locations

The new OCaml pretty-printer

A new OCaml pretty-printer is under design and has the following features: