[cwn] Attn: Development Editor, Latest Caml Weekly News

Alan Schmitt alan.schmitt at polytechnique.org
Tue Mar 24 02:56:07 PDT 2009


Hello,

Here is the latest Caml Weekly News, for the week of March 17 to 24,  
2009.

1) XML output
2) ocaml-http is looking for a new maintainer
3) Google summer of Code proposal

========================================================================
1) XML output
Archive: <http://groups.google.com/group/fa.caml/browse_thread/thread/165323537f036e34# 
 >
------------------------------------------------------------------------
** Rémi Dewitte asked and Gerd Stolpmann answered:

 > I have used pxp to parse xml and I am happy with it. I'd like now to
 > produce xml and wonder what are the options to do so (possibly the
 > simpliest).

Maybe not the simplest: Use the PXP preprocessor to create the output
tree, and print the tree:

<http://projects.camlcity.org/projects/dl/pxp-1.2.1/doc/manual/html/ref/Intro_preprocessor.html 
 >
<http://projects.camlcity.org/projects/dl/pxp-1.2.1/doc/manual/html/ref/Pxp_document.document.html#2_WritingdocumentsasXMLtext 
 >


 >
 > I think I am going to start with the Printf module. I wonder how well
 > it handles utf8 for example.

UTF-8 are just bytes for printf.

 > And I'll have to write a kind of xml_encode function. I am pretty  
sure
 > it has already be done somewhere !

let xml_encode =
   Netencoding.Html.encode
     ~in_enc:`Enc_utf8
     ~out_enc:`Enc_usascii
     ~prefer_names:false
     ()

That would assume the input is UTF-8 encoded, and the output is
ASCII-encoded. You can control which ASCII characters get the special
XML representation &...; with the unsafe_chars optional argument.
Docs are at
<http://projects.camlcity.org/projects/dl/ocamlnet-2.2.9/doc/html-main/Netencoding.Html.html 
 >
			
** Sylvain Le Gall also replied:

Maybe it is a bit overkilling, but there is also ocamlduce.

See there:
<http://www.cduce.org/ocaml>
(dev for ocaml 3.11:)
<http://ocamlduce.forge.ocamlcore.org/>
<http://git.ocamlcore.org/cgi-bin/gitweb.cgi?p=ocamlduce/ocamlduce.git;a=summary 
 >

OCamlduce can also be used with Eliom/OCsigen.

AFAIK, using ocamlduce can help you to type check your output tree
directly within OCaml compiler...
			
** Matthieu Wipliez also replied:

Yet another solution is Xmlm by Daniel Bünzli.

<http://erratique.ch/software/xmlm>

This is probably the easiest and lightweight solution: Xmlm comes as a  
single
module and its interface, and it's BSD so you can just copy/paste it  
into your
project.
			
** Michael Ekstrand then added:

I second the xmlm suggestion.  Polling event-based parsing is very slick
and maps well into the functional paradigm, and its XML writing support
(generating a stream of events identical to those you read) makes
generation quite intuitive and reliable.
			
========================================================================
2) ocaml-http is looking for a new maintainer
Archive: <http://groups.google.com/group/fa.caml/browse_thread/thread/3e99820fd9b65b57# 
 >
------------------------------------------------------------------------
** Stefano Zacchiroli announced:

Hi all, ocaml-http [1] is looking for a new maintainer.
More details have been written on my blog [2].

If you are interested in taking over the maintenance, please mail me
in private.

Cheers.

[1] <http://upsilon.cc/~zack/hacking/software/ocaml-http/>
[2] <http://upsilon.cc/~zack/blog/posts/2009/03/ocaml-http_is_looking_for_a_new_maintainer/ 
 >
			
========================================================================
3) Google summer of Code proposal
Archive: <http://groups.google.com/group/fa.caml/browse_thread/thread/3249f410fe061579# 
 >
------------------------------------------------------------------------
** In this long thread full of technical posts, Andrey Riabushenko  
said and Xavier Leroy replied:

 > I would like to develop LLVM frontend to Ocaml language. LLVM does  
participate
 > in GSoC. LLVM do not mind to developed a ocaml frontend as LLVM  
GSoC project.
 > I want to discuss details with you before I will make an official  
proposal to
 > LLVM. [...]

Do authors of ocaml has something to say about the idea?

Da. A number of things, actually.

1- I know of at least 3, maybe 4 other projects that want to do
something with OCaml and LLVM.  Clearly, some coordination between
these efforts is needed.

2- If you're applying for funding through a summer of code project,
you need to articulate good reasons why you want to combine OCaml and
LLVM.  "Ocaml is cool, LLVM is cool, so OCaml+LLVM would be extra
cool" is not enough.  "It will generate PIC-16 code" isn't either.
Run-time code generation could be a good enough reason, but you still
need to weigh the development cost of the LLVM approach against, for
example, hacking the current OCaml code generator so that it emits
machine code in memory instead of assembly code.

3- A language implementation like OCaml breaks down in four big parts:
       1- Front-end compiler
       2- Back-end compiler and code emitter
       3- Run-time system
       4- OS interface
Of these four, the back-end is not the biggest part nor the most
complicated part.  LLVM gives you part 2, but you still need to
consider the other 3 parts.  Saying "I'll do 1, 3 and 4 from scratch",
Harrop-style, means a 5-year project.  To get somewhere within a
reasonable amount of time, you really need to reuse large parts of the
existing OCaml code base.

4- From a quick look at LLVM specs, the two aspects that appear most
problematic w.r.t. Caml are exception handling and GC interface.
LLVM seems to implement C++/Java "zero-cost" exceptions, which are
known to be quite costly for Caml.  (Been there, done that, in the
early 1990s.)  I regret that LLVM does not provide support for
alternate implementations of exceptions, like the C-- intermediate
language of Ramsey et al does:
  <http://portal.acm.org/citation.cfm?id=349337>

The big issue, however, is GC interface.  There are GC techniques that
need no support from the back-end: stack maps and conservative
collection.  Stack maps are very costly in execution time.
Conservative GC like the Boehm-Weiser GC is already much better.  But
for full efficiency, back-end support is required.  LLVM documents a
minimal interface to produce stack maps and make them available to the
GC at run-time:
  <http://llvm.org/docs/GarbageCollection.html>

The first thing you want to investigate is whether this interface is
enough to support an exact, relocating collector such as
OCaml's. According to
  <http://www.nabble.com/Garbage-collection-td22219430.html>
Gordon Henriksen did succeed in interfacing OCaml's GC with LLVM.
Maybe there is still some more work to do here, in which case I
recommend you look into this first.

6- Here is a schematic of the Caml compiler.  (Use a fixed-width font.)

            |
            | parsing and preprocessing
            v
         Parsetree (untyped AST)
            |
            | type inference and checking
            v
         Typedtree (type-annotated AST)
            |
            | pattern-matching compilation, elimination of modules,  
classes
            v
         Lambda
          /  \
         /    \ closure conversion, inlining, uncurrying,
        v      \  data representation strategy
     Bytecode   \
                 \
                Cmm
                 |
                 | code generation
                 v
              Assembly code

In my opinion, the simplest approach is to start at Cmm and generate
LLVM code from there.  Starting at one of the higher-level
intermediate forms would entail reimplementing large chunks of the
OCaml compiler.  Note that Cmm is quite close to the C-- intermediate
language mentioned earlier, so there is a lot to learn from Fermin
Reig's experience with an OCaml/C-- back-end.

7- To finish, I'll preventively deflect some likely reactions by Jon
Harrop:

"But you'll be tied to OCaml's data representation strategy!"
  Right, but 1- implementing you own data representation strategy is
  a lot of work, especially if it is type-based, and 2- OCaml's
  strategy is close to optimal for symbolic computing.

"But LLVM assembly is typed, so you must have types!"
  Just use int32 or int64 as your universal type and cast to
  appropriate pointer types in loads or stores.

"But your code will be tainted by OCaml's evil license!"
  It is trivial to make ocamlopt dump Cmm code in a file or pipe.
  (The -dcmm debug option already does this.)  Then, you can have a
  separate, untainted program that reads the Cmm code and transforms it.

"But shadow stacks are the only way to go for GC interface!"
  No, it's probably the worst approach performance-wise; even a
  conservative GC should work better.
			
** Jon Harrop replied:

 > 3- A language implementation like OCaml breaks down in four big  
parts:
 >        1- Front-end compiler
 >        2- Back-end compiler and code emitter
 >        3- Run-time system
 >        4- OS interface
 > Of these four, the back-end is not the biggest part nor the most
 > complicated part.  LLVM gives you part 2, but you still need to
 > consider the other 3 parts.  Saying "I'll do 1, 3 and 4 from  
scratch",
 > Harrop-style, means a 5-year project.

On the contrary, my "style" was to provide the features that I value
(multicore & FFI) in a usable form (stop-the-world) with the shortest
possible development time (i.e. <<6 months to create something useful).
Specifically:

1- Front-end compiler: use camlp4 to provide an embedded DSL for
high-performance parallel numerics and/or reuse front-ends from existing
compilers like OCaml, PolyML, MosML, NekoML to build completely new  
language
implementations.

2- Back-end compiler and code emitter: reuse LLVM.

3- Run-time system: write the simplest possible precise GC and use
stop-the-world to apply it to threads that may then run in parallel.

4- OS interface: make it as easy as possible to call C directly.

HLVM had solved (2), (3) and (4) after only 3 months of part-time  
work. I
vindicated my style!

 > 7- To finish, I'll preventively deflect some likely reactions by Jon
 > Harrop:
 >
 > "But you'll be tied to OCaml's data representation strategy!"
 >   Right, but 1- implementing you own data representation strategy is
 >   a lot of work, especially if it is type-based, and

Actually I found that easy, not least because I wanted a user-friendly  
FFI so
I just used C's data representation whenever possible.

 >   2- OCaml's strategy is close to optimal for symbolic computing.

Is MLton not several times faster than OCaml for symbolic computing?

 > "But LLVM assembly is typed, so you must have types!"
 >   Just use int32 or int64 as your universal type and cast to
 >   appropriate pointer types in loads or stores.

That is entirely possible and could be useful as an incremental  
improvement to
OCaml's existing bytecode interpreter but it is not a step toward my  
goals.

 > "But your code will be tainted by OCaml's evil license!"
 >   It is trivial to make ocamlopt dump Cmm code in a file or pipe.
 >   (The -dcmm debug option already does this.)  Then, you can have a
 >   separate, untainted program that reads the Cmm code and  
transforms it.

Again, that is another technically-feasible step away from my goals  
because
OCaml's CMM has already been mangled for its data representation (e.g.  
31-bit
ints, boxed floats).

 > "But shadow stacks are the only way to go for GC interface!"
 >   No, it's probably the worst approach performance-wise; even a
 >   conservative GC should work better.

Building a state-of-the-art optimized concurrent GC Leroy-style means an
infinity-year project. =:-p

Seriously though, I think it is essential to get a first working  
version of a
GC that permits parallel threads. HLVM will be useful to a lot of  
people long
before its GC gets optimized.
			
========================================================================
Using folding to read the cwn in vim 6+
------------------------------------------------------------------------
Here is a quick trick to help you read this CWN if you are viewing it  
using
vim (version 6 or greater).

:set foldmethod=expr
:set foldexpr=getline(v:lnum)=~'^=\\{78}$'?'<1':1
zM
If you know of a better way, please let me know.

========================================================================
Old cwn
------------------------------------------------------------------------

If you happen to miss a CWN, you can send me a message
(alan.schmitt at polytechnique.org) and I'll mail it to you, or go take a  
look at
the archive (<http://alan.petitepomme.net/cwn/>) or the RSS feed of the
archives (<http://alan.petitepomme.net/cwn/cwn.rss>). If you also wish
to receive it every week by mail, you may subscribe online at
<http://lists.idyll.org/listinfo/caml-news-weekly/> .

========================================================================

-- 
Alan Schmitt <http://alan.petitepomme.net/>

The hacker: someone who figured things out and made something cool  
happen.
  .O.
  ..O
  OOO


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.idyll.org/pipermail/caml-news-weekly/attachments/20090324/8045f2b7/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 195 bytes
Desc: This is a digitally signed message part
Url : http://lists.idyll.org/pipermail/caml-news-weekly/attachments/20090324/8045f2b7/attachment-0001.pgp 


More information about the caml-news-weekly mailing list