[cwn] Attn: Development Editor, Latest Caml Weekly News

Alan Schmitt alan.schmitt at polytechnique.org
Tue Jul 26 00:33:36 PDT 2005


Hello,

Here is the latest Caml Weekly News, for the week of 19 to 26 July,  
2005.

1) parsing OCaml code
2) (Mostly) Functional Design?
3) Pattern Matching Papers
4) pftdbns 0.2.8
5) How to do this properly with OCaml?
6) ocamldap 2.1.4
7) Ocamlnet 1.1

========================================================================
1) parsing OCaml code
Archive: <http://thread.gmane.org/gmane.comp.lang.caml.general/29668>
------------------------------------------------------------------------
** Nate Gaylinn asked and Sylvain Le Gall answered:

 > I'm trying to figure out a nice way to parse OCaml code in a C++  
program.
 > All I need is a very basic parse tree for use in syntax  
highlighting and
 > indenting OCaml code.
 >
 > I would love to make use of some of the programs that already come  
with
 > OCaml. For instance, ocamlc and camlp4 can both output OCaml parse  
trees
 > already! The only problem is the formats that these programs  
output are
 > (to my knowledge) completely undocumented.
 >
 > If I don't use these programs, I could always try using Lex and  
Yacc to do
 > the job, but I'm completely unfamiliar with these utilities and it  
would
 > take me quite some time to write input for parsing OCaml.
 >
 > Does anyone know where I could find reference to the various OCaml  
parse
 > tree formats? Does anyone know where I could find Yacc grammar
 > descriptions of OCaml, or would I have to write them myself? Does  
anyone
 > have any other suggestions of how to tackle the parsing problem?

You should have a look to ocaml-ast-analyze
(<http://www.carva.org/sylvain.le-gall/ocaml-ast-analyze.html>). It is a
module build around camlp4 which can overridde the output of camlp4 (ie
it can replace pr_dump.cmo for example).

I use it to analyze ocaml code in order to find functions (which are
"s_", "f_"... ) followed by a string (s_ 'Coucou') and output only the
string ("msgstrid 'Coucou').

I think it is very useful to do analysis on Ocaml AST.

There is no documentation, but i am planning to write some.

** Eric Cooper also answered:

OCaml uses ocamlyacc and ocamllex for its front-end: see
.../parsing/parser.mly and lexer.mll.

ocamlyacc is very similar to yacc/bison; reuse of the grammar should
be straightforward.

ocamllex is similar to lex/flex, but it supports mutually recursive
rules instead of lex's explicit "start conditions", so you might have
a bit more work if you want to translate it.

========================================================================
2) (Mostly) Functional Design?
Archive: <http://thread.gmane.org/gmane.comp.lang.caml.general/29595>
------------------------------------------------------------------------
** In the middle of this huge thread, Doug Kirk said and James  
Woodyatt added:

 > First, since this thread was started by somebody asking for ideas on
 > learning FP, I can site a couple of printed resources that have  
helped
 > me, but none are written using Ocaml:
 >
 > "Haskell School of Expression", Hudak, ISBN 0-52164-4089
 > "Algorithms: A Functional Programming Approach", Rabhi, Lapalme, ISBN
 > 0-20159-6040
 > "Structure & Interpretation of Computer Programs", Sussman, Abelson,
 > Sussman, ISBN 0-26269-2201

I would add that a good tutorial on monads is a resource that every
functional programmer should read for comprehension.  I haven't found
one that I can recommend without hesitation, but here's a candidate:

         <http://www.nomaware.com/monads/html/>

========================================================================
3) Pattern Matching Papers
Archive: <http://thread.gmane.org/gmane.comp.lang.caml.general/29583>
------------------------------------------------------------------------
** Continuing the thread from last week, Norman Ramsey said:

In addition to other papers mentioned on this list, there are two
unpublished papers that may have some value:

    Baudinet, Marianne and David MacQueen. 1985 (December). Tree pattern
    matching for ML (extended abstract). Unpublished manuscript, AT&T  
Bell
    Laboratories.

    Scott, Kevin and Norman Ramsey. 2000 (May). When do match- 
compilation
    heuristics matter? Technical Report CS-2000-13, Department of  
Computer
    Science, University of Virginia.

My paper with Kevin Scott has some tutorial stuff in it.  If you like,
I can try to dig out source code, but it's in SML and may not be so  
useful.

========================================================================
4) pftdbns 0.2.8
Archive: <http://thread.gmane.org/gmane.comp.lang.caml.general/29724>
------------------------------------------------------------------------
** Oliver Bandel announced:

I have to announce version 0.2.8 of pftdbns,
a tool which is useful in sorting/listing/moving files.

It's name "pftdbns" is a short hand for "put files to directories  
(sorted) by name structure".

It takes filenames, maps each char of the filename into a char,  
representing
the charclass of it (a..z and A..Z -> "l" (letter), 0...9 ->  
"i" (integer" and so on)).

This yields to an easy way of sorting files by names, based upon file- 
naming
with certain filenaming-conventions.

So, for example  "hello.txt" and "ballo.txt" are part of the same  
name structure,
as well as "1001.txt" and "8251.txt" but also "8251.jpg" are of the  
same name
structure. For example "foobar.tex" and "foobar.txt" are equally  
structured too.

The default behaviour is to move files into directories. The names of  
the directories
are choosen from the string, which represents the name structure by  
default.

Since version 0.2.2 the default-behaviour of moving files can be changed
to only list the lilenames/namestructrues (and not moving them).

Changes in 0.2.8:
=================

Bug-Fix:
--------

   -inv option corrected!
          It didn't worked properly in version 0.2.6,
          because the filter function was stupid!

Source restructuring:
---------------------

Some changes in datastructures to have them more clear
and have a more efficient program (multiple/unnecessary
calls of the function "scan_string" removed).
This was done in the parts of the program, where the
file/template lists are build.

I hope you enjoy this program, and I think if you have to handle a lot
of files, this will be very helpful.

You can find the tool here:
         <ftp://www.belug.org/new/user/ob/Programs/Tools/pftdbns/>

There also is a README in this directory, so that you can read more  
details.

A description can also be found here:
    <http://www.belug.org/~ob/ftp-area.html>

pftdbns uses the GPL-license.

========================================================================
5) How to do this properly with OCaml?
Archive: <http://thread.gmane.org/gmane.comp.lang.caml.general/29708>
------------------------------------------------------------------------
** Thomas Fischbacher started a huge (sometimes flaming) thread asking:

I repeatedly stumble across problems where I would like to use a
programming style which seems quite out of line with how one is supposed
to use OCaml (to say it otherwise, it looks outright wrong and
horrendously ugly), but which nevertheless might be just appropriate for
the situation in question. So, I would like to hear the opinion of the
OCaml community about this.

Imagine constructing a binary heap of tuples. It is somewhat neat to be
able to just use an array to store the entries of the heap which is
pre-allocated to some fixed size and replaced by a larger array whenever
more space is needed. The only thing known about the heap's entries at
compile time is that they are bound to be tuples, hence boxed objects,
but nothing else is known about their size or even type.

As one does not have a prototype of such a tuple at the time the  
array is
created, it seems to me as if the best thing one could do in OCaml would
be:

(1) Make an array of <tuple> option and initially fill it with None.

(2) Make an optional array of tuples which is None until the first entry
is made.

One drawback of approach (2) is that when we "remove" entries from the
heap, the underlying array will keep unnecessary references to values
which by chance might be quite big, and which we might want to be
reclaimed by GC, so that's not too beautiful for a general-purpose
application.

The major drawback of (1), however, is that there is a further level of
indirection for array entries, which means some unnecessary consing, as
well as more work for the machine to get at a given entry.

If I just define a function

let whatever () = Obj.magic (1,2);

then

let a = Array.make 10 (whatever());;

would give me more or less just what I want. An array of boxed things  
that
I then would like to use as in:

# let () = a.(2) <- (1,2,3,4) in a.(2);;
- : int * int * int * int = (1, 2, 3, 4)
# let () = a.(3) <- (5,8,2,1) in a.(2);;
- : int * int * int * int = (1, 2, 3, 4)
# a.(3);;
- : int * int * int * int = (5, 8, 2, 1)

Mind you - in this case, I will only assign int*int*int*int tuples to  
that
array, or the result of the evaluation of whatever() when I want to kill
an entry. Plus, I guarantee never to look at any entry that is set to
whatever().

In some other situation, I might be inclined to use the same code, but
with string*bool instead of int*int*int*int. But again, I will promise
never to put anything else but string*bool into the underlying array,  
and
never look at entries which are not set properly. (Which, of course,
includes never printing such an array at the toplevel unless it is fully
occupied.)

Please don't tell me that "well, OCaml is not Lisp, after all".
This I already know. But is there no proper way to get around that
(technically speaking) unnecessary extra level of indirection that is
forced upon me due to static type checking?

The very same problem appears in a different guise when one tries to  
write
a function that flattens 'a array array -> 'a array. The best I could
come up with here is:

let join_arrays aa =
   let len_total = Array.fold_left (fun sf a -> sf+Array.length a) 0 aa
   and nr_arrays = Array.length aa
   in
   if len_total=0
   then
     if nr_arrays = 0 then [||] else aa.(0)
       (* ^ Take a close look! *)
   else
     (* Here, we face the problem that we have to get some value
        of the proper type to init the joined array with.
      *)
     let first_entry =
       (let rec seek pos =
         let array_here = aa.(pos) in
         if Array.length array_here = 0
         then seek (pos+1)
         else array_here.(0)
             (* This is guaranteed to terminate, as len_total>0! *)
       in seek 0)
     in
     let result = Array.make len_total first_entry in
     let transfer_to_result arr offset =
       let nr = Array.length arr in
       for i = 0 to nr-1 do result.(i+offset) <- arr.(i) done
     in
     let rec populate nr_array_now offset =
       if nr_array_now = nr_arrays
       then result
       else
         let array_now = aa.(nr_array_now) in
         let () = transfer_to_result array_now offset in
         populate (1+nr_array_now) (offset+Array.length array_now)
     in
     populate 0 0
;;

Of course, it is pretty awkward having to scan for the first element in
the first nonempty array in an array of arrays, especially as that  
element
really does not play a that special role.

Could anyone please tell me how to do all this in a more appropriate  
way?

** Deep into the thread (please read the read online), Michael  
Alexander Hamburg said and Xavier Leroy replied:

 > I was constructing a binary heap of tuples the other day.  After  
pondering
 > these options, I just used Obj.magic 0 as the null value in the  
array.
 > The heap was in a module, so nothing else could see the array, and  
I could
 > prove that the code never accessed the null elements, so the use of
 > Obj.magic seemed justified.

In other terms:

" I was walking in the city the other day.  I saw a syringe lying on
   the sidewalk.  I stuck the needle in my forearm.  That was a classy
   neighborhood, so the use of the syringe seemed justified. "

Sorry for being sarcastic, but I strongly feel that any suggestion
to use Obj functions should be avoided on this list.  The OCaml
compiler performs some type-dependent optimizations that can result in
incorrect code (w.r.t. GC invariants) if wrong types are given using
Obj.magic.

For instance, the following implementation of "magic" arrays will
eventually cause the GC to crash:

type 'a t = int array
let get (a: 'a t) i = (Obj.magic a.(i) : 'a)
let set (a: 'a t) i (x: 'a) = a.(i) <- (Obj.magic x : int)

while the same code with "string" instead of "int" will not.  You
don't understand why?  Then, don't use Obj.magic.

A few years ago, I spent one full day debugging a mysterious crash
in code provided by a user, then realized that the problem was exactly
the use of Obj.magic outlined above.  I then swore to throw away all
bug reports whose repro case uses Obj.  So, you can break the type
system with Obj, but you get to keep the pieces afterwards.

Coming back to the initial question, I would first warn against
premature optimization: quite possibly the overhead of the "option"
solution is negligible.  If not, just ask the user to pass an initial
value of the heap element type to the "create heap" function.

========================================================================
6) ocamldap 2.1.4
Archive: <http://thread.gmane.org/gmane.comp.lang.caml.general/29781>
------------------------------------------------------------------------
** Eric Stokes announced:

Announcing the release of ocamldap 2.1.4, which includes bugfixes,
and performance enhancements.

      The BER decoder sees probably its last significant performance
increase in this release, however thats ok, because the increase is a
quite large 2.5x speedup. To put this in perspective the bytecode
decoder is now 10% faster than the first release of the native code
decoder was, and is 14 times faster than perl's Net::LDAP.

========================================================================
7) Ocamlnet 1.1
Archive: <http://thread.gmane.org/gmane.comp.lang.caml.general/29782>
------------------------------------------------------------------------
** Gerd Stolpmann announced:

The Ocamlnet project announces version 1.1 of this library collection.
This release focuses on important additions:

- Nethttpd is a library with HTTP 1.1 daemon functionality. The core
   of this daemon is quite complete; almost all HTTP 1.1 core features
   are supported (more or less only multipart/byteranges are omitted).
   The daemon can serve static and dynamic pages. There are two
   operating modes: In reactive mode, the server works sequentially,
   and in event-based mode, the server can do multiplexing. What is
   still missing are ready-to-use MPM containers (but it is already very
   easy to create a multi-threaded server with very good performance).

   Nethttpd was sponsored by Alex Baretta's company Baretta s.r.l.
   and implemented by me. Thank you, Alex. Nethttpd is released under
   the GPL (unlike the rest of Ocamlnet).

- Smtp is a simple SMTP client library, i.e. it can transfer mail
   messages to a mail server.

   Smtp is a contribution by Pierre Habouzit.

Furthermore, there are a few bugfixes:

- Netmime.write_mime_message: A bug in CR/LF handling has been resolved.

- The FastCGI library can now handle POST requests larger than 4K

- Newer versions of the PCRE library are now supported.

The addition of Nethttpd also made some refactoring necessary:

- There is a new Nethttp module with common functions for all components
   dealing with HTTP. Nethttp includes an almost complete set of parsers
   and printers for the individual HTTP headers.

- The cgi_environment class type has been extended. Especially,
   one can now set an error logging function.

------------------------------------------------------------------------ 
----
What is Ocamlnet?
------------------------------------------------------------------------ 
----

A collection of modules for the Objective Caml language which focus on
application-level Internet protocols and conventions.

The current distribution contains:

- a mature implementation of the CGI protocol

- an implementation of the JSERV protocol (AJP-1.2), can be used with
   mod_jserv (Apache JServ) and mod_jk (Jakarta connector) to connect
   application servers written in O'Caml with web servers

- an implementation of the FastCGI protocol, for the same purpose

- an HTTP daemon, for substituting the web server completely

- a POP3 client

- an SMTP client

- a library of string processing functions related to Internet
   protocols (formerly known as "netstring" and distributed separately):
   MIME encoding/decoding, Date/time parsing, Character encoding
   conversion, HTML parsing and printing, URL parsing and printing,
   OO-representation of channels, and a lot more.

Ocamlnet is developed as a SourceForge project:

   <http://sourceforge.net/projects/ocamlnet>

Developers and code contributions are welcome.

Ocamlnet is licensed under the zlib/libpng license, except the HTTP
daemon, which is released under the terms of the GPL.

========================================================================
Using folding to read the cwn in vim 6+
------------------------------------------------------------------------
Here is a quick trick to help you read this CWN if you are viewing it  
using
vim (version 6 or greater).

:set foldmethod=expr
:set foldexpr=getline(v:lnum)=~'^=\\{78}$'?'<1':1
zM
If you know of a better way, please let me know.

========================================================================
Old cwn
------------------------------------------------------------------------

If you happen to miss a CWN, you can send me a message
(alan.schmitt at polytechnique.org) and I'll mail it to you, or go take  
a look at
the archive (<http://sardes.inrialpes.fr/~aschmitt/cwn/>) or the RSS  
feed of the
archives (<http://sardes.inrialpes.fr/~aschmitt/cwn/cwn.rss>). If you  
also wish
to receive it every week by mail, you may subscribe online at
<http://lists.idyll.org/listinfo/caml-news-weekly/> .

========================================================================

Alan Schmitt, <http://sardes.inrialpes.fr/~aschmitt/>


-- 
The hacker: someone who figured this out and made something cool happen.
.O.
..O
OOO


-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part
Url : http://lists.idyll.org/pipermail/caml-news-weekly/attachments/20050726/1919a83e/PGP.bin


More information about the caml-news-weekly mailing list