[cwn] Attn: Development Editor, Latest Caml Weekly News

Alan Schmitt alan.schmitt at polytechnique.org
Tue Jan 30 02:27:57 PST 2007


Here is the latest Caml Weekly News, for the week of January 23 to  
30, 2007.

1) Job opportunity in Paris, around the CDuce project
2) Interfacing with C question...
3) The OCaml Summer Project
4) cmigrep
5) TestSimple - A simple unit testing framework
6) lablpcre-1.0 - a PCRE binding for Objective Caml

1) Job opportunity in Paris, around the CDuce project
Archive: <http://groups.google.com/group/fa.caml/browse_frm/thread/ 
** Alain Frisch announced:

The announce below might be of interest for people looking for jobs
involving OCaml.

   -- Alain Frisch

Position available at Paris 7

The laboratory PPS of University Paris 7 is looking for candidates for a
one year position, available immediately, around the CDuce project.
CDuce is a programming language for XML (see <http://www.cduce.org>),
close in spirit to OCaml, and whose compiler is implemented in OCaml.

According to the profile of the recruited person the position will focus
more on the development environment (e.g.: libraries for web development
or web services, Eclipse plugins, Windows port) or on the research
aspects (e.g.: concurrency, typing, distribution, verification) around

Candidates should be fluent in OCaml or in another functional language.
Experience in Web development, environments for software development
and/or XML would be useful as well.

The annual gross salary will be around 28.000 Euros. If interested
please send a mail to staff at cduce.org as soon as possible.

Giuseppe Castagna
2) Interfacing with C question...
Archive: <http://groups.google.com/group/fa.caml/browse_frm/thread/ 
** David Allsopp asked and Remi Vanicat answered:

 > Sorry if this an RADBOTFM case. Rule 2 in Chapter 18 of the manual  
 > that all local variables of type value must be declared using  
 > macros. However, later on when demonstrating caml_callback we get the
 > statements:
 > value* format_result_closure = caml_named_value("format_result");
 > return strdup(String_val(caml_callback(*format_result_closure,  

 > (I've "simplified" the opening lines for clarity here - naturally  
it should
 > be static and once only!).

 > Two questions arise:

 > 1. Presumably it's OK to cache values returned by caml_named_value  
 > declaring them in a CAMLlocal "call" or by using  

No, any C pointer to a caml value must be known to the caml GC,
because the caml GC might move the caml value. But you might no
declare such a pointer if you are sur that the GC won't be triger
between your affectation of the value to the C variable, and the use
of the C variable.*
In the given exemple, this is the case: nothing is done between the
affectation and the use.

By the way, when in doubt, or when you are a begginer not nowing well
how the GC work, you should always use the CAMLlocal call.

 > 2. The result of caml_callback is passed straight to String_val.  
 > if I expand the line to:
 > value result = caml_callback(*format_result_closure, Val_int(n)));
 > return strdup(String_val(result));

 > then does that work ok without using CAMLlocal1(result);

Yes, for the same reason: you do nothing between the affectation of
the result function, and the use of the variable.
** Hendrik Tews also answered:

 > Sorry if this an RADBOTFM case. Rule 2 in Chapter 18 of the manual  
 > that all local variables of type value must be declared using  
 > macros. However, later on when demonstrating caml_callback we get the
 > statements:
 > value* format_result_closure = caml_named_value("format_result");

Note the type! format_result_closure is not of type value, Rule 2
does not apply!

 > 1. Presumably it's OK to cache values returned by caml_named_value  
 > declaring them in a CAMLlocal "call" or by using  

Yes it is OK. And you cannot use CAMLlocal or
register_global_root, because they only deal with values and not
pointers to values. The manual guarantees that the value pointed
to by the result of caml_named_value doesn't move. Probably
caml_named_value allocates a value outside the heap, registers it
as a global root and gives you back its address.

 > value result = caml_callback(*format_result_closure, Val_int(n)));
 > return strdup(String_val(result));
 > then does that work ok without using CAMLlocal1(result);

Yes. You should think of values as pointers that point to data
that is moved around by the garbage collector. If there is any
chance that the garbage collector is called, then you must make
sure that it updates your pointer when it moves the data. Hence
you have to register the value.

If the garbage collector is not called under any circumstances
then the data will not move, the pointer doesn't need to get
updated and you don't have to register the value.

Furthermore, if your are sure Is_long(your_variable) is true
under any circumstances, you don't have to register
your_variable, because it is not a pointer.

Of course all that is strongly discouraged.
3) The OCaml Summer Project
Archive: <http://groups.google.com/group/fa.caml/browse_frm/thread/ 
** Yaron M. Minsky  announced:

I am pleased to announce the OCaml Summer Project. The project is aimed
at encouraging growth in the OCaml community by funding students over
the summer to work on open-source projects in OCaml. At the end of the
summer, we will fly all of the students who have completed their
projects succesfully out for a meeting in New York, where people will
present their projects and get a chance to shmooze with other members of
the OCaml community.
The project is being funded and run by Jane Street Capital. As people on
the list likely know at this point, we make extensive use of OCaml here
at Jane Street, and are excited about the idea of encouraging and
growing the OCaml community.

If you'd like to learn more about the project, you can look at our
website here:


We'd love to have professors tell their students about the project,
since we hope it will do some real good in terms of increasing interest
in functional programming.

Please direct any questions or suggestions you have to
osp at janestcapital.com.
** Gabriel Kerneis asked and Yaron M. Minsky answered:

 > And I'm an interested student looking for ideas (the binary trees
 > project on the OCSP page seems fine but I'd be glad to have more
 > ideas) ;-)
 > Other questions :
 > - is it possible to propose several projects (per student) and let
 > the OCSP team decide which one is the better ?

Just one proposal per student, I'm afraid.  We'd rather you chose a  
you liked and believed in, and come up with the best proposal you can fo

 > (if i were to be selected and finish the job) what kind of
 > visa/passport/etc. do one need to come to the USA (I'm living in
 > France) - but i guess i can find this on the Internet - and to what
 > extent will you pay the flight/food/housing/etc. fees ?

We will pay for the travel and your stay.  I don't expect we'll pay  
for all
of your meals, but we'll have a few dinners for the group.  As for  
the visa
issue, my expectation is that students will figure this out on their  
own.  I
suspect that for the most part I believe a tourist visa should do, but I
don't really know the details, and it will no doubt vary from country to
** Jon Harrop said:

Sounds like an excellent idea and the projects all look fascinating.  
I do have some comments on the "Binary tree library" project:
OCaml currently has two separate implementations of AVL trees in Map  
and Set
functors. Set already has fast union and split operations.

Having two separate implementations is wasteful but more efficient. The
underlying tree code could be factored out into another functor but  
this is
costly in terms of performance. Also, the OCaml stdlib has used an  
odd choice
of optimisations: inlining height calculation (which is quite a small  
in the context of functors and polymorphism) but not amortising  
trees into a separate constructor (which can remove up to 50% of the  
effort). So the code can be made shorter and faster.

I've already implemented my own AVL set using the node-specialisation  
Performance is ~30% faster, IIRC. I've also wanted to write a functional
array based on AVL trees (O(log n) lookup but fast sub, append, insert,
delete etc.) and a camlp4 extension to support pattern matching over  
type. Lists and arrays are rather priviledged containers in OCaml,  
pattern matching and literals, but trees are better in many respects and
would make an excellent general-purpose container.

Finally, having to use functors does obfuscate OCaml code that deal  
with Sets
and Maps in many cases, particularly because there are no built-in  
Int and
Float modules so you must write your own. I often find that this  
code is as long as all of the code using the Sets/Maps. Although it  
be "dangerous", Sets and Maps implemented without functors are much  
easier to
use. After all, Hashtbl is typically used in that way.

It is also worth noting that several people (Diego, Jean-Christophe)  
written other tree libraries using various data structures (RB, trie,  
etc.). As far as I can tell, AVL trees are a good all-rounder.

Best of luck with the projects!
** Xavier Leroy said:

I just wanted to say a big "thank you" to you and the Jane Street
Capital people for donating the time and money to organize such an
event.  It will be very interesting to see what comes out of it.
My attempt to put the visa discussion at rest.  I believe Yaron and
Markus are right: a tourist visa (or visa waiver program) is
probably enough; whether you need an actual visa or not is a
complicated function of your country of citizenship and of the date
your passport was issued, but for citizens of "old Europe", this
function returns "no visa needed" with high probability.

For more details, see the Web sites of the ministry of foreign affairs
or of the US consulate in your country (URLs for France included below
for your convenience).

Let me add that if you never visited Manhattan before, it's well worth
the trip.  One more reason to participate in this project!


4) cmigrep
Archive: <http://groups.google.com/group/fa.caml/browse_frm/thread/ 
** Eric Stokes announced:

I am happy to announce the immediate availability of cmigrep, a small
utility to mine cmi files for interesting bits of data. cmigrep is
available in godi, or at <http://homepage.mac.com/letaris>
A short description of features,

cmigrep: <args> <module>

cmigrep has two modes, the first and most common is that of searching
for various types of objects inside a module. Objects that you can
search for include

switch         purpose
-t             (regexp) print types with matching names
-r             (regexp) print record field labels with matching names
-c             (regexp) print constructors with matching names
-e             (regexp) print exceptions with matching constructors
-v             (regexp) print values with matching names
-o             (regexp) print all classes with matching names
-a             (regexp) print all names which match the given expression

These are all very useful for finding specific things inside a given
module. Here are a few examples,

find some constructors in the unix module

itsg106:~ eric$ cmigrep -c SO_ Unix
SO_DEBUG (* socket_bool_option *)
SO_BROADCAST (* socket_bool_option *)
SO_REUSEADDR (* socket_bool_option *)
SO_KEEPALIVE (* socket_bool_option *)
SO_DONTROUTE (* socket_bool_option *)
SO_OOBINLINE (* socket_bool_option *)
SO_ACCEPTCONN (* socket_bool_option *)
SO_SNDBUF (* socket_int_option *)
SO_RCVBUF (* socket_int_option *)
SO_ERROR (* socket_int_option *)
SO_TYPE (* socket_int_option *)
SO_RCVLOWAT (* socket_int_option *)
SO_SNDLOWAT (* socket_int_option *)
SO_LINGER (* socket_optint_option *)
SO_RCVTIMEO (* socket_float_option *)
SO_SNDTIMEO (* socket_float_option *)

full types get printed in the case that the constructors have
arguments. Notice that adding to the include path is modeled after the
compiler. Findlib is also supported.

itsg106:~ eric$ cmigrep -c "^Tsig_.*" -I /opt/godi/lib/ocaml/compiler-
lib Types
Tsig_value of Ident.t * value_description (* signature_item *)
Tsig_type of Ident.t * type_declaration * rec_status (*
signature_item *)
Tsig_exception of Ident.t * exception_declaration (* signature_item *)
Tsig_module of Ident.t * module_type * rec_status (* signature_item *)
Tsig_modtype of Ident.t * modtype_declaration (* signature_item *)
Tsig_class of Ident.t * class_declaration * rec_status (*
signature_item *)
Tsig_cltype of Ident.t * cltype_declaration * rec_status (*
signature_item *)

record field labels

itsg106:~ eric$ cmigrep -r "^st_" Unix
st_dev: int (* stats *)
st_ino: int (* stats *)
st_kind: file_kind (* stats *)
st_perm: file_perm (* stats *)
st_nlink: int (* stats *)
st_uid: int (* stats *)
st_gid: int (* stats *)
st_rdev: int (* stats *)
st_size: int (* stats *)
st_atime: float (* stats *)
st_mtime: float (* stats *)
st_ctime: float (* stats *)

findlib support, matching value names

itsg106:~ eric$ cmigrep -package pcre -v for Pcre
val foreach_line : ?ic:in_channel -> (string -> unit) -> unit
val foreach_file : string list -> (string -> in_channel -> unit) -> unit

nested modules

itsg106:~ eric$ cmigrep -v ".*" Unix.LargeFile
val lseek : file_descr -> int64 -> seek_command -> int64
val truncate : string -> int64 -> unit
val ftruncate : file_descr -> int64 -> unit
val stat : string -> stats
val lstat : string -> stats
val fstat : file_descr -> stats


itsg106:~ eric$ cmigrep -t ".*" Unix.LargeFile
type stats = {
    st_dev : int;
    st_ino : int;
    st_kind : file_kind;
    st_perm : file_perm;
    st_nlink : int;
    st_uid : int;
    st_gid : int;
    st_rdev : int;
    st_size : int64;
    st_atime : float;
    st_mtime : float;
    st_ctime : float;


itsg106:~ eric$ cmigrep -a ".*" Unix.LargeFile
val lseek : file_descr -> int64 -> seek_command -> int64
val truncate : string -> int64 -> unit
val ftruncate : file_descr -> int64 -> unit
type stats = {
    st_dev : int;
    st_ino : int;
    st_kind : file_kind;
    st_perm : file_perm;
    st_nlink : int;
    st_uid : int;
    st_gid : int;
    st_rdev : int;
    st_size : int64;
    st_atime : float;
    st_mtime : float;
    st_ctime : float;


val stat : string -> stats
val lstat : string -> stats
val fstat : file_descr -> stats
exception declarations

itsg106:~/cmigrep eric$ ./cmigrep -e ".*" Unix
exception Unix_error of error * string * string

The second mode of cmigrep is for searching for modules in it's path,
this lets you do a regular expression match on the full module path,
including sub modules. For example,

itsg106:~/cmigrep eric$ cmigrep -m Net -package netstring

here are all modules starting with Net that contain a sub module
itsg106:~/cmigrep eric$ cmigrep -m "^Net.*\." -package netstring

modules exactly one level deep
itsg106:~/cmigrep eric$ cmigrep -m "^[^.]*\.[^.]*$"
5) TestSimple - A simple unit testing framework
Archive: <http://groups.google.com/group/fa.caml/browse_frm/thread/ 
** Stevan Little announced:

I would like to announce the first release of our new unit testing
framework for OCaml. It can be downloaded from our website here:


It is based heavily on the Perl unit testing framework of the same
name, and produces TAP output (<http://en.wikipedia.org/wiki/ 
which can be read and analyzed by a wide
range of existing Perl tools. The goal of this framework is to make
writing unit tests as simple and as easy as possible (hence the
name). Here is a basic example taken from the TestSimple test suite

      #use "topfind";;
      #require "testSimple";;

      open TestSimple;;

      plan 9;;

      diag "... testing O'Caml TestSimple v0.01 ";;
      ok true "... ok passed";;
      is 2 2 "... is <int> <int> passed";;
      is 2. 2. "... is <float> <float> passed";;
      is "foo" "foo" "... is <string> <string> passed";;
      is [] [] "... is <'a list> <'a list> passed";;
      is [1;2;3] [1;2;3] "... is <int list> <int list> passed";;
      is ["foo";"bar"] ["foo";"bar"] "... is <string list> <string  
list> passed";;
      is (1,"foo") (1,"foo") "... is <int * string> <int * string>  
      is TAPDocument.Ok TAPDocument.Ok "... is <type> <type> passed";;

As this is our first released OCaml library, any and all feedback is
very much appreciated.
6) lablpcre-1.0 - a PCRE binding for Objective Caml
Archive: <http://groups.google.com/group/fa.caml/browse_frm/thread/ 
** Robert Roessler announced:

The "1.0" release of the LablPCRE OCaml binding for PCRE is now
available, fully supporting Linux and Windows builds PCRE versions 6.1
- 7.0 (current).
LablPCRE provides simple and easy to use access to regular expression
matching, offering a rich module-based interface based on PCRE's POSIX
functions wrapper.

This release has been built and tested using OCaml 3.09.3 on Fedora
Core 6 and Windows XP, supports findlib and "hands-off" building and
installing (no "configure" script or manual file editing required),
and has pre-built binaries for [native] Windows XP.  The full package
is licensed under the "new" BSD license, and may be downloaded here:

Using folding to read the cwn in vim 6+
Here is a quick trick to help you read this CWN if you are viewing it  
vim (version 6 or greater).

:set foldmethod=expr
:set foldexpr=getline(v:lnum)=~'^=\\{78}$'?'<1':1
If you know of a better way, please let me know.

Old cwn

If you happen to miss a CWN, you can send me a message
(alan.schmitt at polytechnique.org) and I'll mail it to you, or go take  
a look at
the archive (<http://alan.petitepomme.net/cwn/>) or the RSS feed of the
archives (<http://alan.petitepomme.net/cwn/cwn.rss>). If you also wish
to receive it every week by mail, you may subscribe online at
<http://lists.idyll.org/listinfo/caml-news-weekly/> .


Alan Schmitt <http://alan.petitepomme.net/>

The hacker: someone who figured things out and made something cool  

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part
Url : http://lists.idyll.org/pipermail/caml-news-weekly/attachments/20070130/934965e9/attachment.pgp 

More information about the caml-news-weekly mailing list