[cwn] Attn: Development Editor, Latest Caml Weekly News

Alan Schmitt alan.schmitt at polytechnique.org
Tue Oct 10 00:31:35 PDT 2006


Hello,

Here is the latest Caml Weekly News, for the week of October 03 to  
10, 2006.

1) Bindlib 3.0
2) float rounding
3) Memoization
4) Ancient module
5) ocamlopt under win32

========================================================================
1) Bindlib 3.0
Archive: <http://groups.google.com/group/fa.caml/browse_thread/thread/ 
fdafe2effc212fc0/50084ad4cfb63287#50084ad4cfb63287>
------------------------------------------------------------------------
** Christophe Raffalli announced:

I am pleased to announce the 3.0 version of Bindlib:
<http://www.lama.univ-savoie.fr/~raffalli/bindlib.html>

bindlib is a library and a camlp4 syntax extension for the
OCaml language. It proposes a set of tools to manage data structures
with bound and free variables. It includes fast substitution and  
management
of variables names including renaming.

As an exemple here is a piece of code to print and normalize  
(efficiently)
lambda-terms:

Warning: it does not work with ocaml 3.10(cvs) due to the camlp4  
revision. A
new version is planned
when 3.10 is officially out.

----------------8<-----------------------
type term =
    App of term * term
| Abs of (term,term) binder
| FVar of term variable

let fVar (x:term variable) = FVar(x)

(* simple printing of term *)

let rec print_term = function
    App(t1,t2) ->
      print_string "(";
      print_term t1;
      print_string " ";
      print_term t2;
      print_string ")"
| Abs f ->
      match f with bind fVar x in t ->
        print_string "fun ";
        print_string (name_of x);
        print_string " ";
        print_term t
| FVar(v) ->
      print_string (name_of v)

(* weak head normal form *)

let rec whnf = function
    App(t1,t2) as t0 -> (
      match (whnf t1) with
        Abs f -> whnf (subst f t2)
      | t1' -> if t1' != t1 then App(t1', t2) else t0)
| t -> t

(* call by name normalisation *)
let norm t = let rec fn t =
    match whnf t with
      Abs f ->
        match f with bind fVar x in u ->
        Abs(^ bindvar x in fn u ^)
    | t ->
        let rec unwind = function
           FVar(x) -> bindbox_of x
         | App(t1,t2) -> App(^unwind t1,fn t2^)
         | t -> assert false
        in unwind t
in unbox (fn t)
----------------8<-----------------------
			
========================================================================
2) float rounding
Archive: <http://groups.google.com/group/fa.caml/browse_thread/thread/ 
8cced5418a89870a/b50ade009a60d66f#b50ade009a60d66f>
------------------------------------------------------------------------
** Sean McLaughlin asked and Christophe Raffalli answered:

 > I'm using Ocaml for an interval arithmetic application.  I"m  curious
 > about
 > what the Ocaml parser/compiler does to float constants.  May I assume
 > that for any constant I enter, eg. 3.1415... (for N digits of pi),  
that
 > the compiler will give me a closest machine representable number?
 > i.e., if I bound a constant by the previous and next floating point
 > value to
 > that given me by the compiler,
 > will it always be the case that my original (mathematical) constant
 > lies in that interval?


This is not really an answer to your question but I wrote an OCaml  
binding to
the function to change the rounding mode of the proc. You can  
download it from
(the fifth item)
<http://www.lama.univ-savoie.fr/~raffalli/?page=soft&=en>
(four rounding modes are availaible: closest, toward +inf, -inf and  
zero. The
mode far from zero is not available)

And to answer your question, the default rounding mode is "closest"  
in OCaml
like in C (I think).
			
** Olivier Andrieu then added:

float_of_string (which uses strtod) already knows how to read a
hexadecimal float and there's a hack to have the C printf do the  
writing:
,----
| # external format_float: string -> float -> string =  
"caml_format_float" ;;
| external format_float : string -> float -> string =  
"caml_format_float"
| # let hex_float = format_float "%a" ;;
| val hex_float : float -> string = <fun>
| # hex_float 1.25 ;;
| - : string = "0x1.4p+0"
| # float_of_string "0x1.4p+0" ;;
| - : float = 1.25
| # hex_float 0.1 ;;
| - : string = "0x1.999999999999ap-4"
`----
			
========================================================================
3) Memoization
Archive: <http://groups.google.com/group/fa.caml/browse_thread/thread/ 
c2b0a94059b929e8/3b2620739bf5bd9e#3b2620739bf5bd9e>
------------------------------------------------------------------------
** Erik de Castro Lopo asked, William Neumann answered, and Walid  
Taha added:

 > > Unfortunately, the URL is dead. Does anybody have another link for
 > > that code or some other polymorphic memoizer?
 >
 > You may want to take a look at this paper by Bruce McAdam that uses a
 > fix-point
 > combinator to create all sorts of wrappers for functions, including
 > memoization.  The examples ore in SML, but translate pretty easily  
to OCaml.
 >
 > <http://www.lfcs.inf.ed.ac.uk/reports/97/ECS-LFCS-97-375/>
 >
We recently had to put together a generic account of memoization in a
functional language (in our case OCaml) so that we can address a staging
problem in a generic manner.  Section 3 of

         <http://www.cs.rice.edu/~taha/publications/conference/ 
pepm06.pdf>

is a low-impact buildup to memoization as a monadic library.
			
========================================================================
4) Ancient module
Archive: <http://groups.google.com/group/fa.caml/browse_thread/thread/ 
7be1968ef5582f43/de55fd10cff69756#de55fd10cff69756>
------------------------------------------------------------------------
** Richard Jones announced:

Merjis is pleased to announce general availability of the 'Ancient'
module, discussed in this group previously.
<http://merjis.com/developers/ancient>

Rich.

----------------------------------------------------------------------
'Ancient' module for OCaml
----------------------------------------------------------------------
$Id: README.txt,v 1.1 2006/10/06 15:03:47 rich Exp $

What does this module do?
----------------------------------------------------------------------

This module allows you to use in-memory data structures which are
larger than available memory and so are kept in swap.  If you try this
in normal OCaml code, you'll find that the machine quickly descends
into thrashing as the garbage collector repeatedly iterates over
swapped memory structures.  This module lets you break that
limitation.  Of course the module doesn't work by magic :-) If your
program tries to access these large structures, they still need to be
swapped back in, but it is suitable for large, sparsely accessed
structures.

Secondly, this module allows you to share those structures between
processes.  In this mode, the structures are backed by a disk file,
and any process that has read/write access that disk file can map that
file in and see the structures.

To understand what this module really does, you need to know a little
bit of background about the OCaml garbage collector (GC).  OCaml's GC
has two heaps, called the minor and major heaps.  The minor heap is
used for short-term storage of small objects which are usually created
and then quickly become unreachable.  Any objects which persist longer
(or objects which are very big to start with) get moved into the major
heap.  Objects in the major heap are assumed to be around for some
time, and the major heap is GC'd more slowly.

This module adds a third heap, called the "ancient heap", which is
never checked by the GC.  Objects must be moved into ancient manually,
using a process called "marking".  Once an object is in the ancient
heap, memory allocation is handled manually.  In particular objects in
the ancient heap may need to be manually deallocated.  The ancient
heap may either exist as ordinary memory, or may be backed by a file,
which is how shared structures are possible.

Structures which are moved into ancient must be treated as STRICTLY
NON-MUTABLE.  If an ancient structure is changed in any way then it
may cause a crash.

There are some limitations which apply to ancient data structures.
See the section "Shortcomings & bugs" below.

This module is most useful on 64 bit architectures where large address
spaces are the norm.  We have successfully used it with a 38 GB
address space backed by a file and shared between processes.

API
----------------------------------------------------------------------

Please see file ancient.mli .

Compiling
----------------------------------------------------------------------

   cd mmalloc && ./configure
   make

Make sure you run this command before running any program which
uses the Ancient module:

   ulimit -s unlimited

Example
----------------------------------------------------------------------

Run:

   ulimit -s unlimited
   wordsfile=/usr/share/dict/words
   baseaddr=0x440000000000               # System specific - see below.
   ./test_ancient_dict_write.opt $wordsfile dictionary.data $baseaddr
   ./test_ancient_dict_verify.opt $wordsfile dictionary.data
   ./test_ancient_dict_read.opt dictionary.data

(You can run several instances of test_ancient_dict_read.opt on the
same machine to demonstrate sharing).

Shortcomings & bugs
----------------------------------------------------------------------

(0) Stack overflows are common when marking/sharing large structures
because we use a recursive algorithm to visit the structures.  If you
get random segfaults during marking/sharing, then try this before
running your program:

   ulimit -s unlimited

(1) Ad-hoc polymorphic primitives (structural equality, marshalling
and hashing) do not work on ancient data structures, meaning that you
will need to provide your own comparison and hashing functions.  For
more details see Xavier Leroy's response here:

<http://caml.inria.fr/pub/ml-archives/caml-list/ 
2006/09/977818689f4ceb...>

(2) Ancient.attach suggests setting a baseaddr parameter for newly
created files (it has no effect when attaching existing files).  We
strongly recommend this because in our tests we found that mmap would
locate the memory segment inappropriately -- the basic problem is that
because the file starts off with zero length, mmap thinks it can place
it anywhere in memory and often does not leave it room to grow upwards
without overwriting later memory mappings.  Unfortunately this
introduces an unwanted architecture dependency in all programs which
use the Ancient module with shared files, and it also requires
programmers to guess at a good base address which will be valid in the
future.  There are no other good solutions we have found --
preallocating the file is tricky with the current mmalloc code.

(3) The current code requires you to first of all create the large
data structures on the regular OCaml heap, then mark them as ancient,
effectively copying them out of the OCaml heap, then garbage collect
the (hopefully unreferenced) structures on the OCaml heap.  In other
words, you need to have all the memory available as physical memory.
The way to avoid this is to mark structures as ancient incrementally
as they are created, or in chunks, whatever works for you.

We typically use Ancient to deal with web server logfiles, and in this
case loading one file of data into memory and marking it as ancient
before moving on to the next file works for us.

(4) Why do ancient structures need to be read-only / not mutated?  The
reason is that you might create a new OCaml heap structure and point
the ancient structure at this heap structure.  The heap structure has
no apparent incoming pointers (the GC will not by its very nature
check the ancient structure for pointers), and so the heap structure
gets garbage collected.  At this point the ancient structure has a
dangling pointer, which will usually result in some form of crash.
Note that the restriction here is on creating pointers from ancient
data to OCaml heap data.  In theory it should be possible to modify
ancient data to point to other ancient data, but we have not tried
this.

(5) You can store more than just one compound object per backing file
by supplying a key to Ancient.share and Ancient.get.  The keys are
integers in the range [0..1023].  The upper limit is hard coded into
the mmalloc library.  This hard coded upper limit is a bug which
should be fixed.

(6) [Advanced topic] The _mark function in ancient_c.c makes no
attempt to arrange the data structures in memory / on disk in a way
which optimises them for access.  The worst example is when you have
an array of large structures, where only a few fields in the structure
will be accessed.  Typically these will end up on disk as:

   array of N pointers
   structure 1
   field A
   field B
     ...
   field Z
   structure 2
   field A
   field B
     ...
   field Z
   structure 3
   field A
   field B
     ...
   field Z
    ...
    ...
    ...
   structure N
   field A
   field B
     ...
   field Z

If you then iterate accessing only fields A, you end up swapping the
whole lot back into memory.  A better arrangement would have been:

   array of N pointers
   structure 1
   structure 2
   structure 3
     ...
   structure N
   field A from structure 1
   field A from structure 2
   field A from structure 3
     ...
   field A from structure N
   field B from structure 1
   field B from structure 2
     etc.

which avoids loading unused fields at all.  In some circumstances we
have shown that this could make a huge difference to performance, but
we are not sure how to implement this cleanly in the current library.

Authors
----------------------------------------------------------------------

Primary code was written by Richard W.M. Jones <rich at annexia.org>
with help from Markus Mottl, Martin Jambon, and invaluable advice from
Xavier Leroy and Damien Doligez.

mmalloc was written by Mike Haertel and Fred Fish.

License
----------------------------------------------------------------------

The module is licensed under the LGPL + OCaml linking exception.  This
module includes mmalloc which was originally distributed with gdb
(although it has since been removed), and that code was distributed
under the plain LGPL.

Latest version
----------------------------------------------------------------------

The latest version can be found on the website:
<http://merjis.com/developers/ancient>

----------------------------------------------------------------------
(** Mark objects as 'ancient' so they are taken out of the OCaml heap.
   * $Id: ancient.mli,v 1.5 2006/10/06 15:03:47 rich Exp $
   *)

type 'a ancient

val mark : 'a -> 'a ancient
   (** [mark obj] copies [obj] and all objects referenced
     * by [obj] out of the OCaml heap.  It returns the proxy
     * for [obj].
     *
     * The copy of [obj] accessed through the proxy MUST NOT be mutated.
     *
     * If [obj] represents a large object, then it is a good
     * idea to call {!Gc.compact} after marking to recover the
     * OCaml heap memory.
     *)

val follow : 'a ancient -> 'a
   (** Follow proxy link to out of heap object.
     *
     * @raise [Invalid_argument "deleted"] if the object has been  
deleted.
     *)

val delete : 'a ancient -> unit
   (** [delete obj] deletes ancient object [obj].
     *
     * @raise [Invalid_argument "deleted"] if the object has been  
deleted.
     *
     * Forgetting to delete an ancient object results in a memory leak.
     *)

(** {6 Shared memory mappings} *)

type md
   (** Memory descriptor handle. *)

val attach : Unix.file_descr -> nativeint -> md
   (** [attach fd baseaddr] attaches to a new or existing file
     * which may contain shared objects.
     *
     * Initially [fd] should be a read/writable, zero-length file
     * (for example you could create this using {!Unix.openfile} and
     * passing the flags [O_RDWR], [O_TRUNC], [O_CREAT]).
     * One or more objects can then be shared in this file
     * using {!Unix.share}.
     *
     * For new files, [baseaddr] specifies the virtual address to
     * map the file.  Specifying [Nativeint.zero] ([0n]) here lets  
[mmap(2)]
     * choose this, but on some platforms (notably Linux/AMD64)
     * [mmap] chooses very unwisely, tending to map the memory
     * just before [libc] with hardly any headroom to grow.  If
     * you encounter this sort of problem (usually a segfault or
     * illegal instruction inside libc), then look at [/proc/PID/maps]
     * and choose a more suitable address.
     *
     * If the file was created previously, then the [baseaddr] is
     * ignored.  The underlying [mmalloc] library will map the
     * file in at the same place as before.
     *)

val detach : md -> unit
   (** [detach md] detaches from an existing file, and closes it.
     *)

val share : md -> int -> 'a -> 'a ancient
   (** [share md key obj] does the same as {!Ancient.mark} except
     * that instead of copying the object into local memory, it
     * writes it into memory which is backed by the attached file.
     *
     * Shared mappings created this way may be shared between
     * other OCaml processes which can access the underlying
     * file.  See {!Ancient.attach}, {!Ancient.detach}.
     *
     * More than one object can be stored in a file.  They are
     * indexed using integers in the range [0..1023] (the limit
     * is hard-coded in [mmalloc/mmprivate.h]).  The [key] parameter
     * controls which object is written/overwritten by [share].
     * If you do not wish to use this feature, just pass [0]
     * as the key.
     *
     * Do not call {!Ancient.delete} on a mapping created like this.
     * Instead, call {!Ancient.detach} and, if necessary, delete the
     * underlying file.
     *
     * Caution when sharing files/objects between processes:
     * The underlying [mmalloc] library does not do any sort of
     * locking, so all calls to [share] must ensure that they have
     * exclusive access to the underlying file while in progress.
     * (Other processes should not even call {!Ancient.get} while
     * this is happening, but it seems safe to be just reading an
     * ancient object from the file).
     *)

val get : md -> int -> 'a ancient
   (** [get md key] returns the object indexed by [key] in the
     * attached file.
     *
     * The key is in the range [0..1023] (the limit is hard-coded in
     * [mmalloc/mmprivate.h]).  If you do not wish to use this feature,
     * just pass [0] as the key when sharing / getting.
     *
     * You need to annotate the returned object with the correct
     * type.  As with the Marshal module, there is no type checking,
     * and setting the wrong type will likely cause a segfault
     * or undefined behaviour.  Note that the returned object has
     * type [sometype ancient], not just [sometype].
     *
     * @raises [Not_found] if no object is associated with the key.
     *)
			
========================================================================
5) ocamlopt under win32
Archive: <http://groups.google.com/group/fa.caml/browse_thread/thread/ 
42ab6a1c67bda727/2f0bc128e0f968c3#2f0bc128e0f968c3>
------------------------------------------------------------------------
** Philip A. Viton asked and Xavier Leroy answered:

 > Can anyone tell me what's going wrong here?
 > Computer: running win xp/sp2 on amd 64
 > ocamlopt: 3.09.0
 > ms files: from visual studio 6
 > [...]
 > libasmrun.lib(compact.obj) : error LNK2001: unresolved external  
symbol
 > __ftol2

The MSVC binary distribution for OCaml 3.09 was built with Visual C++  
2003,
which is binary-incompatible with Visual Studio 6.  There is also
Visual C++ 2005 and the Platform SDK compilers, all from Microsoft,
all subtly incompatible with each other.  Complain with Microsoft if
you feel like it.
If you just want to generate .exe files for Windows, may I suggest you
use the MinGW version of OCaml and avoid the MSVC one?  MinGW, besides
being free, is also more stable than Microsoft's dev. tools.
			
========================================================================
Using folding to read the cwn in vim 6+
------------------------------------------------------------------------
Here is a quick trick to help you read this CWN if you are viewing it  
using
vim (version 6 or greater).

:set foldmethod=expr
:set foldexpr=getline(v:lnum)=~'^=\\{78}$'?'<1':1
zM
If you know of a better way, please let me know.

========================================================================
Old cwn
------------------------------------------------------------------------

If you happen to miss a CWN, you can send me a message
(alan.schmitt at polytechnique.org) and I'll mail it to you, or go take  
a look at
the archive (<http://alan.petitepomme.net/cwn/>) or the RSS feed of the
archives (<http://alan.petitepomme.net/cwn/cwn.rss>). If you also wish
to receive it every week by mail, you may subscribe online at
<http://lists.idyll.org/listinfo/caml-news-weekly/> .

========================================================================

-- 
Alan Schmitt <http://alan.petitepomme.net/>

The hacker: someone who figured things out and made something cool  
happen.
  .O.
  ..O
  OOO


-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part
Url : http://lists.idyll.org/pipermail/caml-news-weekly/attachments/20061010/6ddac9e0/attachment.pgp 


More information about the caml-news-weekly mailing list