[cwn] Attn: Development Editor, Latest Caml Weekly News

Alan Schmitt alan.schmitt at polytechnique.org
Tue Jul 31 03:40:50 PDT 2007


Hello,

Here is the latest Caml Weekly News, for the week of July 24 to 31,  
2007.

The CWN will go on hiatus for two weeks as I'll be in
			vacations with very limited internet access.

1) Knuth-Morris-Pratt
2) OCamlPCSC
3) Ropes and rope-like functional extensible vectors with O(1)  
prepend/append.
4) XmlRpc-Light 0.4 - Now a server too
5) 3-yr post for language person

========================================================================
1) Knuth-Morris-Pratt
Archive: <http://groups.google.com/group/fa.caml/browse_frm/thread/ 
59b214d45c40e2ce/718c284cc231c5e4#718c284cc231c5e4>
------------------------------------------------------------------------
** Jean-Christophe Filliatre announced:

As a byproduct of the  last ICFP programming contest, I'm releasing an
implementation of Knuth-Morris-Pratt string searching algorithm that I
wrote years ago. You can find it here, in my ocaml bazar:

   <http://www.lri.fr/~filliatr/software.en.html>

Just in case it may be useful, as it finally happened to be for myself.
			
========================================================================
2) OCamlPCSC
Archive: <http://groups.google.com/group/fa.caml/browse_frm/thread/ 
dcb74e1ce128ea06/0e021a82b0be1235#0e021a82b0be1235>
------------------------------------------------------------------------
** Manuel Preliteiro announced:

I am pleased to announce OCamlPCSC, an OCaml binding to PC/SC library.
OCamlPCSC is a wrapper to libraries that follow the PC/SC standard for
Smartcards. In Linux
it uses PCSC-Lite from M.U.S.C.L.E. (<http://www.linuxnet.com/>) and in
windows it uses
winscard.dll.

OCamlPCSC is available at <http://www.di.ubi.pt/~desousa/ocamlpcsc>.

As any other OCaml library it's free for use and open-source (LGPL  
license).
			
========================================================================
3) Ropes and rope-like functional extensible vectors with O(1)  
prepend/append.
Archive: <http://groups.google.com/group/fa.caml/browse_frm/thread/ 
4e70beff0f714229/7c60860fa63809c9#7c60860fa63809c9>
------------------------------------------------------------------------
** Mauricio Fernandez announced:

In the aftermath of the ICFP contest, during which I used Luca de  
Alfaro's
Vec, I felt like implementing ropes, based on Boehm's paper and the  
well-known
code included in his conservative garbage collector.

I later realized that the basic implementation strategies ("dense"  
leaves,
bounded tree height and amortized constant time concatenation of small
strings) could be generalized to build general extensible vectors  
similar to
Vec.

Such vectors (tentatively named "Vect" until I find a better name)  
have some
interesting properties:
* lower space overhead than other structures based on balanced trees  
such as
   Vec.  The overhead can be adjusted, allowing to make "get" faster  
at the
   expense of "set" and viceversa.
* appending or prepending a small vector to an arbitrarily large one in
   amortized constant time
* concat, subarray, insert, remove operations in amortized  
logarithmic time
* access and modification (get, set) in logarithmic time

The first one is probably the most compelling. Currently, Vec imposes  
a 6-word
overhead per element. Even after the obvious modification consisting  
in adding
a new constructor for leaves, the overhead would still be 350%...  
Vect uses
compact leaves with a configurable number of elements (32-64 seem good
choices, leading to worst-case overheads of 100% and 50%  
respectively), which
also helps with "get" due to the improved spatial locality.

You can find the code for both Rope and Vect at
<http://eigenclass.org/repos/oropes/head/>
It is still young and experimental, but it's maybe worth a look. Any  
feedback
is very welcome.

The documentation can be found under
<http://eigenclass.org/repos/oropes/head/doc/>

I've spent some time benchmarking it against Vec; you can also find the
code I used and the resulting graphs at the above address.

To summarise how it compares to Vec:
* Vec can be used when persistence is required, but Vect would  
probably be a
   poor choice in this case (until that is fixed using lazy  
rebuilding, which
   doesn't seem too hard), unless rebalancing explicitly before  
"taking the
   snapshot" is an option
* Vect can append/prepend single elements or small vectors very  
quickly, in
   amortized constant time. See
   <http://eigenclass.org/repos/oropes/head/append.png>
* as expected, Vec.set is faster than Vect's in general
   <http://eigenclass.org/repos/oropes/head/set.png>
   However, if the vector is balanced explicitly shortly before an  
update
   burst, Vect is somewhat surprisingly faster
   <http://eigenclass.org/repos/oropes/head/set-balanced.png>
   This might be attributed to Vect's smaller memory profile and the  
fact that
   it allows better use of the L2 cache, but there seems to be  
another factor
   that I have yet to discover.
* Vect.get is considerably faster than Vec.get
   <http://eigenclass.org/repos/oropes/head/get.png>

The above URL is a darcs repository, so if anybody shoots me a patch  
I'll be
happy to apply it :)
			
** Jon Harrop asked and Mauricio Fernandez answered:

 > Looks awesome!

 > It seems to use mutation internally. Is it not thread safe?

The structure is persistent and all operations are non-destructive.
Mutation is used only in the rebalancing operation and in set, but  
they affect
an ephemeral forest of ropes/vects and a new string/array  
respectively, so the
original structure is never modified and in principle all operations  
should be
thread-safe.

Ropes/vects being functional doesn't mean that they will perform well  
in a
persistent setting however, see the clarification at
<http://eigenclass.org/repos/oropes/head/doc/Vect.html>

 > I have some suggestions:
 >
 > I'd like metadata in every node, so I can provide a constructor that
 > combines
 > the metadata of child nodes and a cull function to accelerate  
searches.


If I understand it correctly, that scheme could in the limit turn  
some O(n)
searches into O(log n)?), right?

Unlike Vec, Vect uses "compact" leaves (Leaf of 'a array) of bounded  
size
(leaf_size, typically 16-64), which might not fit very well.

Vect would need to know how to combine the metadata, wouldn't it? I was
thinking about something like
   Leaf of ('meta -> 'meta -> 'meta) * 'meta * 'a array
but I've realized that this wouldn't suffice. So, given that Vect.t is
currently

     type 'a t =
         Empty
       | Concat of 'a t * int * 'a t * int * int
       | Leaf of 'a array

something like this maybe?

   type ('a, 'meta) t =
         Empty of ('meta -> 'meta -> 'meta)
       | Concat of ('meta -> 'meta -> 'meta) * 'meta *
                   ('a, 'meta) t * int *
                   ('a, 'meta) t * int *
                   int
       | Leaf of ('meta -> 'meta -> meta) * 'meta array * 'a array
                 (* maybe also  * 'meta  to cache the last  
computation? *)

or even without the ('meta -> 'meta -> 'meta) part, forcing the user  
to pass
the function on each modification?  Just thinking out loud.

At any rate, it'd be better to provide it as a separate structure, any
suggestions for the name?.

 > The usual HOFs, like map.

I just pushed a patch with filter and map. The former is trivially  
implemented
with fold + append (thanks to the O(1) append). I was going to code  
map the
same way but I ended up making one that returns an isomorphic vect  
and is
faster (since there's no need to rebalance).

So Vect currently has iter, iteri, rangeiter, fold, map and filter. I'm
considering renaming fold to fold_left and providing fold_right too.
			
========================================================================
4) XmlRpc-Light 0.4 - Now a server too
Archive: <http://groups.google.com/group/fa.caml/browse_frm/thread/ 
07d658e57ff7d1c1/ec53e2f3fc7cfa3f#ec53e2f3fc7cfa3f>
------------------------------------------------------------------------
** Dave Benjamin announced:

I have released XmlRpc-Light 0.4, an XML-RPC library for OCaml based on
Xml-Light and Ocamlnet 2. Source, downloads, and documentation are
available here:

      <http://code.google.com/p/xmlrpc-light/>

This version adds support for writing servers. Two methods are currently
provided: CGI (based on Netcgi2) and Netplex. The obligatory hello-world
example looks like this:

      let server = new XmlRpcServer.cgi () in
      server#register "demo.sayHello"
        (fun _ -> `String "Hello!");
      server#run ()

To build a Netplex server, just change "cgi" to "netplex". An example of
a Netplex server including the required configuration file is in the
examples/adder directory.

All servers support the "system.getCapabilities" and
"system.listMethods" introspection functions, as well as the
"system.multicall" protocol. These can be disabled if desired by calling
the "unregister" method.

Other changes and improvements:

    - The default date-time functions use the format
"20070729T10:42:00-07:00". This seems to be the most common
interpretation of ISO 8601 used in XML-RPC servers. You can override
this behavior by calling the "set_datetime_encode" or
"set_datetime_decode" methods on the client or server.

    - Date-time parsing errors are now wrapped in XmlRpc.Error so that
they will be relayed to clients as faults.

    - Error handling adheres much closer to the XML-RPC specification  
and
its list of suggested fault codes and strings.

    - The client now sends a "text/xml" Content-Type header in requests.

Thanks to Gerd Stolpmann for the help with Ocamlnet!
			
** Richard Jones suggested and Dave Benjamin answered:

 > You might want to take a look at Julien Signoles' Calendar library  
for
 > date/time types and handling:
 >
 > <http://www.lri.fr/~signoles/prog/calendar/>

I have this library installed, and indeed considered using it when I
began writing the date-time support. I would likely have used it, if
only it had the ability to parse strings.

I really wish Winer had considered alternatives to ISO 8601--say, UTC
epoch seconds--in the design of XML-RPC, because it's barely a standard
at all! There are so many variations and options that writing a parser
for it borders on natural language processing. Even the W3C suggestion,
which restricts ISO 8601 to a very small subset, doesn't help here since
it still conflicts with the common usage in XML-RPC, with hyphens
omitted between the date values. I decided to err on the side of
oversimplification, and support only the most common format, leaving in
hooks for users to customize the behavior as required.

There is still benefit, of course, in using a standard date-time type. I
only wonder if it is worth adding another library dependency; I am
trying hard to keep the list small (currently only Xml-Light and
Ocamlnet, which in turn requires PCRE). I think it would be great if a
date-time type were made part of the official OCaml distribution.

My only qualm with the Calendar library is that I feel a bit
uncomfortable with a top-level module called "Printer" that is for the
specific purpose of date formatting. I would assume that a module by
that name were for communicating with "lpt", if anything. But hey,
what's in a name, anyway... =)

Thanks for the advice. I will consider it.
			
========================================================================
5) 3-yr post for language person
Archive: <http://groups.google.com/group/fa.caml/browse_frm/thread/ 
1792bc907e171834/91afa28b79a0b2e4#91afa28b79a0b2e4>
------------------------------------------------------------------------
** Simon Peyton-Jones announced:

Tim Griffin is advertising a 3-year research associate position at the
Cambridge Computer Lab, working on a project that seeks to design and
implement a meta-language for the specification and implementation of  
correct
Internet routing protocols.

He says "A PL person would be perfect".

Details here:
<http://www.admin.cam.ac.uk/offices/personnel/jobs/vacancies.cgi? 
job=2114>
			
========================================================================
Using folding to read the cwn in vim 6+
------------------------------------------------------------------------
Here is a quick trick to help you read this CWN if you are viewing it  
using
vim (version 6 or greater).

:set foldmethod=expr
:set foldexpr=getline(v:lnum)=~'^=\\{78}$'?'<1':1
zM
If you know of a better way, please let me know.

========================================================================
Old cwn
------------------------------------------------------------------------

If you happen to miss a CWN, you can send me a message
(alan.schmitt at polytechnique.org) and I'll mail it to you, or go take  
a look at
the archive (<http://alan.petitepomme.net/cwn/>) or the RSS feed of the
archives (<http://alan.petitepomme.net/cwn/cwn.rss>). If you also wish
to receive it every week by mail, you may subscribe online at
<http://lists.idyll.org/listinfo/caml-news-weekly/> .

========================================================================

-- 
Alan Schmitt <http://alan.petitepomme.net/>

The hacker: someone who figured things out and made something cool  
happen.
  .O.
  ..O
  OOO


-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part
Url : http://lists.idyll.org/pipermail/caml-news-weekly/attachments/20070731/f3e1586d/attachment.pgp 


More information about the caml-news-weekly mailing list