[cwn] Attn: Development Editor, Latest Caml Weekly News

Tue Jul 24 02:19:57 PDT 2007

Hello,

Here is the latest Caml Weekly News, for the week of July 17 to 24,  
2007.

1) Vec: a functional implementation of extensible arrays
2) Fast XML parser
3) fiblib 0.1, a fibonacci heap library
4) Custom camlp4 lexers

========================================================================
1) Vec: a functional implementation of extensible arrays
Archive: <http://groups.google.com/group/fa.caml/browse_frm/thread/ 
a965ef148e7a7201/5c5a35b15dffdf7b#5c5a35b15dffdf7b>
------------------------------------------------------------------------
** Luca de Alfaro announced:

I would like to share with you an Ocaml implementation of extensible
arrays.  The implementation is functional, based on balanced trees  
(and on
the code for Set and Map); I called the module Vec (for vector - I like
short names).  You can find it at <http://www.dealfaro.com/home/ 
vec.html>
Module Vec provides, in log-time:

    -  Access and modification to arbitrary elements (Vec.put n el v  
puts
    element el in position n of vector v, for instance).
    - Concatenation
    - Insertion and removal of elements from arbitrary positions
    (auto-enlarging and auto-shrinking the vector).

as well as:

    - All kind of iterators and some visitor functions.
    - Efficient translation to/from lists and arrays.

An advantage of Vec over List, for very large data structures, is that
iterating over a Vec of size n requires always stack depth bounded by  
log n:
with lists, non-tail-recursive functions can cause stack overflows.

I have been using this data structure for some months, and it has  
been very
handy in a large number of occasions.  I hope it can be as useful to  
you.

I would appreciate all advice and feedback.  Also, is there a repository
where I should upload it?  Do you think it is worth it?

** Hugo Ferreira said:

For those of you interested in functional array consider Sylvain Conchon
and Jean-Christophe Filliâtre paper in [1]. The Union-Find (UF) uses
persistent arrays as its base data structure. I have made tests with the
UF using the code provided, an implementation of k-BUF data structure
(delayed backtracking) and altered version of the persistent array (fat
nodes + delayed backtracking). The tests I did show that this version of
persistent arrays is very efficient (especially for single threaded
backtracking).

Maybe Luca could pit his implementation against that of the article and
report on how they fare?

[1] <http://www.lri.fr/~filliatr/ftp/publis/puf-wml07.ps>

** Luca de Alfaro answered:

Thanks for the pointer to the excellent paper.  First, let me say  
that my
Vec data structure was born to fill a need I had while working on a  
project:
while it has been useful to me, I certainly do not claim it is the  
best that
can be done, so I am very grateful for these suggestions!

My Vec data structure is different from persistent arrays.  It is  
likely to
be less efficient for get/set use.
However, it offers at logarithmic cost insertion/removal operations  
that are
not present in the standard persistent arrays.
Consider a Vec a of size 10.

    - Vec.insert 3 d a inserts value d in position 3 of a, shifting
    elements 3..9 of a to positions 4..10.
    - Vec.remove 3 a removes the element in position 3 of a, shifting
    elements 4..9 to positions 3..8.  Vec.pop is similar and returns the
    removed element as well.
    - Vec.concat works in log-time.

These operations are necessary if you want to use a Vec as a FIFO, for
example (you append elements at the end, and you get the first  
element via
Vec.pop 0 a).  In many algorithms, it is often handy to be able to
remove/insert elements in the middle of a list.

In summary, I don't think the Vec data structure is a replacement for  
arrays
or persistent arrays in numerically-intensive work.  But if you want a
flexible data structure for the 90% of the code that is not peformance
critical, they can be useful.
Now the question is: can one get better get/set efficiency while  
retaining
the ability to insert/remove elements?  (I am sure that there is  
something
better to be done...).

========================================================================
2) Fast XML parser
Archive: <http://groups.google.com/group/fa.caml/browse_frm/thread/ 
5136b4032ed0f6c2/1c07ca1f78d20b45#1c07ca1f78d20b45>
------------------------------------------------------------------------
** Luca de Alfaro asked and Richard Jones answered:

 > I am interested in parsing Wiki markup language that has a few  
tags, like
 > <pre>...</pre>, <math>...,</math>.
 > These tags are sparse, meaning that the ratio of number of tags /  
number of
 > bytes is low.
 > I would like, given a string (or a stream) with such tags, to  
parse it as
 > fast as possible.  Efficiency is a primary consideration, and so is
 > simplicity of the implementation.
 > Do you have any advice about the library I should be using?

There's some code in COCANWIKI which does exactly this:

<http://sandbox.merjis.com/release>

Look at the file scripts/lib/wikilib.ml.

It's not a particularly clever implementation, but it has a great deal
of testing in the real world.

As well as <xml>-like syntax it also does a lot of standard wiki
syntax like '* ' for bullet points, paragraphs, indents for
preformatted sections and so on.  And it outputs pure unadulterated
XHTML.

========================================================================
3) fiblib 0.1, a fibonacci heap library
Archive: <http://groups.google.com/group/fa.caml/browse_frm/thread/ 
eeb75e7d4f7eeb52/8532f5ca530e88d2#8532f5ca530e88d2>
------------------------------------------------------------------------
** Denis Bueno announced:

I'm pleased to announce the first release of fiblib, an efficient,
imperative Fibonacci heap library, based on the pseudocode of Cormen,
Leiserson, Rivest, and Stein:

<http://code.google.com/p/fiblib/>

This implementation provides a base of operations you'd expect from a
heap (with more to come):

   - insert
   - extract min
   - delete
   - decrease key

Other operations, such as union, are slated for another release.

Though a relatively obscure data structure, they are the best known
solution for certain algorithms requiring a heap. In particular, a
fibonacci heap is a win if the number of extract-min and delete
operations is small compared to the number of other operations.

Fiblib is released under a BSD-style license, so, if you need a heap,
try it out, and let me know how it goes!

========================================================================
4) Custom camlp4 lexers
Archive: <http://groups.google.com/group/fa.caml/browse_frm/thread/ 
5221659647adbeee/649206d2f9826ef1#649206d2f9826ef1>
------------------------------------------------------------------------
** Jon Harrop asked and Daniel de Rauglaudre answered:

 > IIRC, there was mention before that camlp4 even allows you to  
specify lexers
 > inline. How is this done and are there any examples?

Perhaps you think of the specific syntax of parsers of character  
streams ?

I implemented that in camlp5 (previously named camlp4s). See doc at:
    <http://pauillac.inria.fr/~ddr/camlp5/doc/html/lexers.html>

If interested, camlp5 is downloadable at:
    <http://pauillac.inria.fr/~ddr/camlp5/>

========================================================================
Using folding to read the cwn in vim 6+
------------------------------------------------------------------------
Here is a quick trick to help you read this CWN if you are viewing it  
using
vim (version 6 or greater).

:set foldmethod=expr
:set foldexpr=getline(v:lnum)=~'^=\\{78}$'?'<1':1
zM
If you know of a better way, please let me know.

========================================================================
Old cwn
------------------------------------------------------------------------

If you happen to miss a CWN, you can send me a message
(alan.schmitt at polytechnique.org) and I'll mail it to you, or go take  
a look at
the archive (<http://alan.petitepomme.net/cwn/>) or the RSS feed of the
archives (<http://alan.petitepomme.net/cwn/cwn.rss>). If you also wish
to receive it every week by mail, you may subscribe online at
<http://lists.idyll.org/listinfo/caml-news-weekly/> .

========================================================================

-- 
Alan Schmitt <http://alan.petitepomme.net/>

The hacker: someone who figured things out and made something cool  
happen.
  .O.
  ..O
  OOO

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part
Url : http://lists.idyll.org/pipermail/caml-news-weekly/attachments/20070724/fc10db3d/attachment.pgp