[cwn] Attn: Development Editor, Latest Caml Weekly News
Alan Schmitt
alan.schmitt at polytechnique.org
Tue Jul 24 02:19:57 PDT 2007
Hello,
Here is the latest Caml Weekly News, for the week of July 17 to 24,
2007.
1) Vec: a functional implementation of extensible arrays
2) Fast XML parser
3) fiblib 0.1, a fibonacci heap library
4) Custom camlp4 lexers
========================================================================
1) Vec: a functional implementation of extensible arrays
Archive: <http://groups.google.com/group/fa.caml/browse_frm/thread/
a965ef148e7a7201/5c5a35b15dffdf7b#5c5a35b15dffdf7b>
------------------------------------------------------------------------
** Luca de Alfaro announced:
I would like to share with you an Ocaml implementation of extensible
arrays. The implementation is functional, based on balanced trees
(and on
the code for Set and Map); I called the module Vec (for vector - I like
short names). You can find it at <http://www.dealfaro.com/home/
vec.html>
Module Vec provides, in log-time:
- Access and modification to arbitrary elements (Vec.put n el v
puts
element el in position n of vector v, for instance).
- Concatenation
- Insertion and removal of elements from arbitrary positions
(auto-enlarging and auto-shrinking the vector).
as well as:
- All kind of iterators and some visitor functions.
- Efficient translation to/from lists and arrays.
An advantage of Vec over List, for very large data structures, is that
iterating over a Vec of size n requires always stack depth bounded by
log n:
with lists, non-tail-recursive functions can cause stack overflows.
I have been using this data structure for some months, and it has
been very
handy in a large number of occasions. I hope it can be as useful to
you.
I would appreciate all advice and feedback. Also, is there a repository
where I should upload it? Do you think it is worth it?
** Hugo Ferreira said:
For those of you interested in functional array consider Sylvain Conchon
and Jean-Christophe Filliâtre paper in [1]. The Union-Find (UF) uses
persistent arrays as its base data structure. I have made tests with the
UF using the code provided, an implementation of k-BUF data structure
(delayed backtracking) and altered version of the persistent array (fat
nodes + delayed backtracking). The tests I did show that this version of
persistent arrays is very efficient (especially for single threaded
backtracking).
Maybe Luca could pit his implementation against that of the article and
report on how they fare?
[1] <http://www.lri.fr/~filliatr/ftp/publis/puf-wml07.ps>
** Luca de Alfaro answered:
Thanks for the pointer to the excellent paper. First, let me say
that my
Vec data structure was born to fill a need I had while working on a
project:
while it has been useful to me, I certainly do not claim it is the
best that
can be done, so I am very grateful for these suggestions!
My Vec data structure is different from persistent arrays. It is
likely to
be less efficient for get/set use.
However, it offers at logarithmic cost insertion/removal operations
that are
not present in the standard persistent arrays.
Consider a Vec a of size 10.
- Vec.insert 3 d a inserts value d in position 3 of a, shifting
elements 3..9 of a to positions 4..10.
- Vec.remove 3 a removes the element in position 3 of a, shifting
elements 4..9 to positions 3..8. Vec.pop is similar and returns the
removed element as well.
- Vec.concat works in log-time.
These operations are necessary if you want to use a Vec as a FIFO, for
example (you append elements at the end, and you get the first
element via
Vec.pop 0 a). In many algorithms, it is often handy to be able to
remove/insert elements in the middle of a list.
In summary, I don't think the Vec data structure is a replacement for
arrays
or persistent arrays in numerically-intensive work. But if you want a
flexible data structure for the 90% of the code that is not peformance
critical, they can be useful.
Now the question is: can one get better get/set efficiency while
retaining
the ability to insert/remove elements? (I am sure that there is
something
better to be done...).
========================================================================
2) Fast XML parser
Archive: <http://groups.google.com/group/fa.caml/browse_frm/thread/
5136b4032ed0f6c2/1c07ca1f78d20b45#1c07ca1f78d20b45>
------------------------------------------------------------------------
** Luca de Alfaro asked and Richard Jones answered:
> I am interested in parsing Wiki markup language that has a few
tags, like
> <pre>...</pre>, <math>...,</math>.
> These tags are sparse, meaning that the ratio of number of tags /
number of
> bytes is low.
> I would like, given a string (or a stream) with such tags, to
parse it as
> fast as possible. Efficiency is a primary consideration, and so is
> simplicity of the implementation.
> Do you have any advice about the library I should be using?
There's some code in COCANWIKI which does exactly this:
<http://sandbox.merjis.com/release>
Look at the file scripts/lib/wikilib.ml.
It's not a particularly clever implementation, but it has a great deal
of testing in the real world.
As well as <xml>-like syntax it also does a lot of standard wiki
syntax like '* ' for bullet points, paragraphs, indents for
preformatted sections and so on. And it outputs pure unadulterated
XHTML.
========================================================================
3) fiblib 0.1, a fibonacci heap library
Archive: <http://groups.google.com/group/fa.caml/browse_frm/thread/
eeb75e7d4f7eeb52/8532f5ca530e88d2#8532f5ca530e88d2>
------------------------------------------------------------------------
** Denis Bueno announced:
I'm pleased to announce the first release of fiblib, an efficient,
imperative Fibonacci heap library, based on the pseudocode of Cormen,
Leiserson, Rivest, and Stein:
<http://code.google.com/p/fiblib/>
This implementation provides a base of operations you'd expect from a
heap (with more to come):
- insert
- extract min
- delete
- decrease key
Other operations, such as union, are slated for another release.
Though a relatively obscure data structure, they are the best known
solution for certain algorithms requiring a heap. In particular, a
fibonacci heap is a win if the number of extract-min and delete
operations is small compared to the number of other operations.
Fiblib is released under a BSD-style license, so, if you need a heap,
try it out, and let me know how it goes!
========================================================================
4) Custom camlp4 lexers
Archive: <http://groups.google.com/group/fa.caml/browse_frm/thread/
5221659647adbeee/649206d2f9826ef1#649206d2f9826ef1>
------------------------------------------------------------------------
** Jon Harrop asked and Daniel de Rauglaudre answered:
> IIRC, there was mention before that camlp4 even allows you to
specify lexers
> inline. How is this done and are there any examples?
Perhaps you think of the specific syntax of parsers of character
streams ?
I implemented that in camlp5 (previously named camlp4s). See doc at:
<http://pauillac.inria.fr/~ddr/camlp5/doc/html/lexers.html>
If interested, camlp5 is downloadable at:
<http://pauillac.inria.fr/~ddr/camlp5/>
========================================================================
Using folding to read the cwn in vim 6+
------------------------------------------------------------------------
Here is a quick trick to help you read this CWN if you are viewing it
using
vim (version 6 or greater).
:set foldmethod=expr
:set foldexpr=getline(v:lnum)=~'^=\\{78}$'?'<1':1
zM
If you know of a better way, please let me know.
========================================================================
Old cwn
------------------------------------------------------------------------
If you happen to miss a CWN, you can send me a message
(alan.schmitt at polytechnique.org) and I'll mail it to you, or go take
a look at
the archive (<http://alan.petitepomme.net/cwn/>) or the RSS feed of the
archives (<http://alan.petitepomme.net/cwn/cwn.rss>). If you also wish
to receive it every week by mail, you may subscribe online at
<http://lists.idyll.org/listinfo/caml-news-weekly/> .
========================================================================
--
Alan Schmitt <http://alan.petitepomme.net/>
The hacker: someone who figured things out and made something cool
happen.
.O.
..O
OOO
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part
Url : http://lists.idyll.org/pipermail/caml-news-weekly/attachments/20070724/fc10db3d/attachment.pgp
More information about the caml-news-weekly
mailing list