[cwn] Attn: Development Editor, Latest Caml Weekly News
Alan Schmitt
alan.schmitt at polytechnique.org
Tue Nov 2 02:24:42 PDT 2010
Hello,
Here is the latest Caml Weekly News, for the week of October 26 to November 02, 2010.
1) Plasma MapReduce, PlasmaFS, version 0.2
2) fundata1 -- Karmic Social Capital Benchmark and Shootout
3) Camlp5 major release 6.00
4) Camllight packages
5) Other Caml News
========================================================================
1) Plasma MapReduce, PlasmaFS, version 0.2
Archive: <http://groups.google.com/group/fa.caml/browse_thread/thread/6dbe28c090d747bb#>
------------------------------------------------------------------------
** Gerd Stolpmann announced:
I've just released Plasma-0.2. Plasma consists of two parts (for now),
namely Plasma MapReduce, a map/reduce compute framework, and PlasmaFS,
the underlying distributed filesystem.
Major changes in version 0.2 :
* Enhanced filesystem operations, with enhanced locking
* Full support for symbolic links
* Extended filesystem API, both in the ocaml module and the
command-line utility
* One can now mount filesystems read-write via NFS
* Map/reduce processes files in larger chunks than before
* Streaming (as in Hadoop)
* Support for per-task logging
Of course, there are also numerous bug fixes and performance
improvements.
Plasma MapReduce is a distributed implementation of the map/reduce
algorithm scheme. In a sentence, map/reduce performs a parallel List.map
on an input file, sorts and splits the output by some criterion into
partitions, and runs a List.fold_left on each partition. Only that it
does not do that sequentially, but in a distributed way, and chunk by
chunk. Because of this Plasma MapReduce can process very large files,
and if run on enough computers, this also will work in reasonable time.
Of course, map and reduce are Ocaml functions here.
This all works on top of a distributed filesystem, PlasmaFS. This is a
user-space filesystem that is primarily accessed over RPC (but it is
also mountable as NFS volume). Actually, most of the effort went here.
PlasmaFS focuses on reliability and speed for big blocksizes. To get
this, it implements ACID transactions, replicates data and metadata with
two-phase commit, uses a shared memory data channel if possible, and
monitors itself. Unlike other filesystems for map/reduce, PlasmaFS
implements the complete set of usual file operations, including random
reads and writes. It can also be used as unspecialized global
filesystem.
Both pieces of software are bundled together in one download. The
project page with further links is
<http://projects.camlcity.org/projects/plasma.html>
This is an early alpha release (0.2). A lot of things work already, and
you can already run distributed map/reduce jobs. However, it is in no
way complete.
Plasma is installable via GODI for Ocaml 3.12.
For discussions on specifics of Plasma there is a separate mailing list:
<https://godirepo.camlcity.org/mailman/listinfo/plasma-list>
========================================================================
2) fundata1 -- Karmic Social Capital Benchmark and Shootout
Archive: <http://groups.google.com/group/fa.caml/browse_thread/thread/921f8f6680218d5e#>
------------------------------------------------------------------------
** Alexy Khrabrov announced:
I am happy to announce fundata1 -- the largest-ever program per RAM allocation
in Haskell, originally implemented in Clojure and then OCaml and Haskell for
social network modeling.
<http://github.com/alexy/fundata1>
It has now become the first large-scale social networking benchmark with a
real dynamic social graph built from the actual Twitter gardenhose, with the
data OK'd by Twitter and supplied along with the benchmark.
I wrote three reference implementations, all on github as well. Clojure and
OCaml are quite basic, while Haskell community had a chance to optimize its
data structures and in fact fix a GC integer overflow while working on it.
You're welcome to fork and improve all of these implementations, and supply
others!
There's a Google Group,
<http://groups.google.com/group/fundata/>
to discuss the shootout. There's also a blog about it and other functional
things at
<http://functional.tv/>
** bluestorm then added:
I was mildly curious and directly went for performance results.
A few bits of information for those that dont want to explore the website
themselves :
- it's a really big data set; on the website you're advised to
export OCAMLRUNPARAM='h=5G;s=1G'
- the Haskell implementation was developped first, tuned, and now takes 17
minutes to run. The OCaml implementation is a simple port of the Haskell
implementation (with the data structures adapted), and it takes 15 minute to
run. A younger Clojure implementation is at 30 minutes for now.
My hasty conclusion : the OCaml GC and the Hasthbl implementation scale
well.
========================================================================
3) Camlp5 major release 6.00
Archive: <http://groups.google.com/group/fa.caml/browse_thread/thread/6b57dbe22fde3477#>
------------------------------------------------------------------------
** Daniel de Rauglaudre announced:
New release 6.00 of Camlp5!
Camlp5 is a preprocessor-pretty-printer for OCaml.
Changes :
- added missing OCaml statements (objects, 1st class modules, etc.),
- added option '-flag O' to add locations comments (pr_o, pr_r),
- changed some syntaxes in 'revised syntax' (mainly labels),
- locations now include file name and comment (module Ploc),
- can now be compiled with 'make -j' (parallel compilation),
- made Camlp5 compatible with old OCaml versions,
- many improvements in source code and numerous bugs fixing.
Details in file CHANGES.
Web page, documentation and download at:
<http://pauillac.inria.fr/~ddr/camlp5/>
========================================================================
4) Camllight packages
Archive: <http://groups.google.com/group/fa.caml/browse_thread/thread/3971951fcb1d224c#>
------------------------------------------------------------------------
** François Boisson announced:
I'm just have done ubuntu package for camllight for maverick.
I add a caml_all interactive mode to camllight and backup it for lucid,
maverick and debian squeeze. With this mode, when doing
$ camllight caml_all
you can use libgraph, libnum and libunix in interactive mode.
debian packages:
deb <http://boisson.homeip.net/debian> squeeze divers
(or etch lenny sarge sid squeeze woody)
ubuntu packages:
deb <http://boisson.homeip.net/ubuntu/> maverick divers
(or breezy dapper edgy feisty gutsy hardy intrepid jaunty
karmic lucid maverick)
apt-get install camllight
source:
deb-src <http://boisson.homeip.net/source/> ./
apt-get source camllight
========================================================================
5) Other Caml News
------------------------------------------------------------------------
** From the ocamlcore planet blog:
Thanks to Alp Mestan, we now include in the Caml Weekly News the links to the
recent posts from the ocamlcore planet blog at <http://planet.ocamlcore.org/>.
Gravitational experiments in OCaml:
<http://www.sairyx.org/2010/10/gravitational-experiments-in-ocaml/>
ANSITerminal:
<https://forge.ocamlcore.org/projects/ansiterminal/>
========================================================================
Old cwn
------------------------------------------------------------------------
If you happen to miss a CWN, you can send me a message
(alan.schmitt at polytechnique.org) and I'll mail it to you, or go take a look at
the archive (<http://alan.petitepomme.net/cwn/>) or the RSS feed of the
archives (<http://alan.petitepomme.net/cwn/cwn.rss>). If you also wish
to receive it every week by mail, you may subscribe online at
<http://lists.idyll.org/listinfo/caml-news-weekly/> .
========================================================================
More information about the caml-news-weekly
mailing list