[cwn] Attn: Development Editor, Latest OCaml Weekly News

Alan Schmitt alan.schmitt at polytechnique.org
Tue Jan 21 06:11:56 PST 2014


Hello,

Here is the latest OCaml Weekly News, for the week of January 14 to 21, 2014.

1) release of Logtk 0.2
2) Who was working on ocaml bindings for zeromq?
3) yypkg 1.7.0
4) ocaml considered dangerous
5) OCaml on the Arduino (or similar)
6) unboxed-arrays-0.1: Unboxed arrays for OCaml
7) batteries 2.2.0
8) Core Suite 109.60.00
9) Other OCaml News

========================================================================
1) release of Logtk 0.2
Archive: <https://sympa.inria.fr/sympa/arc/caml-list/2014-01/msg00093.html>
------------------------------------------------------------------------
** Simon Cruanes announced:

I'm happy to announce the release of Logtk 0.2, a pure OCaml library for
(mostly) first-order logic. It focuses on data structures to represent
terms, formulas, types, and provides various algorithms. It is released
under the BSD license.

- main page: <https://www.rocq.inria.fr/deducteam/Logtk/index.html>
- github: <https://github.com/c-cube/logtk>
- api doc: <http://cedeela.fr/~simon/software/logtk/>

The library is currently in a working (*) but unstable state, the API
may change. I'd be very happy to have feedback about bugs, or about the
general usability of the code.

Some algorithms and data structures (extract from README):
- terms
- formulas
- types (simple polymorphism)
- substitutions (for free variables, bound variables use De Bruijn indices)
- first-order unification and matching
- simplification term ordering (RPO, KBO)
- indexing structures for unification/matching (discrimination trees...)
- term rewriting
- congruence closure
- reduction of formulas to CNF (clausal normal form)
- parser for TPTP

To give more context, I use this library for small-ish tools that
process TPTP files, and for an experimental theorem prover for
first-order logic. Yes, it's PhD code ;)

Regards,

-- 
Simon

(*) bugs excepted...
      
========================================================================
2) Who was working on ocaml bindings for zeromq?
Archive: <https://sympa.inria.fr/sympa/arc/caml-list/2014-01/msg00098.html>
------------------------------------------------------------------------
** Continuing the thread from last week, Anders Peter Fugmann said:

At Issuu we are activly maintaing a fork of pdhborges ocaml-zmq
bindings. We have updated it to support version 3.2 and added new
features such as socket event listening.

I do not know how much work it would require to update that to version
4.0, but I would expect it to be rather strait forward.

You can find the github fork here:
<https://github.com/issuu/ocaml-zmq>
      
========================================================================
3) yypkg 1.7.0
Archive: <https://sympa.inria.fr/sympa/arc/caml-list/2014-01/msg00143.html>
------------------------------------------------------------------------
** Adrien Nader announced:

I am happy to release yypkg 1.7.0. This is the release that goes into
win-builds.org 1.3.

Yypkg is a very simple package manager that is also very fast and
portable. Its main selling point is the support of native windows.

Rough changelog:
- symlink emulation on Windows through hardlinks and directory
  junctions; none of them is the same as a symlink but in yypkg's and
  win-builds' context, it's fine.
  (and if you think junctions are good, they're not; they're nothing
  more than hooks on directories at the filesystem/kernel level)
- relevant sexplib sources have been copied inside yypkg and
  (de)serializers are written by hand, avoiding the need for camlp4 and
  type_conv and reducing binary size.
- OCaml 3.11 compat dropped.
- Instead of spawning bsdtar and xz independently with logic to pipe
  between the two, use bsdtar and its bindings the xz' library.
- Remove the GUI; it was probably one of the best example on how to
  create horrible code by first writing a functional CLI core, strapping
  GUI code on it and adding some lwt as a work-around for a blocking
  design.
- Bundle patches against fileutils for symlink handling; I lack time to
  make them mergeable now and I needed this 1.7.0 release and the
  symlink fixes (among others) in time for FOSDEM. Details can be found
  in the corresponding commit message:
  <http://git.ocamlcore.org/cgi-bin/gitweb.cgi?p=yypkg/yypkg.git;a=commit;h=f99d37987a7eef3b94fd6b3d2319deced9eebc2e>

Homepage: <https://forge.ocamlcore.org/projects/yypkg/>
Download:
  <https://forge.ocamlcore.org/frs/download.php/1364/yypkg-1.7.0.tar.gz>
Git repository:
  <http://git.ocamlcore.org/cgi-bin/gitweb.cgi?p=yypkg/yypkg.git>
Bug tracker is now hosted on win-builds' flyspray instance:
  <http://win-builds.org/bugs/index.php?project=2>
      
========================================================================
4) ocaml considered dangerous
Archive: <https://sympa.inria.fr/sympa/arc/caml-list/2014-01/msg00145.html>
------------------------------------------------------------------------
** Deep in this thread, Jacques Garrigue said and oleg added:

> By the way, there is now an implementation of Printf that avoids
> most of the Obj.magic by using GADTs. It should be merged soon.

Perhaps it may be worth mentioning a (quite old actually) paper
        <http://okmij.org/ftp/typed-formatting/index.html#derivation>

that attempts to _derive_ various typed scanf and printf
implementation, including a couple of new ones. All the code is in
OCaml. Perhaps also relevant is
        <http://okmij.org/ftp/typed-formatting/index.html#C-like>
which uses Template Haskell to convert a C-like format string into the
proper combinator format. The conversion is syntax-directed rather
than type directed (that is, a simple preprocessor, like ppx). 

In the above developments, the types seem to have fewer type
parameters than what I've seen presented at the OCaml workshop. I am
still not clear about the source of that difference.
      
========================================================================
5) OCaml on the Arduino (or similar)
Archive: <https://sympa.inria.fr/sympa/arc/caml-list/2014-01/msg00144.html>
------------------------------------------------------------------------
** Jon Harrop asked and Adrien Nader replied:

> I don't suppose anyone has retargeted OCaml to run on an Arduino or similar?
> 
> I'm just getting into Arduino programming and writing async code in C++ is
> just horrible. L

There is OCaPIC fro PIC18 (or above iirc):
  <http://www.algo-prog.info/ocapic/web/index.php?id=OCAPIC>
      
** Oliver also replied:

 Arch-Arm-Linux (e.g. for Raspberry Pi) also has OCaml-packages.
      
========================================================================
6) unboxed-arrays-0.1: Unboxed arrays for OCaml
Archive: <https://sympa.inria.fr/sympa/arc/caml-list/2014-01/msg00153.html>
------------------------------------------------------------------------
** Bob Atkey announced:

Apropos the recent discussion about optimizing values of type 'a
option, I'd like to announce my little prototype library for the
related problem of defining unboxed array types in OCaml. The library
uses GADTs to describe the memory layout of unboxed elements, and the
module system to hide the unsafe operations used to implement the
arrays. Arrays of monomorphic and polymorphic values are supported.

The library is available on GitHub:
   <https://github.com/bobatkey/unboxed-arrays>

For your convenience, I have reproduced the README below (in markdown
format), which includes simple example of usage:


# Unboxed Arrays

An implementation of unboxed arrays for OCaml by Robert Atkey
<bob.atkey at gmail.com>.

The runtime representation of OCaml arrays usually stores the elements
using a 'boxed' representation, meaning that if the value of a
particular element cannot be stored within a single machine word
(i.e., it is not an argument-less constructor or an `int` value), then
the array actually contains a pointer to the representation of the
element elsewhere in the heap. (Note that `float` arrays are special
cased.) This boxing representation leads to an extra indirection when
accessing elements of the array, means that memory is wasted, and
decreases data locality, to the detriment of efficent cache usage.

The `unboxed-arrays` library implements functors that generate new
abstract types representing arrays, whose run-time representation is
'unboxed', so that each of the elements in stored inline in the
array. The library uses unsafe operations from the `Obj` module in its
implementation, but this unsafety is completely hidden from clients
behind abstract types.

Note that this implementation has not yet been benchmarked, and so can
only be regarded as a proof-of-concept for implementing unboxed
arrays. In particular, it is not clear to me that the overhead of
allocating `Data` constructor (see the `Element_Descriptor` signature
below) is not too expensive.

In the future, it would be nice to explore alternative representations
for more specialised representations of unboxed arrays. For example,
using bitmaps to representation occupancy information for unboxed `'a
option array`s.

## To build

````
$ ocaml setup.ml -build
````

There are no dependencies beyond plain OCaml (version 4.01.0 tested).

## Interface

New unboxed array implementations are generated by using the following
functor:

````ocaml
module Make (Elt : Element_Descriptor) : S with type elt = Elt.t
````

where the signature `S` describes a basic imperative array interface:

````ocaml
module type S = sig
  type t
  type elt
  val create : int -> elt -> t
  val init   : int -> (int -> elt) -> t
  val get    : t -> int -> elt
  val set    : t -> int -> elt -> unit
  val length : t -> int
end
````

The element type is described using modules matching the following
signature. Modules implementing this signature set out (a) the
allowable constructors for the elements; (b) the types of the data
associated with each constructor; (c) the maximum size of an element;
and (d) the specific width associated with each constructor.

````ocaml
module type Element_Descriptor = sig
  type 'a constructor
  type size
  val size : size is_a_natural
  val width_of : 'a constructor -> ('a, size) width
  type t = Data : 'a constructor * 'a -> t
end
````

The type `'a constructor` is usually a GADT (Generalised Algebraic
Datatype) that uses the type parameter to describe the types of the
data associated with that constructor. The *type* `size` is expected
to be of the form `zero succ ... succ`, and represents in unary
notation the maximum size of any element. The *value* `size` is a
runtime representation of the number represented by `size`. The
function `width_of` describes the necessary width to store the data
associated with each constructor, with a static check that all the
widths are less than `size`. Finally, the type `t` defines a GADT that
represents 'boxed' representations used to represent element values
outwith the array.

The types `is_a_natural` and `width` are defined by the `UnboxedArray`
library.

There is also another functor `Make1` that allows for the generation
of arrays with polymorphic element types.

## A Simple Example

The following implementation of `Element_Descriptor` describes a
datatype with three constructors:

````ocaml
module Elt = struct
  open UnboxedArray

  type 'a constructor =
    | Empty   : unit constructor
    | Deleted : unit constructor
    | Value   : (int * string) constructor

  type size = zero succ succ
  let size = Succ (Succ Zero)

  let width_of (type d) (c : d constructor) : (d,size) width =
      match c with
        | Empty   -> Width0
        | Deleted -> Width0
        | Value   -> Width2

  type t = Data : 'd constructor * 'd -> t
end

module A = UnboxedArray.Make (Elt)

open Elt
````

Values of type `A.t` now represent boxed values of type `Elt.t`, and
can be manipulated through the interface `S with type elt = Elt.t`, in
a fairly natural way:

````ocaml
# let arr = A.create 10 (Data (Empty, ()));;
val arr : A.t = <abstr>
# A.set arr 0 (Data (Value, (1, "foo")));;
- : unit = ()
# A.set arr 5 (Data (Deleted, ()));;
- : unit = ()
# let elt_to_string arr i =
    match A.get arr i with
      | Data (Empty, ()) -> "Empty"
      | Data (Deleted, ()) -> "Deleted"
      | Data (Value, (k,v)) -> Printf.sprintf "Value (%d,%S)" k v;;
val elt_to_string : A.t -> int -> string = <fun>
# elt_to_string arr 0 |> print_endline;;
Value (1,"foo")
- : unit = ()
# elt_to_string arr 1 |> print_endline;;
Empty
- : unit = ()
# elt_to_string arr 5 |> print_endline;;
Deleted
- : unit = ()
#
````

There is a more substantial example using unboxed arrays to implement
a hashtable with linear probing in the file `hashtable.ml`. This
example also demonstrates the use of the `UnboxedArray.Make1` functor
for generated polymorphic unboxed array types.
      
** Mark Shinwell then replied:

Thanks for sharing your code.  I also did an experiment
with this recently (although I would argue that it isn't always
clear that flat arrays are desirable).  My flat array module
is designed to hold a sequence of values of record type.  Pointers
to the records may be retrieved and then normal mutable field
updates may be used to change them.  (In case you are wondering,
if you put some other variety of value in the array, then it's
probably safe but won't be useful---since in particular there
is no "set" method provided.)

module Flat_array : sig
  type 'a t

  val create : num_elements:int -> default:'a -> 'a t
  val element_at : 'a t -> index:int -> 'a
end

Here's a simple example of its use.  The code of the
implementation is left as an interesting exercise for the
reader ;)

type t = {
  mutable foo : int;
  mutable bar : string;
}

let default = { foo = 42; bar = "hello world"; }

let test () =
  let arr = Flat_array.create ~num_elements:10 ~default in
  let r = Flat_array.element_at arr ~index:4 in
  Printf.printf "foo=%d, bar=%s\n%!" r.foo r.bar;
  Gc.compact (); (* make sure we haven't corrupted the heap *)
  r.foo <- 10;
  r.bar <- "bar";
  Gc.compact ();
  Printf.printf "foo=%d, bar=%s\n%!" r.foo r.bar

--

module Flat_array : sig
  type 'a t

  val create : num_elements:int -> default:'a -> 'a t
  val element_at : 'a t -> index:int -> 'a
end = struct
  type _ t = Obj.t

  let create ~num_elements ~default =
    let fields_per_elt = Obj.size (Obj.repr default) in
    let size = 1 + num_elements * (1 + fields_per_elt) - 1 in
    let t = Obj.new_block Obj.closure_tag size in
    let stride = Sys.word_size/8 * (fields_per_elt + 1) in
    Obj.set_field t (size - 1) (Obj.repr stride);
    let default = Obj.repr default in
    for elt = 0 to num_elements - 1 do
      let header_index_for_elt = (1 + fields_per_elt) * elt - 1 in
      for field = 0 to fields_per_elt - 1 do
        Obj.set_field t (header_index_for_elt + 1 + field)
          (Obj.field default field)
      done;
      if elt > 0 then begin
        let infix_header =
          let infix_offset = elt * (fields_per_elt + 1) in
          let actual_value =
            (infix_offset lsl 10) lor Obj.infix_tag
          in
          assert (actual_value land 1 = 1);
          actual_value lsr 1
        in
        Obj.set_field t header_index_for_elt (Obj.repr infix_header)
      end
    done;
    t

  let element_at t ~index =
    let offset =
      if index = 0 then 0
      else
        let t = ((Obj.magic t) : int array) in
        let stride = t.(Array.length t - 1) in
        stride * index
    in
    let record_is_at = Obj.add_offset t (Int32.of_int offset) in
    Obj.obj record_is_at
end
      
** Bob Atkey then replied:

> Thanks for sharing your code.  I also did an experiment
> with this recently (although I would argue that it isn't always
> clear that flat arrays are desirable).  My flat array module
> is designed to hold a sequence of values of record type.  Pointers
> to the records may be retrieved and then normal mutable field
> updates may be used to change them.

This looks really useful. Especially the fact that your implementation
has no allocation overhead when accessing an element in the array.
unboxed-arrays definitely has allocation overhead, and also an
interpretation overhead when reading and writing. Maybe these could be
mitigated with enough inlining though.

> (In case you are wondering,
> if you put some other variety of value in the array, then it's
> probably safe but won't be useful---since in particular there
> is no "set" method provided.)

Unfortunately, Flat_array seems to go a bit awry when you put strings
or arrays in as the default element:

(OCaml 4.01.0)

# let s = Flat_array.create 10 "hello";;
val s : string Flat_array.t = <abstr>
# Flat_array.element_at s ~index:0;;
- : string =
"hello\000\000\002\249\b\000\000\000\000\000\000hello\000\000\002\249\016\000\000\000\000\000\000hello\000\000\002\249\024\000\000\000\000\000\000hello\000\000\002\249
\000\000\000\000\000\000hello\000\000\002\249(\000\000\000\000\000\000hello\000\000\002\2490\000\000\000\000\000\000hello\000\000\002\2498\000\000\000\000\000\000hello\000\000\002\249@\000\000\000\000\000\000hello\000\000\002\249H\000\000\000\000\000\000hello\000\000\002!\000\000\000\000\000\000"

# let t = Flat_array.create 10 [|1;2;3|];;
val t : int array Flat_array.t = <abstr>
# Flat_array.element_at t ~index:0;;
- : int array =
[|1; 2; 3; 2172; 1; 2; 3; 4220; 1; 2; 3; 6268; 1; 2; 3; 8316; 1; 2; 3; 10364;
  1; 2; 3; 12412; 1; 2; 3; 14460; 1; 2; 3; 16508; 1; 2; 3; 18556; 1; 2; 3;
  32|]

Even worse, because it is accessing memory off the end of the block
allocated by Flat_array:

# Flat_array.element_at t ~index:9;;
- : int array =
[|1; 2; 3; 32; 1920; 1; 2; 3; 1019; 19742592; 2043; 70323357739924;
  70323353168284; 70323352779268; 2555; 70323357739208; 70323353168344;
  70323353168448; 0; 1019; 70323357739294; 1408; 70323358792352;
  70323358792376; 1408; 5224; 70323358798112; 1408; 5225; 70323352741540;
  1408; 70323358792388; 0; 1408; 5223; 70323358796124|]


I can see how one might detect at runtime that a string is being used
as the default element, but I'm not sure if there is a way to tell the
difference between arrays, tuples and records at runtime. I thought
that it might be possible to get a segfault during GC by putting a
string full of 0 bytes in, but I wasn't able to actually make this
happen.
      
** Mark Shinwell finally said:

> Unfortunately, Flat_array seems to go a bit awry when you put strings
> or arrays in as the default element

Ah, yeah, so thinking about this: any value whose accessors
depend on the size of the value [that is stored within the
value itself] isn't going to work. The "sub-values" (e.g.
the individual records) inside a single [Flat_array] value
don't have the usual size field in the header. Strings are
an example that will hit this problematic case.
      
========================================================================
7) batteries 2.2.0
Archive: <https://sympa.inria.fr/sympa/arc/caml-list/2014-01/msg00154.html>
------------------------------------------------------------------------
** Gabriel Scherer announced:

Batteries (OCaml Batteries Included) is a community-developed overlay
over the "standard" library distributed with the compiler, that aims
to provide general-purpose data-structures and convenient functions.

The project follows a semantic versioning scheme; the new version is
backward-compatible with the previous releases 2.1.0 (July 2013) and
2.0.0 (January 2013). The lowest OCaml version certainly supported is
3.12.

The new release is available in OPAM, or as a tarball
  <https://forge.ocamlcore.org/frs/download.php/1363/batteries-2.2.tar.gz>
  <https://github.com/ocaml-batteries-team/batteries-included/releases/tag/v2.2.0>
or from the sources
  <https://github.com/ocaml-batteries-team/batteries-included>
The online API documentation is at:
  <http://ocaml-batteries-team.github.io/batteries-included/hdoc2/>

This new version saw increased commits from new contributors, which
did most of the work that is included in this release: François
Berenger (which also did a good share of work for the
previous release), Simon Cruanes, and Jacques-Pascal Deplaix. As you
can see, it is the right time for people with a name in the beginning
of the alphabet to contribute -- and others.

As always, the work in this release is split between some code
improvements and some new functions. Some highlights include:

- val split_opt: elt -> t -> t * elt option * t
  on sets, as a generalization of stdlib's
  split : elt -> t -> t * bool * t
  (François Berenger)

- various cartesian_product functions, including one for Enum
  accommodating infinite enumerations
  (Simon Cruanes)

- kahan_sum functions for slower but numerically-accurate summation of
  floats ( <http://en.wikipedia.org/wiki/Kahan_summation_algorithm> )
  (Gabriel Scherer)

- val filteri : (int -> 'a -> bool) -> 'a list -> 'a list
  val filteri_map : (int -> 'a -> 'b option) -> 'a list -> 'b list
  (Jacques-Pascal Deplaix)

- val bsearch :
    'a BatOrd.ord -> 'a array -> 'a ->
      [ `At of int | `Just_after of int
      | `All_lower | `All_bigger | `Empty ]
  (Simon Cruanes)

Remarkably, there was only one bug fixed (by its reporter
Jonas Jensen) during this release circle.

With many thanks to Francois Berenger, Cedric Cellier, Simon Cruanes,
Jacques-Pascal Deplaix, David Fourchaux, Rudi Grinberg, Jonas Jensen,
Kensuke Matsuzaki, and Eric Norige.
      
========================================================================
8) Core Suite 109.60.00
Archive: <https://sympa.inria.fr/sympa/arc/caml-list/2014-01/msg00157.html>
------------------------------------------------------------------------
** Ben Millwood announced:

I am excited to announce the 109.60.00 release of the Core suite.

The following packages were upgraded:

- async
- async_extra
- async_kernel
- async_unix
- comparelib
- core
- core_kernel
- custom_printf
- faillib
- pipebang
- sexplib
- type_conv

Files and documentation for this release are available on our website
and all packages are in opam:

<https://ocaml.janestreet.com/ocaml-core/109.60.00/individual/>
<https://ocaml.janestreet.com/ocaml-core/109.60.00/doc/>

Here is the list of changes for this version:

# 109.60.00

## async_extra

- Replaced `Tcp_file.serve`'s `~port:int` argument with
`Tcp.Where_to_listen.inet`.

## async_kernel

- Changed the scheduler to clear a job from its queue when it runs the
job, eliminating a performance regression from 109.57.

Clearing avoids spurious promotion of what would otherwise be dead
data associated with already-executed jobs.

## async_unix

- Fixed a bug in detection of the thread pool being stuck that could
overstate the amount of time the pool was stuck.

It had been incorrectly reporting the duration of the thread pool
being stuck if the pool had no work in it and then got enough jobs
to be stuck. In that situation, the duration included the time span
before the pool got stuck. If the pool had been idle long enough,
this could even spuriously abort the program.

## comparelib

- Fixed a type error in `with compare` of polymorphic variant
inclusions.

## core

- Added `Iobuf.unsafe_advance`.

This can be used to benchmark inner loops that have redundant bounds
checking, to see if the checks matter. For example, see the
following two `advance` calls:

let rec process_buffer buf ~f =
let len = Iobuf.length buf in
if header_len <= len then
let msg_len = header_len + Iobuf.Unsafe.Peek.uint16_be buf ~pos:0 in
if msg_len <= len then begin
let len = msg_len - header_len in
Iobuf.advance buf header_len;
f (Protocol.packed_of_iobuf buf);
Iobuf.advance buf len;
process_buffer buf ~f
end

- Added `Weak_hashtbl.add_exn` and `sexp_of_t`.
- Fixed `Lock_file.create` to behave correctly if the target mountpoint
is out of space.

Previously in this situation, `Lock_file.create` would create an
empty lock and exit with exception trying to write pid/message
there. Subsequent runs would not able to read pid out of empty pid
file and `blocking_create` would block instead of removing defective
lock.

- Dropped the `-principal` flag from corebuild

## core_kernel

- Added `Gc.keep_alive`, which ensures its argument is live at the point
of the call.
- Added `Sexp.With_text` module, which keeps a value and the a sexp it
was generated from, preserving the original formatting.

## custom_printf, faillib, pipebang, type_conv

- Compatibility with warning 7 (method override)

## sexplib

- Separated out an exception printer that depends on unix into a
separate `sexplib_unix` item.

I hope you like it!
      
========================================================================
9) Other OCaml News
------------------------------------------------------------------------
** From the ocamlcore planet blog:

Thanks to Alp Mestan, we now include in the OCaml Weekly News the links to the
recent posts from the ocamlcore planet blog at <http://planet.ocaml.org/>.

A toy type language (3) using Fix to compute variance:
  <http://gallium.inria.fr/blog/a-toy-type-language-3>

A toy type language (2) variance 101:
  <http://gallium.inria.fr/blog/a-toy-type-language-2>
      
========================================================================
Old cwn
------------------------------------------------------------------------

If you happen to miss a CWN, you can send me a message
(alan.schmitt at polytechnique.org) and I'll mail it to you, or go take a look at
the archive (<http://alan.petitepomme.net/cwn/>) or the RSS feed of the
archives (<http://alan.petitepomme.net/cwn/cwn.rss>). If you also wish
to receive it every week by mail, you may subscribe online at
<http://lists.idyll.org/listinfo/caml-news-weekly/> .

========================================================================



More information about the caml-news-weekly mailing list