[cwn] Attn: Development Editor, Latest OCaml Weekly News

Alan Schmitt alan.schmitt at polytechnique.org
Tue Apr 23 00:58:17 PDT 2019


Hello

Here is the latest OCaml Weekly News, for the week of April 16 to 
23,
2019.

Table of Contents
─────────────────

Wrapping C++ std::shared_ptr and similar smart pointers
OCaml 4.08.0+beta3
Menhir and preserving comments from source
ppx_protocol_conv 5.0.0
Orsetto: structured data interchange languages (preview release)
Searching for functions
Other OCaml News
Old CWN


Wrapping C++ std::shared_ptr and similar smart pointers
═══════════════════════════════════════════════════════

  Archive:
  <https://discuss.ocaml.org/t/wrapping-c-std-shared-ptr-and-similar-smart-pointers/3582/1>


Manuel Hornung asked
────────────────────

  I'm trying to create Reason/OCaml bindings for the Skia 2D 
  graphics
  library. The library makes heavy use of smart pointers similar 
  to
  `std::shared_ptr`s, but they called them `sk_sp`.

  Now my first idea for wrapping these was using a regular pointer 
  that
  points to a shared pointer. That would trigger the release of 
  the
  memory behind the shared pointer as soon as the local variable
  containing the shared pointer goes out of scope though.

  I found a solution that looked promising to me in
  <https://github.com/ygrek/scraps/blob/master/cxx_wrapped.h> but 
  now I
  heard that reducing the refcount in the finalizer is also not a 
  good
  idea. Unfortunately I don't know why that is not a good idea and 
  I
  also don't have a better one.

  Can anyone help me understand this better and point me towards a
  better approach?


Guillaume Munch-Maccagnoni replied
──────────────────────────────────

  (Sorry for the delay as I have been busy.)

  It all comes down to the fact that tracing and reference 
  counting have
  different advantages and drawbacks, and the main difference for 
  this
  question is that RC reclaims promptly, whereas tracing does not
  reclaim predictably; in addition OCaml is currently poor in 
  terms of
  predictable resource management.

  Smart pointers can be used to manage resources other than 
  memory. (I
  mean smart pointers that implement deterministic reclamation of
  resources such as unique or reference-counted pointers; in 
  principle
  smart pointers are not restricted in what they implement: 
  delayed
  evaluation, [roots for tracing GCs]… such exotic pointers are 
  out of
  the scope of my answer.)

  First, you need to determine whether the pointer manages 
  non-memory
  resources (the destruction closes a file, releases a lock, rolls 
  back
  some state…). If so, using finalizers is a no-go, because you 
  cannot
  predict when and in which order finalizers run, and in practice 
  it can
  be way too late. When that is the case, skip 1). For instance I 
  see
  that your library has some functions that return RAII guards; 
  quite
  obviously these cannot be handled with finalizers.


[roots for tracing GCs]
<http://manishearth.github.io/blog/2015/09/01/designing-a-gc-in-rust/>

1) Custom blocks with finalizer
╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌

  If the smart pointer only manages memory, then it is possible to
  represent it with a custom block with a finalizer attached to 
  it. The
  GC needs to know the size of what it manages, otherwise it will 
  not
  work hard enough to reclaim memory and you can end up with a 
  memory
  leak. This has occasionally been called “[the familiar 
  "allocation of
  custom objects mess up the speed of the major GC" problem]”.

  The situation is supposed to improve in OCaml 4.08, which 
  introduces
  [a new function `caml_alloc_custom_mem'] that lets you specify 
  the
  size of the memory managed by the custom block, which the GC's
  heuristics will take into account. (`caml_alloc_custom' also has
  parameters to tweak the GC speed but presumably this was not 
  good
  enough as witnessed by the multiple bug reports referenced in 
  that
  PR.)

  So you can use as a source of inspiration @ygrek's `wrapped' 
  pointer
  you have linked to above, but you must adapt it to tell the 
  OCaml GC
  the size of the data your custom block contains.

  Pros:

  • Expressive: the foreign data is abstracted as an OCaml value 
  that
    can be passed around, inserted into data structures, etc.

  Cons:

  • No-go for non-memory resources.

  • You need to know the size of what you are managing—there is no
    universal smart pointer wrapper!

  • Not so good for performance/scale or interoperability. Mixing
    tracing and RC cumulates the drawbacks of both; in particular 
    you
    inherit the possible unbounded latency due to the upfront
    deallocation cost of RC (depending on your use-case), and you 
    are
    even at a risk of creating cycles that are never collected if 
    you
    mix this method with [that one] to store OCaml values on the 
    foreign
    side.

  These are some guaranteed theoretical drawbacks, but I imagine 
  that
  there can be more practical implementation-specific issues (as
  witnessed by `caml_alloc_custom' vs `caml_alloc_custom_mem'). I 
  do not
  have hands-on experience with custom blocks, and while 
  researching for
  this answer, I found this usage not very well documented, so I 
  hope
  that experts can fill-in the gaps and/or correct the above if 
  needed.


[the familiar "allocation of custom objects mess up the speed of 
the
major GC" problem] <https://github.com/ocaml/ocaml/issues/7676>

[a new function `caml_alloc_custom_mem']
<https://github.com/ocaml/ocaml/pull/1738>

[that one]
<https://discuss.ocaml.org/t/storing-an-ocaml-value-in-a-c-structure/3521>


2) Deterministic resource management
╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌

  To avoid the impedance mismatch between smart pointers and the 
  GC, you
  can rely on deterministic resource management. In OCaml, the 
  idiomatic
  expression of it is to use “~with_~” wrappers based on
  `unwind-protect' [see the [example of files]]. OCaml 4.08 
  introduces
  `Fun.protect', an implementation of `unwind-protect' suitable 
  for
  OCaml.

  Pros:

  • Predictable: can be used for non-memory resources.

  Cons:

  • Lacks expressiveness: resources live for the exact duration of 
  their
    defining scope, and are reclaimed in LIFO order.

  • Allows “use after free”: the resource can be referenced 
  outside of
    its scope, if not careful.

  • Currently incompatible with asynchronous exceptions: OCaml 
  does not
    currently allow an implementation of unwind-protect that 
    protects
    from asynchronous exceptions being raised inside the finally 
    clause.


[example of files] 
<https://dev.realworldocaml.org/error-handling.html>


3) Manual resource management
╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌

  If neither 1) nor 2) fit the bill, you have to resort to manual
  resource management, in which the user has to call some `free'
  function explicitly (and gets an exception if they use it after
  `free'). It is “hard” to program correctly with manual resource
  management, moreso in the presence of exceptions. For this 
  reason,
  people mix it with 1) and/or 2); for instance they use 
  unwind-protect
  in a non-systematic manner, or they attach finalizers to act as 
  a
  fallback, or both. While with 1) and 2) you are still within the 
  realm
  of structured programming, with manual resource management you 
  enter
  the realm of debugging-oriented programming—think programming in 
  a
  weird dialect of old C++.

  Pros:

  • Last resort solution

  Cons:

  • Non-idiomatic code

  • Hard to program

  • Hard to reason about the code

  Discussions with Serious Industrial OCaml Users a while ago 
  (starting
  around POPL 2017 in Paris) have let appear OCaml's current 
  issues with
  resource management. These discussion prompted [a proposal for a
  resource management model for OCaml], inspired by RAII and move
  semantics from modern C++/Rust. In a nutshell, it aims to lift 
  the
  expressiveness limitations of 2).  Interoperability is probably 
  its
  most important application.


[a proposal for a resource management model for OCaml]
<https://discuss.ocaml.org/t/a-proposal-for-a-resource-management-model-for-ocaml/1680>


OCaml 4.08.0+beta3
══════════════════

  Archive:
  <https://sympa.inria.fr/sympa/arc/caml-list/2019-04/msg00048.html>


Damien Doligez announced
────────────────────────

  Dear OCaml users,

  The release of OCaml 4.08.0 is approaching. We have created a 
  third
  beta version to help you adapt your software to the new features 
  ahead
  of the release.

  The source code is available at these addresses:

  <https://github.com/ocaml/ocaml/archive/4.08.0+beta3.tar.gz>
  <https://caml.inria.fr/pub/distrib/ocaml-4.08/ocaml-4.08.0+beta3.tar.gz>

  The compiler is (or will soon be) also available in OPAM with 
  one of
  the following commands.

  opam switch create ocaml-variants.4.08.0+beta3
  –repositories=default,beta=git+<https://github.com/ocaml/ocaml-beta-repository.git>

  or

  opam switch create ocaml-variants.4.08.0+beta3+<VARIANT>
  –repositories=default,beta=git+<https://github.com/ocaml/ocaml-beta-repository.git>

  where you replace <VARIANT> with one of these:
    afl
    default_unsafe_string
    flambda
    fp
    fp+flambda

  We want to know about all bugs. Please report them here:
   <https://github.com/ocaml/ocaml/issues>

  Happy hacking,

  – Damien Doligez for the OCaml team.


  The changes from beta2 are the following:

  • GPR#1942, GPR#2244: simplification of the static check for 
  recursive
    definitions (Alban Reynaud and Gabriel Scherer, review by 
    Jeremy
    Yallop, Armaël Guéneau and Damien Doligez)

  • GPR#1354, GPR#2177: Add fma support to Float module. (Laurent
    Thévenoux, review by Alain Frisch, Jacques-Henri Jourdan, 
    Xavier
    Leroy)

  • GPR#2202: Correct 
  Hashtbl.MakeSeeded.{add_seq,replace_seq,of_seq} to
    use functor hash function instead of default hash
    function. Hashtbl.Make.of_seq shouldn't create randomized hash
    tables. (David Allsopp, review by Alain Frisch)

  • * PR#4208, PR#4229, PR#4839, PR#6462, PR#6957, PR#6950, 
  GPR#1063,
      GPR#2176, GPR#2297: Make (nat)dynlink sound. (Mark Shinwell, 
      Leo
      White, Nicolás Ojeda Bär, Pierre Chambart)

  • GPR#2317: type_let: be more careful generalizing parts of the
    pattern (Thomas Refis and Leo White, review by Jacques 
    Garrigue)

  • MPR#6242, GPR#2143, MPR#8558, GPR#8559: optimize some local
    functions (Alain Frisch, review by Gabriel Scherer)

  • #7829, #8585: Fix pointer comparisons in freelist.c (for 
  32-bit
     platforms) (David Allsopp and Damien Doligez)

  • #8567, #8569: on ARM64, use 32-bit loads to access
     caml_backtrace_active (Xavier Leroy, review by Mark Shinwell 
     and
     Greta Yorsh)

  • #8568: Fix a memory leak in mmapped bigarrays (Damien Doligez,
     review by Xavier Leroy and Jérémie Dimino)

  • MPR#7548: printf example in the tutorial part of the manual
    (Kostikova Oxana, rewiew by Gabriel Scherer, Florian 
    Angeletti,
    Marcello Seri and Armaël Guéneau)

  • MPR#7547, GPR#2273: Tutorial on Lazy expressions and patterns 
  in
    OCaml Manual (Ulugbek Abdullaev, review by Florian Angeletti 
    and
    Gabriel Scherer)

  • GPR#8508: refresh \moduleref macro (Florian Angeletti, review 
  by
    Gabriel Scherer)

  • MPR#7919, GPR#2311: Fix assembler detection in configure 
  (Sébastien
    Hinderer, review by David Allsopp)

  • GPR#2295: Restore support for bytecode target XLC/AIX/Power
    (Konstantin Romanov, review by Sébastien Hinderer and David 
    Allsopp)

  • GPR#8528: get rid of the direct call to the C preprocessor in 
  the
    testsuite (Sébastien Hinderer, review by David Allsopp)

  • Issue #7938, GPR #8532: Fix alignment detection for ints on 
  32-bits
    platforms (Sébastien Hinderer, review by Xavier Leroy)

  • * GPR#8533: Remove some unused configure tests (Stephen Dolan,
      review by David Allsopp and Sébastien Hinderer)

  • GPR#2207,#8604: Add opam files to allow pinning (Leo White, 
  Greta
    Yorsh, review by Gabriel Radanne)

  • MPR#7835, GPR#1980, GPR#8548, GPR#8586: separate scope from 
  stamp in
    idents and explicitly rescope idents when substituting
    signatures. (Thomas Refis, review by Jacques Garrigue and Leo 
    White)

  • #8550, #8552: Soundness issue with class generalization 
  (Jacques
     Garrigue, review by Leo White and Thomas Refis, report by 
     Jeremy
     Yallop)


Menhir and preserving comments from source
══════════════════════════════════════════

  Archive:
  <https://discuss.ocaml.org/t/menhir-and-preserving-comments-from-source/3686/1>


Chet Murthy asked
─────────────────

  I've used ocamlyacc over the years a lot, and menhir in a couple 
  of
  projects (including a big one I'm working on right now).  I've 
  also
  used camlp4/camlp5's stream-parsers in a *ton* of projects.  And 
  of
  course, with ocamllex and sedlexing.  I find that with 
  stream-parsers,
  it's easy to arrange for preserving lexical positions in tokens, 
  and
  then carrying that across to the parse-tree.  To wit,
  ┌────
  │ ...
  │ type basic_token = ...... ;;
  │ type token = basic_token * lexical_position_info_t ;;
  │ ...
  └────

  and then in your stream parser, you pattern-match on the first
  component, e.g.
  ┌────
  │ ...
  │ parser [< .... ; '(Tstring s, _) ; ... >] -> yadda yadda
  │ ...
  └────

  But with menhir (and ocamlyacc) it seems like, you need to embed 
  the
  lexical position info in the token, e.g.
  ┌────
  │ ...
  │ type basic_token =
  │ | Tstring of lexicai_position_info_t * string
  │ | Tsemi of lexical_position_info_t
  │ etc
  │ ...
  └────

  Is there some trick I'm missing, for how to use camlyacc/menhir 
  in a
  manner that allows preserving this positional information during 
  the
  parse?


gasche replied
──────────────

  To have location/position information in the AST: the standard
  approach I'm familiar with is not to embed position information 
  in the
  tokens, but to query it from the lexer or parser at the place 
  where
  you build your AST values in the parser actions. When using 
  ocamlyacc,
  I use the `Lexing' module for this 
  (`Lexing.lexeme_{start,end}_p'),
  when using Menhir I use its special symbols `${start,end}pos',
  `${start,end}pos(n)', `$loc', `$loc(n)'.

  To preserve comments, an approach we use in the OCaml compiler 
  (where
  comments that are docstrings are kept in the AST) is to have a 
  global
  table of comments, that is filled by the Lexer, and accessed 
  from
  parsing actions (there is a function that says basically 
  "collect all
  the comments from the last time you were called to <this 
  position>").


ppx_protocol_conv 5.0.0
═══════════════════════

  Archive:
  <https://discuss.ocaml.org/t/ann-ppx-protocol-conv-5-0-0/3692/1>


Anders Fugmann announced
────────────────────────

  It is my pleasure to announce the release of [Ppx_protocol_conv]
  version 5.0.0.

  Ppx_protocol_conv is a syntax extension to generate functions to
  serialize and de-serialize ocaml types. The ppx itself does not
  contain any protocol specific code, but relies on user defined
  'drivers' to define serialization and de-serialiazation of basic 
  types
  and structures.

  The library comes with multiple pre-defined drivers:
  • ppx_protocol_conv_json (Yojson.Safe.json)
  • ppx_protocol_conv_jsonm (Ezjson.value)
  • ppx_protocol_conv_msgpack (Msgpck.t)
  • ppx_protocol_conv_xml-light (Xml.xml)
  • ppx_protocol_conv_yaml (Yaml.value)

  The library is based on ppxlib and is is compatible with base 
  v0.12.
  Release 5.0.0 is available through opam.

  The project homepage is:
  <https://github.com/andersfugmann/ppx_protocol_conv>

  The project's [wiki pages] contains some more information on how 
  to
  use the library and existing drivers and on how to write you own
  drivers.

  *Noteworthy Change* This release includes a major rewrite of the 
  core
  of the library to allow more control by user supplied drivers 
  over the
  serialization and de-serialization of types. These changes 
  breaks
  backward compatibility.

  The json driver (`Ppx_protocol_conv_json') has been updated to 
  be
  compatible with the serialization format of ppx_deriving_yojson,
  supporting both `[@key]', `[@name]' and `[@default]' attributes, 
  and
  can be used as a replacement for `ppx_deriving_yojson' with few
  modifications.

  Deserialization functions now returns a `result' type. Old 
  support for
  exception type errors is available in functions with the `_exn'
  suffix.  For a complete list of changes, see the [Changelog].

  As always, comments, suggestions and PRs are more than welcome.


[Ppx_protocol_conv]
<https://github.com/anders.fugmann/ppx_protocol_conv>

[wiki pages] 
<https://github.com/andersfugmann/ppx_protocol_conv/wiki>

[Changelog]
<https://github.com/andersfugmann/ppx_protocol_conv/blob/master/Changelog>


Orsetto: structured data interchange languages (preview release)
════════════════════════════════════════════════════════════════

  Archive:
  <https://discuss.ocaml.org/t/ann-orsetto-structured-data-interchange-languages-preview-release/3304/6>


james woodyatt announced
────────────────────────

  I have now released `~preview4' which resolves Issue [#8] /OCaml 
  4.07:
  the new Stdlib.Seq.t is functionally equivalent to Cf_seq.t/. 
  For
  OCaml 4.06, this introduces an external dependency on the *seq*
  compatibility package. I've also checked that documentary 
  comments are
  available with *odig*, so this might be the last preview release
  before 1.0. (It depends on whether I decide to remove the 
  support for
  the *ppx_let* syntax extension.)


[#8]
<https://bitbucket.org/jhw/orsetto/issues/8/ocaml-407-the-new-stdlibseqt-is>


james woodyatt then added
─────────────────────────

  > It depends on whether I decide to remove the support for the
    *ppx_let* syntax extension.

  I've thought about this, and I will not be removing support for 
  the
  *ppx_let* syntax extension. I plan to /deprecate/ it when OCaml 
  4.08
  is released, but it will be retained while I continue supporting 
  OCaml
  4.06 and 4.07.


Searching for functions
═══════════════════════

  Archive: 
  <https://discuss.ocaml.org/t/searching-for-functions/3698/1>


Jordan Mackie announced
───────────────────────

  OCaml newbie here - coming from Haskell land out of curiosity.

  I'm curious how you guys find your way around stdlib/packages 
  etc?

  Example: I'm writing a script and I want to lookup an 
  environment
  variable. I know there's probably some function along the lines 
  of
  `get_env' somewhere, so I'd like to know where it is and what 
  type it
  has. In Haskell I'd do a hoogle search along the lines of
  <https://www.stackage.org/lts-13.18/hoogle?q=getenv> - what 
  would be
  my process in OCaml?

  I tried googling "get env var in Ocaml" - first hit is a link to
  stdlib, but I'm using base. It did at least give me the hint 
  that
  `Sys' is a relevant namespace, so I go and look at the docs for
  `Base.Sys' (many clicks later -
  <https://ocaml.janestreet.com/ocaml-core/latest/doc/base/Base/Sys/index.html>)
  but `getenv' isn't listed. But it is apparently there…

  There must be a better way?


Yawar Amin
──────────

  The Hoogle equivalent for OCaml is called odig:
  <https://erratique.ch/software/odig> . You can install it 
  locally and
  have it generate documentation for all installed packages. 
  However,
  generated documentation is not globally searchable (see last
  point). Besides that, there are a few other strategies:

  • Familiarize yourself with the standard library that ships with 
  every
    OCaml distribution:
    <https://caml.inria.fr/pub/docs/manual-ocaml/libref/> . This 
    is the
    equivalent of Haskell's `base' package. The `Prelude' 
    equivalent
    module is called `Pervasives'. You will find the `Sys' module 
    here,
    and `getenv' in there.
  • Keep <http://opam.ocaml.org/packages/> handy for when you're 
  given a
    package name to look up. Package documentation is mostly not
    uploaded to a central location like Haddock. (But people have 
    been
    talking about setting that up at docs.ocaml.org.) You'll 
    probably
    need to open up and search through `.mli' files once in a 
    while.
  • The old-style ocamldoc documentation pages (like the standard
    library I linked above) have very handy pages indexing types,
    values, and modules. However, the newer odoc documentation 
    pages
    which are becoming the de facto standard do not, as of yet. 
    There
    are a couple of issues tracking this.


Other OCaml News
════════════════

From the ocamlcore planet blog
──────────────────────────────

  Here are links from many OCaml blogs aggregated at [OCaml 
  Planet].

  • [OCaml Developer at Ahrefs (Full-time)]
  • [Learning ML Depth-First]


[OCaml Planet] <http://ocaml.org/community/planet/>

[OCaml Developer at Ahrefs (Full-time)]
<https://functionaljobs.com/jobs/9165-ocaml-developer-at-ahrefs>

[Learning ML Depth-First]
<https://blog.janestreet.com/learning-ml-depth-first/>


Old CWN
═══════

  If you happen to miss a CWN, you can [send me a message] and 
  I'll mail
  it to you, or go take a look at [the archive] or the [RSS feed 
  of the
  archives].

  If you also wish to receive it every week by mail, you may 
  subscribe
  [online].

  [Alan Schmitt]


[send me a message] <mailto:alan.schmitt at polytechnique.org>

[the archive] <http://alan.petitepomme.net/cwn/>

[RSS feed of the archives] 
<http://alan.petitepomme.net/cwn/cwn.rss>

[online] <http://lists.idyll.org/listinfo/caml-news-weekly/>

[Alan Schmitt] <http://alan.petitepomme.net/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/caml-news-weekly/attachments/20190423/1e56b937/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 487 bytes
Desc: not available
URL: <http://lists.idyll.org/pipermail/caml-news-weekly/attachments/20190423/1e56b937/attachment-0001.pgp>


More information about the caml-news-weekly mailing list