Outline of a trainable, streaming tokenizer for NLP with Elixir

Patrick Tschorn wrote “Virtually all NLP tasks require some form of tokenization, and in many cases the tokenizers provided by popular NLP libraries are adequate. If, however, the input material strays sufficiently from the norm, the available tokenizers may not be satisfactory and it may turn out that it is nearly impossible or far too costly to adapt…”

Reading ARFF files with Elixir

Patrick Tschorn wrote “If you are implementing a machine learning approach, you are likely to want to test it on publicly available datasets. A large number of these datasets use the ARFF file format established by Weka. I am not aware of any Elixir ARFF readers, so I am going to explore writing one (‘Arfficionado‘) in this blog.…”

Building rule-based machine learning systems from scratch

Patrick Tschorn wrote “Sometimes, it is obvious that a project needs machine learning, but you can tell that simply pumping the data through all the algorithms in a popular library (and picking the one algorithm that performs least badly) is not the answer. Machine learning libraries cannot cover all algorithms, trade-offs and heuristics specific to arbitrary problem domains.…”

Dockerizing Sybase and connecting to it from Elixir

Patrick Tschorn wrote “On a recent project, we were required to connect our software to a Sybase DB on Linux, which we managed through FreeTDS and ODBC. I will summarize the relevant details in this post. Please be aware that a number of alternative solutions are conceivable. I built Docker images for our software, so that I would…”

Visitor7 [CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0)], from Wikimedia Commons

Fast Elixir Porter2 Stemmer

Patrick Tschorn wrote “Motivation: understand the Porter2 stemming algorithm and learn some Elixir On a recent project, my mission was to refine and substantially extend a prototype document classification system originally written by somebody else in Python. In order to keep the extended system small and understandable (i.e. maintainable by the original author), I implemented all but one of…”

A basic recipe for an Elixir SSL server

Patrick Tschorn wrote “In this post, we’ll first try out Erlang’s SSL application interactively and then put together a simple Elixir SSL server OTP application using the Supervisor and GenServer behaviours. Preparation First of all, we’ll create a self-signed certificate: mkdir foo cd foo openssl genrsa -out key.pem 1024 openssl req -new -key key.pem -out request.pem # (using…”

Santeri Viinamäki [CC-BY-3.0 (http://creativecommons.org/licenses/by/3.0)], via Wikimedia Commons

CodeMesh 2014 Day 1

James Uther wrote “I was at day 1 of CodeMesh this year (you can see Tim’s report on day 2 here). A quick recap: QOTD: There are 3 fire exits as marked, but we’re confident that Erlang programmers who die will be restarted. Keynote: complexity is outside the code Jessica Kerr & Dan North A good, entertaining talk that…”

SpringSource / VMWare Acquire Rabbit Technologies

Mike Rowlands wrote “SpringSource, a division of VMware, Inc. today announced the acquisition by VMware of Rabbit Technologies, Ltd, a company set up by LShift and partners Monadic and CohesiveFT. Read the full story”

On the limits of concurrency: Worker Pools in Erlang

Matthew Sackman wrote “A worker pool is a very common pattern, and they exist in the standard libraries for many languages. The idea is simple: submit some sort of closure to a service which commits to running the closure in the future in some thread. Normally the work is shared out among many different threads and in the…”

The fine art of holding a file descriptor

Matthew Sackman wrote “People tend to like certain software packages to be scalable. This can have a number of different meanings but mostly it means that as you throw more work at the program, it may require some more resources, in terms of memory or CPU, but it nevertheless just keeps on working. Strangely enough, it’s fairly difficult…”

By Tangopaso (Self-photographed) [Public domain], via Wikimedia Commons

Memory matters – even in Erlang

Marek Majkowski wrote “Some time ago we got an interesting bug report for RabbitMQ. Surprisingly, unlike other complex bugs, this one is easy to describe:  At some point basic.get suddenly starts being very slow – about 9 times slower!”

RabbitMQ-shovel: Message Relocation Equipment

Matthew Sackman wrote “In several applications, it’s very useful to be able to take messages out of one RabbitMQ broker, and insert them into another. Many people on our mailing list have being asking for such a shovel, and we’ve recently been able to devote some time to writing one. This takes the form of a plugin for…”