Outline of a trainable, streaming tokenizer for NLP with Elixir

Patrick Tschorn wrote “Virtually all NLP tasks require some form of tokenization, and in many cases the tokenizers provided by popular NLP libraries are adequate. If, however, the input material strays sufficiently from the norm, the available tokenizers may not be satisfactory and it may turn out that it is nearly impossible or far too costly to adapt…”

Visitor7 [CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0)], from Wikimedia Commons

Fast Elixir Porter2 Stemmer

Patrick Tschorn wrote “Motivation: understand the Porter2 stemming algorithm and learn some Elixir On a recent project, my mission was to refine and substantially extend a prototype document classification system originally written by somebody else in Python. In order to keep the extended system small and understandable (i.e. maintainable by the original author), I implemented all but one of…”

Carpe Diem Word Scrabble

Programming a word at a time

Irene Papakonstantinou wrote “There is an improv game called one-word-at-a-time. It goes like this: two (or more) people collaboratively compose a story, each adding just one word at a time. It sounds easy, but actually humans can be pretty bad at this game! Some of the rules of improv are “be obvious” and “accept offers”. In the context…”