
Outline of a trainable, streaming tokenizer for NLP with Elixir
Patrick Tschorn wrote “Virtually all NLP tasks require some form of tokenization, and in many cases the tokenizers provided by popular NLP libraries are adequate. If, however, the input material strays sufficiently from the norm, the available tokenizers may not be satisfactory and it may turn out that it is nearly impossible or far too costly to adapt…”