Trainable Tokenizer is able to tokenize and segment most languages based on supplied configuration and sample data. The tokenizer is not aimed e.g. for Chinese with no explicit delimitation of words.