SentencePiece is an unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems where the vocabulary size is predetermined prior to the neural model training. SentencePiece implements subword units (e.g., byte-pair-encoding (BPE)) and unigram language model with the extension of direct training from raw sentences. SentencePiece allows us to make a purely end-to-end system that does not depend on language-specific pre/postprocessing.
Binary packages can be installed with the high-level tool pkgin (which can be installed with pkg_add) or pkg_add(1) (installed by default). The NetBSD packages collection is also designed to permit easy installation from source.
The pkg_admin audit command locates any installed package which has been mentioned in security advisories as having vulnerabilities.
Please note the vulnerabilities database might not be fully accurate, and not every bug is exploitable with every configuration.
Problem reports, updates or suggestions for this package should be reported with send-pr.