Spacy is neat
Spacy and it’s as nice and I thought it’d be.
Interesting bits and general dump of first impressions:
- NER @ CLI: Custom-named entity recognition with spaCy in four lines: spacy can:
- Convert NER datasets from conll
- while outputting nice status info
- Has a “Debug data” tool that allows to validate train data (and other stuff): Command Line Interface · spaCy API Documentation
- Can do rule-based matching, linguistic features, Rule-based matching · spaCy Usage Documentation
- Some support for Transformers, including allegedly all HuggingFace ones!
Spanare heavily token-based, including for NER stuff. Can’t set a sub-token entity, for example.
Doc.char_span()supports creating a Span based on characters and various alignment methods! Doc · spaCy API Documentation
- And of course we can get the character offsets from the span itself
- You can merge/split tokens: Linguistic Features · spaCy Usage Documentation
Exampleclass for individual training instances can do neat stuff with BIO mapping, aligning of NER tokens etc: Example · spaCy API Documentation
Nel mezzo del deserto posso dire tutto quello che voglio.