Day 843
Python dataclasses
-
This module provides a decorator and functions for automatically adding generated special methods such as init() and repr() to user-defined classes. It
- dataclasses — Data Classes — Python 3.9.4 documentation
HuggingFace
“Token classification” includes but is not limited to NER: Hugging Face – The AI community building the future.. Really nice new correct phrase I’ll be using!
Installing (after tensorflow and/or pytorch):
pip install transformers
Caches by default in user folder but can be overridden:
export HF_HOME="/data/sh/experiments/bert/cache"
The “hosted inference API” on the website is really cool! dslim/bert-base-NER · Hugging Face
Example of converting conll dataset to what BERT expects: Fine Tuning BERT for NER on CoNLL 2003 dataset with TF 2.0 | by Bhuvana Kundumani | Analytics Vidhya | Medium
The BERT model documentation shows the tokenizers etc etc etc. - BERT — transformers 4.5.0.dev0 documentation
-
Training and fine-tuning — transformers 4.5.0.dev0 documentation - same model can be trained/imported from TF to pytorch and back! Wow!
-
Documentation of a sample model: transformers/examples/research_projects/distillation at master · huggingface/transformers
- It has examples of preparing data for finetuning
- In general, HF’s examples are wonderful
-
Another example of fine-tuning BERT in Pytorch for NER: transformers/examples/pytorch/token-classification at master · huggingface/transformers
- Needs
transformers
installed from source (git/master): https://huggingface.co/transformers/installation.html#installing-from-source /pip install git+https://github.com/huggingface/transformers
- Trained in 37 minutes, wrote everything to
/tmp/test-ner/
, checkpoints, eval data. Wow. - Command used was:
CUDA_VISIBLE_DEVICES=1; python run_ner.py --model_name_or_path bert-base-uncased --dataset_name conll2003 --output_dir /tmp/test-ner --do_train --do_eval
- Needs
python datasets package
Here datasets
is imported: transformers/requirements.txt at master · huggingface/transformers
TODO - what is this and where can I learn more? Is this HF specific? What else is there?
HuggingFace datasets
It has a really nice interface for searching datasets! Filter by task, language, etc.
German NER datasets: Hugging Face – The AI community building the future.
Some German NER models, sometimes based on bert: Hugging Face – The AI community building the future.
- I could try to reproduce them, for example this one using BERT-base-german-cased and finetuned on legal entity recognition: mrm8488/bert-base-german-finetuned-ler · Hugging Face
Huggingface converting between tf and pytorch
Converting Tensorflow Checkpoints — transformers 4.5.0.dev0 documentation
Is this real?
export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12
transformers-cli convert --model_type bert \
--tf_checkpoint $BERT_BASE_DIR/bert_model.ckpt \
--config $BERT_BASE_DIR/bert_config.json \
--pytorch_dump_output $BERT_BASE_DIR/pytorch_model.bin
Random / recipes / cooking
Tatar von geräuchertem Forellenfilet mit Avocado - Annemarie Wildeisens KOCHEN
Die Forellenfilets in kleine Würfelchen schneiden. Die Schalotte schälen und sehr fein hacken. Die Cherrytomaten je in 6 oder 8 Stücke schneiden. Alle diese Zutaten in eine kleine Schüssel geben und sorgfältig mit der Mayonnaise mischen.
Forelle + tomatos + mayonnaise is literally the only recipe I’ve liked with mayonnaise in it