05 Dec 2022

Untitled

Previously: 221119-2306 LM paper garden has more context about such metrics, 221204-2349 Interesting block with explanations of ML stuff has the compression angle for it.

Dumping these here for now.

The GPT2¹ paper puts it like this:

“Results on language modeling datasets are commonly reported in a quantity which is a scaled or ex- ponentiated version of the average negative log probability per canonical prediction unit - usually a character, a byte, or a word.”

GPT-2 (Metrics : PPL, BPB, BPC) led me to:

python - How to calculate bits per character of a string? (bpc) - Stack Overflow
python - How to calculate bits per character of a string? (bpc) - Stack Overflow

Evaluation Metrics for Language Modeling is really detailed.

Radford: Language models are unsupervised multitask learners - Google Scholar / https://qdata.github.io/deep2Read/talks2019/19sCourse/20190417-Faizan-OpenAIGPT2.pdf ↩︎

zc
zc/it
Gp

Nel mezzo del deserto posso dire tutto quello che voglio.

serhii.net

Untitled