Untitled
Previously: [[garden/it/221119-2306 LM paper garden]] has more context about such metrics, [[garden/it/221204-2349 Interesting block with explanations of ML stuff]] has the compression angle for it.
Dumping these here for now.
The GPT21 paper puts it like this:
“Results on language modeling datasets are commonly reported in a quantity which is a scaled or ex- ponentiated version of the average negative log probability per canonical prediction unit - usually a character, a byte, or a word.”
GPT-2 (Metrics : PPL, BPB, BPC) led me to:
- python - How to calculate bits per character of a string? (bpc) - Stack Overflow
- python - How to calculate bits per character of a string? (bpc) - Stack Overflow
Evaluation Metrics for Language Modeling is really detailed.
Nel mezzo del deserto posso dire tutto quello che voglio.