05 Dec 2022

Untitled

Previously: [[garden/it/221119-2306 LM paper garden]] has more context about such metrics, [[garden/it/221204-2349 Interesting block with explanations of ML stuff]] has the compression angle for it.

Dumping these here for now.

The GPT2¹ paper puts it like this:

“Results on language modeling datasets are commonly reported in a quantity which is a scaled or ex- ponentiated version of the average negative log probability per canonical prediction unit - usually a character, a byte, or a word.”

GPT-2 (Metrics : PPL, BPB, BPC) led me to: