# serhii.net

In the middle of the desert you can say anything you want

For the paper I’m writing, I’ll actually try to do a real garden thing. With leaves etc that get updated with new info, not chronologically like my current DTB notes.

## Basics

### Perplexity and intrinsic eval

• Resources:
• The above cites http://web.stanford.edu/~jurafsky/slp3/3.pdf that’s longer and so much better!
• ![[Screenshot_20221119-233022.png]]
• P 37 about test set needing to have enough statistical power to measure improvements
• Sampling
• Chapter 3 about Shakespeare vs WSJ and genre
• 42: Smoothing
• Unknown words so we don’t multiply 0 probs
• 7 / 130 really nice basics of ml
• https://surge-ai.medium.com/evaluating-language-models-an-introduction-to-perplexity-in-nlp-f6019f7fb914
• Another take on the same, but love it
• Links the Roberta paper about the connection between perplexity and downstream it!
• [[Screenshot_20221120-000131_Fennec.png]]
• ![[Screenshot_20221119-235918_Fennec.png]]
• If surprisal lets us quantify how unlikely a single outcome of a possible event is, entropy does the same thing for the event as a whole. It’s the expected value of the surprisal across every possible outcome — the sum of the surprisal of every outcome multiplied by the probability it happens

• Excellent about the drawbacks of perplexity:
• First, as we saw in the calculation section, a model’s worst-case perplexity is fixed by the language’s vocabulary size. This means you can greatly lower your model’s perplexity just by, for example, switching from a word-level model (which might easily have a vocabulary size of 50,000+ words) to a character-level model (with a vocabulary size of around 26), regardless of whether the character-level model is really more accurate.

• Two more
• https://arxiv.org/pdf/2110.12609.pdf about perplexity and news cycle 6- TODO
• The problem is that news publications cycle through viral buzzwords quickly — just think about how often the Harlem Shake was mentioned 2013 compared to now.

• https://arxiv.org/pdf/2110.12609.pdf - about one million DS news benchmark

## Benchmarks

### SuperGLUE

• https://super.gluebenchmark.com/
• Much more detailed paper than the glue one!
• More complex tasks since models better than people at easy ones
• Goldmine of sources
• At the end they list the excluded tasks + instructions from the tasks for humans!

## Finance

• FinBERT / https://github.com/ProsusAI/finBERT
• has other eng lang dataset
• Eval on sentiment analysis, accuracy regression
• Redundant content
• NFinbert knows numbers, there are a lot of them in finance
• “Context, language modeling and multimodal data on finance”
• Models trained on mix better than in fin data alone
• Really nice and involved and financial and I can’t go through it now
• Almost exclusively sentiment analysis
• https://link.springer.com/article/10.1007/s41060-021-00285-x NER on German financial text for anonymisation
• BERT

## Multilingual

Nel mezzo del deserto posso dire tutto quello che voglio.