In the middle of the desert you can say anything you want

19 Nov 2022

LM paper notes

For the paper I’m writing, I’ll actually try to do a real garden thing. With leaves etc that get updated with new info, not chronologically like my current DTB notes.


Perplexity and intrinsic eval

  • Resources:
  • The above cites that’s longer and so much better!
  • Full link:
    • ![[Screenshot_20221119-233022.png]]
    • P 37 about test set needing to have enough statistical power to measure improvements
    • Sampling
    • Chapter 3 about Shakespeare vs WSJ and genre
    • 42: Smoothing
      • Unknown words so we don’t multiply 0 probs
      • 7 / 130 really nice basics of ml
    • Another take on the same, but love it
    • Links the Roberta paper about the connection between perplexity and downstream it!
    • [[Screenshot_20221120-000131_Fennec.png]]
    • ![[Screenshot_20221119-235918_Fennec.png]]
    • If surprisal lets us quantify how unlikely a single outcome of a possible event is, entropy does the same thing for the event as a whole. It’s the expected value of the surprisal across every possible outcome — the sum of the surprisal of every outcome multiplied by the probability it happens

  • Excellent about the drawbacks of perplexity:
    • First, as we saw in the calculation section, a model’s worst-case perplexity is fixed by the language’s vocabulary size. This means you can greatly lower your model’s perplexity just by, for example, switching from a word-level model (which might easily have a vocabulary size of 50,000+ words) to a character-level model (with a vocabulary size of around 26), regardless of whether the character-level model is really more accurate.

    • Two more
    • about perplexity and news cycle 6- TODO
    • The problem is that news publications cycle through viral buzzwords quickly — just think about how often the Harlem Shake was mentioned 2013 compared to now.

  • - about one million DS news benchmark


Interesting intrinsic eval




  • Much more detailed paper than the glue one!
  • More complex tasks since models better than people at easy ones
  • Goldmine of sources
  • At the end they list the excluded tasks + instructions from the tasks for humans!



  • FinBERT /
    • has other eng lang dataset
    • Discussion about cased etc
    • Eval on sentiment analysis, accuracy regression
    • Redundant content
  • NFinbert knows numbers, there are a lot of them in finance
  • “Context, language modeling and multimodal data on finance”
    • Models trained on mix better than in fin data alone
    • Really nice and involved and financial and I can’t go through it now
    • Almost exclusively sentiment analysis
  • NER on German financial text for anonymisation
    • BERT



Nel mezzo del deserto posso dire tutto quello che voglio.