In the middle of the desert you can say anything you want
Vaclav Kosar’s Software & Machine Learning Blog, sample: OpenAI’s DALL-E 2 and DALL-E 1 Explained. Found it originally through Bits-Per-Byte and Bits-Per-Character.
Software engineering, ML, Thinkpad P52 Disassembly - Categories. Often with nice graphics.
Close in spirit, randomness and citing-your-sources to this/my DTB but way more in depth. But the most brilliant part is the big “Ask or report a mistake” button.
I should do in-depth stuff more often.
…And resurrect my link wiki, and go back to the pre-war tradition of reading RSS feeds :(
The GPT31 paper mentioned that it’s 10x bigger than any previous non-sparse LM.
So - sparse LMs () are LMs with A LOT of params where only a subset is used for each incoming example.2
To pass a custom dockerfile, add -f custom_filename
:
docker build . -f custom.Dockerfile -t tag:latest ....
Dockerfile naming conventions exist: Dockerfile Naming Convention and Organization – mohitgoyal.co, quoting options from there:
myapp.azure.dev.Dockerfile
myapp.gcp.dev.Dockerfile
myapp.aws.dev.Dockerfile
-
Dockerfile.myapp.azure.dev
Dockerfile.myapp.i386.azure.dev
Dockerfile.myapp.amd.azure.Dev
From that article I learned that Dockerfiles don’t have to be inside build context anymore! Link: Allow Dockerfile from outside build-context by thaJeztah · Pull Request #886 · docker/cli · GitHub
TL;DR from there
$ docker build --no-cache -f $PWD/dockerfiles/Dockerfile $PWD/context
redis-cli set test 1
etc. immediately work - did it start a server in the background?
systemctl disable redis-cli
etc!redis-cli
starts in interactive mode!
fish
shell!> r
127.0.0.1:6379> multi
OK
127.0.0.1:6379> get google
QUEUED
127.0.0.1:6379> incr google_accesses
QUEUED
127.0.0.1:6379> exec
1) "http://google.com"
2) (integer) 1
127.0.0.1:6379>
help <Tab>
autocompleteshelp @hash
# Create a hashset that has field f1 w/ value v1 etc.:
127.0.0.1:6379> hmset myhash f1 v1 f2 v2
OK
127.0.0.1:6379> hgetall myhash
1) "f1"
2) "v1"
3) "f2"
4) "v2"
127.0.0.1:6379> hget myhash f1
"v1"
Operations on hashes:
# We create a hset s_google that has an url and accesses counter
127.0.0.1:6379> hset s_google url url_google accesses 0
(integer) 2
127.0.0.1:6379> hmget s_google url accesses
1) "url_google"
2) "0"
# Increase accesses by 1
127.0.0.1:6379> HINCRBY s_google accesses 1
(integer) 1
127.0.0.1:6379> hmget s_google url accesses
1) "url_google"
2) "1"
DEL key
FLUSHALL
to delete everythingcat file.txt | redis-cli --pipe
127.0.0.1:6379> zadd myss 1 'one' 2 'two'
(integer) 2
127.0.0.1:6379> ZSCORE myss 'one'
"1"
127.0.0.1:6379> ZSCORE myss 'one'
127.0.0.1:6379> get B
"https://www.wikipedia.org"
127.0.0.1:6379> get A
"http://www.openstreetmap.org"
127.0.0.1:6379> ZCARD accesses
(integer) 2
127.0.0.1:6379> ZCARD accesses
(integer) 2
127.0.0.1:6379> ZRANGE accesses 0 40
1) "A"
2) "B"
127.0.0.1:6379> ZRANGE accesses 0 40 withscores
1) "A"
2) "1"
3) "B"
4) "1"
127.0.0.1:6379>
You can comment on commits but they’re limited, comments on a merge requests give much more functionality incl. closing threads etc.!
Google scholar, in the default search interface, showed only papers written after 2016 - can’t reproduce anymore, but important to keep in mind when looking for 2011 papers.
For the paper I’m writing, I’ll actually try to do a real garden thing. With leaves etc that get updated with new info, not chronologically like my current DTB notes.
https://thegradient.pub/understanding-evaluation-metrics-for-language-models/
Closer to the end has a discussion about LM metrics and performance on downstream task:
Perplexity is the multiplicative inverse of the probability assigned to the test set by the language model, normalized by the number of words in the test set.
Perplexity limitations and ways to go around it / smoothing:
As a result, the bigram probability values of those unseen bigrams would be equal to zero making the overall probability of the sentence equal to zero and in turn perplexity to infinity. This is a limitation which can be solved using smoothing techniques.
If surprisal lets us quantify how unlikely a single outcome of a possible event is, entropy does the same thing for the event as a whole. It’s the expected value of the surprisal across every possible outcome — the sum of the surprisal of every outcome multiplied by the probability it happens
First, as we saw in the calculation section, a model’s worst-case perplexity is fixed by the language’s vocabulary size. This means you can greatly lower your model’s perplexity just by, for example, switching from a word-level model (which might easily have a vocabulary size of 50,000+ words) to a character-level model (with a vocabulary size of around 26), regardless of whether the character-level model is really more accurate.
The problem is that news publications cycle through viral buzzwords quickly — just think about how often the Harlem Shake was mentioned 2013 compared to now.
https://ruder.io/tag/natural-language-processing/index.html multilingual not-english NLP seems to be an interest of his, might be interesting in the “why” context
Best post ever: The #BenderRule: On Naming the Languages We Study and Why It Matters
Bits:
https://ml-cheatsheet.readthedocs.io/en/latest/calculus.html#chain-rule
Basics and with math, but not too much: https://cs231n.github.io/neural-networks-1/
Activation functions:
God I need to read documentation, all of it, including not-important sounding first sentences.
Previously: 220810-1201 Huggingface utils ExplicitEnum python bits showing me how to do str enuenums
.. you can set using both.
enum — Support for enumerations — Python 3.11.0 documentation:
class MyEnum(str,Enum):
IG2 = "val1"
IG3 = "val2"
MyEnum("val1") == MyEnum["IG3"]
Pipelines: in the predictions, p['word']
is not the exact string from the input text! It’s the recovered one from the subtokens - might have extra spaces etc. For the exact string the offsets should be used.
EDIT - I did another good deed today: Fix error/typo in docstring of TokenClassificationPipeline by pchr8 · Pull Request #19798 · huggingface/transformers