In the middle of the desert you can say anything you want
python - Split / Explode a column of dictionaries into separate columns with pandas - Stack Overflow taught be about pandas.json_normalize — pandas 2.1.1 documentation:
In: json-like (dict, list, ..) Out: pandas dataframe!
eval-UA-tion
/ eval_UA_tion
-
if they end up in the name
eval-UA-tion
since it works both with colors and as plain monotype text!Some drafts I did in inkscape:
And just for fun:
ChatGPT generated this:
It’s internal prompt for the picture, based on inspect element, was
alt="Logo design for 'eval-UA-tion', a benchmark for Ukrainian language models. Incorporate the word 'eval-UA-tion' in a stylish font, with a sunflower replacing the letter 'o'. Add elements that give a Ukrainian touch, such as traditional Ukrainian patterns or colors (blue and yellow). The design should be modern, clear, and professional, suitable for a technical and academic setting."
# 2 after comma
pd.set_option("display.precision", 2)
# Suppress scientific notation
pd.options.display.float_format = "{:.0f}".format
# for more natural 100,233.23-like output
pd.options.display.float_format = "{:,.3f}".format
Setting as a context1:
with pd.option_context('display.float_format', lambda x: f'{x:,.3f}'):
display(df.describe())
Also: I can format a float column (’temporarily’) not just how I always did, but also in a way simpler way2:
# before
ds["percent"].apply(lambda x: f"{x:.2%}")
# after
ds["percent"].apply("{:.2%}".format)
I forgot you can do "string".format(variable)
!
Also TIL display()
for jupyter-notebooks when it’s not the return value (e.g. if you’re exiting a context, df.describe()
alone there would not have shown the description)
One way to do it, if it’s all for all:
df.groupby("collection")[
["num_pages", "num_chars", "num_tokens", "num_sentences"]
].agg(
[
# "count",
"sum",
"mean",
# "std",
]
)
An even better way:
# ...
].agg(
num_documents=("num_pages", "count"),
num_pages=("num_pages", "sum"),
mean_pages=("num_pages", "mean"),
mean_tokens=("num_tokens", "mean"),
)
They are literally named tuples! Yay for Named Aggregation1!
Draft.
Context: 230529-2208 Seaborn matplotlib labeling data points
Given: need to make the limits larger to fit text, the last lines here:
data = df_pages.reset_index().sort_values('num_pages')
ax = sns.barplot(data,y="collection",x="num_pages")
# label points
for i in ax.axes.containers:
ax.bar_label(
i,
)
# make the labels fit the limits
xlim = ax.axes.get_xlim()[1]
new_xlim = xlim + 14600
ax.axes.set_xlim(0, new_xlim)
Question: by how much?
Answer:
for i in ax.axes.containers:
an = ax.bar_label(
i,
)
# `an` is a list of all Annotations
an[0].get_window_extent()
>>> Bbox(88.66956472198585, 388.99999999999994], [123.66956472198585, 402.99999999999994)
def get_text_size(anno): # Annotation
""" TODO: get array of annos, find the leftmost one etc."""
bbox = anno.get_window_extent()
ext = bbox.bounds
# > (91.43835300441604, 336.19999999999993, 35.0, 14.0)
x=ext[2]
y=ext[3]
return x,y
"""
ano = an[1]
bbox = ano.get_window_extent()
bbox.bounds
> (91.43835300441604, 336.19999999999993, 35.0, 14.0)
"""
get_text_size(an[6])
Gitlab introduced tasks, and they get shown by default in the issue list. Type != task
in the search leaves only the issues.
Can one save search templates?..
Is this needed or I can just use one of the existing ones? I’ll use one of the existing ones!
Then this is about notes about choosing one and adapting my own tasks for it.
First of all, I’d like the generator things to be runnable through Docker, especially the pravda crawler!
Related:
General:
Other / useful libraries:
exact match: true/false
multiple choice
lmentry/lmentry/predict.py at main · aviaefrat/lmentry contains the predicting code used to evaluate it using different kinds of models - I’ll need this.
SWAG seems the closest out of the modern models to UA-CBT — one-word completions etc. I should look into what exactly they do
NarrativeQA!
Literature Review For Academic Outsiders: What, How, and Why — LessWrong
‘Literature review’ the process is a way to become familiar with what work has already been done in a particular field or subject by searching for and studying previous work
Every time I do research I perform a simple thought experiment: assuming somewhere in the world exists evidence that would prove or disprove my hypothesis, where is it?
Citations are a hierarchy of ideas
My old note about tenses in a bachelor thesis: Day 155 - serhii.net linking to the excellent Effective Writing | Learn Science at Scitable
Leipzig Glossing rules seems to be the key for me:
Markdown and python and stuff
Markdown
<span style="font-variant:small-caps;">Hello World</span>
1Python
The online version1 has cool tests at the end!
Generally: a lot of it is about languages/power, indigenous languages etc. Might be interesting for me wrt. UA/RU and colonialism
https://peps.python.org/pep-0673/
from typing import Self
class Shape:
def set_scale(self, scale: float) -> Self:
self.scale = scale
return self
Related: 220726-1638 Python typing classmethods return type
I remember writing about the typevar approach but cannot find it…