Diensttagebuch

Day 1932 (16 Apr 2024)

CLI webservice for easy sharing of files via curl

kubectl cp failed with errors, so.

curl bashupload.com -T rd3.zip



(rapids) root@lm-eval-sh:/data/output# curl -F "file=@more.zip" https://file.io

{"success":true,"status":200,"id":"xxx","key":"xxx","path":"/","nodeType":"file","name":"more.zip","title":null,"description":null,"size":46277219,"link":"https://file.io/xxx","private":false,"expires":"xxx","downloads":0,"maxDownloads":1,"autoDelete":true,"planId":0,"screeningStatus":"pending","mimeType":"application/octet-stream","created":"2024-04-16T15:19:10.227Z","modified":"2024-04-16T15:19:10.227Z"}

Generally, free curl file sharing online - Google Suche returns many services with potential.

Day 1931 (15 Apr 2024)

Seaborn barplot ordering gotcha

seaborn.barplot — seaborn 0.13.2 documentation:

passing order=[list,of,cats,in,order] decides the ordering.

Otherwise “it will be inferred” except that it’s not always trivial to understand how exactly (or I’m too sleep-deprived).

And if I’m drawing horizontal lines on top of the bars in the barplot based on indexes then the order may be sligthly different.

Latex automated title case in titles

With the help of ChatGPT

\documentclass{article}
\usepackage{titlecaps}
\usepackage{etoolbox}

% Specify words to remain in lowercase unless they are the first word
\Addlcwords{the and but or nor for a an at by to in on with of}

\let\oldchapter\chapter
\renewcommand{\chapter}[1]{\oldchapter{\titlecap{#1}}}

\let\oldsection\section
\renewcommand{\section}[1]{\oldsection{\titlecap{#1}}}

\let\oldsubsection\subsection
\renewcommand{\subsection}[1]{\oldsubsection{\titlecap{#1}}}

\begin{document}

\section{an example of a section with and without uppercasing specific words}
This is some text.

\subsection{exploring the integration of tools in the workplace}
More text here.

\end{document}

Day 1928 (12 Apr 2024)

In LaTex you can put multiple labels to the same object

The same thing can have multiple names and that’s alright!

\label{old-subsection-name-maybe-linked-to-elsewhere}
\label{sec:eval-task-2}

Day 1927 (11 Apr 2024)

Latex margin notes

Margin notes - Overleaf, Online-LaTeX-Editor:

\marginpar{text} is the vanilla option, but this works in all cases ever:

\usepackage{marginnote}
\marginnote{text}

EXCEPT I couldn’t find a way to add footnote markers to have numbered margin notes separate from the real footnotes.

But this solves everything, quoting directly¹:

\newcounter{mgncount}
\renewcommand\themgncount{\arabic{mgncount} }
\newcommand\marginfootnote[1]{\refstepcounter{mgncount}\marginpar{{$^\themgncount$}#1}\footnotemark}

\begin{document}
Can we put a footnote with number in the margin and a number in the text?\marginfootnote{There's a number here!}

Another test\marginfootnote{Working!}

One more try\marginfootnote{Successful!}
\end{document}

EDIT: actually it doesn’t and uses the number from footnotes in the text itself. :(

Ah, the sidenotes package exists: https://ctan.math.utah.edu/ctan/tex-archive/macros/latex/contrib/sidenotes/sidenotes.pdf

But uses only 1…3-type numbers.

Yes this is it! CTAN: Paket sidenotesplus

%\usepackage{sidenotes}
\usepackage[mark=Alph]{sidenotesplus}
...

\sidenote{does basically what footnote does}

It has a lot of options and can do a lot of things, yes, this is it, it’s perfect. The example page has everything: https://ctan.math.illinois.edu/macros/latex/contrib/sidenotesplus/tests-sidenoteplus.pdf

Latex quotations

symbols - What is the best way to use quotation mark glyphs? - TeX - LaTeX Stack Exchange:

``this'' / `this' is the proper way
"this"/'this' produces two closing quotes and 'is annoying to readers'

There’s also CTAN: Package csquotes that ‘is fantastic’, including smartly doing nested quotes, correct quotes for diff languages, and ‘generally always doing what you want it it’:

\usepackage[autostyle]{csquotes}

...
\enquote{My quoted text}

Another answer suggests

\newcommand{\q}[1]{``#1''}

...

\q{whatever}

I’ve been using more custom latex commands lately and this goes in that direction.

I guess creating a \q that does autoquotes w/ csquotes is the way to go?

Day 1924 (08 Apr 2024)

Masterarbeit final checklist

Punctuation
- all citations to word~/cite{xxx}.
- all footnotes to sentence.\footnote{}¹
  - either full sentence or lowercase part
  - but within parentheses!
- for both, it’s sent~\cite{}.\footnote{}
- all numbers to 132,32.99
- Consistent quoting (using the correct latex quotes or \enquote{} with italics for longer sentences.)
- all refs to autorefs
  - Autoref fails with appendix subsections, do it manually.
- tightlists everywhere
- Overleaf ‘stop on first error’ to fix the errors
- ~~Title Case in all Titles~~
Bits
- CBT-UA -> UA-CBT
- LMentry-static-UA shorten to LMES once and keep using LMES.
- Eval-UA-tion should be captialized
- Thesis always capitalized
- gpt2/GPT2 -> GPT-2.
- check for stray ‘we’s in the paper
  - “our”/“we” “paper”
- look for sticking out over-the-line bits
- Python is capitalized
- all Grammarly suggestions
Not bits
- go through all latex comments
- go through all latex warnings
- go through all todos in home.md + taskwarrior

====== Open research questions:

Research
- look into whether translated datasets are worse at stuff
- monolingual VS multilingual models incl Ukrainian performance
- Whether prompt language makes a differenec on Ukrainian task
Datasets:

SH, [10 Apr 2024 14:58:39] LMES — дослідити robustness моделей, і наприклад глянути яка залежність accuracy людей і ШІ в залежності від мммм різниці в довжині слів чи номеру слова (“яке стотринадцяте слово в реченні …”) CBT-UA — нормально evaluate, а ще для людей і машин — глянути scores якщо давати тільки challenge segment. Я це тестив з нейромережами (не попало в paper), але там дуже часто були кращі результати з фрагментом ніж з усією казкою

SH, [10 Apr 2024 14:59:57] Зробити датасет по biases і фемінітивам, у мене написаний код для генерації нульової версії, там по суті речення типу “моя жінка займається програмуванням компʼютерних систем, тобто за професією вона — ….”

SH, [10 Apr 2024 15:00:20] Мрія всього життя таки зробити Russian-Ukrainian interference dataset на предмет русизмів та російських помилок

SH, [10 Apr 2024 15:02:57] UA-CBT — взяти казки з project Gutenberg, взяти іноземні казки перекладені українською, і порівняти scores моделей на тасках по казкам з цих різних джерел. Можна забити на фільтрацію, чисто зробити human baseline на частині згенерованого датасету. Так можна зробити нереально великий датасет і знати що там максимум умовнио 80% бо 20% тасків сміття

Also:

CATSMC and friends — much larger datasets can be generated from the given data, a lot of combinations are possible.

Should Footnote Markers Go After the Punctuation? | Proofed’s Writing Tips ↩︎

Day 1913 (27 Mar 2024)

More latex tricks for spacing and references

\autoref is like \ref but it adds the word, not just the number. 3.2 -> Figure 3.2 : cross referencing - What’s the difference between \ref and \autoref? - TeX - LaTeX Stack Exchange

j

Day 1902 (16 Mar 2024)

Latex trivial TODO command

Wrapping stuff in this command makes it stand out; also greppable by TODO which removes the need to remember commands

\newcommand{\TODO}[1]{{\color{magenta}#1}}

Day 1900 (14 Mar 2024)

Locally debugging Huggingface Dataset scripts

Previously:

240218-2049 Huggingface dataset build configs

Instead of committing etc. every time, one can clone the dataset path locally through git and then point load_dataset() to that local folder with the dataset script file!