Diensttagebuch

Day 1931 (15 Apr 2024)

Seaborn barplot ordering gotcha

seaborn.barplot — seaborn 0.13.2 documentation:

passing order=[list,of,cats,in,order] decides the ordering.

Otherwise “it will be inferred” except that it’s not always trivial to understand how exactly (or I’m too sleep-deprived).

And if I’m drawing horizontal lines on top of the bars in the barplot based on indexes then the order may be sligthly different.

Latex automated title case in titles

With the help of ChatGPT

\documentclass{article}
\usepackage{titlecaps}
\usepackage{etoolbox}

% Specify words to remain in lowercase unless they are the first word
\Addlcwords{the and but or nor for a an at by to in on with of}

\let\oldchapter\chapter
\renewcommand{\chapter}[1]{\oldchapter{\titlecap{#1}}}

\let\oldsection\section
\renewcommand{\section}[1]{\oldsection{\titlecap{#1}}}

\let\oldsubsection\subsection
\renewcommand{\subsection}[1]{\oldsubsection{\titlecap{#1}}}

\begin{document}

\section{an example of a section with and without uppercasing specific words}
This is some text.

\subsection{exploring the integration of tools in the workplace}
More text here.

\end{document}

Day 1928 (12 Apr 2024)

In LaTex you can put multiple labels to the same object
- zc
- zc/it
- latex
The same thing can have multiple names and that’s alright!
```
\label{old-subsection-name-maybe-linked-to-elsewhere}
\label{sec:eval-task-2}
```
Day 1927 (11 Apr 2024)

Latex margin notes
- zc
- zc/it
- latex
Margin notes - Overleaf, Online-LaTeX-Editor:

\marginpar{text} is the vanilla option, but this works in all cases ever:
```
\usepackage{marginnote}
\marginnote{text}
```
EXCEPT I couldn’t find a way to add footnote markers to have numbered margin notes separate from the real footnotes.

But this solves everything, quoting directly¹:
```
\newcounter{mgncount}
\renewcommand\themgncount{\arabic{mgncount} }
\newcommand\marginfootnote[1]{\refstepcounter{mgncount}\marginpar{{$^\themgncount$}#1}\footnotemark}

\begin{document}
Can we put a footnote with number in the margin and a number in the text?\marginfootnote{There's a number here!}

Another test\marginfootnote{Working!}

One more try\marginfootnote{Successful!}
\end{document}
```
EDIT: actually it doesn’t and uses the number from footnotes in the text itself. :(

Ah, the sidenotes package exists: https://ctan.math.utah.edu/ctan/tex-archive/macros/latex/contrib/sidenotes/sidenotes.pdf

But uses only 1…3-type numbers.

Yes this is it! CTAN: Paket sidenotesplus
```
%\usepackage{sidenotes}
\usepackage[mark=Alph]{sidenotesplus}
...

\sidenote{does basically what footnote does}
```
It has a lot of options and can do a lot of things, yes, this is it, it’s perfect. The example page has everything: https://ctan.math.illinois.edu/macros/latex/contrib/sidenotesplus/tests-sidenoteplus.pdf

See also CTAN: Marginal topic.

marginpar - Footnote and number in margin - TeX - LaTeX Stack Exchange
1. ↩︎
Latex quotations
- zc
- zc/it
- latex
- research
symbols - What is the best way to use quotation mark glyphs? - TeX - LaTeX Stack Exchange:
```
``this'' / `this' is the proper way
"this"/'this' produces two closing quotes and 'is annoying to readers'
```
There’s also CTAN: Package csquotes that ‘is fantastic’, including smartly doing nested quotes, correct quotes for diff languages, and ‘generally always doing what you want it it’:
```
\usepackage[autostyle]{csquotes}

...
\enquote{My quoted text}
```
Another answer suggests
```
\newcommand{\q}[1]{``#1''}

...

\q{whatever}
```
I’ve been using more custom latex commands lately and this goes in that direction.

I guess creating a \q that does autoquotes w/ csquotes is the way to go?
Day 1924 (08 Apr 2024)

Masterarbeit final checklist
- zc
- zc/it
- master
- masterarbeit
- Punctuation
  - all citations to word~/cite{xxx}.
  - all footnotes to sentence.\footnote{}¹
    
    either full sentence or lowercase part
    
    but within parentheses!
  - for both, it’s sent~\cite{}.\footnote{}
  - all numbers to 132,32.99
  - Consistent quoting (using the correct latex quotes or \enquote{} with italics for longer sentences.)
  - all refs to autorefs
    
    Autoref fails with appendix subsections, do it manually.
  - tightlists everywhere
  - Overleaf ‘stop on first error’ to fix the errors
  - ~~Title Case in all Titles~~
- Bits
  - CBT-UA -> UA-CBT
  - LMentry-static-UA shorten to LMES once and keep using LMES.
  - Eval-UA-tion should be captialized
  - Thesis always capitalized
  - gpt2/GPT2 -> GPT-2.
  - check for stray ‘we’s in the paper
    
    “our”/“we” “paper”
  - look for sticking out over-the-line bits
  - Python is capitalized
  - all Grammarly suggestions
- Not bits
  - go through all latex comments
  - go through all latex warnings
  - go through all todos in home.md + taskwarrior
====== Open research questions:
- Research
  - look into whether translated datasets are worse at stuff
  - monolingual VS multilingual models incl Ukrainian performance
  - Whether prompt language makes a differenec on Ukrainian task
- Datasets:
SH, [10 Apr 2024 14:58:39] LMES — дослідити robustness моделей, і наприклад глянути яка залежність accuracy людей і ШІ в залежності від мммм різниці в довжині слів чи номеру слова (“яке стотринадцяте слово в реченні …”) CBT-UA — нормально evaluate, а ще для людей і машин — глянути scores якщо давати тільки challenge segment. Я це тестив з нейромережами (не попало в paper), але там дуже часто були кращі результати з фрагментом ніж з усією казкою

SH, [10 Apr 2024 14:59:57] Зробити датасет по biases і фемінітивам, у мене написаний код для генерації нульової версії, там по суті речення типу “моя жінка займається програмуванням компʼютерних систем, тобто за професією вона — ….”

SH, [10 Apr 2024 15:00:20] Мрія всього життя таки зробити Russian-Ukrainian interference dataset на предмет русизмів та російських помилок

SH, [10 Apr 2024 15:02:57] UA-CBT — взяти казки з project Gutenberg, взяти іноземні казки перекладені українською, і порівняти scores моделей на тасках по казкам з цих різних джерел. Можна забити на фільтрацію, чисто зробити human baseline на частині згенерованого датасету. Так можна зробити нереально великий датасет і знати що там максимум умовнио 80% бо 20% тасків сміття

Also:
- CATSMC and friends — much larger datasets can be generated from the given data, a lot of combinations are possible.
1. Should Footnote Markers Go After the Punctuation? | Proofed’s Writing Tips ↩︎
Day 1913 (27 Mar 2024)

More latex tricks for spacing and references
- zc
- zc/it
- latex
\autoref is like \ref but it adds the word, not just the number. 3.2 -> Figure 3.2 : cross referencing - What’s the difference between \ref and \autoref? - TeX - LaTeX Stack Exchange

j
Day 1902 (16 Mar 2024)

Latex trivial TODO command
- zc
- zc/it
- latex
Wrapping stuff in this command makes it stand out; also greppable by TODO which removes the need to remember commands
```
\newcommand{\TODO}[1]{{\color{magenta}#1}}
```
Day 1900 (14 Mar 2024)

Locally debugging Huggingface Dataset scripts
- zc
- zc/it
- py/datasets
- hf
- ml
Previously:
- 240218-2049 Huggingface dataset build configs
Is there a suggested way of debugging dataset generators? - 🤗Datasets - Hugging Face Forums

Instead of committing etc. every time, one can clone the dataset path locally through git and then point load_dataset() to that local folder with the dataset script file!
Day 1899 (13 Mar 2024)

Huggingface Hub prefers zip archives because they support streaming
- zc
- zc/it
- python
- hf
- ml
Random nugget from Document to compress data files before uploading · Issue #5687 · huggingface/datasets:
- gz, to compress individual files
- zip, to compress and archive multiple files; zip is preferred rather than tar because it supports streaming out of the box
(Streaming: https://huggingface.co/docs/datasets/v2.4.0/en/stream TL;DR don’t download the entire dataset for very large datasets, add stream=true to the load_dataset() fn)

Day 1931 (15 Apr 2024)

Day 1928 (12 Apr 2024)

Day 1927 (11 Apr 2024)

Day 1924 (08 Apr 2024)

Day 1913 (27 Mar 2024)

Day 1902 (16 Mar 2024)

Day 1900 (14 Mar 2024)

Day 1899 (13 Mar 2024)