serhii.net

In the middle of the desert you can say anything you want

25 Dec 2023

Glosses markdown magic

Interlinear Glosses

… are a way to annotate grammar bits of a language together with translation: Interlinear gloss - Wikipedia

The Leipzig Glossing Rules are a set of rules to standardize interlinear glosses. They are focused less on understandability and more on consistency.

Using interlinear glosses

I’m writing my thesis in Obsidian/Markdown, synced to Hugo, later I’ll use sth like pandoc to make it into a PDF, with or without a latex intermediate step.

EDIT: newer technical part lives now here 231226-1702 Ideas for annotating glosses in my Masterarbeit

Pandoc-ling

cysouw/pandoc-ling: Pandoc Lua filter for linguistic examples

> pandoc --lua-filter=pandoc_ling.lua 231225-2240\ Glosses\ markdown\ magic.pandoc.md -o test.pdf
Error running filter pandoc_ling.lua:
pandoc_ling.lua:21: attempt to call a nil value (method 'must_be_at_least')
stack traceback:
	pandoc_ling.lua:21: in main chunk
:::ex
| Dutch (Germanic)
| Deze zin is in het nederlands.
| DEM sentence AUX in DET dutch.
| This sentence is dutch.
:::

:::ex | Dutch (Germanic) | Deze zin is in het nederlands. | DEM sentence AUX in DET dutch. | This sentence is dutch. :::

.. it was the pandoc version. Updated. No error, but no luck either.

Digging into the examples I think this is happening:

Code is code. Using that formatting without code makes it be interpreted as a line, and that doesn’t survive the obsidian’s pandoc extensions’ conversion to pandoc markdown.

The original docu generation had this script:

function addRealCopy (code)
  return { code, pandoc.RawBlock("markdown", code.text) }
end

return {
  { CodeBlock = addRealCopy }
}

It changes code blocks into code blocks and the content of the code block. Then the ::: block is put after the code but like normal markdown text, and it gets correctly changed by the pandoc-ling filter.

> pandoc 231225-2240\ Glosses\ markdown\ magic.pandoc.md -t markdown -L processVerbatim.lua -s --wrap=preserve | pandoc -L pandoc_ling.lua -o my.html

I can drop exporting extensions and just manually convert bits?..

This works:

> pandoc "garden/it/231225-2240 Glosses markdown magic.md" -t markdown -L pandoc_ling.lua  -s
> pandoc "garden/it/231225-2240 Glosses markdown magic.md" -L pandoc_ling.lua -o my.html

and is OK if my masterarbeit file will have no complexities at all.

(Can i add this as parameter to the existing bits?)

YES!

-L /home/sh/t/pandoc/pandoc_ling.lua added as option to the pandoc plugins, together with “from markdown” (not HTML) option, works for getting this parsed right!

2023-12-25-235625_623x178_scrot.png

(Except that it’s ugly in the HTML view but I can live with that)

And Hugo. Exporting to Hugo through obyde is ugly as well.

I colud write sth like this: A Pandoc Lua filter to convert Callout Blocks to Hugo admonitions (shortcode).

We’lll o

Mijyuoon/obsidian-ling-gloss: An Obsidian plugin for interlinear glosses used in linguistics texts.

Pandoc export from HTML visualizes them quite well.

\gla Péter-nek van egy macská-ja
\glb pe:tɛrnɛk vɒn ɛɟ mɒt͡ʃka:jɒ
\glc Peter-DAT exist INDEF cat-POSS.3SG
\ft Peter has a cat.
\gla Péter-nek van egy macská-ja
\glb pe:tɛrnɛk vɒn ɛɟ mɒt͡ʃka:jɒ
\glc Peter-DAT exist INDEF cat-POSS.3SG
\ft Peter has a cat.
\set glastyle cjk
\ex 牆上掛著一幅畫 / 墙上挂着一幅画
\gl 牆 [墙] [qiáng] [wall] [^[TOP]
	上 [上] [shàng] [on] [^]]
	掛 [挂] [guà] [hang] [V]
	著 [着] [zhe] [CONT] [ASP]
	一 [一] [yì] [one] [^[S]
	幅 [幅] [fú] [picture.CL] []
	畫 [画] [huà] [picture] [^]]
\ft A picture is hanging on the wall.

Maybe a solution

  • Do all my glosses using pandoc-ling format
  • put them into code blocks belonging to a special class
  • write a very similar filter to processVerbatim but that operates only on code blocks of this class
  • when outputting to Hugo they’ll stay as preformatted code
  • when exporting to pandoc run them through it first, then pandoc-ling, leading to pretty glosses in the final exported option
function addRealCopy (code)
  -- return { code, pandoc.RawBlock("markdown", code.text) }
	if code.classes[1] == "mygloss" then
		return { pandoc.RawBlock("markdown", code.text) }
	else
		return { code }
	end
end

return {
	{ CodeBlock = addRealCopy }
}

Should parse:

:::ex
| Dutch (Germanic)
| Deze zin is in het nederlands.
| DEM sentence AUX in DET dutch.
| This sentence is dutch.
:::

Should stay as code:

:::ex
| Dutch (Germanic)
| Deze zin is in het nederlands.
| DEM sentence AUX in DET dutch.
| This sentence is dutch.
:::
pandoc "/... arden/it/231225-2240 Glosses markdown magic.md" -L processVerbatim.lua -t markdown -s | pandoc -L pandoc_ling.lua -o my.html

It works!

But not this:

> pandoc "/home231225-2240 Glosses markdown magic.md" -L processVerbatim.lua -L pandoc_ling.lua -o my.html

Likely because both require markdown and the intermediate step seems to break.

Maybe I’m overcomplicating it and I can just use the UD I can use superscripts!

Just use superscripts

The inflectional paradigm of Ukrainian admits free word order: in English the Subject-Verb-Object word order in “the manman-NOM.SG saw the dogdog-NOM.SG” (vs “the dog man-NOM.SG saw the manman-NOM.SG “) determines who saw whom, while in Ukrainian (“чоловікman-NOM.SG побачивsaw-PST собакУdog-GEN.PL”) the last letter of the object (dog) makes it genetive, and therefore the object.

Nel mezzo del deserto posso dire tutto quello che voglio.