serhii.net

In the middle of the desert you can say anything you want

29 Nov 2021

Attempting to parse Obsidian tags with Templater templates

1503 words, ~6 min read

Obsidian

After more than a month of use, I’m still totally completely absolutely in love with Obsidian.

The standard scenario for any new technology I get excited about is something like

  1. Discover something really cool
  2. Read the entire Internet about it
  3. Try to use it to solve all my problems, to see what sticks
  4. After a week or so, either I stop using it completely, or I keep using it for a narrow use-case. I consider the latter a success.

I think this is the first time the opposite happened: I started to play with Obsidian as a way to quickly sync notes between my computer and phone (better than one-person Telegram groups or Nextcloud or Joplin or…), then started using it to write the Diensttagebuch, then realized I can drag-n-drop pictures/PDFs into it and can use it to keep my vaccination certificate and other stuff I need often on my phone, then oh nice it’s also a markdown editor, and we’re not done yet.

I’d usually avoid closed-source solutions, but it has everything I could ask for except that:

  • a really active community, with blog posts being written, repositories, plugins etc being constantly developed by very different people
  • notes being stored markdown, which then can be version-controlled and edited using any of my existing tools
    • this is much better than being able to “export” them to markdown, one less step to do (or forget, or easily disable). Fits seamlessly into the backup strategies being used.
  • Downloadable program that works even without an Internet connection, even if Obsidian HQ gets hit by a meteorite
  • Obsidian themselves having paid options1, which means a chance, clear plan and incentive to survive, and clear ways to support them. Better than whatever I’d be able to do with an abandoned open source project I rely on. (That said, I’d love them to open source at least the client at some point.)

This mix between personal-and-not-personal stuff, and having both my phone and my laptop as first-class citizens, is something I’ve never had before, and something I like, then if I also try to make it into my published blog I’m bound to find edge cases, one of them being tags. That said until now, and including this bit, everything I tried to do worked seamlessly.

This post describes my attempt to set up tags in a way that both Obsidian’s native autocompletion/search AND Hugo’s tagging work.

Two types of tags

This blog is written in markdown and converted into a static website with Hugo. Both the Diensttagebuch and the journal take files from specific subfolders in a specific Obsidian vault, and get converted into Hugo-friendly markdown using Obyde.

There are two kinds of tags as a result of that:

  • Obsidian’s #tags in the body of the text. They are the main kind supported, they get autocompleted, various #tag/subtag options can happen, etc. They get ignored by obyde/Hugo during conversion.
  • Front matter tags. They look like this in yaml/obsidian:
    Obsidian metadata
    They are parsed by Obsidian, as in searching by them works, but adding them manually is painful and has no autocompletion. One needs to set the ‘metadata’ to be shown in settings for them to be displayed.
    They are the only ones understood by Obyde/Hugo, though.

I decided to write something that takes care of this for me easily.

Obsidian templates and Templater

Logical first step was Templater, which is a plugin for Obsidian I already use for my templating needs, such as generating a YAML frontmatter based only on a filename (see Day 1021).

I wanted:

  • obsidian’s #tags to become part of the yaml frontmatter
  • optionally - adding a line at the end of the file, like “Tags: #one #two #three” with the tags found in the frontmatter but not in the body.

I found a template2 doing pretty much this in the Template Showcase. It uses purely Obsidian’s/Templater’s JS/Templating language and is very close to what I want - but not exactly, and somehow I didn’t want to write more Javascript than needed.

My solution is more of a hack, less portable, and uses Python, because that’s what I know how to write best.

My solution

Python script to parse/add the tags to frontmatter

I wrote a (quick-n-dirty) Python script that:

  • gets the tags from the frontmatter of the input .md file
  • gets the tags found by Obsidian from an environment variable
  • finds tags found by obsidian but not in frontmatter
  • rewrites the .md file with all the tags added to the frontmatter.
from frontmatter import Frontmatter
from pathlib import Path
import os
import argparse
import yaml

ENV_VAR_TAGS = 'tags'
TAGS_LINE = "\nTags: "
IGNORE_TAGS = {"zc"}
TAGS_TO_LOWER = True

def get_args() -> argparse.Namespace:
  parser = argparse.ArgumentParser(description='Remove tags')
  parser.add_argument('--input_file', type=Path, help='Input file')
  parser.add_argument('--print', default=False, action="store_true", help='if set, will print to screen instead of rewriting the input file')
  parser.add_argument('--add_new_tagline', default=False, action="store_true", help='if set, will create/edit a "Tags: " line at the end of the file with tags found in front matter but not text body')
  return parser.parse_args()


def str_tags_to_set(tags: str) -> set:
  """
  Converts tags like "#one,#two,#three" into a set
  of string tags without the "#"s:
    {'one', 'two', 'three'}
  """
  def parse_tag(tag: str):
    ret_tag = tag.lower() if TAGS_TO_LOWER else tag
    return ret_tag[1:]

  return_set = set([parse_tag(x) for x in tags.split(",")])
  return return_set

def set_tags_to_str(tags: set, ignore_tags: set = IGNORE_TAGS) -> str:
  """
  The opposite of str_tags_to_set, returns space-separated "#tag1 #tag, ..".
  Ignores tags that contain even part of any ignore_tags.
  """

  final_tags = ''
  for tag in tags:
    for i in ignore_tags:
      if i in tag:
        continue
    if TAGS_TO_LOWER:
      tag: str = tag.lower()
    final_tags+=f"#{tag} "
  return final_tags

def line_is_tag_line(line: str) -> bool:
  return line[0:len(TAGS_LINE)] == TAGS_LINE

def get_tags_in_tagline(tagline: str) -> set:
  words = tagline.split(" ")
  return {w for w in words if w[0]=="#"}

def main() -> None:
  args = get_args()
  input_file = args.input_file

  # input tags
  tags_frontmatter = set()
  tags_all = set()
  tags_obs = set()

  # output tags
  missing_obs_tags = set()
  missing_yaml_tags = set()

  parsed_yaml_fm = Frontmatter.read_file(input_file)
  frontmatter_dict = parsed_yaml_fm['attributes']
  post_body = parsed_yaml_fm['body']

  has_tags_in_fm = 'tags' in frontmatter_dict

  # all tags (yaml + #obsidian)
  env_tags_by_obsidian = os.getenv(ENV_VAR_TAGS)
  tags_all = str_tags_to_set(env_tags_by_obsidian)

  # tags in yaml frontmatter 
  tags_frontmatter = set()
  if has_tags_in_fm:
    tags_frontmatter = set(frontmatter_dict['tags'])

  # "obsidian" tags (basically #tags in the text)
  tags_obs = tags_all.difference(tags_frontmatter)

  # print(f"{input_file}: \n\
  #     all_tags: {tags_all}\n \
  #     obs_tags: {tags_obs}\n \
  #     fm_tags: {tags_frontmatter}\n"
  #     )

  # tags found in frontmatter but not in #obsidian
  missing_obs_tags = tags_frontmatter.difference(tags_obs)
  # #obsidian tags not found in frontmatter
  missing_fm_tags = tags_obs.difference(tags_frontmatter)

  if missing_fm_tags:
    if not has_tags_in_fm:
      frontmatter_dict['tags'] = list()
    frontmatter_dict['tags'].extend(missing_fm_tags)

  if missing_obs_tags and args.add_new_tagline:
    final_tags = set_tags_to_str(missing_obs_tags)

    # If last line is "Tags: "
    last_line = post_body.splitlines()[-1]
    if line_is_tag_line(last_line):
      # tags_in_tl = get_tags_in_tagline(last_line)
      # Remove last "\n" in post body
      post_body = post_body[:-1]
      # Add the missing tags to the last line
      post_body += " " + final_tags + "\n"
    else:
      # If we have no "Tags: " line, we add one
      final_string = TAGS_LINE + final_tags
      post_body+=final_string

  new_fm_as_str = "---\n" + yaml.dump(frontmatter_dict)  + "\n---"
  final_file_content = new_fm_as_str + "\n" + post_body

  # print(final_file_content)
  if args.print:
    print(final_file_content)
  else:
    input_file.write_text(final_file_content)

if __name__ == "__main__":
  main()

Obsidian templates

To use the python file, I added to Templater a “System command user function” 3 add_tags with this code:

python3 _templates/tags/py_add_tags.py --input_file "<% tp.file.path() %>" 

It calls the python file and passes to it the location of the currently edited file.

Then I create the template that calls it:

<% tp.user.add_tags({"tags": tp.file.tags}) %>

tp.file.tags returns all (all) tags found by Obsidian, so #body tags and frontmatter ones.

They get passed to the python script as an environment variable, canonical way to do this as per Templater docu4.

Initially I tried to pass them as parameter, but tp.file.tags passed in the system command user function always returned an empty list.

Usage

I write the text in Obsidian as usual, adding any #body tags I like, then at the end run that template through the hotkey I bound it to (<C-S-a>), done.

Better ways to do this

Problems with this solution:

  • won’t work on Android
  • too many moving parts

Ways to improve it:

  • It’s possible to do this without Python. The template I found2 uses JS to find the editor window, then the text in it, then the YAML node. Though intuitively I like parsing the YAML as YAML more.
  • I wrote it with further automatization in mind, for example running it in deploy.sh so that it parses and edits all the markdown files in one go. This would mean splitting the thing by words, taking care of correctly grouping #tags-with/subtags and not looking inside code blocks etc. Not sure here. I’d still have Obsidian as ultimate source of truth about what it considers a tag and what not.

What now?

This is the first long-ish blog post I’ve written in a while. Feels awesome. Let’s see if I can get this blog started again.


  1. Pricing - Obsidian ↩︎

  2. Gather up tags and format for YAML frontmatter · Discussion #140 · SilentVoid13/Templater ↩︎

  3. System Command User Functions | Templater ↩︎

  4. System Command User Functions | Templater ↩︎

26 Mar 2019

Pchr8board - a mirrored left-hand keyboard layout for Dvorak

1055 words, ~4 min read

Pchr8board

Pchr8board is a keyboard layout for Linux based on Randall Munroe’s Mirrorboard idea and implements it in Linux for the Dvorak keyboard layout.

It looks like this:

If you are not familiar with Dvorak, it’s basically a better keyboard layout than Qwerty for a variety of reasons, and if you invest 2-3 months it can dramatically improve your typing speed 1. The idea is that you can use your left hand to type the letters that usually the right hand types. It’s much easier for the brain if the location is mirrored.

This is an implementation of that idea for Dvorak, along with a number of small modifications. I use something very similar for my keyboard layout, but with a number of small modifications (like umlauts in place of some of the mirrored keys).

Keys without a legend don’t have any changes from the usual Dvorak layout.

The labels are to be read like this:

The layout uses left Alt as a Latch key, that, when pressed with left thumb, switches the letters to their corresponding right-hand letters, as per original Mirrorboard.

Other important changed keys are highlighted.

  • Enter and BackSpace are on the Tilde (“~”) key.
  • The Tab key allows to type forward slashes (“/”), mostly for searching, and diaereses (ümläüts) on the letter immediately following Shift+Latch+Tab (awkward at first, but not more so than the typical compose key approach).

The layout is usable as a typical Dvorak one, and I wanted to add a couple of more keys that I miss.

  • On the right-hand side, I’ve added arrow keys and another backspace (light-orange on the picture).

I sorely needed both of those, since they required too much movement in a traditional layout. What’s also nice is that all the keyboard shortcuts still work, that is for the OS it doesn’t make much difference. Selecting words word-by-word using Ctrl+Shift+right_arrow as Ctrl+Shift+Latch+n still works, for example. In practice such chords are much less complicated and easier to get used to than they seem.

At first, I wanted to add the arrow keys to the left hand, but didn’t find a not-awkward way to do this.

Installation instructions:

  • Copy to your key definitions folder (usually /usr/share/X11/xkb/symbols/)
  • Either just setxkbmap left3 or integrate it in whatever you are using (e.g. setxkbmap -option 'grp:rshift_toggle, compose:rctrl' left3,ru,ua)
  • In case you want to edit it, you may have to save it under a new name. Arch used to cache it somehow, and I needed either a reboot, or a new name. Now I don’t have this problem anymore but someone might have.Or during editing do xkbcomp mirrorboard.xkb $DISPLAY 2>/dev/null as recommended in the original post, maybe removing the last part to see any errors.

The layout is on Github., copied here:

// Pchr8board, formerly known Dvorak MirrorBoard (v3), based on MirrorBoard one-hand keymapping

// Original keymap: https://blog.xkcd.com/2007/08/14/mirrorboard-a-one-handed-keyboard-layout-for-the-lazy/
// Changes and details: https://serhii.net/

default  partial alphanumeric_keys modifier_keys
  xkb_symbols   "dvorak-mirrorboard" {

// Using L-Alt as modifier instead of Caps lock.

// Additionally, it's a Latch key, not a Shift one, so pressing it once activates the group. 
    key <LALT> { type[Group1] = "ONE_LEVEL", symbols[Group1] = [ ISO_Level3_Latch ] };

// Mod+Space is return

// Tilde is Backspace by itself, 
// Mod+Tilde is Return 
    key <SPCE> { [ space, space, Return ] };
    key <TLDE> {    [     BackSpace,    asciitilde,    Return,    asciitilde    ]    };

// Mod+Tab gives a slash, which I use often (searching etc.) 
// Mod+Shift+Tab gives an umlaut on the next character

    key  <TAB> {    [ Tab,    ISO_Left_Tab, slash, dead_diaeresis]    };

    key <AD01> { [  apostrophe,    quotedbl, l, L] };
    key <AD02> { [    comma,    less,   r, R] };
    key <AD03> { [      period,    greater, c, C] };
    key <AD04> { [        p,    P, g, G        ]    };
    key <AD05> { [        y,    Y, f, F        ]    };

    key <AC01> { [        a,    A, s, S         ]    };
    key <AC02> { [        o,    O, n, N        ]    };
    key <AC03> { [        e,    E, t, T        ]    };
    key <AC04> { [        u,    U, h, H        ]    };
    key <AC05> { [        i,    I, d, D        ]    };

    key <AB01> { [   semicolon,    colon,z, Z] };
    key <AB02> { [        q,    Q, v, V        ]    };
    key <AB03> { [        j,    J, w, W        ]    };
    key <AB04> { [        k,    K, m, M        ]    };
    key <AB05> { [        x,    X, b, B        ]    };

    key <AE01> {    [      1,    exclam,        0,    parenleft    ]    };
    key <AE02> {    [      2,    at,        9,    parenright    ]    };
    key <AE03> {    [      3,    numbersign,    8,    asterisk    ]    };
    key <AE04> {    [      4,    dollar,        7,    ampersand    ]    };
    key <AE05> {    [      5,    percent,    6,    asciicircum    ]    };

    // Backspace and arrow keys
    key <AD08> { [        c,    C,    Up,     Up    ]    };
    key <AD09> { [        r,    R,    BackSpace,    BackSpace        ]    };
    key <AC07> { [        h,    H,    Left,    Left        ]    };
    key <AC08> { [        t,    T,    Down,    Down   ]    };
    key <AC09> { [        n,    N,    Right,    Right        ]    };

    key <AD06> { [        f,    F          ]    };
    key <AD07> { [        g,    G        ]    };
    key <AD10> { [        l,    L        ]    };
    key <AD11> { [    slash,    question    ]    };
    key <AD12> { [    equal,    plus        ]    };


    key <AC06> { [        d,    D        ]    };
    key <AC10> { [        s,    S        ]    };
    key <AC11> { [    minus,    underscore    ]    };

    key <AB06> { [        b,    B        ]    };
    key <AB07> { [        m,    M        ]    };
    key <AB08> { [        w,    W        ]    };
    key <AB09> { [        v,    V        ]    };
    key <AB10> { [        z,    Z        ]    };

    key <BKSL> { [  backslash,  bar             ]       };


    key <AE06> {    [      6,    asciicircum    ]    };
    key <AE07> {    [      7,    ampersand    ]    };
    key <AE08> {    [      8,    asterisk    ]    };
    key <AE09> {    [      9,    parenleft    ]    };
    key <AE10> {    [      0,    parenright    ]    };
    key <AE11> {    [     bracketleft,    braceleft    ]    };
    key <AE12> {    [     bracketright,    braceright        ]    };

  };

This layout is available on Github.

The following resources helped me:

(Y)

-SH.

 


  1. For typing practice, I recommend aoeu.eu and typingclub.com↩︎

09 Dec 2016

Semantic Mediawiki for personal knowledge management, using templates and a custom userscript

1335 words, ~5 min read

Here I’ll try to document my current setup for links management, which is slowly starting to take form.

Как мы пришли к такой жизни

Since the social bookmarking site Delicious (old links page) is seriously falling apart (which is very sad, I liked it almost as much as I liked Google Reader) I started looking for other alternatives. For some time, I used WordPress LinkLibrary plugin until I felt the hard category system lacked flexibility (you can see on the “Links” page of this blog how cluttered and repetitive it is), I needed _tags _and more ways to organize the links and, possibly, the relationships between them.

Then for a very short time I set up a WordPress installation specifically for links. I was not the first one who attempted this (https://sebastiangreger.net/2014/01/own-your-data-part-1-bookmarks/ as an example), but it did not work out well for me.

As for the existing social bookmarking services, for example http://pinboard.in or http://historio.us/, I did not want to pay and wanted control of my data (thank God the export feature in Delicious worked more often than not, but I don’t want to risk it anymore).

As for the need to “share” it, I want to have access to it from various places and, since there’s nothing private, putting it in the cloud and putting a password on it sounds like an unneeded layer of complication. Lastly — who knows — maybe someone will actually get some use out of it.

Semantic Mediawiki

Mediawiki is the software Wikipedia runs on. Semantic Mediawiki is an open-source extension for it that adds the ability to store and query data on a whole another level.

Semantics means, basically, meaning. The difference between “60”, “60kg”, “My weight is 60kg.”

Traditionally, Mediawiki allows the pages to link to each other, but the exact nature of the connection is not clear, and you can’t use the connections much. Semantic Mediawiki allows to define additional data for every page, and allows to define relationships between pages. The data “Benjamin Franklin was born in the USA in 1706” suddenly becomes searchable, for example as “Give me the people born in America before 1800” and “Give me the list of countries where people named Ben were born”. A link “Benjamin Franklin -> Philadelphia” becomes “Benjamin Franklin was (BORN IN) Philadelphia”.

This is awesome.

After looking at it, I understood that I have immense power in my hands, and that I have no idea how to use it. As in, how to create an architecture that was both meaningful and easy to adhere to.

Seeing all this, I thought it would make sense to upgrade my “Link database” to something much more interconnected and useful, a personal knowledge management system.

And here it is.

2016-12-09-213801_1594x634_scrot

The system

Take this page.

Every page has 5 values:

  • l: The actual URI
  • t: the title
  • c: the complexity (how easy/hard is it to read; sometimes I just don’t want to think too much), 1 to 10
  • r: the rating, also 1 to 10
  • o: If it’s a page with only one link, around which the content of the page has been built. (As opposed to “Here are 5 links about X”)

Plus, of course, any additional text.

Properties can be set:

1) In the text itself, for example like this:

    [[l::https://plus.maths.org/content/os/issue53/features/hallucinations/index]]
    - [[t::Uncoiling the spiral: Maths and hallucinations.]]

2) Invisibly:

{{#set:
 o=true
 |c=8
}}

3) using the following nice template I’ve written:

http://www.pchr8.net/wiki/index.php?title=Template:B 

    [[l::{{{1}}}]] - [[t::{{{2}}}]]. 
    ----
    Complexity: [[c::{{{3|5}}}]]; Rating: [[r::{{{4|5}}}]]; Is only link: [[o::{{{5|true}}}]]
    
    
    {{#set:
     l={{{1}}}
     |t={{{2|1}}} <!-- If no title given, use URI as name -->
     |c={{{3|5}}} <!-- 5 as default value -->
     |r={{{4|5}}} <!-- 5 as default rating unless something else given
     |o={{{5|true}}} <!-- only link by default -->
    }}

which can be used like this:

https://www.fastcodesign.com/3043041/evidence/why-our-brains-love-high-ceilings
|Why our brains love high ceilings
|5
|7
}}

My main goal for this was that it should be fast, and fast for me. I can type the above much faster than I can multiple input boxes in a hypothetical GUI.

Then I decided to write some bad javascript to simplify it even more.

The bookmarklet/userscript

An actual bookmarklet would be definitely the next thing I'm doing, until then I'll be adding the pages manually.

But I wrote a small script (two years since I've used any Javascript, haha), to minimize the text above to just this:

https://www.fastcodesign.com/3043041/evidence/why-our-brains-love-high-ceilings
Why our brains love high ceilings
5
7

The (badbadbad) Javascript code is the following:

var lines = $('#wpTextbox1').val().split('\n');

for (i=0; i<5; i++) {
if (typeof lines[i] == 'undefined') {lines[i]='';}
}

if (!ValidURL(lines[0])) {alert(lines[0]+" doesn't look like a valid URL.")};
if (lines[1]=='') {lines[1]=lines[0]};
if (lines[2]=='') {lines[2]='5'};
if (lines[3]=='') {lines[3]='5'};

if (parseInt(lines[2]) > 10 || parseInt(lines[2])<0 || isNaN(lines[2])) {
alert(lines[2]+'is not a valid value, setting to default 5');
lines[2]='5';
}

if (parseInt(lines[3]) > 10 || parseInt(lines[3])<0 || isNaN(lines[3])) {
alert(lines[3]+'is not a valid value, setting to default 5');
lines[3]='5';
}

var text="{{B|\n"+lines[0]+"\n|"+lines[1]+"\n|"+lines[2]+"\n|"+lines[3];
if (lines[4]!='') text+="\n|"+lines[4];
text+="\n}}";

var field = document.getElementById('wpTextbox1');
var textArray = field.value.split("\n");
textArray.splice(0, 4);
textArray[0] = text;
field.value = textArray.join("\n");

function ValidURL(str) {
var pattern = new RegExp('^(https?:\\/\\/)?'+
'((([a-z\\d]([a-z\\d-]*[a-z\\d])*)\\.?)+[a-z]{2,}|'+
'((\\d{1,3}\\.){3}\\d{1,3}))'+
'(\\:\\d+)?(\\/[-a-z\\d%_.~+]*)*'+
'(\\?[;&a-z\\d%_.~+=-]*)?'+
'(\\#[-a-z\\d_]*)?$','i');
return pattern.test(str);
}

The minimized variant of the above now sits nice in my bookmarks bar, and is bound to a keypress in cvim. So I can fill just the URI, and it sets everything else to some default values and adds the Mediawiki template formatting.

TODO:

  1. Getting the page title automagically (see http://stackoverflow.com/questions/10793585/how-to-pick-the-title-of-a-remote-webpage), I'll need a PHP backend. It would be also interesting to check from the PHP if the IP making the request is currently logged in in my wiki, and get the title only then, to prevent abuse
  2. Making a bookmarklet which populates automatically most of the fields, like my old Delicious bookmarklet (sigh.)

Searching the wiki

The search in Semantic Mediawiki is explained pretty well here. Now I can do neat things like "Give me the pages in the Category 'To read' with complexity < 4". And lastly, categories can be inside other categories. If X is in category  A, which is a subcategory of B, it still shows up in searches for category B. (example) Pretty nice!

Knowledge Management

Things I want to learn or will probably need pretty often will have their own pages, like the Formulating Knowledge page. Simply because interacting with the material always helps much more than just reading it. Also I like that it will be represented in a way relevant for me, without unnecessary data and with additional material I think should be there.

For the link pages, there will be the link + very short summary (it has been working pretty well) + a couple of thoughts about it, + maybe relevant data or links to other pages.

TODO: Quotes + Move there my "To Read" / "To Listen to" lists. Also think of a better name for it.

Why?

Warum einfach, wenn es auch kompliziert geht? (A nice German phrase about avoiding the unbearable simplicity of being: "Why simple, when it can be complicated as well?")

On a serious note, I don't have any doubts that in the long run I'll be thankful for this system.

Firstly, I control all of this data. Feels good. Take that, capitalist ad-ridden surveillance corporations!

Secondly, working with a lot of information has always been something I do often and enjoy immensely, and it would make sense to start accumulating everything in one place. Every day I stumble upon a lot of material on the Internet, of very different nature, and with not-obvious connections between them. I have more interests than I can count.

Organizing everything like this so far looks to me the best alternative, and I'm reasonably certain it will work out. There's a lot that can be improved, and I think in a couple of moths it will morph into something awesome.

Finding ways to use all the accumulated data is a topic for another day.

(Y)


A couple of nice relevant inspiring places:

http://yourcmc.ru/wiki/  - in Russian, a person using Mediawiki as central hub for everything.

http://konigi.com/wiki/  - personal wiki, mostly design.

http://thingelstad.com/2012/bookmarking-with-semantic-mediawiki/ a much more advanced version of what I'm trying to do, also using Semantic Mediawiki. I should drop him a line :)