20 Dec 2024

Panflute and pandoc for parsing qmd and other files

Quarto markdown is a superset of pandoc markdown, unrelated article about the latter: Everything pandoc Markdown can do
Pandoc has a python library: Pandoc (Python Library)
panflute is a more python way to parse docs and write pandoc filters, and I love it: User guide — panflute 2.3.0 documentation

I missed an ability to recursively look for elements matching a condition in panflute, so:

def _recursively_find_elements(
    element: Element | list[Element], condition: Callable
) -> list[Element]:
    """Return panflute element(s) and their descendants that match conditition.
    """
    results = list()

    def action(el, doc):
        if condition(el):
            results.append(el)

    if not isinstance(element, list):
        element = [element]

    for e in element:
        e.walk(action)


    return results

# sample condition
def is_header(e) -> bool:
	cond = e.tag == "Header" and e.level == 2  # and "data-pos" in e.attributes
	return cond

Ah, to read:

ddoc = pf.convert_text(
	markdown,
	input_format="commonmark_x+raw_html+bracketed_spans+fenced_divs+sourcepos",
	output_format="panflute",
)

To output readably:

pf.stringify(el).strip()

Pandoc/panflute get line numbers of elements

input_format has to be commonmark[_x]+sourcepos
- sourcepos isn’t too well documented, only w/ commonmark
- it basically sets el.attributes['data-pos'] a la 126:1-127:1
- line_no always matching what I expect

def _parse_data_pos(p: str) -> tuple[tuple[int, int], tuple[int, int]]:
	"""Parse data-pos string to (line, char) for start and end.
	
	Example: '126:1-127:1' -> ((126, 1), (127, 1))
	
	Arguments:
		p: data-pos string as generated by commonmark+sourcepos extension.
	"""
	start, end = p.split("-")
	start_l, start_c = start.split(":")
	end_l, end_ch = end.split(":")
	return (int(start_l), int(start_c)), (int(end_l), int(end_ch))

Nel mezzo del deserto posso dire tutto quello che voglio.

serhii.net

Panflute and pandoc for parsing qmd and other files

Pandoc/panflute get line numbers of elements