Panflute and pandoc for parsing qmd and other files
- Quarto markdown is a superset of pandoc markdown, unrelated article about the latter: Everything pandoc Markdown can do
- Pandoc has a python library: Pandoc (Python Library)
- panflute is a more python way to parse docs and write pandoc filters, and I love it: User guide — panflute 2.3.0 documentation
I missed an ability to recursively look for elements matching a condition in panflute, so:
def _recursively_find_elements(
element: Element | list[Element], condition: Callable
) -> list[Element]:
"""Return panflute element(s) and their descendants that match conditition.
"""
results = list()
def action(el, doc):
if condition(el):
results.append(el)
if not isinstance(element, list):
element = [element]
for e in element:
e.walk(action)
return results
# sample condition
def is_header(e) -> bool:
cond = e.tag == "Header" and e.level == 2 # and "data-pos" in e.attributes
return cond
Ah, to read:
ddoc = pf.convert_text(
markdown,
input_format="commonmark_x+raw_html+bracketed_spans+fenced_divs+sourcepos",
output_format="panflute",
)
To output readably:
pf.stringify(el).strip()
Pandoc/panflute get line numbers of elements
input_format
has to becommonmark[_x]+
sourcepos
sourcepos
isn’t too well documented, only w/commonmark
- it basically sets
el.attributes['data-pos']
a la126:1-127:1
line_no
always matching what I expect
def _parse_data_pos(p: str) -> tuple[tuple[int, int], tuple[int, int]]:
"""Parse data-pos string to (line, char) for start and end.
Example: '126:1-127:1' -> ((126, 1), (127, 1))
Arguments:
p: data-pos string as generated by commonmark+sourcepos extension.
"""
start, end = p.split("-")
start_l, start_c = start.split(":")
end_l, end_ch = end.split(":")
return (int(start_l), int(start_c)), (int(end_l), int(end_ch))
Nel mezzo del deserto posso dire tutto quello che voglio.
comments powered by Disqus