serhii.net

In the middle of the desert you can say anything you want

06 Oct 2023

Pandas aggregation with multiple columns and/or functions

One way to do it, if it’s all for all:

df.groupby("collection")[
    ["num_pages", "num_chars", "num_tokens", "num_sentences"]
].agg(
    [
        # "count",
        "sum",
        "mean",
        # "std",
    ]
)

An even better way:

# ...
].agg(
    num_documents=("num_pages", "count"),
    num_pages=("num_pages", "sum"),
    mean_pages=("num_pages", "mean"),
    mean_tokens=("num_tokens", "mean"),
)

They are literally named tuples! Yay for Named Aggregation1!

Nel mezzo del deserto posso dire tutto quello che voglio.