Pandas aggregation with multiple columns and/or functions
One way to do it, if it’s all for all:
df.groupby("collection")[
["num_pages", "num_chars", "num_tokens", "num_sentences"]
].agg(
[
# "count",
"sum",
"mean",
# "std",
]
)
An even better way:
# ...
].agg(
num_documents=("num_pages", "count"),
num_pages=("num_pages", "sum"),
mean_pages=("num_pages", "mean"),
mean_tokens=("num_tokens", "mean"),
)
They are literally named tuples! Yay for Named Aggregation1!
Nel mezzo del deserto posso dire tutto quello che voglio.
comments powered by Disqus