Pandas set column value based on (incl.groupby) filter
TL;DR
df.loc[row_indexer, col_indexer] = value
col_indexer
can be a non-existing-yet column! And row_indexer
can be anything, including based on a groupby
filter.
Below, the groupby filter has dropna=False
which would return also the rows that don’t match the filter, giving a Series with the same indexes as the main df
# E.g. this groupby filter - NB. dropna=False
df_item.groupby(['item.item_id']).filter(lambda x:x.items_available.max()>0, dropna=False)['item.item_id']
# Then we use that in the condition, nice arbitrary example with `item.item_id` not being the index of the DF
df_item.loc[df_item['item.item_id']==df_item.groupby(['item.item_id']).filter(lambda x:x.items_available.max()>0, dropna=False)['item.item_id'],'item_active'] = True
I’m not sure whether this is the “best” way to incorporate groupby results, but seems to work OK for now.
Esp. the remaining rows have nan
instead of False, can be worked around but is ugly:
df_item['item_active'] = df_item['item_active'].notna()
# For plotting purposes
sns.histplot(data=df_item.notna(), ... )
Pandas Filter by Column Value - Spark By {Examples} has more examples of conditions:
# From https://sparkbyexamples.com/pandas/pandas-filter-by-column-value/
df.loc[df['Courses'] == value]
df.loc[df['Courses'] != 'Spark']
df.loc[df['Courses'].isin(values)]
df.loc[~df['Courses'].isin(values)]
df.loc[(df['Discount'] >= 1000) & (df['Discount'] <= 2000)]
df.loc[(df['Discount'] >= 1200) & (df['Fee'] >= 23000 )]
df[df["Courses"] == 'Spark']
df[df['Courses'].str.contains("Spark")]
df[df['Courses'].str.lower().str.contains("spark")]
df[df['Courses'].str.startswith("P")]
Nel mezzo del deserto posso dire tutto quello che voglio.
comments powered by Disqus