serhii.net

In the middle of the desert you can say anything you want

15 May 2023

Pandas set column value based on (incl.groupby) filter

TL;DR

df.loc[row_indexer, col_indexer] = value

col_indexer can be a non-existing-yet column! And row_indexer can be anything, including based on a groupby filter.

Below, the groupby filter has dropna=False which would return also the rows that don’t match the filter, giving a Series with the same indexes as the main df

# E.g. this groupby filter - NB.  dropna=False
df_item.groupby(['item.item_id']).filter(lambda x:x.items_available.max()>0, dropna=False)['item.item_id']

# Then we use that in the condition, nice arbitrary example with `item.item_id` not being the index of the DF
df_item.loc[df_item['item.item_id']==df_item.groupby(['item.item_id']).filter(lambda x:x.items_available.max()>0, dropna=False)['item.item_id'],'item_active'] = True

I’m not sure whether this is the “best” way to incorporate groupby results, but seems to work OK for now.

Esp. the remaining rows have nan instead of False, can be worked around but is ugly:

df_item['item_active'] = df_item['item_active'].notna()

# For plotting purposes
sns.histplot(data=df_item.notna(), ... )

Pandas Filter by Column Value - Spark By {Examples} has more examples of conditions:

# From https://sparkbyexamples.com/pandas/pandas-filter-by-column-value/

df.loc[df['Courses'] == value]
df.loc[df['Courses'] != 'Spark']
df.loc[df['Courses'].isin(values)]
df.loc[~df['Courses'].isin(values)]
df.loc[(df['Discount'] >= 1000) & (df['Discount'] <= 2000)]
df.loc[(df['Discount'] >= 1200) & (df['Fee'] >= 23000 )]

df[df["Courses"] == 'Spark'] 
df[df['Courses'].str.contains("Spark")]
df[df['Courses'].str.lower().str.contains("spark")]
df[df['Courses'].str.startswith("P")]
Nel mezzo del deserto posso dire tutto quello che voglio.