In the middle of the desert you can say anything you want
The Radar chart and its caveats: “radar or spider or web chart” (c)
… are best done in plotly:
For a log axis:
fig.update_layout(
template=None,
polar = dict(
radialaxis = dict(type="log"),
)
EDIT: for removing excessive margins, use
fig.update_layout( margin=dict(l=20, r=20, t=20, b=20), )
Trivial option: Label data points with Seaborn & Matplotlib | EasyTweaks.com
TL;DR
for i, label in enumerate (data_labels):
ax.annotate(label, (x_position, y_position))
BUT! Overlapping texts are sad:
SO sent me to the library Home · Phlya/adjustText Wiki and it’s awesome
fig, ax = plt.subplots()
plt.plot(x, y, 'bo')
texts = [plt.text(x[i], y[i], 'Text%s' %i, ha='center', va='center') for i in range(len(x))]
# adjust_text(texts)
adjust_text(texts, arrowprops=dict(arrowstyle='->', color='red'))
Not perfect but MUCH cleaner:
More advanced tutorial: adjustText/Examples.ipynb at master · Phlya/adjustText · GitHub
Pypy doesn’t have the latest version, which has:
min_arrow_len
expand
How To Apply Conditional Formatting Across An Entire Row;
$A$1
is a direct reference to A1
, that won’t move if formula is applied to a rangeISBLANK(..)
means cell is emptyAND(c1,c2,...,cN)
, OR(c1,c2,...,cN)
$
s=$U1=1
is “if U
of the current row is equal to 1” (then you can color the entire row green or whatever)This contains the entire list of all datasets I care about RE [230529-1413 Plants datasets taxonomy] for 230507-2308 230507-1623 Plants paper notes
GBIF
Pl@ntNet
Flora-On: Flora de Portugal Interactiva. (2023). Sociedade Portuguesa de Botânica. www.flora-on.pt. Consulta efectuada em 29-5-2023.
@herediaLargeScalePlantClassification2017
(2017) z/d>iNaturalist-xxx
@vanhornINaturalistSpeciesClassification2018
(2018) z/d>INaturalist
Flora Incognita
we curated a partly crowd-sourced image dataset, comprising 50,500 images of 101 species.
2494 observations with 3199 images from 588 species, 365 genera and 89 families
PlantCLEF
REALLY NICE OVERVIEW PAPER with really good overview of the existing datasets! Frontiers | Plant recognition by AI: Deep neural nets, transformers, and kNN in deep embeddings
Flavia
Datasets | The Leaf Genie has list of leaf datasets! TODO
Herbarium 2021
@delutioHerbarium2021HalfEarth2021
(2021) z/d>)pip install jupyter-black
To load:
%load_ext jupyter_black
It will automatically format all correct python code in the cells!
NB works much, much better with jupyterlab, in the notebook version it first executes the cell, then does black and hides cell output. It does warn about that everywhere though.
Old code I wrote for making ds.corr()
more readable, looked for it three times already ergo its place is here.
Basically: removes all small correlations, and optionally plots a colorful heatmap of that.
def plot_corr(res:pd.DataFrame):
import seaborn as sns
sns.heatmap(res, annot=True,fmt=".1f",cmap="coolwarm")
def get_biggest_corr(ds_corr: pd.DataFrame, limit: float=0.8, remove_diagonal=True, remove_nans=True,plot=False) -> pd.DataFrame:
import numpy as np # just in case
res = ds_corr[(ds_corr>limit) | (ds_corr<-limit)]
if remove_diagonal:
np.fill_diagonal(res.values, np.nan)
if remove_nans:
res = res.dropna(how='all', axis=0)
res = res.dropna(how='all', axis=1)
if plot:
plot_corr(res)
else:
return res
I like seaborn but kept googling the same things and could never get any internal ‘consistency’ in it, which led to a lot of small unsystematic posts1 but I felt I was going in circles. This post is an attempt to actually read the documentation and understand the underlying logic of it all.
I’ll be using the context of my “Informationsvisualisierung und Visual Analytics 2023” HSA course’s “Aufgabe 6: Visuelle Exploration multivariater Daten”, and the dataset given for that task: UCI Machine Learning Repository: Student Performance Data Set:
This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires
Goal:
I’m not touching the seaborn.objects interface as the only place I’ve seen it mentioned is the official docu and I’m not sure it’s worth digging into for now.
An introduction to seaborn — seaborn 0.12.2 documentation
# sets default theme that looks nice
# and used in all pics of the tutorial
sns.set_theme()
Overview of seaborn plotting functions — seaborn 0.12.2 documentation:
Functions can be:
matplotlib.axes.Axes
object and return it
The axes-level functions are written to act like drop-in replacements for matplotlib functions. While they add axis labels and legends automatically, they don’t modify anything beyond the axes that they are drawn into. That means they can be composed into arbitrarily-complex matplotlib figures with predictable results.
FacetGrid
)kind=xxx
parameter)col=
and row=
params that automatically create subplots!The figure-level functions wrap their axes-level counterparts and pass the kind-specific keyword arguments (such as the bin size for a histogram) down to the underlying function. That means they are no less flexible, but there is a downside: the kind-specific parameters don’t appear in the function signature or docstring
Special cases:
sns.jointplot()
3 has one plot with distributions around it and is a JointGridsns.pairplot()
4 “visualizes every pairwise combination of variables simultaneously” and is a PairGridIn the pic above, the figure-level functions are the blocks on top, their axes-level functions - below. (TODO: my version of that pic with the kind=xxx
bits added)
The returned seaborn.FacetGrid
can be customized in some ways (all examples here from that documentation link).
g.map_dataframe(sns.scatterplot, x="total_bill", y="tip")
g.set_axis_labels("Total bill ($)", "Tip ($)")
g.set_titles(col_template="{col_name} patrons", row_template="{row_name}")
g.set(xlim=(0, 60), ylim=(0, 12), xticks=[10, 30, 50], yticks=[2, 6, 10])
g.tight_layout()
g.savefig("facet_plot.png")
It’s possible to access the underlying matplotlib axes:
g = sns.FacetGrid(tips, col="sex", row="time", margin_titles=True, despine=False)
g.map_dataframe(sns.scatterplot, x="total_bill", y="tip")
g.figure.subplots_adjust(wspace=0, hspace=0)
for (row_val, col_val), ax in g.axes_dict.items():
if row_val == "Lunch" and col_val == "Female":
ax.set_facecolor(".95")
else:
ax.set_facecolor((0, 0, 0, 0))
And generally access matplotlib stuff:
ax
: The matplotlib.axes.Axes when no faceting variables are assigned.axes
: An array of the matplotlib.axes.Axes objects in the grid.axes_dict
: A mapping of facet names to corresponding matplotlib.axes.Axes.figure
: Access the matplotlib.figure.Figure object underlying the grid (formerly fig
)legend
: The matplotlib.legend.Legend object, if present.FacetGrid.set()
(Previously: 230515-2257 seaborn setting titles etc. with matplotlib set)
FacetGrid.set() is used from time to time in the tutorial (e.g. .set(title="My title")
, especially in Building structured multi-plot grids) but never explicitly explained; in its documentation, there’s only “Set attributes on each subplot Axes”.
It sets attributes for each subplot’s matplotlib.axes.Axes. Useful ones are:
title
for plot title (set_title()
)xticks
,yticks
set_xlabel()
, set_ylabel
(but not sequentially as return value is not the ax)Axis-level functions “can be composed into arbitrarily complex matplotlib figures”.
Practically:
fig, axs = plt.subplots(2)
sns.heatmap(..., ax=axs[0])
sns.heatmap(..., ax=axs[1])
Documentation has an entire section on it5, mostly reprasing and stealing screenshots from it.
For axis-level functions, the size of the plot is determined by the size of the Figure it is part of and the axes layout in that figure. You basically use what you would do in matplotlib, relevant being:
matplotlib.Figure.set_size_inches()
)TL;DR they have FacetGrid
’s’ height=
and aspect=
(ratio; 0.75
means 5 cells high, 4 cells wide) params that work per subplot.
Figure-level functions’ size has differences:
height
and aspect
, work like this: width = height * aspect
Blocks doing similar kinds of plots, each with a figure-level function and multiple axis-level ones. Listed in the API reference.6
kind="scatter"
; the default)kind="line"
)And again, the already mentioned special cases, now with pictures:
sns.jointplot()
3 has one plot with distributions around it and is a JointGrid:
sns.pairplot()
4 “visualizes every pairwise combination of variables simultaneously” and is a PairGrid:The parameters for marks are described better in the tutorial than I ever could: Properties of Mark objects — seaborn 0.12.2 documentation:
TODO my main remaining question is where/how do I set this? Can this be done outside the seaborn.objects
interface I don’t want to learn.
s=30
to the plotting function. (size=
would be a column name)sns.scatterplot(
style="is_available",
# marker=MarkerStyle("o", "left"),
markers={True: MarkerStyle("o", "left"), False: MarkerStyle("o", "right")},
)
Controlling figure aesthetics — seaborn 0.12.2 documentation
There are five preset seaborn themes: dark
, white
, ticks
, whitegrid
, darkgrid
. This picture contains the first four of the above in this order.
set_context()
The tutorial has this: Choosing color palettes — seaborn 0.12.2 documentation with both a theoretical basis about color and stuff, and the “how to set it in your plot”.
TL;DR sns.color_palette(PALETTE_NAME, NUM_COLORS, as_cmap=TRUE_IF_CONTINUOUS)
seaborn.color_palette()
returns a list of colors or a continuous matplotlib ListedColormap colormap:
Accepts as palette
, among other things:
‘light:<color>’, ‘dark:<color>’, ‘blend:<color>,<color>’
n_colors
: will truncate if it’s less than palette colors, will extend/cycle palette if it’s more
as_cmap
- whether to return a continuous ListedColormap
desat
You can do .as_hex()
to get the list as hex colors.
You can use it as context manager: with sns.color_palette(...):
to temporarily change the current defaults.
Matplotlib colormap + _r
(tab10_r
).
I needed a colormap where male is blue and female is orange, tab10
has these colors but in reversed order. This is how I got a colormap with the first two colors but reversed:
cm = sns.color_palette("tab10",2)[::-1]
First I generated a color_palette of 2 colors, then reversed the list of tuples it returned.
histplot
has different approaches for plotting multiple=
distributions on the same plot:
multiple=fill
dodge=True
errwidth=
The error bars around an estimate of central tendency can show one of two general things: either the range of uncertainty about the estimate or the spread of the underlying data around it. These measures are related: given the same sample size, estimates will be more uncertain when data has a broader spread. But uncertainty will decrease as sample sizes grow, whereas spread will not.
pd.sort_index()
annot=True, fmt=".1f"
vmin=/vmax=
Previously: Small unsystematic posts about seaborn: - Architecture-ish: - 230515-2257 seaborn setting titles etc. with matplotlib set - 230515-2016 seaborn things built on FacetGrid for easy multiple plots - Small misc: - 230428-2042 Seaborn basics - 230524-2209 Seaborn visualizing distributions and KDE plots)
Visualizing distributions of data — seaborn 0.12.2 documentation:
common_norm=True
by default applies the same normalization to the entire distribution. False
scales each independently. This is critical in many cases, esp. with stat="probability"
Generally: I read the seaborn documentation, esp. the high level architecture things, and a lot of things I’ve been asking myself since forever (e.g. 230515-2257 seaborn setting titles etc. with matplotlib set) have become much clearer - and will be its own post. I love seaborn and it’s honestly worth learning to use well and systematically.
ds = Dataset(...)
ds.set_format("pandas")
There’s cycler
, a package:
It returns cycles of dicts, finite or infinite:
from cycler import cycler
# list of colors
pal = sns.color_palette("Paired")
# `cycler` is a finite cycle, cycler() is an infinite
cols = iter(cycler(color=pal)())
# every time you need a color
my_color = next(cols)