In the middle of the desert you can say anything you want
When saving seaborn images there was weirdness going on, with borders either cutting labels or benig too big.
# bad: cut corners ax.figure.savefig("inat_pnet_lorenz.png") # good: no cut corners and OK bounding box ax.figure.savefig("inat_pnet_lorenz.png", bbox_inches="tight")
Didn’t find this in the documentation, but:
gg = ds.groupby(by=["species"]) lg = next(gg.groups) # lg is the group name tuple (in this case of one string) group_df = gg.get_group(lg)
Looking for a way to have vertical/tree tabs, I found a mention of the zotero web version being really good.
Then you can have multiple papers open (with all annotations etc.) in different browser tabs that can be easily navigated using whatever standard thing one uses.
You can read annotations but not edit them. Quite useful nonetheless!
Overleaf Keyboard Shortcuts - Overleaf, Online LaTeX Editor helpfully links to a PDF, screenshots here:
It seems to have cool multi-cursor functionality that might be worth learning sometime.
<C-b/i>. Same for copypaste etc.
%-style LaTex comments.
<C-S-c>for adding Overleaf comments
Overleaf has a lot of templates: Templates - Journals, CVs, Presentations, Reports and More - Overleaf, Online LaTeX Editor
If your conference’s is missing but it sends you a .zip, you can literally import it as-is in Overleaf, without even unpacking. Then you can “copy” it to somewhere else and start writing your paper.
%timerefers to the line after it,
%%timerefers to the entire cell
occurrences.txtis an improved/cleaned/formalized
meta.xmlhas list of all colum data types etc.
metadata.xmlis things like download doi, license, number of rows, etc.
df = vx.read_csv(DS_LOCATION,convert="verbatim.hdf5",progress=True, sep="\t",quotechar=None,quoting=3,chunk_size=500_000)
Things to try:
pd.read_csv.usecols()1 to the ‘interesting’ ones
TIL that for readability,
x = 100000000 can be written as
x = 100_000_000 etc.! Works for all kinds of numbers - ints, floats, hex etc.!1
I have a larger-than-usual text-based dataset, need to do analysis, pandas is slow (hell, even
wc -l takes 50 seconds…)
Vaex: Pandas but 1000x faster - KDnuggets - that’s a way to catch one’s attention.
vx.from_csv()reads a CSV in memory, kwargs get passed to pandas'
vx.open()reads stuff lazily, but I can’t find a way to tell it that my
.txtfile is a CSV, and more critically - how to pass params like
vx.from_ascii()has a parameter called sepe rator?! API documentation for vaex library — vaex 4.16.0 documentation
convert=that converts stuff to things like HDFS, optionally
chunk_size=is the chunk size in lines. It’ll create $N/chunk_size$ chunks and concat together at the end.
nrows=is the number of rows to read, works with convert etc.
usecols=limits to columns by name, id or callable, speeds up stuff too and by a lot
df.export_hdf5()in vaex, but pandas can’t read that. It may be related to the opposite problem - vaex can’t open pandas HDF5 files directly, because one saves them as rows, other as columns. (See FAQ)
object, in my case it was a boolean. Objects are not supported1, and booleans are objects. Not trivial situation because converting that to, say, int, would have meant reading the entire file - which is just what I don’t want to do, I want to convert to hdf to make it manageable.
Syntax is similar to pandas, but the documentation is somehow .. can’t put my finger on it, but I don’t enjoy it somehow.
l_desc = df.describe() # We find column names that have length_of_dataset NA values not_empty_cols = list(l_desc.T[l_desc.T.NA!=df.count()].T.columns) # Filter the description by them interesting_desc = l_desc[not_empty_cols]
Create and activate it as usual, then:
python -m ipykernel install --user --name=myenv