In the middle of the desert you can say anything you want
When saving seaborn images there was weirdness going on, with borders either cutting labels or benig too big.
Solution:
# bad: cut corners
ax.figure.savefig("inat_pnet_lorenz.png")
# good: no cut corners and OK bounding box
ax.figure.savefig("inat_pnet_lorenz.png", bbox_inches="tight")
Didn’t find this in the documentation, but:
gg = ds.groupby(by=["species"])
lg = next(gg.groups)
# lg is the group name tuple (in this case of one string)
group_df = gg.get_group(lg)
Looking for a way to have vertical/tree tabs, I found a mention of the zotero web version being really good.
Then you can have multiple papers open (with all annotations etc.) in different browser tabs that can be easily navigated using whatever standard thing one uses.
You can read annotations but not edit them. Quite useful nonetheless!
Overleaf Keyboard Shortcuts - Overleaf, Online LaTeX Editor helpfully links to a PDF, screenshots here:
It seems to have cool multi-cursor functionality that might be worth learning sometime.
<C-b/i>
. Same for copypaste etc.<C-/>
for adding %
-style LaTex comments.<C-S-c>
for adding Overleaf commentsOverleaf has a lot of templates: Templates - Journals, CVs, Presentations, Reports and More - Overleaf, Online LaTeX Editor
If your conference’s is missing but it sends you a .zip, you can literally import it as-is in Overleaf, without even unpacking. Then you can “copy” it to somewhere else and start writing your paper.
Difference between %time and %%time in Jupyter Notebook - Stack Overflow
%time
refers to the line after it, %%time
refers to the entire celloccurrences.txt
is an improved/cleaned/formalized verbatim.txt
meta.xml
has list of all colum data types etc.
metadata.xml
is things like download doi, license, number of rows, etc.'
/"
as quotechar
work.df = vx.read_csv(DS_LOCATION,convert="verbatim.hdf5",progress=True, sep="\t",quotechar=None,quoting=3,chunk_size=500_000)
Things to try:
pd.read_csv.usecols()
1 to the ‘interesting’ onesNaN
smeta.xml
TIL that for readability, x = 100000000
can be written as x = 100_000_000
etc.! Works for all kinds of numbers - ints, floats, hex etc.!1
I have a larger-than-usual text-based dataset, need to do analysis, pandas is slow (hell, even wc -l
takes 50 seconds…)
Vaex: Pandas but 1000x faster - KDnuggets - that’s a way to catch one’s attention.
I/O Kung-Fu: get your data in and out of Vaex β vaex 4.16.0 documentation
vx.from_csv()
reads a CSV in memory, kwargs get passed to pandas' read_csv()
vx.open()
reads stuff lazily, but I can’t find a way to tell it that my .txt
file is a CSV, and more critically - how to pass params like sep
etc
vx.from_ascii()
has a parameter called sepe rator?! API documentation for vaex library β vaex 4.16.0 documentationconvert=
that converts stuff to things like HDFS, optionally chunk_size=
is the chunk size in lines. It’ll create $N/chunk_size$ chunks and concat together at the end.nrows=
is the number of rows to read, works with convert etc.usecols=
limits to columns by name, id or callable, speeds up stuff too and by a lotdf.export_hdf5()
in vaex, but pandas can’t read that. It may be related to the opposite problem - vaex can’t open pandas HDF5 files directly, because one saves them as rows, other as columns. (See FAQ)object
, in my case it was a boolean. Objects are not supported1, and booleans are objects. Not trivial situation because converting that to, say, int, would have meant reading the entire file - which is just what I don’t want to do, I want to convert to hdf to make it manageable.Syntax is similar to pandas, but the documentation is somehow .. can’t put my finger on it, but I don’t enjoy it somehow.
l_desc = df.describe()
# We find column names that have length_of_dataset NA values
not_empty_cols = list(l_desc.T[l_desc.T.NA!=df.count()].T.columns)
# Filter the description by them
interesting_desc = l_desc[not_empty_cols]
Use Virtual Environments Inside Jupyter Notebooks & Jupter Lab [Best Practices]
Create and activate it as usual, then:
python -m ipykernel install --user --name=myenv