In the middle of the desert you can say anything you want
Difference between %time and %%time in Jupyter Notebook - Stack Overflow
%time
refers to the line after it, %%time
refers to the entire celloccurrences.txt
is an improved/cleaned/formalized verbatim.txt
meta.xml
has list of all colum data types etc.
metadata.xml
is things like download doi, license, number of rows, etc.'
/"
as quotechar
work.df = vx.read_csv(DS_LOCATION,convert="verbatim.hdf5",progress=True, sep="\t",quotechar=None,quoting=3,chunk_size=500_000)
Things to try:
pd.read_csv.usecols()
1 to the ‘interesting’ onesNaN
smeta.xml
TIL that for readability, x = 100000000
can be written as x = 100_000_000
etc.! Works for all kinds of numbers - ints, floats, hex etc.!1
I have a larger-than-usual text-based dataset, need to do analysis, pandas is slow (hell, even wc -l
takes 50 seconds…)
Vaex: Pandas but 1000x faster - KDnuggets - that’s a way to catch one’s attention.
I/O Kung-Fu: get your data in and out of Vaex β vaex 4.16.0 documentation
vx.from_csv()
reads a CSV in memory, kwargs get passed to pandas’ read_csv()
vx.open()
reads stuff lazily, but I can’t find a way to tell it that my .txt
file is a CSV, and more critically - how to pass params like sep
etc
vx.from_ascii()
has a parameter called sepe rator?! API documentation for vaex library β vaex 4.16.0 documentationconvert=
that converts stuff to things like HDFS, optionally chunk_size=
is the chunk size in lines. It’ll create $N/chunk_size$ chunks and concat together at the end.nrows=
is the number of rows to read, works with convert etc.usecols=
limits to columns by name, id or callable, speeds up stuff too and by a lotdf.export_hdf5()
in vaex, but pandas can’t read that. It may be related to the opposite problem - vaex can’t open pandas HDF5 files directly, because one saves them as rows, other as columns. (See FAQ)object
, in my case it was a boolean. Objects are not supported1, and booleans are objects. Not trivial situation because converting that to, say, int, would have meant reading the entire file - which is just what I don’t want to do, I want to convert to hdf to make it manageable.Syntax is similar to pandas, but the documentation is somehow .. can’t put my finger on it, but I don’t enjoy it somehow.
l_desc = df.describe()
# We find column names that have length_of_dataset NA values
not_empty_cols = list(l_desc.T[l_desc.T.NA!=df.count()].T.columns)
# Filter the description by them
interesting_desc = l_desc[not_empty_cols]
Use Virtual Environments Inside Jupyter Notebooks & Jupter Lab [Best Practices]
Create and activate it as usual, then:
python -m ipykernel install --user --name=myenv
It all started with the menu bar disappearing on qutebrowser but not firefox:
Broke everything when trying to fix it, leading to not working vim bindings in lab
. Now I have vim bindings back and can live without the menu I guess.
It took 4h of very frustrating trial and error that I don’t want to document anymore, but - the solution to get vim bindings inside jupyterlab was to use the steps for installing through jupyter of the extension for notebooks, not the recommended lab one.
Installation Β· lambdalisue/jupyter-vim-binding Wiki:
mkdir -p $(jupyter --data-dir)/nbextensions/vim_binding
jupyter nbextension install https://raw.githubusercontent.com/lambdalisue/jupyter-vim-binding/master/vim_binding.js --nbextensions=$(jupyter --data-dir)/nbextensions/vim_binding
jupyter nbextension enable vim_binding/vim_binding
I GUESS the issue was that previously I didn’t use --data-dir
, and tried to install as-is, which led to permission hell. Me downgrading -lab at some point also helped maybe.
The recommended jupyterlab-vim
package installed (through pip), was enabled, but didn’t do anything: jwkvam/jupyterlab-vim: Vim notebook cell bindings for JupyterLab.
Also, trying to install it in a clean virtualenv and then doing the same with pyenv was not part of the solution and made everything worse.
Getting paths for both -lab
and classic:
> jupyter-lab paths
Application directory: /home/sh/.local/share/jupyter/lab
User Settings directory: /home/sh/.jupyter/lab/user-settings
Workspaces directory: /home/sh/.jupyter/lab/workspaces
> jupyter --paths
config:
/home/sh/.jupyter
/home/sh/.local/etc/jupyter
/usr/etc/jupyter
/usr/local/etc/jupyter
/etc/jupyter
data:
/home/sh/.local/share/jupyter
/usr/local/share/jupyter
/usr/share/jupyter
runtime:
/home/sh/.local/share/jupyter/runtime
Removing ALL packages I had locally:
pip uninstall --yes jupyter-black jupyter-client jupyter-console jupyter-core jupyter-events jupyter-lsp jupyter-server jupyter-server-terminals jupyterlab-pygments jupyterlab-server jupyterlab-vim jupyterlab-widgets
pip uninstall --yes jupyterlab nbconvert nbextension ipywidgets ipykernel nbclient nbclassic ipympl notebook
To delete all extensions: jupyter lab clean --all
Related: 230606-1428 pip force reinstall package
> pip freeze | ag "(jup|nb|ipy)"
ipykernel==6.23.1
ipython==8.12.2
ipython-genutils==0.2.0
jupyter-client==8.2.0
jupyter-contrib-core==0.4.2
jupyter-contrib-nbextensions==0.7.0
jupyter-core==5.3.0
jupyter-events==0.6.3
jupyter-highlight-selected-word==0.2.0
jupyter-nbextensions-configurator==0.6.3
jupyter-server==2.6.0
jupyter-server-fileid==0.9.0
jupyter-server-terminals==0.4.4
jupyter-server-ydoc==0.8.0
jupyter-ydoc==0.2.4
jupyterlab==3.6.4
jupyterlab-pygments==0.2.2
jupyterlab-server==2.22.1
jupyterlab-vim==0.16.0
nbclassic==1.0.0
nbclient==0.8.0
nbconvert==7.4.0
nbformat==5.9.0
scipy==1.9.3
widgetsnbextension==4.0.7
history | grep jup
“One of the 2.5 hours I’ll never get back”, Serhii H. (2023). Oil on canvas
Kitty terminal, scrot
screenshotting tool, bash.
Docker image runs a Python script that uses print()
a lot, but docker logs
is silent because python print()
uses buffered output, and it takes minutes to show.
Solution1: tell python not to do that through an environment variable.
docker run --name=myapp -e PYTHONUNBUFFERED=1 -d myappimage
TIL about pip install packagename --force-reinstall
1
…
(On a third thought, I realized how good ChatGPT is at suggesting this stuff, making this list basically useless. Good news though.)
I love Dia, and today I discovered that:
Before and after: