In the middle of the desert you can say anything you want
When joining/adding two paths (as in discrete math union) located in different layers, the resulting path will be located in the layer selected when doing the joining.
.. are recursive! Grouping two groups works; ungrouping them leads the original two groups!
From Multiprocessing vs. Threading in Python: What Every Data Scientist Needs to Know
Processes: instances of a program being executed; don’t share memory space
Threads: components of a process that run in parallel; share memory, variables, code etc.
Race Condition: “A race condition occurs when multiple threads try to change the same variable simultaneously.” (Basically - when order of execution matters)
Starvation: “Starvation occurs when a thread is denied access to a particular resource for longer periods of time, and as a result, the overall program slows down.”
Deadlock: A deadlock is a state when a thread is waiting for another thread to release a lock, but that other thread needs a resource to finish that the first thread is holding onto.
Livelock : Livelock is when threads keep running in a loop but don’t make any progress.
In CPython, the Global Interpreter Lock (GIL) is a (mutex) mechanism to make sure that two threads don’t write in the same memory space.
Basically “for any thread to perform any function, it must acquire a global lock. Only a single thread can acquire that lock at a time, which means the interpreter ultimately runs the instructions serially.” Therefore, python multithreading cannot make use of multiple CPUs; multithreading doesn’t help for CPU-intensive tasks, but does for places where the bottleneck is elsewhere - user interaction, networking, etc. Multithreading works for places w/o user interaction and other bottlenecks where the tasks are CPU-bound, like doing stuff with numbers.
Tensorflow uses threading for parallel data transformation; pytorch uses multiprocessing to do that in the CPU.
TODO - why does Tensorflow do that?
Python has two libraries, multithreading
and multiprocessing
, with very similar syntax.
Both pictures from the same article above1:
From Python Multi-Threading vs Multi-Processing | by Furqan Butt | Towards Data Science:
Concurrency is essentially defined as handling a lot of work or different units of work of the same program at the same time.
Doing a lot of work of the same program at the same time to speed up the execution time.
Parallelism has a narrower meaning.
concurrent.futures
for multithreading and multiprocessingMultithreading:
import concurrent.futures
with concurrent.futures.ThreadPoolExecutor() as executor:
executor.map(function_name, iterable)
This would create a thread for each element in iterable
.
Multiprocessing works in an extremely similar way:
import concurrent.futures
with concurrent.futures.ProcessPoolExecutor() as executor:
executor.map(function_name, iterable)
More about it, as usual, in the docs:
The asynchronous execution can be performed with threads, using ThreadPoolExecutor, or separate processes, using ProcessPoolExecutor. Both implement the same interface, which is defined by the abstract Executor class. 2
Does concurrent.futures
have any tradeoffs compared to doing multiprocessing.Pool()
like the following?
pool = multiprocessing.Pool()
pool.map(multiprocessing_func, range(1,10))
pool.close()
Parallelising Python with Threading and Multiprocessing | QuantStart has a nice point:
time python thread_test.py
real 0m2.003s
user 0m1.838s
sys 0m0.161s
Both user
and sys
approximately sum to the real
time. => No parallelization (in the general case). After they use multiprocessing, two processes, real
time drops by two, while user
/sys
time stays the same. So time on CPU per second is the same, but we have two CPUs that we use, and we get real time benefits.
time
output:Excellent article, copying directly: Where’s your bottleneck? CPU time vs wallclock time
real
: the wall clock time.
user
: the process CPU time.
sys
: the operating system CPU time due to system calls from the process.
In this case the wall clock time was higher than the CPU time, so that suggests the process spent a bunch of time waiting (58ms or so), rather than doing computation the whole time. What was it waiting for? Probably it was waiting for a network response from the DNS server at my ISP.
Important: If you have lots of processes running on the machine, those other processes will use some CPU.
Directly copypasting from the article above, “CPU” here is “CPU Time” (so user
in the output of the command), second is “real” (=wall; real-world) time.
If this is a single-threaded process:
If this is a multi-threaded process and your computer has N CPUs and at least N threads, CPU/second can be as high as N.
def thread_task(lock):
"""
task for thread
calls increment function 100000 times.
"""
for _ in range(100000):
lock.acquire()
increment()
lock.release()
This is the script of the DOCTOR program for ELIZA: eliza/doctor.txt at master · wadetb/eliza
The -L option can be specified multiple times within the same command. Every time with different ports. 1
Here’s an example:
ssh me@remote_server -L 8822:REMOTE_IP_1:22 -L 9922:REMOTE_IP_2:22
And an even better solution from there, adding this to ~/.ssh/config
Host port-forwarding
Hostname remote_server
User me
LocalForward 6007 localhost:6007
LocalForward 6006 localhost:6006
Port 10000
and then just do ssh pf
!
A list of all colors in latex supported via the various packages: color - Does anyone have a newrgbcolor{colourname}{x.x.x} list? - TeX - LaTeX Stack Exchange
Pressing <Ctrl-C>
in a Terminal where jupyter-notebook is running will show a list of running kernels/notebooks, which will include the token:
1 active kernel
Jupyter Notebook 6.2.0 is running at:
http://localhost:6007/?token=3563b961b19ac50677d86a0952c821c2396c0255e97229bc
or http://127.0.0.1:6007/?token=3563b961b19ac50677d86a0952c821c2396c0255e97229bc
Nice description: Measuring Object Detection models - mAP - What is Mean Average Precision?
TL;DR a way to uniformly calculate results of object detection over an entire dataset, accounding for different thresholds (“my 50% confidence is your 80%). We get such thresholds that recall is 0.1, 0.2, …, 1.0 and then measure precision at these points; take the mean.
A bit more details: rafaelpadilla/Object-Detection-Metrics: Most popular metrics used to evaluate object detection algorithms.
One can use mount
without arguments to get the list of mounted filesystems! o
Killing anything that uses a directory:1
fuser -kim /address # kill any processes accessing file
unmount /address
(-k
is kill, -i
is “ask nicely before killing”)
rbgirshick/yacs: YACS – Yet Another Configuration System is a “lightweight library to define and manage system configurations, such as those commonly found in software designed for scientific experimentation”. It’s used by detectron2, serializes configs in yaml files. Nicely supports standard settings and experiment overrides and CLI overrides. Basically what I’ve been trying ot hack together in some of my scripts.
Got: FileNotFoundError: [Errno 2] No such file or directory: 'datasets/coco/annotations/instances_val2017.json
at the end of trainings.
Solution was to have cfg.DATASETS.TEST = ()
explicitly set, not commented out like I had. 2
so it’s a mystery why
cfg.DATASETS.TEST
is looking fordatasets/coco/annotations/instances_val2017.json
Indeed.
Example of how to use EvalHook to run functions: detectron2/train_net.py at master · facebookresearch/detectron2 (but I’d like to implement the eval as a subclass)
The python3 way to work with paths seems to be pathlib — Object-oriented filesystem paths — Python 3.9.2 documentation, not the old os.path.*
Split is Path (close to really-existing things), and PurePath - abstract paths, without connection to any real filesystem.
Shapely is awesome! And easy to play with in jupyter notebook
To access a Tensorboard (..or anything) running on a remote server servername
on port 6006:
ssh -L 6006:127.0.0.1:6006 me@servername
After this, tensorboard is bound to the local port 6006, so 127.0.0.1:6006
.
Tensorboard has to be run with --host=127.0.0.1
to make it accessible from outside.
Jupyter - the link with the token can simply be followed (or copypasted), if the port is the same in both localhost and server.
Unsurprisingly intuitive:
ax.set_ylim(1, 0)
(of course, problematic if you don’t know your actual limit)
EDIT Mi 10 Mär 2021 19:23:20 CET: There’s an even better solution! 1
ax.invert_yaxis()
Pytorch officially doesn’t do CUDA 10.0.x, but I found this, worked perfectly: How to Install PyTorch with CUDA 10.0 - VarHowto
Installing:
pip install torch==1.4.0 torchvision==0.5.0 -f https://download.pytorch.org/whl/cu100/torch_stable.html
Testing installation and GPU:
import torch
x = torch.rand(5, 3)
print(x)
torch.cuda.is_available()
Nice discussion: How do you manage your dotfiles across multiple and/or new developer machines? - DEV Community
This article also provides a really nice explanation of the general practice that many people seem to be taking: store dotfiles in GitHub, and then install them via a simple script that symlinks files and runs any additional init logic.
… not that I’ve ever used it or plan to (google, don’t ban me before I finished switching to FastMail!), but - NewPipe supports searching and playing videos from Youtube Music!
Serial-position effect “is the tendency of a person to recall the first and last items in a series best, and the middle items worst”. Related is the Von Restorff effect about the most different stimuli being easier to remember.