In the middle of the desert you can say anything you want
When training on different GPUs on the same server, I get errors like RuntimeError: DataLoader worker (pid 30141) exited unexpectedly with exit code 1.
The fix was to set the number of workers to 0: 1
cfg.DATALOADER.NUM_WORKERS = 2
From SO: 1
[..]the only difference between mAP for object detection and instance segmentation is that when calculating overlaps between predictions and ground truths, one uses the pixel-wise IOU rather than bounding box IOU.
Finding an optimal cutoff point in a ROC curve is largely arbitrary (or ‘depending on what you need’ based on the actual thing). A lot of ways to find this. (Nice list here, but I’d see if I can find a paper with a good overview: data visualization - How to determine best cutoff point and its confidence interval using ROC curve in R? - Cross Validated)
Nice series of posts on how Detectron2 works inside: Digging into Detectron 2 — part 1 | by Hiroto Honda | Medium
The best way to build intuition about how your model performs is by looking at predictions that it was confident about but got wrong. With FiftyOne, this is easy. For example, let’s create a view into our dataset looking at the samples with the most false positives
More examples of the same: IoU a better detection evaluation metric | by Eric Hofesmann | Towards Data Science
In my text notes, I use indentation heavily, but use bullet-point-dashes (-) and just indentation almost interchangeably:
One two
Three
Four
Five
- six
- seven
- eight
Nine
Ten
- 12
- Thirteen
Next part
From now on:
tensor.cpu().numpy()
needs to be done when using GPU.
Паста с морепродуктами в сливочном соусе рецепт – итальянская кухня: паста и пицца. «Еда»
NVIDIA Nsight Systems | NVIDIA Developer
Found here (a nice article too): Object Detection from 9 FPS to 650 FPS in 6 Steps | paulbridger.com
Multiprocessing best practices — PyTorch 1.8.0 documentation
TL;DR:
torch.multiprocessing
is a drop in replacement for Python’smultiprocessing
module
If Detectron2 complains about wanting a GPU and finding no CUDA (because there’s none), the script can be set to CPU-only through the settings:
cfg.MODEL.DEVICE = 'cpu'
I should read documentation more often: detectron2.structures — detectron2 0.3 documentation
category_3_detections = instances[instances.pred_classes == 3]
confident_detections = instances[instances.scores > 0.9]
In general about model outputs: Use Models — detectron2 0.3 documentation
mytensor.numpy()
is unsurprisingly easy.
Shapely geometries can be processed into a state that supports more efficient batches of operations.
(The Shapely User Manual — Shapely 1.7.1 documentation)
if joined_boxes.geom_type == 'MultiPolygon':
is much cleaner than the isinstance(joined_boxes, MultiPolygon) I’ve been using!
Also - TODO - why is a Polygon that created a MultiPolygon within()
it, if `within()..
Returns True if the object’s boundary and interior intersect only with the interior of the other (not its boundary or exterior).
Their boundary should touch, so shouldn’t be valid?
Nice (and one of the only..) graphic explanation: R-tree Spatial Indexing with Python – Geoff Boeing
Shapely has a partial implementation: 1
Pass a list of geometry objects to the STRtree constructor to create a spatial index that you can query with another geometric object. Query-only means that once created, the STRtree is immutable.
TL;DR:
tree = STRtree(all_geoms)
results = tree.query(query_geom)
In general if I’ll be working more with shapes I should hang out in GIS places to to absorb approaches and terminology. One of R-Tree’s use-cases is say “find restaurants inside this block” which can also be solved by blind iteration (but shouldn’t).
Finally got the more familiar keybinding to work, as usual config.py
:
config.bind('<Ctrl-Shift-C>', 'yank selection')`
config.bind(',y', 'yank selection')
johnnydep2 is really cool and visualizes the dependencies of something without installing them (but still downloads them!)
Found .local/share/Trash
with 33Gb of ..trash in it.
A .whl
file is just an archive, can be unzipped. The entire list of dependencies is in yourpackage.dist-info/METADATA
, looks like this:
Requires-Python: >=3.6
Provides-Extra: all
Provides-Extra: dev
Requires-Dist: termcolor (>=1.1)
Requires-Dist: Pillow (>=7.1)
..exists, and in general I should pay more attention to the new python versions and their changes.
Ubuntu Manpage: tiffsplit - split a multi-image TIFF into single-image TIFF files
Installs as libtiff-tools
, basename can be used as prefix.
When joining/adding two paths (as in discrete math union) located in different layers, the resulting path will be located in the layer selected when doing the joining.
.. are recursive! Grouping two groups works; ungrouping them leads the original two groups!
From Multiprocessing vs. Threading in Python: What Every Data Scientist Needs to Know
Processes: instances of a program being executed; don’t share memory space
Threads: components of a process that run in parallel; share memory, variables, code etc.
Race Condition: “A race condition occurs when multiple threads try to change the same variable simultaneously.” (Basically - when order of execution matters)
Starvation: “Starvation occurs when a thread is denied access to a particular resource for longer periods of time, and as a result, the overall program slows down.”
Deadlock: A deadlock is a state when a thread is waiting for another thread to release a lock, but that other thread needs a resource to finish that the first thread is holding onto.
Livelock : Livelock is when threads keep running in a loop but don’t make any progress.
In CPython, the Global Interpreter Lock (GIL) is a (mutex) mechanism to make sure that two threads don’t write in the same memory space.
Basically “for any thread to perform any function, it must acquire a global lock. Only a single thread can acquire that lock at a time, which means the interpreter ultimately runs the instructions serially.” Therefore, python multithreading cannot make use of multiple CPUs; multithreading doesn’t help for CPU-intensive tasks, but does for places where the bottleneck is elsewhere - user interaction, networking, etc. Multithreading works for places w/o user interaction and other bottlenecks where the tasks are CPU-bound, like doing stuff with numbers.
Tensorflow uses threading for parallel data transformation; pytorch uses multiprocessing to do that in the CPU.
TODO - why does Tensorflow do that?
Python has two libraries, multithreading
and multiprocessing
, with very similar syntax.
Both pictures from the same article above1:
From Python Multi-Threading vs Multi-Processing | by Furqan Butt | Towards Data Science:
Concurrency is essentially defined as handling a lot of work or different units of work of the same program at the same time.
Doing a lot of work of the same program at the same time to speed up the execution time.
Parallelism has a narrower meaning.
concurrent.futures
for multithreading and multiprocessingMultithreading:
import concurrent.futures
with concurrent.futures.ThreadPoolExecutor() as executor:
executor.map(function_name, iterable)
This would create a thread for each element in iterable
.
Multiprocessing works in an extremely similar way:
import concurrent.futures
with concurrent.futures.ProcessPoolExecutor() as executor:
executor.map(function_name, iterable)
More about it, as usual, in the docs:
The asynchronous execution can be performed with threads, using ThreadPoolExecutor, or separate processes, using ProcessPoolExecutor. Both implement the same interface, which is defined by the abstract Executor class. 2
Does concurrent.futures
have any tradeoffs compared to doing multiprocessing.Pool()
like the following?
pool = multiprocessing.Pool()
pool.map(multiprocessing_func, range(1,10))
pool.close()
Parallelising Python with Threading and Multiprocessing | QuantStart has a nice point:
time python thread_test.py
real 0m2.003s
user 0m1.838s
sys 0m0.161s
Both user
and sys
approximately sum to the real
time. => No parallelization (in the general case). After they use multiprocessing, two processes, real
time drops by two, while user
/sys
time stays the same. So time on CPU per second is the same, but we have two CPUs that we use, and we get real time benefits.
time
output:Excellent article, copying directly: Where’s your bottleneck? CPU time vs wallclock time
real
: the wall clock time.
user
: the process CPU time.
sys
: the operating system CPU time due to system calls from the process.
In this case the wall clock time was higher than the CPU time, so that suggests the process spent a bunch of time waiting (58ms or so), rather than doing computation the whole time. What was it waiting for? Probably it was waiting for a network response from the DNS server at my ISP.
Important: If you have lots of processes running on the machine, those other processes will use some CPU.
Directly copypasting from the article above, “CPU” here is “CPU Time” (so user
in the output of the command), second is “real” (=wall; real-world) time.
If this is a single-threaded process:
If this is a multi-threaded process and your computer has N CPUs and at least N threads, CPU/second can be as high as N.
def thread_task(lock):
"""
task for thread
calls increment function 100000 times.
"""
for _ in range(100000):
lock.acquire()
increment()
lock.release()
This is the script of the DOCTOR program for ELIZA: eliza/doctor.txt at master · wadetb/eliza
The -L option can be specified multiple times within the same command. Every time with different ports. 1
Here’s an example:
ssh me@remote_server -L 8822:REMOTE_IP_1:22 -L 9922:REMOTE_IP_2:22
And an even better solution from there, adding this to ~/.ssh/config
Host port-forwarding
Hostname remote_server
User me
LocalForward 6007 localhost:6007
LocalForward 6006 localhost:6006
Port 10000
and then just do ssh pf
!
A list of all colors in latex supported via the various packages: color - Does anyone have a newrgbcolor{colourname}{x.x.x} list? - TeX - LaTeX Stack Exchange
Pressing <Ctrl-C>
in a Terminal where jupyter-notebook is running will show a list of running kernels/notebooks, which will include the token:
1 active kernel
Jupyter Notebook 6.2.0 is running at:
http://localhost:6007/?token=3563b961b19ac50677d86a0952c821c2396c0255e97229bc
or http://127.0.0.1:6007/?token=3563b961b19ac50677d86a0952c821c2396c0255e97229bc
Nice description: Measuring Object Detection models - mAP - What is Mean Average Precision?
TL;DR a way to uniformly calculate results of object detection over an entire dataset, accounding for different thresholds (“my 50% confidence is your 80%). We get such thresholds that recall is 0.1, 0.2, …, 1.0 and then measure precision at these points; take the mean.
A bit more details: rafaelpadilla/Object-Detection-Metrics: Most popular metrics used to evaluate object detection algorithms.