Skip to content

Commit 79424a1

Browse files
committed
up
1 parent 811ab28 commit 79424a1

File tree

6 files changed

+58
-51
lines changed

6 files changed

+58
-51
lines changed
400 KB
Loading

docs/assets/task_clock_median.png

399 KB
Loading
305 KB
Loading

docs/plots_v2.ipynb

Lines changed: 24 additions & 20 deletions
Large diffs are not rendered by default.

docs/report.md

Lines changed: 34 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,12 @@ header-includes:
2525
- \lstset{basicstyle=\fontsize{8pt}{8pt}\selectfont\ttfamily}
2626
# code output
2727
- \DefineVerbatimEnvironment{verbatim}{Verbatim}{fontsize=\fontsize{8pt}{8pt}}
28+
# tables
29+
- \usepackage{etoolbox}
30+
- \AtBeginEnvironment{longtable}{\footnotesize}
31+
- \AtBeginEnvironment{tabular}{\footnotesize}
32+
- \setlength{\LTleft}{0pt}
33+
- \setlength{\LTright}{0pt}
2834
---
2935

3036
<!--
@@ -121,37 +127,34 @@ Extending CPython has neglible to no overhead and allows to share large chunks o
121127

122128
# 3. Results
123129

124-
We achieved a ~6x speedup of the plain python implementation, even marginally outperforming the `hashlib.sha1` library.
125-
126-
![Performance Overview](docs/assets/perf.png){ width=100% }
127-
128-
|command | mean| stddev| median| user| system| min| max|
129-
|:----------------------------------|---------:|---------:|---------:|---------:|---------:|---------:|---------:|
130-
|plain: itertools.py | 0.5674692| 0.0119883| 0.5700655| 0.5599028| 0.0074184| 0.5496380| 0.5865690|
131-
|plain: lib.py | 0.1003698| 0.0026018| 0.0995592| 0.0950988| 0.0051505| 0.0976092| 0.1086287|
132-
|plain: plain.py | 0.5631182| 0.0085463| 0.5607433| 0.5574100| 0.0056163| 0.5564152| 0.5863659|
133-
|multiprocessing: imap_unordered.py | 0.2258019| 0.0085966| 0.2235880| 0.5456730| 0.1661910| 0.2184310| 0.2449692|
134-
|multiprocessing: imap.py | 0.2328316| 0.0065632| 0.2306093| 0.5554529| 0.1625106| 0.2235672| 0.2426373|
135-
|multiprocessing: map_async.py | 0.4528332| 0.0248580| 0.4485649| 1.0100400| 0.1108107| 0.4314743| 0.5167882|
136-
|multiprocessing: map.py | 0.4400746| 0.0043315| 0.4405771| 0.9853628| 0.1084339| 0.4329375| 0.4467715|
137-
|multithreading: GIL=1 executor.py | 0.3696592| 0.0103798| 0.3658508| 0.3597924| 0.0092238| 0.3621005| 0.3968615|
138-
|multithreading: GIL=0 executor.py | 0.2102704| 0.0104267| 0.2121045| 0.4389796| 0.0105094| 0.1962787| 0.2321854|
139-
|multithreading: GIL=1 workers.py | 0.1304648| 0.0071726| 0.1292655| 0.1178397| 0.0081775| 0.1242261| 0.1582329|
140-
|multithreading: GIL=0 workers.py | 0.1677349| 0.0076839| 0.1685633| 0.1931002| 0.0202853| 0.1429964| 0.1760283|
141-
|ctypes: invoke_hashcat.py | 0.0934947| 0.0031416| 0.0929726| 0.0882496| 0.0049164| 0.0891827| 0.0996272|
142-
|ctypes: invoke_hashcat.py (openmp) | 0.1021338| 0.0056378| 0.1003631| 0.0986943| 0.0083828| 0.0976012| 0.1269725|
143-
|cpython: invoke_hashcat.py | 0.1006056| 0.0043579| 0.0997623| 0.0950439| 0.0052310| 0.0943297| 0.1081794|
144-
145-
Each experiment was conducted with 3 warmup runs using the `hyperfine` library. The following observations can be made:
146-
147-
- The fastest GIL-enabled parallelization using `imap_unordered` achieved a ~2.5x speedup over the plain python implementation.
148-
- The fastest GIL-free parallelization implementation, individually managing threads, managed to shave off another ~0.06s, achieving a ~3.4x speedup over the plain python implementation.
149-
- The overhead induced by OpenMP made it an unviable option for parallelization in our case.
150-
- The ctypes implementation outperformed the extension of CPython directly and achieved a ~6x speedup over the plain python implementation, even marginally outperforming the usage of the `hashlib.sha1` library in plain Python.
151-
152-
Our findings, while insightful, are limited in their generalizability due to the simplicity of the experiments, the overhead introduced by containerization and various implicit assumptions about the underlying hardware and software environment. Nonetheless, this report serves as both a proof of concept and a practical guide for engineering parallelized Python code across different abstraction levels. It highlights the trade-offs between performance, complexity and usability.
153-
154-
Crucially, we demonstrate that significant performance gains can be achieved by leveraging Python's robust ecosystem and its efficient bindings to C libraries. For instance, a mere four lines of Python code using the `hashlib.sha1` library outperformed several of our more intricate C implementations. This underscores the remarkable efficiency of Python's ecosystem and the importance of focusing on high-level optimization strategies—leveraging well-optimized libraries rather than reinventing the wheel.
130+
We beat the `hashlib` standard library by 13.525 ns or 101703681.5 instructions. This was achieved using the `ctypes` library and the CPython-C-API.
131+
132+
![Median instructions per command](docs/assets/instructions_median.png){ width=100% }
133+
134+
![Median task clock per command](docs/assets/task_clock_median.png){ width=100% }
135+
136+
![Median task clock vs. instructions per command](docs/assets/task_clock_vs_instructions.png){ width=100% }
137+
138+
|gil |type |command | instructions (med)| task_clock (med) | user_time (med)| sys_time (med) |
139+
|:-----|:---------------|:--------------------------------------|-------------------:|-----------------:|----------------:|---------------:|
140+
|true |**cpython** |**invoke_hashcat.py (openmp)** | 44442376| 8.760| 0.0090265| 0.0000000|
141+
|true |ctypes |invoke_hashcat.py (openmp) | 80107280| 39.820| 0.0160960| 0.0000000|
142+
|true |ctypes |invoke_hashcat.py | 118592997| 16.110| 0.0162195| 0.0000000|
143+
|true |**plain** |**lib.py (hashlib libary)** | 146146058| 22.285| 0.0222055| 0.0000000|
144+
|false |multithreading |workers.py | 198008716| 39.985| 0.0258195| 0.0113530|
145+
|true |multithreading |workers.py | 1030919157| 106.765| 0.1006565| 0.0111120|
146+
|true |plain |itertools.py | 3945750392| 325.575| 0.3242800| 0.0000000|
147+
|true |plain |improved.py | 3959326962| 322.015| 0.3206755| 0.0000000|
148+
|true |plain |plain.py | 4752510454| 400.205| 0.3996415| 0.0000000|
149+
|true |multiprocessing |imap.py | 6620723294| 1743.685| 1.2987810| 0.4785180|
150+
|true |multiprocessing |imap_unordered.py | 6692752894| 1787.350| 1.3024570| 0.5340625|
151+
|true |multithreading |executor.py | 241741072306| 20136.600| 19.8526395| 0.6276710|
152+
|false |multithreading |executor.py | 241749347062| 63354.890| 63.2586825| 0.0327220|
153+
|true |multiprocessing |map_async.py | 244913585430| 61218.555| 61.0370955| 0.2066270|
154+
|true |multiprocessing |map.py | 245013383854| 61259.295| 61.0844710| 0.2048880|
155+
156+
Disabling the GIL while manually managing the threads and memory using the `multithreading` API also showed very promising results.
157+
155158

156159
# Addendum
157160

docs/report.pdf

1.23 MB
Binary file not shown.

0 commit comments

Comments
 (0)