Table of Contents
memory_profiler provides detailed memory usage measurements, with the aim of efficiently reducing memory consumption and optimizing memory usage to improve application performance..
⚠️ Before starting using this tool, it is important to mention the impact on the execution speed of the code while it is in use, since the execution time increases dozens of times, it is convenient to profiling time and CPU beforehand, and use
line_profiler only on sections identified to be optimized.
@profile decorator indicates the method to be evaluated.
However, analyzing the memory usage in Python is complicated, even if we consider that the garbage collector is freeing memory, we do not know for sure when a section of memory is freed and when an object is going to be released, or how long it will remain in the pool until it is finally disposed. Many times, it is convenient to handle methods that consume more memory, and not repeatedly allocate memory because it is an expensive process, and it is more efficient than analyzing the memory usage of the code line by line.
Using the sorting algorithm as an experiment, we perform the following exercise:
import numpy as np @profile def bubble_sorting(array): n = len(array) for i in range(n): sorted_item = True for j in range(n - i - 1): if array[j] > array[j + 1]: array[j], array[j + 1] = array[j + 1], array[j] sorted_item = False if sorted_item: break return array work_array = np.random.randint(1, 100000, size=10000) bubble_sorting(work_array)
In the following instruction, I use
gtime -p along with
memory_profiler to measure the execution time, but it is not required.
alexis@MacBook-Pro % gtime -p python -m memory_profiler main.py Filename: main.py Line # Mem usage Increment Occurrences Line Contents ============================================================= 3 56.859 MiB 56.859 MiB 1 @profile 4 def bubble_sorting(array): 5 56.859 MiB 0.000 MiB 1 n = len(array) 6 7 56.906 MiB -254060.531 MiB 9909 for i in range(n): 8 56.906 MiB -254060.531 MiB 9909 sorted_item = True 9 10 56.906 MiB -874942713.984 MiB 50000814 for j in range(n - i - 1): 11 56.906 MiB -874688626.062 MiB 49990905 if array[j] > array[j + 1]: 12 56.906 MiB -287907579.547 MiB 25155548 array[j], array[j + 1] = array[j + 1], array[j] 13 56.906 MiB -287907586.656 MiB 25155548 sorted_item = False 14 15 56.906 MiB -254101.422 MiB 9909 if sorted_item: 16 16.016 MiB -40.891 MiB 1 break 17 18 16.016 MiB 0.000 MiB 1 return array real 1855.21 user 1046.50 sys 808.82
There are two important things to note here: first, the output of
memory_profiler is very detailed line by line, which is what we are looking for when using this tool; and, second, the elapsed time is approximately 30 minutes, which is a considerable time compared to the original time (~10 seconds). These metrics should give us an indication of how and when to use this tool.
memory_profiler includes the
mprof command (requires
matplotlib to be installed), which allows you to visualize changes in memory usage, for example:
alexis@MacBook-Pro PythonProfiling % mprof run main.py mprof: Sampling memory every 0.1s running new process running as a Python program... alexis@MacBook-Pro PythonProfiling % mprof plot Using last profile data.
PySpy is a great tool that helps us to evaluate processes that are already running, without the need to stop and configure them beforehand. For this reason, it requires additional privileges to evaluate other processes. It is especially useful for running experiments in environments such as Production or UAT.
py-spy package must be installed with PIP or Conda, depending on the local Python installation, and the command it includes is
py-spy. To evaluate a running process it is important to obtain the Process Identifier (PID), for example:
alexis@MacBook-Pro % ps -A PID TTY TIME CMD ... 36588 ?? 0:51.08 /Users/alexis/miniforge3/bin/python /Users/alexis/PythonProfiling/main.py ...
C:\Users\alexis>tasklist Image Name PID Session Name Session# Mem Usage ========================= ======== ================ =========== ============ ... python3.10.exe 18212 Console 1 47,460 K ...
PySpy has 4 subcommands:
USAGE: py-spy <SUBCOMMAND> OPTIONS: -h, --help Prints help information -V, --version Prints version information SUBCOMMANDS: record Records stack trace information to a flamegraph, speedscope or raw file top Displays a top like view of functions consuming CPU dump Dumps stack traces for a target program to stdout help Prints this message or the help of the given subcommand(s)
Example of subcommand
alexis@MacBook-Pro % sudo env "PATH=$PATH" py-spy dump --pid 36588 Process 36588: /Users/alexis/miniforge3/bin/python /Users/alexis/PythonProfiling/main.py Python v3.9.13 (/Users/alexis/miniforge3/bin/python3.9) Thread 0x100FDC580 (active+gil): "MainThread" bubble_sorting (main.py:11) <module> (main.py:22)
C:\Users\alexi>py-spy dump --pid 18212 Process 18212: python main.py Python v3.9.12 (C:\Users\alexi\anaconda3\python.exe) Thread 2168 (active+gil): "MainThread" bubble_sorting (main.py:12) <module> (main.py:21)
Example of subcommand
top, it refreshes the information once per second:
alexis@MacBook-Pro % sudo env "PATH=$PATH" py-spy top --pid 36588 Collecting samples from '/Users/alexis/miniforge3/bin/python /Users/alexis/PythonProfiling/main.py' (python v3.9.13) Total Samples 1200 GIL: 100.00%, Active: 100.00%, Threads: 1 %Own %Total OwnTime TotalTime Function (filename:line) 38.00% 38.00% 5.86s 5.86s bubble_sorting (main.py:12) 52.00% 52.00% 5.65s 5.65s bubble_sorting (main.py:11) 9.00% 9.00% 0.420s 0.420s bubble_sorting (main.py:10) 1.00% 1.00% 0.070s 0.070s bubble_sorting (main.py:13) 0.00% 100.00% 0.000s 12.00s <module> (main.py:22)
C:\Users\alexi>py-spy top --pid 18212 Collecting samples from 'python main.py' (python v3.9.12) Total Samples 400 GIL: 100.00%, Active: 100.00%, Threads: 1 %Own %Total OwnTime TotalTime Function (filename:line) 58.00% 58.00% 2.27s 2.27s bubble_sorting (main.py:12) 36.00% 36.00% 1.51s 1.51s bubble_sorting (main.py:11) 5.00% 5.00% 0.160s 0.160s bubble_sorting (main.py:10) 1.00% 1.00% 0.060s 0.060s bubble_sorting (main.py:13) 0.00% 100.00% 0.000s 4.00s <module> (main.py:21)
Example of subcommand
record, it generates an image file with extension
.svg summarizing the experiment:
alexis@MacBook-Pro % sudo env "PATH=$PATH" py-spy record --pid 36588 py-spy> Sampling process 100 times a second. Press Control-C to exit. ^C py-spy> Stopped sampling because Control-C pressed py-spy> Wrote flamegraph data to '36588-2022-08-06T08:00:04-05:00.svg'. Samples: 24397 Errors: 0
C:\Users\alexi>py-spy record --pid 18212 --output profile.svg py-spy> Sampling process 100 times a second. Press Control-C to exit. py-spy> Stopped sampling because Control-C pressed py-spy> Wrote flamegraph data to 'profile.svg'. Samples: 603 Errors: 0
Understanding how the executing bytecode behaves, beyond the Python code that we have analyzed, can gives us an idea of how things work internally and why some pieces of code work better than others, and the
dis module is the tool that includes the default distribution of Python that we have for this purpose.
In : import dis In : import main In : dis.dis(main.bubble_sorting) 5 0 LOAD_GLOBAL 0 (len) 2 LOAD_FAST 0 (array) 4 CALL_FUNCTION 1 6 STORE_FAST 1 (n) 7 8 LOAD_GLOBAL 1 (range) 10 LOAD_FAST 1 (n) 12 CALL_FUNCTION 1 14 GET_ITER >> 16 FOR_ITER 96 (to 114) 18 STORE_FAST 2 (i) 8 20 LOAD_CONST 1 (True) 22 STORE_FAST 3 (sorted_item) 10 24 LOAD_GLOBAL 1 (range) 26 LOAD_FAST 1 (n) 28 LOAD_FAST 2 (i) 30 BINARY_SUBTRACT 32 LOAD_CONST 2 (1) 34 BINARY_SUBTRACT 36 CALL_FUNCTION 1 38 GET_ITER >> 40 FOR_ITER 62 (to 104) 42 STORE_FAST 4 (j) 11 44 LOAD_FAST 0 (array) 46 LOAD_FAST 4 (j) 48 BINARY_SUBSCR 50 LOAD_FAST 0 (array) 52 LOAD_FAST 4 (j) 54 LOAD_CONST 2 (1) 56 BINARY_ADD 58 BINARY_SUBSCR 60 COMPARE_OP 4 (>) 62 POP_JUMP_IF_FALSE 40 12 64 LOAD_FAST 0 (array) 66 LOAD_FAST 4 (j) 68 LOAD_CONST 2 (1) 70 BINARY_ADD 72 BINARY_SUBSCR 74 LOAD_FAST 0 (array) 76 LOAD_FAST 4 (j) 78 BINARY_SUBSCR 80 ROT_TWO 82 LOAD_FAST 0 (array) 84 LOAD_FAST 4 (j) 86 STORE_SUBSCR 88 LOAD_FAST 0 (array) 90 LOAD_FAST 4 (j) 92 LOAD_CONST 2 (1) 94 BINARY_ADD 96 STORE_SUBSCR 13 98 LOAD_CONST 3 (False) 100 STORE_FAST 3 (sorted_item) 102 JUMP_ABSOLUTE 40 15 >> 104 LOAD_FAST 3 (sorted_item) 106 POP_JUMP_IF_FALSE 16 16 108 POP_TOP 110 JUMP_ABSOLUTE 114 112 JUMP_ABSOLUTE 16 18 >> 114 LOAD_FAST 0 (array) 116 RETURN_VALUE
Clearly, this output is the disassembly of the code being executed. When interpreting it we can observe the following details:
- The first column corresponds to the equivalent line number in the Python code that you are referring to.
- In the second column, only some lines contain
>>, which corresponds to the redirects and moves in the Python code.
- The remaining columns correspond to the equivalent lines of the disassembled code.
The tools for profiling an application are now available, but there are several thoughts that we must take into account. Here are some recommendations:
- CPU Overclocking and BIOS-based Accelerators, as Eco Modes and Battery Optimizing, can affect the real performance of the code under testing. It is recommended to disable them.
- Telemetry Tools, Anti-Viruses, or Cloud-based File Sharing Systems should be disabled during the experiments.
- Perform several measurements to obtain valid statistics could help us to confirm the expected behavior of the changes, and confirm the hypothesis.
- Run experiments in a separated branches from the main repository, and keep the branches that reported code performance improvements
- Consider using PySpy and Pyinstrument for Web-based applications in environments where it is not possible to make changes to code.
If you think this content was helpful, consider buy me a cup of coffee. 😉☕️👉🏼