Multiprocessing basic concepts

In this article I'm going to explain basic principles standing behind programming using processes in python.
Level 1 - computer and CPU
What is CPU

To get familiar with multiprocessing we have to start from computer and CPU definition.
Computer is a device that computes instructions, saves results and interacts with other devices. The core component of computing power in computers is CPU.
CPU stands for central processing unit. Main purpose of the CPU is to execute instructions. Each CPU has its own set of instructions defined by manufacturer. Transferring a sequence of zeroes and ones as input to this device makes CPU to do some work and produce output as other sequence of zeroes and ones. This is what we call programming
. On the low level CPU understands only electronic signals such as 1
(voltage passed) and 0
(no voltage).
Executing instructions
CPU fetches operation codes from memory(RAM) and executes one by one.
Registers memory
CPU has special cells where you can store temporary data. They are called registers
. Register is a special memory cell located directly inside CPU allowing to store and fetch some data very fast. Programs running on a CPU always use registers to deal with some data. For example, register called Command Pointer
or CP
stores pointer to currently running command. Other register called Accumulator
stores result of mathematical computations. There are also other kinds of registers presented in the CPU.
I/O devices
Usually you can attach various devices to the CPU in order to increase your capabilities:
* keyboard - allows to write data in human convenient way without need to turn on/off voltage individually on each CPU/Computer line;
* monitor - allows to see output;
* RAM(Random Access Memory) - allows to store some data in runtime and access it fast. Doesn't save data after turning off the power;
* hard drive disk – allows to store data that will be preserved after turning off the power;

When you turn on your computer, the CPU starts to read commands from the first address available, for example RAM(random access memory).
To run a program we have to load CPU specific commands into RAM(random access memory). CPU fetches commands from RAM one by one and executes them. It's also possible to interact with the devices. For example, you can store data inside HDD(Hard Drive Disk) or show some graphics on a monitor. All such devices have their own interfaces(communication rules). Knowing device interfaces is required to write software interacting with these devices.
Level 2 - operating system
Complexity of writing software
Soon enough you will find that writing a program from scratch to execute on the CPU requires you to write a lot of repeatable and hard code dealing with the devices and resources management.

Hiding complexity under OS
To make development easier you need some kind of a library which can encapsulate the complexity of the computer. It should "operate" our system. It's called Operating System
or OS
. It's loaded when you start your computer. CPU starts reading something RAM, where BIOS already loaded your OS from HDD.
OS provides services which can be used by developers. For example, you don't need to remember how to interact with a specific HDD. Instead the code interacting with HDD is already written in the OS. Operating systems usually provide simple interface for complex things using libraries. For example, to create a file you can call single function defined in the OS provided library(C libraries) which will do everything instead of you. This kind of operations called system calls
. Using system calls you can tell the OS to do specific things like creating a file or printing something to terminal.

Different operating systems have different purposes and features. Key OS you might be familiar with:
- unix based(Linux and MacOS);
- windows;
Unix based operation systems have some agreement on how to provide specific interfaces. That's why MacOS and Ubuntu have some common parts and are able to execute similar programs. And that's why you might hear about linux "distributions" which have one common OS core(linux, unix based), but different set of preinstalled software(ubuntu, arch, gentoo, etc). However you can't run unix program inside windows because windows has different API to run programs. Unix program simply has different structure/definition of a program.
What is process
Operating system defines its own language, abstractions and rules which you should know. The first thing all operating systems provide is an abstraction called process
.

Process data structure is defined and managed by operating system. For example, linux kernel is written in C and processes are defined as C structs. The most common attribute you will see while using the OS is PID(process id). It allows us to separate different processes and identify them using integers. For example, OS starts init
process with PID=1
. This is the parent of all other processes. Running pycharm program also has its own PID. Knowing PID it's possible to close the process and do various operations with it.

Process memory layout
In RAM process represented as a set of special segments.
Usually process is defined using following segments:
- Text segment. Holds actual code(operations);
- Data segment. Holds various variables which size is defined statically.
- Context(registers). Holds information about registers state.
- Heap. Area of memory which allows to dynamically add new memory data(for example using
malloc
in C or creating a class instance in Java, C#, etc); - Stack. Area of memory where we store temporary variables like function arguments and return values.

Child processes
Process can create child process to do some work. Let's imagine you're opening a new tab in chrome browser. Under the hood the browser creates a new child process which starts to render the page. Each child process has his parent's PID.

Quasi-parallelism
At the same time CPU executes other processes. But how it's possible? Old computers were able to execute only one process at a time, users felt that software was running in parallel. That's called quasi-parallel
. The main idea of this is the OS switches each process very fast. Thus, you feel like everything runs in parallel and nothing blocks viewing of your chrome page.
Parallel execution with multiple cores
One CPU core can execute 1 process at a time. Each process has its own memory variables(context) which are stored inside core registers. Nowadays we have CPUs with multiple cores which means we can execute multiple processes at a time. Each core has its own set of registers and is able to run tasks in parallel with other core.
Creating a child process in Unix and Windows
Different operating systems have different systems calls to create a process. Because process management is handled by OS, you have to use libraries provied by OS. Usually they are implemanted in C. Functions from such libraries called systems calls. Some of them can create child process.
Unix | Windows |
---|---|
fork() | CreateProcess() |
clone() |
We will take a closer look at using fork
system call later. For now keep in mind that creating a child process in different operating systems requires you to call different functions/methods due to differences in API. These system calls have differences in how these processes spawned.
Level 3 - Python
Unix. Fork a process using python
On different operating systems creating a process is done via using system calls to OS core.
On Unix based operation system programmers have access to a system call "fork" which allows to make a copy of currently running process. New process can be picked up by OS and run in parallel. System call usually refers to a function provided by OS. Child process starts execution from the line where we called fork()
. It has the same code and variables. For performance reason fork
uses copy on write
mechanism. It means that data is located inside the same region of memory, but if you change something, the process copies that data into a new region.
We can fork a process using such code in python:
import os
def calculate_fib(goal_index: int) -> int:
# Function to be run in paralllel process
print('Calculating fib sequence: ', goal_index)
pid = os.getpid()
print('Running in process with ID: ', pid)
if goal_index <= 2:
return 1
previous, current = 1, 1
for _ in range(2, goal_index):
previous, current = current, previous + current
print(f"Process {pid} completed with result: {current}")
return current
# make system call to create a copy of processs
new_pid = os.fork()
# Inside child process OS sets new_pid to 0
if new_pid == 0:
calculate_fib(500)
# Inside parent process new_pid equals to actual child pid
else:
current_pid = os.getpid()
child_pid = new_pid
print(f"current_pid: {current_pid}, child: {child_pid}")
What happens here? Look at the video below.
Calling os.fork
creates a separate process with almost exact variables existied in previous process. They share the same memory till we change something. Inside child process our OS returns new_pid
as 0, which means we are in the child process. Inside parent process new_pid
equals to child's actual pid. Execution of both processes continues. The first process completes execution and goes away. The second process is still executing. Now process 2 doesn't have link to parent process. OS decides to reassign this link to init process with PID 1. Process with PID 1 is a process create by OS when you start your computer(unix based OS).
What if we want to wait till the second process finishes and then do some work in the first process? To do that you can use os.waitpid
# make system call to create a copy of processs
new_pid = os.fork()
# Inside child process OS sets new_pid to 0
if new_pid == 0:
calculate_fib(500)
# Inside parent process new_pid equals to actual child pid
else:
current_pid = os.getpid()
child_pid = new_pid
print(f"current_pid: {current_pid}, child: {child_pid}")
# Sleep untill child process is finished
os.waitpid(new_pid, 0)
# Do other work with results here
Forking is used in a big variety of web/application servers. You will see that many servers have option to use pre-fork
model which means they create processes via fork
before processing web requests.

Prefork is also used in celery as option to run worker.

Windows. Using CreateProcess() system call
Forking a process works only on unix based OS and has limitations. It doesn't work on windows. Python doesn't provide direct interface to call CreateProcess
, but it calls it under the hood in C code.
You should understand this concept of forking a process, but usually you will be spawning new processes using Process
class. Python hides all complexity of running processes on different platforms. Under the hood your python interpreter will choose a strategy to create a process based on your platform.
Python checks current OS platform.
For windows python uses WINAPI CreateProcess
function to create a process.

For unix it uses fork
If you want to write platform independent code, you should create processes using Process
class. Thus, your code will be working on both windows
and unix
based OS.
import os
from multiprocessing import Process
def calculate_fib(goal_index: int) -> int:
print('Calculating fib sequence: ', goal_index)
pid = os.getpid()
print('Running in process with ID: ', pid)
if goal_index <= 2:
return 1
previous, current = 1, 1
for _ in range(2, goal_index):
previous, current = current, previous + current
print(f"Process {pid} completed with result: {current}")
return current
if __name__ == '__main__':
# Create python class instance representing process
child_process = Process(name='calculate_fib', target=calculate_fib, args=(500,))
# Start the process. From this moment
# you can see a new process running in OS.
# For Windows use WINAPI CreateProcess;
# For Unix use fork;
child_process.start()
# Main process will wait for execution of the process spawned
child_process.join()
Python can use several methods to spawn a process called context
or start_method
:

However the bottomline is that in unix OS python uses fork
, on windows CreatProcess
system calls.

You can change start method using set_start_method
function in multiprocessing module.
import multiprocessing as mp
def foo(q):
q.put('hello')
if __name__ == '__main__':
mp.set_start_method('spawn')
q = mp.Queue()
p = mp.Process(target=foo, args=(q,))
p.start()
print(q.get())
p.join()
Daemonic process and join method
You can pass daemon=True
argument when starting a process. This will force the child process to be killed by parent when parent completes execution.

Join method to wait for children
Using join
method you can wait for child processes to exit. By default for non-daemonic processes python calls .join
automatically.

Inter Process Communication(IPC)
To utilize all power of processes your OS provides several tools how two different processes can communicate with each other. This set of tools is called IPC. Python uses IPC to give a good API for communicating with processes:
- communication using messages(queues);
- communication using pipes;
- communication using shared memory;
Let's take a closer look at this.
Messages(Queues) in python
Queue is a simple FIFO data structure shared between processes. These processes can read from queue and write into the queue. This object is thread
and process safe
.

Pipes in python
Pipe represents a bidirectional channel between two processes. Each part of the pipe can be used to receive or retrieve data.
If duplex is True
(the default) then the pipe is bidirectional. If duplex is False
then the pipe is unidirectional: conn1
can only be used for receiving messages and conn2
can only be used for sending messages.

Shared Memory in python
Processes communication can be established using shared memory model. To do that these processes have to agree on what memory segment to share. Usually OS doesn't allow separate processes to enter other process memory.

In terms of memory two processes map some region of their memory to a shared block of memory. This is done using OS.

Shared memory manager server
Python can create a separate server process which will manage shared memory. This is more flexible way to manage shared memory. This server process can be running on a separate machine. Thus, you can achieve process communication between processes located on different machines.

Race conditions
Race condition is a situation when two concurrent processes are trying to access shared data and the data is get corrupted due to the concurrent nature of the processes.
Let's take a look at code below:
from multiprocessing import Process, Value
def worker_1(a):
for i in range(10_000):
a.value += 1
def worker_2(a):
for i in range(10_000):
a.value += 1
if __name__ == '__main__':
a = Value('i', 0, lock=False)
process_1 = Process(target=worker_1, args=(a,))
process_2 = Process(target=worker_2, args=(a,))
process_1.start()
process_2.start()
# With shared memory you call .join explicitly
process_1.join()
process_2.join()
print(a)
At the code above we have shared integer a
with initial value 0
. Two processes are trying to increment this value 10 000
times each. What will happen? The variable should equal 20 000

As you can see on the gif above, our code sometimes produce incorrect value. But why? Because += 1
is not atomic operation.
Python disassembles += 1
into smaller steps such as: load 1, load a, add 1 to a, set a to result.
def worker_1(a):
a.value += 1
print(dis.dis(worker_1))
# 6 0 LOAD_FAST 0 (a)
# 2 DUP_TOP
# 4 LOAD_ATTR 0 (value)
# 6 LOAD_CONST 1 (1)
# 8 INPLACE_ADD
# 10 ROT_TWO
# 12 STORE_ATTR 0 (value)
# 14 LOAD_CONST 0 (None)
# 16 RETURN_VALUE
Now let's imagine that after bytecode LOAD_CONST
(1) our process starts sleeping due to OS scheduler. Another process has already incremented this integer. Then our first process wakes up and sums up old values, then overrides result value in shared memory. Thus, we have corrupted data.
How to solve this? Use locks. For Value
you can pass lock=True
, which will lock this variable and will not allow another process to increment the variable at the same time. However be aware that this increase time required to finish execution.
a = Value('i', 0, lock=True) # lock=True passed by default
Process and thread safe objects
When two concurrently running processes are trying to use shared object it might lead to undefined behaviour. Objects that have some kind of mechanism to prevent undefined behaviour called process/thread safe
.
For example, in django each process/thread processing a request has db connection
object. Inspecting transaction
source code shows a comment describing that transaction
is thread-safe
. It means that it can't a reason for race condition.

So, in simple words process/thread safety refers to a shared object which can cause race condition.
Pool of processes
Pool is very common design pattern in concurrent world. Parent process creates several processes called workers. These workers are listening for a message inside queue. When one of the workers get the message it starts executing it.
Pool allows easily to distribute work across multiple units.
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
with Pool(5) as p:
print(p.map(f, [1, 2, 3]))
This is also called producer-consumer model
(pattern) . The model(pattern) itself is being used in a big variety of tools such as:
- celery;
- AWS Lambda;
- AWS SNS;
Sequential and asynchronous models of programming
Sequential model
In regular python programming you think using sequential model. It means that every operation you've written is likely executing sequentially. Let's take an example. Consider that you have a function run_timer
which prints something after several second when timer expires.
import time
def run_timer(seconds: int) -> None:
time.sleep(seconds)
print(f'Timer triggered after {seconds} seconds!')
def calculate_something():
print('calculating something')
if __name__ == '__main__':
run_timer(5) # <- this operation blocks main process
calculate_something()
When you run this command you will see that run_timer
function blocks your execution of the script for 5 second.

Asynchronous model
In asynchronous model run_timer
will not block the execution. In nodejs you will face asynchronous model of execution, however in python python interpreter uses sequential model.
asynchronous model can be achieved using a separate process. When you run the code below you will see that running timer don't block main flow. This is what we call asynchronous model.
import time
from multiprocessing import Process
def run_timer(seconds: int) -> None:
time.sleep(seconds)
print(f'Timer triggered after {seconds} seconds!')
def calculate_something():
print('calculating something')
if __name__ == '__main__':
run_timer_5_process = Process(target=run_timer, args=(5,))
run_timer_10_process = Process(target=run_timer, args=(10,))
print('calling run_timer with 5 seconds')
run_timer_5_process.start() # <- this operation doesn't block main process
print('calling run_timer with 10 seconds')
run_timer_10_process.start() # <- this operation doesn't block main process
print('we are not blocked, timer is ticking concurrently')
calculate_something()

After starting timer function our main flow is not blocked. That's why we call it asynchronous model.
You can notice a lot of methods inside multiprocessing module like:
pool.apply_async
;queue.get_nowait
;queue.put_nowait
;
They have the same meaning – they are not blocking the main flow of the program(asynchronous).
Further reading and watching
- Python multiprocessing docs (docs.python.org);
- Operating Systems youtube playlist (youtube.com);
- Concurrency in python (realpython.com);
- Forking in linux (youtube.com playlist).