Understanding Concurrency and Threading in Modern Computing

Mar 09, 2025

When building high-performance applications, it's crucial to understand the underlying mechanics of concurrency, especially when it comes to threads and processes. But have you ever wondered how expensive threads actually are and how to implement concurrency effectively in your programs? 💡

Let's dive in and clear things up! 🧵

💡 How Expensive is a Thread? 💡

When it comes to threads, it’s not just about how many you need, but how much they cost. 🧵

A thread is much more expensive than it might seem. The cost depends on the operating system you're using, as a thread is essentially a wrapper for a kernel thread provided by your OS. 🖥️

👉 Why are threads so expensive?

A kernel thread holds several MB of memory.
It can take milliseconds to create a single thread.
This is why applications try to create threads only when they start and keep them alive as long as possible to reuse them.

💭 This leads to asynchronous, callback-based programming, where we work with just a few threads, enabling scalability and efficiency.

⚡ Enter Virtual Threads ⚡
Virtual threads change the game. They're like regular objects—create them on demand and have as many as you need. Virtual threads offer a simpler, more efficient way of handling concurrency without worrying about the overhead of kernel threads. 🌐

🤔 Thread vs Process 🤔

Now that we’ve touched on threads, it's important to understand the difference between threads and processes:

Thread: A lightweight, smaller unit of a process that shares memory space. Threads are often used for tasks that need to run concurrently but don't require independent memory space.
Process: A heavier, independent unit of execution with its own memory space. Processes are isolated from each other and do not share memory, making them more resource-intensive but more secure and reliable in some cases.

Threads are often used for tasks that need to share data quickly and work in parallel, while processes are used when tasks need to be isolated.

🧑‍💻 Various Ways of Implementing Concurrency in Python 🧑‍💻

Python offers a variety of ways to implement concurrency, but each has its own pros and cons. Here's an overview of common approaches:

Multithreading 🧵
Using Python's threading module, we can create threads that run in parallel. However, due to the Global Interpreter Lock (GIL), Python threads don't run in true parallel for CPU-bound tasks, but they are great for I/O-bound tasks like networking and disk operations.
Multiprocessing 💻
The multiprocessing module allows Python to run processes in parallel, bypassing the GIL. Each process has its own Python interpreter and memory space, making it ideal for CPU-bound tasks that require parallelism.
AsyncIO ⚡
With the asyncio module, Python provides asynchronous programming through coroutines. asyncio is designed for high-level structured network code and is great for tasks that involve waiting for I/O, like web scraping or making API calls. It's non-blocking and allows for efficient concurrency with just a single thread.
Concurrent Futures 🚀
The concurrent.futures module provides a higher-level interface for asynchronous execution. With ThreadPoolExecutor and ProcessPoolExecutor, you can easily execute tasks asynchronously using threads or processes.

✅ Recommended Approach: What to Choose?

I/O-bound tasks: For tasks that spend most of their time waiting for input/output (like reading from a file or web requests), multithreading or asyncio is ideal. These approaches allow concurrent operations without heavy resource consumption.
CPU-bound tasks: When dealing with computationally heavy tasks (like data analysis or image processing), multiprocessing is the best option. It allows Python to take advantage of multiple CPUs, fully utilizing system resources.
For ease of use: If you want to simplify concurrency, concurrent.futures is a great option. It abstracts away many of the complexities of threading and multiprocessing, making it easier to write concurrent programs.

⚡ Conclusion: Virtual Threads & Beyond ⚡

The landscape of concurrency is evolving, and with virtual threads on the horizon, the future looks promising. Virtual threads allow for efficient, lightweight task management without the cost of kernel threads, making scalability much simpler.

In summary:

Threads are lightweight but expensive.
Processes offer isolation but come at a higher resource cost.
Virtual threads offer the best of both worlds for concurrent programming.

By understanding the trade-offs and using the right tool for the job, you can build applications that are both scalable and efficient.

Below are Python examples for each of the concurrency methods we discussed earlier: Multithreading, Multiprocessing, AsyncIO, and Concurrent Futures.

1. Multithreading (Using `threading` module) 🧵

Multithreading is useful for I/O-bound tasks. In Python, threading allows multiple threads to run concurrently.

Example:

import threading
import time

# Function that simulates I/O-bound task (e.g., reading from a file)
def io_task(task_id):
    print(f"Task {task_id} started")
    time.sleep(2)  # Simulate a time-consuming task
    print(f"Task {task_id} completed")

# Create and start threads
threads = []
for i in range(5):
    thread = threading.Thread(target=io_task, args=(i,))
    threads.append(thread)
    thread.start()

# Wait for all threads to complete
for thread in threads:
    thread.join()

print("All tasks completed!")

Explanation:

Multithreading is ideal for I/O-bound tasks because while one thread waits (e.g., for disk or network I/O), other threads can execute.

2. Multiprocessing (Using `multiprocessing` module) 💻

Multiprocessing is ideal for CPU-bound tasks. It creates separate processes with their own memory space and bypasses Python's Global Interpreter Lock (GIL).

Example:

import multiprocessing
import time

# Function to simulate CPU-bound task (e.g., performing calculations)
def cpu_task(task_id):
    print(f"Task {task_id} started")
    result = sum(i * i for i in range(10**6))  # Simulate CPU work
    print(f"Task {task_id} completed with result {result}")

# Create and start processes
processes = []
for i in range(3):
    process = multiprocessing.Process(target=cpu_task, args=(i,))
    processes.append(process)
    process.start()

# Wait for all processes to complete
for process in processes:
    process.join()

print("All tasks completed!")

Explanation:

Multiprocessing is best for CPU-bound tasks, allowing you to fully utilize multiple cores on your machine.

3. AsyncIO (Using `asyncio` module) ⚡

AsyncIO is ideal for tasks that involve waiting for external events (e.g., making web requests). It uses asynchronous programming with async/await.

Example:

import asyncio

# Async function to simulate an I/O task
async def async_task(task_id):
    print(f"Task {task_id} started")
    await asyncio.sleep(2)  # Simulate a time-consuming task (non-blocking)
    print(f"Task {task_id} completed")

# Main function to run the tasks concurrently
async def main():
    tasks = []
    for i in range(5):
        task = asyncio.create_task(async_task(i))
        tasks.append(task)

    # Wait for all tasks to complete
    await asyncio.gather(*tasks)

# Run the event loop
asyncio.run(main())

Explanation:

AsyncIO is designed for I/O-bound operations, allowing many tasks to run concurrently with a single thread.

4. Concurrent Futures (Using `concurrent.futures` module) 🚀

concurrent.futures provides a high-level API for asynchronously executing tasks in threads or processes.

Example (Using ThreadPoolExecutor):

import concurrent.futures
import time

# Function to simulate a task
def task(task_id):
    print(f"Task {task_id} started")
    time.sleep(2)  # Simulate a time-consuming task
    print(f"Task {task_id} completed")

# Use ThreadPoolExecutor for concurrent threads
with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
    # Submit tasks for execution
    future_tasks = [executor.submit(task, i) for i in range(5)]

    # Wait for all tasks to complete
    for future in concurrent.futures.as_completed(future_tasks):
        pass

print("All tasks completed!")

Example (Using ProcessPoolExecutor for CPU-bound tasks):

import concurrent.futures

# Function to simulate a CPU-bound task
def cpu_task(task_id):
    print(f"Task {task_id} started")
    result = sum(i * i for i in range(10**6))  # Simulate CPU work
    print(f"Task {task_id} completed with result {result}")

# Use ProcessPoolExecutor for concurrent processes
with concurrent.futures.ProcessPoolExecutor() as executor:
    # Submit tasks for execution
    future_tasks = [executor.submit(cpu_task, i) for i in range(3)]

    # Wait for all tasks to complete
    for future in concurrent.futures.as_completed(future_tasks):
        pass

print("All tasks completed!")

Explanation:

concurrent.futures.ThreadPoolExecutor simplifies the usage of threads.
concurrent.futures.ProcessPoolExecutor simplifies the usage of processes.

5. Joblib (Bonus - Parallel Processing) 🏃‍♂️

Joblib is a Python library that makes it easy to parallelize code by distributing tasks across multiple processes or cores. It's often used in scenarios where the same function is applied to many different data points, and it's especially popular in data science and machine learning workflows.

Example:

import joblib
import time

# Function to simulate CPU-bound task
def cpu_task(task_id):
    print(f"Task {task_id} started")
    result = sum(i * i for i in range(10**6))  # Simulate CPU work
    print(f"Task {task_id} completed with result {result}")
    return result

# Use Joblib to parallelize the tasks
task_ids = range(3)  # Task IDs from 0 to 2
start_time = time.time()

# Using Parallel and Delayed from Joblib
results = joblib.Parallel(n_jobs=3)(joblib.delayed(cpu_task)(task_id) for task_id in task_ids)

print(f"All tasks completed in {time.time() - start_time} seconds!")

Explanation:

Joblib is used for parallel computing and is a great choice when you need to run multiple tasks in parallel on multiple cores. It automatically handles the parallelization and efficiently uses your system's resources.
Parallel(n_jobs=3) allows you to define how many parallel processes (or jobs) should run at once. You can set it to the number of CPU cores available to fully utilize them.
joblib.delayed is a simple way to wrap a function to enable parallel execution.

Conclusion with Joblib:

Joblib is a great tool for simple parallel processing of CPU-bound tasks. It's often used in machine learning workflows to parallelize operations such as hyperparameter tuning, cross-validation, and large-scale data processing.
If you're doing CPU-intensive tasks and want a simple way to parallelize them, Joblib is an excellent tool to consider.

Final Summary:

Multithreading is for I/O-bound tasks.
Multiprocessing is for CPU-bound tasks.
AsyncIO is for I/O-bound tasks requiring asynchronous handling.
Concurrent Futures offers an easy-to-use interface for managing threads or processes.
Joblib is ideal for parallelizing CPU-bound tasks, especially in data science contexts.

Choose the right concurrency model based on whether your tasks are CPU-bound or I/O-bound, and the scale at which you need to parallelize! 🚀

dayanandchalla’s Substack

Discussion about this post

dayanandchalla’s Substack

Understanding Concurrency and Threading in Modern Computing

🤔 Thread vs Process 🤔

🧑‍💻 Various Ways of Implementing Concurrency in Python 🧑‍💻

✅ Recommended Approach: What to Choose?

⚡ Conclusion: Virtual Threads & Beyond ⚡

1. Multithreading (Using threading module) 🧵

Explanation:

2. Multiprocessing (Using multiprocessing module) 💻

Explanation:

3. AsyncIO (Using asyncio module) ⚡

Explanation:

4. Concurrent Futures (Using concurrent.futures module) 🚀

Explanation:

5. Joblib (Bonus - Parallel Processing) 🏃‍♂️

Explanation:

Conclusion with Joblib:

Final Summary:

Discussion about this post

1. Multithreading (Using `threading` module) 🧵

2. Multiprocessing (Using `multiprocessing` module) 💻

3. AsyncIO (Using `asyncio` module) ⚡

4. Concurrent Futures (Using `concurrent.futures` module) 🚀