It is an excellent language for fast prototyping and code improvement, however one factor individuals usually say about utilizing it’s that it runs gradual. This can be a particular downside for knowledge scientists and ML engineers as they usually carry out computationally intensive operations resembling matrix multiplication, gradient descent calculations, and picture processing.
Over time, Python has advanced internally to handle a few of these points by introducing new options into the language, resembling multi-threading and rewriting current options to enhance efficiency. Nonetheless, Python makes use of world interpreter locking (GIL) usually makes this effort.
Additionally, many exterior libraries have been written to bridge this perceived efficiency hole between compiled languages ​​resembling Python and Java. Maybe probably the most used and well-known of those is numpy library. Carried out in C, Numpy was designed from scratch to assist a number of CPU cores and ultra-fast numeric and array processing.
There may be a substitute for Numpy, and in a latest TDS article, numexpr In lots of use circumstances, libraries are higher than numpy. In case you’re thinking about studying extra, embrace a hyperlink to that story on the finish of this text.
One other exterior library that may be very efficient numba. Numba makes use of the Simply-in-Time (JIT) compiler for Python. This converts a subset of Python and Numpy code into quick machine code at runtime. It’s designed to speed up numerical and scientific computing duties by leveraging LLVM (low-level digital machine) compilation line frastructure.
On this article, I want to focus on exterior libraries that improve one other runtime. Cysten. It is among the most performant Python libraries, however it is usually one of the understood and unused. I feel that is at the very least partially as a result of I must get my palms soiled a bit and make some adjustments to the unique code. Nonetheless, in case you observe the easy four-step plan outlined under, the efficiency advantages you’ll be able to obtain are worthwhile.
What’s Cysten?
In case you’ve by no means heard of Cython, it is a superset of Python designed to offer C-like efficiency, primarily in code written in Python. You’ll be able to convert Python code to C code. This may be compiled right into a shared library that may be imported into Python, similar to common Python modules. This course of presents efficiency advantages for C whereas sustaining Python’s readability.
We’ll present you the precise advantages you’ll be able to obtain by changing your code into utilizing Cython, inspecting three use circumstances, offering 4 steps wanted to transform current Python code, in addition to timing of evaluating every run.
Establishing the event setting
Earlier than persevering with, you could arrange a unique improvement setting for coding to isolate the dependencies of your challenge. I exploit WSL2 Ubuntu for Home windows and use Jupyter notebooks for code improvement. Arrange your improvement setting utilizing the UV bundle supervisor, however be at liberty to make use of the instruments and strategies that swimsuit you.
$ uv init cython-test
$ cd cython-test
$ uv venv
$ supply .venv/bin/activate
(cython-test) $ uv pip set up cython jupyter numpy pillow matplotlib
nowsort “Jupyter Pocket book” To the command immediate. You will note your pocket book open in your browser. If that does not occur robotically, what you are more likely to see is the screen-full data after operating the Jupyter pocket book command. Close to its backside, copy and paste into your browser to begin your jupyter pocket book.
Your URL is totally different from mine, nevertheless it ought to appear like this:-
http://127.0.0.1:8888/tree?token=3b9f7bd07b6966b41b68e2350721b2d0b6f388d248cc69d
Instance 1 – Dashing up the loop
Earlier than you begin utilizing Cython, let’s begin with the same old Python options. It will grow to be our base benchmark.
Code a easy double loop perform that takes just a few seconds to run, pace it up utilizing Cython, and measure the runtime distinction between the 2 strategies.
The baseline commonplace Python code is:
# sum_of_squares.py
import timeit
# Outline the usual Python perform
def slow_sum_of_squares(n):
complete = 0
for i in vary(n):
for j in vary(n):
complete += i * i + j * j
return complete
# Benchmark the Python perform
print("Python perform execution time:")
print("timeit:", timeit.timeit(
lambda: slow_sum_of_squares(20000),
quantity=1))
On my system, the above code produces the next output:
Python perform execution time:
13.135973724005453
Let’s have a look at how a lot Cython has improved.
A four-stage plan for efficient Cython use.
Utilizing Cython to spice up code execution time in Jupyter notes is a straightforward four-step course of.
Don’t be concerned in case you’re not a pocket book person. I will present you tips on how to convert an everyday Python .py file and use Cython later.
1/Within the first cell of the pocket book, enter this command to load the Cython extension.
%load_ext Cython
2/For subsequent cells containing Python code that you simply wish to run utilizing Cython, I will add it %% Cython Magic command earlier than the code. for instance,
%%cython
def myfunction():
and so on ...
...
3/The perform definition containing the parameters have to be entered accurately.
4/Lastly, all variables have to be entered correctly cdef Order. Additionally, if it is smart, use capabilities from the usual C library (can be utilized in Cython) From libc.stdlib ).
Taking the unique Python code for example, this could make it look able to run in a pocket book utilizing Cython after making use of all 4 steps above.
%%cython
def fast_sum_of_squares(int n):
cdef int complete = 0
cdef int i, j
for i in vary(n):
for j in vary(n):
complete += i * i + j * j
return complete
import timeit
print("Cython perform execution time:")
print("timeit:", timeit.timeit(
lambda: fast_sum_of_squares(20000),
quantity=1))
As you’ll be able to see, the truth of changing your code is far simpler than the 4 procedural steps required counsel.
The runtime within the code above was spectacular. On my system, this new Cython code produces the next output:
Cython perform execution time:
0.15829777799808653
This accelerates by over 80 occasions.
Instance 2 – Calculating the PI utilizing Monte Carlo
The second instance examines extra complicated use circumstances. At its basis there are lots of actual functions.
The world the place Cython can reveal important efficiency enhancements are in numerical simulations, notably people who embrace heavy calculations resembling Monte Carlo (MC) simulations. Monte Carlo simulation entails performing many iterations of a random course of to estimate the properties of the system. MC applies to quite a lot of analysis areas, together with local weather and atmospheric science, pc graphics, AI search, and quantitative funding. It is a very computationally intensive course of most often.
As an instance, we use Monte Carlo in a simplified option to calculate the worth of Pi. This can be a well-known instance. On this instance, we take a sq. of the size of 1 unit and engrave 1 / 4 circle with the radius of 1 unit, as proven right here.
The ratio of sq. space of ​​the quarter circle space is clearly (Pi/4).
Due to this fact, for the reason that complete variety of these factors tends to be infinite, contemplating many random (x,y) factors all inside or inside a sq. boundary, the ratio of factors and complete variety of factors on or inside a quarterly circle tends to be pi/4. Subsequent, multiply this worth by 4 to get the worth of Pi itself.
Right here is typical Python code that can be utilized to mannequin this:
import random
import time
def monte_carlo_pi(num_samples):
inside_circle = 0
for _ in vary(num_samples):
x = random.uniform(0, 1)
y = random.uniform(0, 1)
if (x**2) + (y**2) <= 1:
inside_circle += 1
return (inside_circle / num_samples) * 4
# Benchmark the usual Python perform
num_samples = 100000000
start_time = time.time()
pi_estimate = monte_carlo_pi(num_samples)
end_time = time.time()
print(f"Estimated Pi (Python): {pi_estimate}")
print(f"Execution Time (Python): {end_time - start_time} seconds")
After I did this, I generated the next timing outcomes:
Estimated Pi (Python): 3.14197216
Execution Time (Python): 20.67279839515686 seconds
Now, right here is the Cython implementation that may be obtained by following a four-step course of.
%%cython
import cython
import random
from libc.stdlib cimport rand, RAND_MAX
@cython.boundscheck(False)
@cython.wraparound(False)
def monte_carlo_pi(int num_samples):
cdef int inside_circle = 0
cdef int i
cdef double x, y
for i in vary(num_samples):
x = rand() / <double>RAND_MAX
y = rand() / <double>RAND_MAX
if (x**2) + (y**2) <= 1:
inside_circle += 1
return (inside_circle / num_samples) * 4
import time
num_samples = 100000000
# Benchmark the Cython perform
start_time = time.time()
pi_estimate = monte_carlo_pi(num_samples)
end_time = time.time()
print(f"Estimated Pi (Cython): {pi_estimate}")
print(f"Execution Time (Cython): {end_time - start_time} seconds")
And this is the brand new output.
Estimated Pi (Cython): 3.1415012
Execution Time (Cython): 1.9987852573394775 seconds
Once more, it is 10 occasions sooner than the Cython model.
One of many issues we did on this instance code is to import an exterior library from the C commonplace library. That was the road,
from libc.stdlib cimport rand, RAND_MAX
cimport Instructions are Cython key phrases used to import C capabilities, variables, constants, and kinds. I used it to import the optimized C language model equal random() Python perform.
Instance 3—Picture manipulation
Within the final instance, we carry out picture manipulation. Particularly, picture convolution is a standard operation for picture processing. There are various use circumstances for picture convolution. Use this to sharpen the marginally blurry picture proven under.

First, there’s regular Python code.
from PIL import Picture
import numpy as np
from scipy.sign import convolve2d
import time
import os
import matplotlib.pyplot as plt
def sharpen_image_color(picture):
# Begin timing
start_time = time.time()
# Convert picture to RGB in case it is not already
picture = picture.convert('RGB')
# Outline a sharpening kernel
kernel = np.array([[0, -1, 0],
[-1, 5, -1],
[0, -1, 0]])
# Convert picture to numpy array
image_array = np.array(picture)
# Debugging: Verify enter values
print("Enter array values: Min =", image_array.min(), "Max =", image_array.max())
# Put together an empty array for the sharpened picture
sharpened_array = np.zeros_like(image_array)
# Apply the convolution kernel to every channel (assuming RGB picture)
for i in vary(3):
channel = image_array[:, :, i]
# Carry out convolution
convolved_channel = convolve2d(channel, kernel, mode='identical', boundary='wrap')
# Clip values to be within the vary [0, 255]
convolved_channel = np.clip(convolved_channel, 0, 255)
# Retailer again within the sharpened array
sharpened_array[:, :, i] = convolved_channel.astype(np.uint8)
# Debugging: Verify output values
print("Sharpened array values: Min =", sharpened_array.min(), "Max =", sharpened_array.max())
# Convert array again to picture
sharpened_image = Picture.fromarray(sharpened_array)
# Finish timing
period = time.time() - start_time
print(f"Processing time: {period:.4f} seconds")
return sharpened_image
# Appropriate path for WSL2 accessing Home windows filesystem
image_path = '/mnt/d/photos/taj_mahal.png'
picture = Picture.open(image_path)
# Sharpen the picture
sharpened_image = sharpen_image_color(picture)
if sharpened_image:
# Present utilizing PIL's built-in present technique (for debugging)
#sharpened_image.present(title="Sharpened Picture (PIL Present)")
# Show the unique and sharpened photos utilizing Matplotlib
fig, axs = plt.subplots(1, 2, figsize=(15, 7))
# Authentic picture
axs[0].imshow(picture)
axs[0].set_title("Authentic Picture")
axs[0].axis('off')
# Sharpened picture
axs[1].imshow(sharpened_image)
axs[1].set_title("Sharpened Picture")
axs[1].axis('off')
# Present each photos facet by facet
plt.present()
else:
print("Didn't generate sharpened picture.")
The output is that this.
Enter array values: Min = 0 Max = 255
Sharpened array values: Min = 0 Max = 255
Processing time: 0.1034 seconds

Let’s have a look at if Cython can beat the 0.1034 seconds runtime.
%%cython
# cython: language_level=3
# distutils: define_macros=NPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION
import numpy as np
cimport numpy as np
import cython
@cython.boundscheck(False)
@cython.wraparound(False)
def sharpen_image_cython(np.ndarray[np.uint8_t, ndim=3] image_array):
# Outline sharpening kernel
cdef int kernel[3][3]
kernel[0][0] = 0
kernel[0][1] = -1
kernel[0][2] = 0
kernel[1][0] = -1
kernel[1][1] = 5
kernel[1][2] = -1
kernel[2][0] = 0
kernel[2][1] = -1
kernel[2][2] = 0
# Declare variables exterior of loops
cdef int peak = image_array.form[0]
cdef int width = image_array.form[1]
cdef int channel, i, j, ki, kj
cdef int worth
# Put together an empty array for the sharpened picture
cdef np.ndarray[np.uint8_t, ndim=3] sharpened_array = np.zeros_like(image_array)
# Convolve every channel individually
for channel in vary(3): # Iterate over RGB channels
for i in vary(1, peak - 1):
for j in vary(1, width - 1):
worth = 0 # Reset worth at every pixel
# Apply the kernel
for ki in vary(-1, 2):
for kj in vary(-1, 2):
worth += kernel[ki + 1][kj + 1] * image_array[i + ki, j + kj, channel]
# Clip values to be between 0 and 255
sharpened_array[i, j, channel] = min(max(worth, 0), 255)
return sharpened_array
# Python a part of the code
from PIL import Picture
import numpy as np
import time as py_time # Renaming the Python time module to keep away from battle
import matplotlib.pyplot as plt
# Load the enter picture
image_path = '/mnt/d/photos/taj_mahal.png'
picture = Picture.open(image_path).convert('RGB')
# Convert the picture to a NumPy array
image_array = np.array(picture)
# Time the sharpening with Cython
start_time = py_time.time()
sharpened_array = sharpen_image_cython(image_array)
cython_time = py_time.time() - start_time
# Convert again to a picture for displaying
sharpened_image = Picture.fromarray(sharpened_array)
# Show the unique and sharpened picture
plt.determine(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.imshow(picture)
plt.title("Authentic Picture")
plt.subplot(1, 2, 2)
plt.imshow(sharpened_image)
plt.title("Sharpened Picture")
plt.present()
# Print the time taken for Cython processing
print(f"Processing time with Cython: {cython_time:.4f} seconds")
The output is

Each applications labored nicely, however Cython was virtually 25 occasions sooner.
How about operating Cython exterior of a pocket book setting?
Up to now, every little thing I’ve proven you is assuming you’re operating your code in a Jupyter pocket book. The rationale I did it’s because it introduces Cython and is the simplest option to run and run some code rapidly. Pocket book environments are highly regarded amongst Python builders, however an enormous quantity of Python code continues to be included in common information and run from the terminal utilizing Python instructions.
If that’s the predominant mode of coding and operating a python script, %load_ext and %% Cython IPython Magic instructions do not work as solely Jupyter/Ipython is known.
So, in case you are operating your code as an everyday Python script, this is tips on how to adapt the 4-step Cython conversion course of:
Take my first sum_of_squares An instance to point out this.
1/%% Create a .pyx file as a substitute of utilizing Cython
Strikes Cython-enhanced code to a named file.
sum_of_squares.pyx
# sun_of_squares.pyx
def fast_sum_of_squares(int n):
cdef int complete = 0
cdef int i, j
for i in vary(n):
for j in vary(n):
complete += i * i + j * j
return complete
All we did was to take away the %% CYTHON directive and timing code (which is able to grow to be the calling perform).
2/ Create a setup.py file and compile the .pyx file
# setup.py
from setuptools import setup
from Cython.Construct import cythonize
setup(
title="cython-test",
ext_modules=cythonize("sum_of_squares.pyx", language_level=3),
py_modules=["sum_of_squares"], # Explicitly state the module
zip_safe=False,
)
3/ Run the setup.py file utilizing this command.
$ python setup.py build_ext --inplace
operating build_ext
copying construct/lib.linux-x86_64-cpython-311/sum_of_squares.cpython-311-x86_64-linux-g
4/ Create an everyday Python module to invoke Cython code after which run it, as proven under.
# predominant.py
import time, timeit
from sum_of_squares import fast_sum_of_squares
begin = time.time()
outcome = fast_sum_of_squares(20000)
print("timeit:", timeit.timeit(
lambda: fast_sum_of_squares(20000),
quantity=1))
$ python predominant.py
timeit: 0.14675087109208107
abstract
Hopefully I’ve satisfied you of the effectiveness of utilizing Cython libraries in your code. With a little bit of effort, it might appear a little bit difficult at a look, however even utilizing a quick numerical library like numpy, you’ll be able to nonetheless use common Python to get an unimaginable efficiency enchancment over the operating time.
I’ve offered a 4-step course of to transform regular Python code and use Cython for operating inside a Jupyter pocket book setting. Moreover, we have defined the steps required to run Cython code from the command line exterior the pocket book setting.
Lastly, I’ve enhanced the above by introducing an instance of changing common Python code to make use of Cython.
Within the three examples I’ve proven, I’ve achieved speed-up income of 80x, 10x and 25x, however this isn’t in any respect ugly.
As promised, here’s a hyperlink to a earlier TDS article that makes use of the numexpr library to speed up Python code:

