Cython: Simple, fast code
When choosing a programming language, it really depends on what qualities you want. Do you want a simple language that you can learn easily? Python might be the right choice. Do you want a blazingly fast language? Maybe C/C++ is for you.
But, what if there was a language that had the best of both worlds? Cython does exactly that. Cython allows you to write code as simple as Python, that runs as fast as C++.
At its core, Cython first translates Python-like code into C++ automatically, and then compiles the resulting C++ to make extremely fast extensions. Cython also lets you call C/C++ functions from Python, another useful feature for when you want to use an extremely fast C++ library, but don’t want to write your entire project in C++.
How do I use it?
Writing Cython code is not very different from writing Python code. There are two major differences:
- First, adding static type declarations (like in C++) allows Cython to make your code faster.
- Second, you will need to compile your Cython code before you can import it and run it, in contrast to Python code, which is dynamically interpreted.
Static Type Declarations
Adding static type declarations to Cython code is similar to C++: you declare variables with their designated types. For example, you can create an integer in Cython using cdef int x
or an array of 10 integers using cdef int[10] arr
. You can also use C++ vectors, which are similar to Python lists, by first importing the vector class: from libcpp.vector cimport vector
(Notice the cimport
instead of import
), and then use vectors by declaring them: cdef vector[int] vec
.
Compiling Cython Code
Before you can run Cython code successfully, you need to first compile your Cython code. You do this by creating a setup.py
file that builds the extension. Inside your setup.py
file, you call Cython.Build.cythonize
on your file path and then pass it to setuptools.setup
as the ext_modules
parameter. Then, you can build the extension by running python3 setup.py build_ext --inplace
. The --inplace
argument automatically copies the resulting shared object file back to the current directory, so you can directly import it.
Other Notes About Syntax
Aside from static type declarations, Cython is almost identical to Python. There are no curly braces, no cin >>
or cout <<
, and no semicolons at the end of the line. Just simple Python syntax.
Using C++ code in Python
Cython also allows you to use code from other C++ files in your Cython code, which you can do by declaring them using cdef extern from "yourfile.cpp":
and then writing the function signature (e.g. vector[int] my_function(int a, int b)
indented on the next line.
Example: String alignment
String alignment is a well-known problem, with applications in genetics and biology, especially when comparing sequences of DNA bases or protein residues. We will consider a simple implementation that uses dynamic programming and runs in O(N²) by building on top of previously calculated results.
We can first write it in Python:
dp = [[0 for _ in range(5000)] for _ in range(5000)] # Preallocate dp
def align_python(a, b):
na, nb = len(a), len(b)
for i in range(0, na+1):
for j in range(0, nb+1):
if i == 0 or j == 0:
dp[i][j] = 0
continue
dp[i][j] = max(dp[i][j-1], dp[i-1][j])
if a[i-1] == b[j-1]:
dp[i][j] = max(dp[i][j], 1+dp[i-1][j-1])
c, d = len(a), len(b)
resulta, resultb = [], []
while c > 0 or d > 0:
if d > 0 and dp[c][d] == dp[c][d-1]:
d -= 1
elif c > 0 and dp[c][d] == dp[c-1][d]:
c -= 1
else:
resulta.append(c-1)
resultb.append(d-1)
c -= 1
d -= 1
return resulta[::-1], resultb[::-1]
At its core, the code simply calculates the alignment by checking whether a current pair of characters matches in the two strings and combining that fact with the past calculated alignment score. Then, we retrace our steps through the alignment table to find the optimal alignment.
We can also write an alignment visualization tool that shows the two strings:
# !pip install colorama # if you don't have colorama installed
from colorama import Fore
def show_seqs(seq1, seq2, align_fn=align_python):
a1, a2 = align_fn(seq1, seq2)
mapped_indices = [max(a1[0], a2[0])+1]
for i in range(1, len(a1)):
mapped_indices.append(mapped_indices[-1]+max(a1[i]-a1[i-1], a2[i]-a2[i-1]))
mapped_indices.append(mapped_indices[-1]+max(len(seq1)-a1[-1], len(seq2)-a2[-1]))
result1 = [" "]*(mapped_indices[-1]+1)
result2 = [" "]*(mapped_indices[-1]+1)
for i in range(len(a1)):
result1[mapped_indices[i]] = Fore.GREEN+seq1[a1[i]]+Fore.RESET
result2[mapped_indices[i]] = Fore.GREEN+seq2[a2[i]]+Fore.RESET
for i in range(len(mapped_indices)):
for j in range(0 if i == 0 else a1[i-1]+1, a1[i] if i != len(a1) else len(seq1)):
diff = j - (0 if i == 0 else a1[i-1]+1)
result1[(0 if i == 0 else mapped_indices[i-1])+1+diff] = Fore.RED+seq1[j]+Fore.RESET
for j in range(0 if i == 0 else a2[i-1]+1, a2[i] if i != len(a2) else len(seq2)):
diff = j - (0 if i == 0 else a2[i-1]+1)
result2[(0 if i == 0 else mapped_indices[i-1])+1+diff] = Fore.RED+seq2[j]+Fore.RESET
print("".join(result1))
print("".join(result2))
Running this on a sample string gives an example alignment:
Now, let’s translate our Python code into Cython in file align_cython.pyx
:
# distutils: language=c++
#cython: language_level=3
from libcpp.vector cimport vector # import the vector class
from libcpp.string cimport string # import the string class
cdef int[5000][5000] dp # allocate dp table globally
def align_cython(str a, str b):
cdef int na, nb # static type declarations
na, nb = len(a), len(b)
cdef int i, j
for i in range(0, na+1):
for j in range(0, nb+1):
if i == 0 or j == 0:
dp[i][j] = 0
continue
dp[i][j] = max(dp[i][j-1], dp[i-1][j])
if a[i-1] == b[j-1]:
dp[i][j] = max(dp[i][j], 1+dp[i-1][j-1])
cdef int c, d
c, d = len(a), len(b)
cdef vector[int] resulta, resultb
while c > 0 or d > 0:
if d > 0 and dp[c][d] == dp[c][d-1]:
d -= 1
elif c > 0 and dp[c][d] == dp[c-1][d]:
c -= 1
else:
resulta.push_back(c-1)
resultb.push_back(d-1)
c -= 1
d -= 1
return resulta[::-1], resultb[::-1]
You’ll notice that the Cython code is remarkably similar to the Python version, with the main differences being static type declarations using cdef
and using different types than in Python (e.g. vector
instead of list
).
Now, here’s our setup.py
file:
from setuptools import setup
from Cython.Build import cythonize
setup(
ext_modules = cythonize("align_cython.pyx")
)
After running python3 setup.py build_ext --inplace
, we can now import the align_cython
function with from align_cython import align_cython
.
We can now test the speed difference between the two implementations by creating a random string and using %timeit
.
# Create two random strings
length = 4000
import random
letters = list("ABCDEFGHIJKLMNOPQRSTUVWXYZ")
random_string = lambda: "".join([letters[random.randint(0, len(letters)-1)] for _ in range(length)])
rs1 = random_string()
rs2 = random_string()
Whoa! You can really see the speed difference between the two implementations: Cython is able to run the same code 42x faster!
To run this code yourself, see this Colab Notebook.
Conclusion
You may find yourself choosing between two programming languages, favoring one because of simplicity and the other because of performance. Why not have both? Cython allows you to write simple code that, when compiled, leads to blazingly fast performance like that of C/C++.