There are many ways to handle arrays in Cython. I tried to find the most optimize way out of them. For it I took an easy problem of finding number of elements greater than 20 in two arrays. Then I profiled them in Ipython using %prun command and analysed the outcome.

The version I started with was:-

import numpy as np
cimport numpy as np


def fun(a, b):
	c1, c2 = 0, 0
	for i in range(len(a)):
		if a[i] > 20:
			c1 += 1
	for i in range(len(b)):
		if b[i] > 50:
			c2 +=1
	return c1, c2


def trial(a, b):
	return fun(a, b)

I created an array using numpy linspace function of 100000 elements and passed it to the above
function.
Execution time came out to be:- 10.393 seconds.

Then I did some modifications and typed the arguments of Line 6 and changed it to:-

[1]def fun(np.ndarray[int, ndim = 1]a, np.ndarray[int, ndim = 1] b):
[2]    cdef int i, c1 =0, c2 = 0

The execution time after changing [1] was 0.208 sec i.e. 50x faster and after [2] it came down to 0.031 seconds.
As the timing was getting smaller and smaller I created another data-set of 1000000 elements(100xto last one) and now the timing was:- 0.23 seconds.

Then I thought of testing the speed gain of memoryview tehnique and changed the code to:-

# cython: profile=True
import numpy as np
cimport numpy as np


cdef fun(int[:1] a, int[:1] b):
	cdef int i, c1 =0, c2 = 0
	for i in range(len(a)):
		if a[i] > 20:
			c1 += 1
	for i in range(len(b)):
		if b[i] > 50:
			c2 +=1
	return c1, c2


def trial(a, b):
	return fun(a, b)

This had the same timings as of the last one but when I changed the buffers to ‘C type contiguous’ i got 0.154 seconds i.e 1.5 better than the last numpy array version over the new data set.

Next modification was completely different. I made the function fun of complete c type.

# cython: profile=True
import numpy as np
cimport numpy as np


cdef fun(int *a, int *b, int sa, int sb):
	cdef int i, c1 =0, c2 = 0
	for i in range(sa):
		if a[i] > 20:
			c1 += 1
	for i in range(sb):
		if b[i] > 50:
			c2 +=1
	return c1, c2


def trial(a, b):

	return fun( np.PyArray_DATA(a), np.PyArray_DATA(b), len(a), len(b))

This version gave me another 10% speedup and now the timing was .144 seconds.

Now since the fun() function was entirely a c function I thought to use it without gil.
I changed the line:

cdef fun(int *a, int *b, int sa, int sb):
to
cdef fun(int *a, int *b, int sa, int sb) nogil:

This gave me another speed up of 2x.

Advertisements