There are many ways to handle arrays in Cython. I tried to find the most optimize way out of them. For it I took an easy problem of finding number of elements greater than 20 in two arrays. Then I profiled them in Ipython using `%prun`

command and analysed the outcome.

The version I started with was:-

import numpy as np
cimport numpy as np
def fun(a, b):
c1, c2 = 0, 0
for i in range(len(a)):
if a[i] > 20:
c1 += 1
for i in range(len(b)):
if b[i] > 50:
c2 +=1
return c1, c2
def trial(a, b):
return fun(a, b)

I created an array using numpy linspace function of 100000 elements and passed it to the above

function.

Execution time came out to be:- 10.393 seconds.

Then I did some modifications and typed the arguments of Line 6 and changed it to:-

[1]def fun(np.ndarray[int, ndim = 1]a, np.ndarray[int, ndim = 1] b):
[2] cdef int i, c1 =0, c2 = 0

The execution time after changing [1] was 0.208 sec i.e. 50x faster and after [2] it came down to 0.031 seconds.

As the timing was getting smaller and smaller I created another data-set of 1000000 elements(100xto last one) and now the timing was:- 0.23 seconds.

Then I thought of testing the speed gain of memoryview tehnique and changed the code to:-

# cython: profile=True
import numpy as np
cimport numpy as np
cdef fun(int[:1] a, int[:1] b):
cdef int i, c1 =0, c2 = 0
for i in range(len(a)):
if a[i] > 20:
c1 += 1
for i in range(len(b)):
if b[i] > 50:
c2 +=1
return c1, c2
def trial(a, b):
return fun(a, b)

This had the same timings as of the last one but when I changed the buffers to ‘C type contiguous’ i got 0.154 seconds i.e 1.5 better than the last numpy array version over the new data set.

Next modification was completely different. I made the function fun of complete c type.

# cython: profile=True
import numpy as np
cimport numpy as np
cdef fun(int *a, int *b, int sa, int sb):
cdef int i, c1 =0, c2 = 0
for i in range(sa):
if a[i] > 20:
c1 += 1
for i in range(sb):
if b[i] > 50:
c2 +=1
return c1, c2
def trial(a, b):
return fun( np.PyArray_DATA(a), np.PyArray_DATA(b), len(a), len(b))

This version gave me another 10% speedup and now the timing was .144 seconds.

Now since the fun() function was entirely a c function I thought to use it without gil.

I changed the line:

cdef fun(int *a, int *b, int sa, int sb):
to
cdef fun(int *a, int *b, int sa, int sb) nogil:

This gave me another speed up of 2x.

### Like this:

Like Loading...

*Related*

what about using xrange instead of range? ‘range’ createa list of the whole length while xrange is a sequence object that evaluates lazily.

I guess it could also give some improvement.

I had tried replacing range with xrange since its faster than the former. But there was no significant speed gain. Btw thanks for the suggestion..!!!