There are many ways to handle arrays in Cython. I tried to find the most optimize way out of them. For it I took an easy problem of finding number of elements greater than 20 in two arrays. Then I profiled them in Ipython using %prun
command and analysed the outcome.
The version I started with was:-
import numpy as np
cimport numpy as np
def fun(a, b):
c1, c2 = 0, 0
for i in range(len(a)):
if a[i] > 20:
c1 += 1
for i in range(len(b)):
if b[i] > 50:
c2 +=1
return c1, c2
def trial(a, b):
return fun(a, b)
I created an array using numpy linspace function of 100000 elements and passed it to the above
function.
Execution time came out to be:- 10.393 seconds.
Then I did some modifications and typed the arguments of Line 6 and changed it to:-
[1]def fun(np.ndarray[int, ndim = 1]a, np.ndarray[int, ndim = 1] b):
[2] cdef int i, c1 =0, c2 = 0
The execution time after changing [1] was 0.208 sec i.e. 50x faster and after [2] it came down to 0.031 seconds.
As the timing was getting smaller and smaller I created another data-set of 1000000 elements(100xto last one) and now the timing was:- 0.23 seconds.
Then I thought of testing the speed gain of memoryview tehnique and changed the code to:-
# cython: profile=True
import numpy as np
cimport numpy as np
cdef fun(int[:1] a, int[:1] b):
cdef int i, c1 =0, c2 = 0
for i in range(len(a)):
if a[i] > 20:
c1 += 1
for i in range(len(b)):
if b[i] > 50:
c2 +=1
return c1, c2
def trial(a, b):
return fun(a, b)
This had the same timings as of the last one but when I changed the buffers to ‘C type contiguous’ i got 0.154 seconds i.e 1.5 better than the last numpy array version over the new data set.
Next modification was completely different. I made the function fun of complete c type.
# cython: profile=True
import numpy as np
cimport numpy as np
cdef fun(int *a, int *b, int sa, int sb):
cdef int i, c1 =0, c2 = 0
for i in range(sa):
if a[i] > 20:
c1 += 1
for i in range(sb):
if b[i] > 50:
c2 +=1
return c1, c2
def trial(a, b):
return fun( np.PyArray_DATA(a), np.PyArray_DATA(b), len(a), len(b))
This version gave me another 10% speedup and now the timing was .144 seconds.
Now since the fun() function was entirely a c function I thought to use it without gil.
I changed the line:
cdef fun(int *a, int *b, int sa, int sb):
to
cdef fun(int *a, int *b, int sa, int sb) nogil:
This gave me another speed up of 2x.