typedef vector<int> vi; typedef vector<vi> vvi; typedef pair<int,int> ii; #define pb push_back #define tr(c,i) for(typeof((c).begin() i = (c).begin(); i != (c).end(); i++) #define all(c) (c).begin(),(c).end() vvi graph; // start_vertex is the starting vertex. // n is the number of Nodes int bfs(int start_vertex, int n){ vi visited(n, 0); queue<int> q; visited[start_vertex] = 1; q.push(start_vertex); while(!Q.empty()){ int idx = q.front(); q.pop(); tr(graph[idx], itr){ if(!visited[*itr]) { q.push(*itr); visited[*itr] = 1; } } } return (find(all(V), 0) == V.end()); }
int BinarySearch(int A[], int l, int r, int key) { int m; while( l <= r ) { m = l + (r-l)/2; if( A[m] == key ) // first comparison return m; if( A[m] < key ) // second comparison l = m + 1; else r = m - 1; } return -1; }
// Input: A[l .... r-1] // Note A[r] is not being searched int BinarySearch(int A[], int l, int r, int key) { int m; while( r - l > 1 ) { m = l + (r-l)/2; if( A[m] <= key ) l = m; else r = m; } if( A[l] == key ) return l; else return -1; }
// Eg.The code will return 3 if searched for 4 // in array [1,2,3,5,6] // Input: A[l .... r-1] // Note A[r] is not being searched int Floor(int A[], int l, int r, int key) { int m; while( r - l > 1 ) { m = l + (r - l)/2; if( A[m] <= key ) l = m; else r = m; } return A[l]; } // Initial call int Floor(int A[], int size, int key) { // Add error checking if key < A[0] if( key < A[0] ) return -1; // Observe boundaries return Floor(A, 0, size, key); }
// Input: Indices Range [l ... r) // Invariant: A[l] <= key and A[r] > key int GetRightPosition(int A[], int l, int r, int key) { int m; while( r - l > 1 ) { m = l + (r - l)/2; if( A[m] <= key ) l = m; else r = m; } return l; } // Input: Indices Range (l ... r] // Invariant: A[r] >= key and A[l] > key int GetLeftPosition(int A[], int l, int r, int key) { int m; while( r - l > 1 ) { m = l + (r - l)/2; if( A[m] >= key ) r = m; else l = m; } return r; } int CountOccurances(int A[], int size, int key) { // Observe boundary conditions int left = GetLeftPosition(A, -1, size-1, key); int right = GetRightPosition(A, 0, size, key); // What if the element doesn't exists in the array? // The checks helps to trace that element exists return (A[left] == key && key == A[right])? (right - left + 1) : 0; }
Apart from these functions there are also direct functions in stl which can be used. These are lower_bound and upper_bound.
lower_bound gives the iterator of the first appearance of key in array if key is in array, else it returns the iterator of the element just less than key. Similarly upper_bound returns the iterator of the next larger element than key in array in all cases. The example will make it clearer.
#include <iostream> // std::cout #include <algorithm> // std::lower_bound, std::upper_bound, std::sort #include <vector> // std::vector int main () { int myints[] = {10,20,30,30,20,10,10,20}; std::vector<int> v(myints,myints+8); // 10 20 30 30 20 10 10 20 std::sort (v.begin(), v.end()); // 10 10 10 20 20 20 30 30 std::vector<int>::iterator low,up; low=std::lower_bound (v.begin(), v.end(), 20); // ^ up= std::upper_bound (v.begin(), v.end(), 20); // ^ std::cout << "lower_bound at position " << (low- v.begin()) << '\n'; std::cout << "upper_bound at position " << (up - v.begin()) << '\n'; return 0; }
Reference: http://www.geeksforgeeks.org/the-ubiquitous-binary-search-set-1/
Then there was another conflict regarding functions using fused data type of input arrays. If variables are declared in the same file then using fused data type in file itself then it becomes very easy to use fused data type. But writing a function with fused type coming from user is a very tedious task. We finally found a way of implementing it, but its very much complex and uses function pointers which makes it horrible to maintain. We are trying to find any alternative to it yet. I will in my next blog explain how I have used function pointers for fused data type where the type depends upon the input data given by the user.
Link of the PR is http://www.github.com/scipy/scipy/pull/5006
</Keep Coding>
Image processing functions are generally thought to operate over two-dimensional arrays of value. There are however a number we need to operate over images with more than wo dimensions. The scipy.ndimage module is an excellent collection of a number of general image processing functions which are designed to operate over arrays with arbitrary dimensions. This module is an extension of Python library written in C using Python – C API to ameliorate its speed. The whole module can be broadly divided into 3 categories:-
View original post 208 more words
PyArrayIterObject
]]>PyArrayIterObject
container with all the required members, ndimage module has its own optimized iterator structure NI_Iterator
.
But before jumping directly to Iterators
it would be better if we first understand PyArrayObject
. It will help us in deciphering Iterators
better. In C every ndarray is a pointer to PyArrayObject
struture. It contains all the information required to deal with ndarray in C. All instances of ndarray will have this structure. It is defined as:
typedef struct PyArrayObject { PyObject_HEAD char *data; int nd; npy_intp *dimensions; npy_intp *strides; PyObject *base; PyArray_Descr *descr; int flags; PyObject *weakreflist; } PyArrayObject;
Rest of the members of the container are not much of our use as of now. So for now we can safely ignore them.
Now let’s come back to PyArrayIterObject
. It is another container defined in Numpy containing information required to iterate through the array.
The NI_Iterator container defined in ni_support.h looks something like:-
typedef struct { int rank_m1; npy_intp dimensions[MAXDIM]; npy_intp coordinates[MAXDIM]; npy_intp strides[MAXDIM]; npy_intp backstrides[MAXDIM]; } NI_Iterator;
PyArrayObject
of the Array which is to be iterated.PyArrayObject
of the Array which is to be iterated.The PyArrayIterObject
is a higher version of NI_Iterator
. Along with the members of NI_Iterator
it contains some extra members which provide some extra functionality. Following is the exact struct of PyArrayIterObject
annotated with the function of each of its members:
typedef struct { PyObject_HEAD /* Same as rank_m1 in NI_Iterator */ int <strong>nd_m1</strong>; /* The current 1-D index into the arrray.*/ npy_intp index; /* The total size of the Array to be iterated. */ npy_intp size; /* Same as coordinates in NI_Iterator */ npy_intp coordinates[NPY_MAXDIMS]; /* Same as dimensions in NI_Iterator */ npy_intp dims_m1[NPY_MAXDIMS]; /* Same as strides in NI_Iterator*/ npy_intp strides[NPY_MAXDIMS]; /* Same as backstrides in NI_Iterator */ npy_intp backstrides[NPY_MAXDIMS]; /* This array is used to convert 1-D array to N-D array*/ npy_intp factors[NPY_MAXDIMS]; /* The pointer to underlying Arrray*/ PyArrayObject *ao; /* Pointer to element in the ndarray indicated by the index*/ char *dataptr; /* This flag is true if Underlying array is C- contiguous.*/ /* It is used to simplify calculations. */ Bool contiguous; } PyArrayIterObject;
The members in bold were also part of NI_Iterator
. As we can see all the extra members from NI_Iterator
are mostly convenience values and can be derived at the time of requirement.
For iterating through any ndarray, the members of the NI_Iterator
requires to be set according to the ndarray to iterated. This is done by NI_InitPointIterator
function. It takes values from input array and initializes all the members of NI_Iterator
. The function is quite simple to understand and defined as:-
int NI_InitPointIterator(PyArrayObject *array, NI_Iterator *iterator) { int ii; iterator->rank_m1 = array->nd - 1; for(ii = 0; ii < array->nd; ii++) { iterator->dimensions[ii] = array->dimensions[ii] - 1; iterator->coordinates[ii] = 0; iterator->strides[ii] = array->strides[ii]; iterator->backstrides[ii] = array->strides[ii] * iterator->dimensions[ii]; } return 1; }
This sets NI_Iterator
to iterate over all the dimensions of the input array. But there may be cases when we don’t need to iterate over whole ndarray, but only some axes. For this two other variant function of NI_InitPointIterator
are available. These are:
I would like to thank my mentor Jaime who helped me in understanding this.
In the next blog I will give the detailed explanation of NI_LineBuffer and NI_FIlterIterators.
I will explain these constructs in detail in further blogs.
For rewriting the module our plan of action as suggested by my mentor Jaime is something like:-
In order to identify specific functions in points 2, 4 and 6, I have made a list of all constructs used by various functions in the module. It can be seen here. We will use this list to decide the order of poting of functions.
</keep coding>
Objective of Project– The whole module has been written in C using Python-C API and my task is to rewrite the module into cython that will make the code easier to read and maintain. The main challenge expected that is expected is to maintain the same speed as that of c implementation.
About scipy.ndimage module:-
The module has basically following major divisions:-
Further details and examples of above functions can be seen here.
scipy.ndimage implementation details:- As said above the whole module is written in c using python – c API. The module depends upon 3 basic constructs defined in support.c file:-
In the next blog I will be write about how these construct actually work and explain their implementation.
For porting what the initial plan is:- We will try to find a pair of basic construct in support.c and an underlying function in the module which uses only that construct. Going this way will help us in keeping track of correctness and performance at each step in rewriting.
If the array is intermediate in programme, it can be declared as a C or cython array. But it may not work best for returning value to python code, etc. So instead of it we can use a NumPy array. NumPy arrays are already implemented in C and cython has direct interface with it. Hence they can be used.
Accessing elements of NumPy arrays has roughly the same speeed as that of accessing elements from C array. In NumPy arrays modes can also be specified. Like ‘c’ or ‘fortran’ types. These work best when according to our mode of iteration on arrays(row or column wise). ‘c’ mode works best when the iteration is row wise and ‘fortran’ for column wise. By spcifying the mode we don’t get any extra speed if array is arranged in same way but if not it raises and exception. Using it we can use the different array operations, but each element access time is also improved. But for that we have to access the array element wise.
There are some cons of using NumPy arrays also. Passing NumPy array slices between functions can have a significant speed loss, since these slices are Python Objects. For this we have memoryview. It supports the same fast indexing as that of NumPy arrays and on the other hand their slices continue to support the optimized buffer access. They also work well while passing throgh diffrent functions in module. They can be used with inline functions also.
However passing a memoryview object to a NumPy function becomes slower as the memoryview object has to be first converted to NumPy Array. If you need to use NumPy functions and pass memoryviews between functions you can define a ‘memoryview’ object that views the same data as your array. If for some reason a memoryview is to be converted in NumPy array np.asarray() function can be used. The memoryview can also be declared as C contiguous or Fortran contiguous depending upon conditions using syntax ::1. For eg. A two dimensional, int type, fortran contiguous array can be defined as ‘cdef int [::1 , :]’. Cython also supports pointers. Many of the operations can be done using pointers with almost the same speed as that of optimized array lookup, but code readability is compromized heavily. * is not supported in cython for derefrencing pointers. [] should be used for that purpose.
I will try to upload an example each of abovesaid statements in the upcoming blogs, which will make them more clear.
Until then </keep coding>
Also can we beat the speed of numpy with explicit C looping? We will see
The basic function with which I started was:
def fun(a): x = np.sin(a) return x def trial(a): return fun(a)
Runtime: 195ms
Note:- Using cdef instead of def and defining type of all variables cause no difference in runtime.
Then instead of passing whole array in function, I did explicit looping and used sin function from libc.math.
Code was:-
cdef fun(a):
return sin(a)
def trial(np.ndarray[double] a):
for i in xrange(a.shape[0]):
a[i] = fun(a[i])
return a
Runtime now was: 340ms
There are some more points to note in this function:-
1) If we use def instead of cdef while declaring fun() runtime escalates to 500ms. This is the change a pure C loop without python overhead can bring.
2) Another thing, If np.sin is used in place of runtime is 16s. np.sin is a python function which has some python overhead. When called many times this overhead gets added every time the function is called slows the code heavily. But if we need to pass arrays, this np.sin works quite well as was seen in case-1.
Now only if we define the type of a as double, run-time comes down to 252ms.
Note:- If in def "trial(np.ndarray[double] a): " if we don't define the type of a runtime is 1.525s.
Next I removed the function fun() and did the computations in trial function itself.
from libc.math cimport sin
def trial(np.ndarray[double] a):
for i in xrange(a.shape[0]):
a[i] = sin(a[i])
return a
This time run-time was 169ms. We have finally beaten the 1st code.
I have yet not explained the variation in runtime in many cases. I will try to do it soon.
NB:
1. Data over which all the calculations were done was generated by a = np.linspace(1,1000,1000000)
2. Using typed memoryview instead of np.ndarray caused no change.
3. Timings were estimated by using Ipython magic function %timeit.
%prun
command and analysed the outcome.
The version I started with was:-
import numpy as np cimport numpy as np def fun(a, b): c1, c2 = 0, 0 for i in range(len(a)): if a[i] > 20: c1 += 1 for i in range(len(b)): if b[i] > 50: c2 +=1 return c1, c2 def trial(a, b): return fun(a, b)
I created an array using numpy linspace function of 100000 elements and passed it to the above
function.
Execution time came out to be:- 10.393 seconds.
Then I did some modifications and typed the arguments of Line 6 and changed it to:-
[1]def fun(np.ndarray[int, ndim = 1]a, np.ndarray[int, ndim = 1] b): [2] cdef int i, c1 =0, c2 = 0
The execution time after changing [1] was 0.208 sec i.e. 50x faster and after [2] it came down to 0.031 seconds.
As the timing was getting smaller and smaller I created another data-set of 1000000 elements(100xto last one) and now the timing was:- 0.23 seconds.
Then I thought of testing the speed gain of memoryview tehnique and changed the code to:-
# cython: profile=True import numpy as np cimport numpy as np cdef fun(int[:1] a, int[:1] b): cdef int i, c1 =0, c2 = 0 for i in range(len(a)): if a[i] > 20: c1 += 1 for i in range(len(b)): if b[i] > 50: c2 +=1 return c1, c2 def trial(a, b): return fun(a, b)
This had the same timings as of the last one but when I changed the buffers to ‘C type contiguous’ i got 0.154 seconds i.e 1.5 better than the last numpy array version over the new data set.
Next modification was completely different. I made the function fun of complete c type.
# cython: profile=True import numpy as np cimport numpy as np cdef fun(int *a, int *b, int sa, int sb): cdef int i, c1 =0, c2 = 0 for i in range(sa): if a[i] > 20: c1 += 1 for i in range(sb): if b[i] > 50: c2 +=1 return c1, c2 def trial(a, b): return fun( np.PyArray_DATA(a), np.PyArray_DATA(b), len(a), len(b))
This version gave me another 10% speedup and now the timing was .144 seconds.
Now since the fun() function was entirely a c function I thought to use it without gil.
I changed the line:
cdef fun(int *a, int *b, int sa, int sb): to cdef fun(int *a, int *b, int sa, int sb) nogil:
This gave me another speed up of 2x.