# Aman Singh

## Think Twice.. Code Once

Following is an easy implementation of BFS using STL in c++:

```typedef vector<int> vi;
typedef vector<vi> vvi;
typedef pair<int,int> ii;
#define pb push_back
#define tr(c,i) for(typeof((c).begin() i = (c).begin(); i != (c).end(); i++)
#define all(c) (c).begin(),(c).end()

vvi graph;
// start_vertex is the starting vertex.
// n is the number of Nodes
int bfs(int start_vertex, int n){
vi visited(n, 0);
queue<int> q;
visited[start_vertex] = 1;
q.push(start_vertex);
while(!Q.empty()){
int idx = q.front();
q.pop();
tr(graph[idx], itr){
if(!visited[*itr]) {
q.push(*itr);
visited[*itr] = 1;
}
}
}
return (find(all(V), 0) == V.end());
}
```

Many a times we face the need to modify binary search for solving different Competitive Coding problems. I am archiving some of the most used variations of Binary Search:

• Normal Binary search with two comparisons:

```int BinarySearch(int A[], int l, int r, int key)
{
int m;
while( l <= r )
{
m = l + (r-l)/2;

if( A[m] == key ) // first comparison
return m;

if( A[m] < key ) // second comparison
l = m + 1;
else
r = m - 1;
}
return -1;
}```
• Binary Search with less comarisons:

```// Input: A[l .... r-1]
// Note A[r] is not being searched
int BinarySearch(int A[], int l, int r, int key)
{
int m;
while( r - l > 1 )
{
m = l + (r-l)/2;

if( A[m] <= key )
l = m;
else
r = m;
}

if( A[l] == key )
return l;
else
return -1;
}
```
• Binary search used to find the floor value in code:
Note: If there are duplicates, this will return the index of the last occurrence of key.

```//  Eg.The code will return 3 if searched for 4
//  in array [1,2,3,5,6]
//  Input: A[l .... r-1]
//  Note A[r] is not being searched
int Floor(int A[], int l, int r, int key)
{
int m;
while( r - l > 1 )
{
m = l + (r - l)/2;
if( A[m] <= key )
l = m;
else
r = m;
}
return A[l];
}

// Initial call
int Floor(int A[], int size, int key)
{
// Add error checking if key < A
if( key < A )
return -1;
// Observe boundaries
return Floor(A, 0, size, key);
}
```
• Finding number of occurrences of a Number in a sorted array:

```// Input: Indices Range [l ... r)
// Invariant: A[l] <= key and A[r] > key
int GetRightPosition(int A[], int l, int r, int key)
{
int m;

while( r - l > 1 )
{
m = l + (r - l)/2;

if( A[m] <= key )
l = m;
else
r = m;
}

return l;
}

// Input: Indices Range (l ... r]
// Invariant: A[r] >= key and A[l] > key
int GetLeftPosition(int A[], int l, int r, int key)
{
int m;

while( r - l > 1 )
{
m = l + (r - l)/2;

if( A[m] >= key )
r = m;
else
l = m;
}

return r;
}

int CountOccurances(int A[], int size, int key)
{
// Observe boundary conditions
int left = GetLeftPosition(A, -1, size-1, key);
int right = GetRightPosition(A, 0, size, key);

// What if the element doesn't exists in the array?
// The checks helps to trace that element exists
return (A[left] == key && key == A[right])?
(right - left + 1) : 0;
}```
• Apart from these functions there are also direct functions in stl which can be used. These are lower_bound and upper_bound.
lower_bound gives the iterator of the first appearance of key in array if key is in array, else it returns the iterator of the element just less than key. Similarly upper_bound returns the iterator of the next larger element than key in array in all cases. The example will make it clearer.

```#include <iostream>     // std::cout
#include <algorithm>    // std::lower_bound, std::upper_bound, std::sort
#include <vector>       // std::vector

int main () {
int myints[] = {10,20,30,30,20,10,10,20};
std::vector<int> v(myints,myints+8);           // 10 20 30 30 20 10 10 20

std::sort (v.begin(), v.end());                // 10 10 10 20 20 20 30 30

std::vector<int>::iterator low,up;
low=std::lower_bound (v.begin(), v.end(), 20); //          ^
up= std::upper_bound (v.begin(), v.end(), 20); //                   ^

std::cout << "lower_bound at position " << (low- v.begin()) << '\n';
std::cout << "upper_bound at position " << (up - v.begin()) << '\n';

return 0;
}```

I have made the first PR of my GSoC project which is the Porting of the function find_objects,  in measurement submodule of ndimage.  This is one of the most basic function of  ndimage module which finds any object from a labelled image.  It returns a slice object which we can be used on the image to find objects in image.  In porting this function we had several problems to deal with.  Firstly I had to make it  run on all range of devices running whether solaris or Ubuntu 14.10 and for both big endian and little endian machines.  So the first challenge was to manage byteswapped pointers. For this we had used two api’s in numpy. First one PyArray_ISBYTESWAPPED() is used to  check whether the given  pointer is byteswapped or not.  And the second one copyswap(), is used to convert a byteswapped pointer into a normal one which we can dereference normally.  Initially we had used this function but it was making the whole function look like  a proper C function. So we decided to use another high level api of numpy itself which was costlier than the original implementation (as it makes copy of the whole array) but it made the implementation more cythonish and easy to  maintainable. We have yet to do the bench-marking of this version and if results come good,  we are sticking to this new version.

Then there was another conflict regarding functions using  fused data type of input arrays. If variables are declared in the same file then using fused data type in file itself then it becomes very easy to use  fused data type. But writing a function with fused type coming from user is a very tedious task. We finally found a way of implementing it, but its very much complex and uses function pointers which makes it horrible to maintain. We are trying to find any alternative to it yet. I will in my next blog explain how I have used function pointers for fused data type where the type depends upon the input data given by the user.

Link of the PR is http://www.github.com/scipy/scipy/pull/5006

</Keep Coding> Aman Singh

Image processing functions are generally thought to operate over two-dimensional arrays of value. There are however a number we need to operate over images with more than wo dimensions. The scipy.ndimage module is an excellent collection of a number of general image processing functions which are designed to operate over arrays with arbitrary dimensions. This module is an extension of Python library written in C using Python – C API to ameliorate its speed. The whole module can be broadly divided into 3 categories:-

• Files containing wrapper functions:- This includes the nd_image.h and nd_image.c files. ndimage.c file mainly contains functions required for extension of module in c viz. All the wrapper functions along with other module initialization function and method table.
• Files containing basic constructs:- These are in the files ni_support.c and ni_support.h. These constructs include a mixture of some containers, macros and various functions. These constructs are like arteries of…

View original post 208 more words

Iterators are one of the basic pillars on which the ndimage module stands. As the name suggests it is used to iterate over the arbitrary dimensional input arrays. Even though Numpy library contains a complete iterator `PyArrayIterObject` container with all the required members, ndimage module has its own optimized iterator structure `NI_Iterator`.

But before jumping directly to `Iterators` it would be better if we first understand `PyArrayObject`. It will help us in deciphering `Iterators` better.  In C every ndarray is a pointer to `PyArrayObject` struture. It contains all the information required to deal with ndarray in C. All instances of ndarray will have this structure. It is defined as:

```typedef struct PyArrayObject {
char *data;
int nd;
npy_intp *dimensions;
npy_intp *strides;
PyObject *base;
PyArray_Descr *descr;
int flags;
PyObject *weakreflist;
} PyArrayObject;```
• data is the pointer to the first element of the array.
• nd refers to number of dimensions in the array.
• dimensions is an array of integers which tells the shape of each dimension.
• strides is an array of integers providing for each dimension the number of bytes that must be skipped to get to the next element in that dimensions.

Rest of the members of the container are not much of our use as of now. So for now we can safely ignore them.

Now let’s come back to `PyArrayIterObject`. It is another container defined in Numpy containing information required to iterate through the array.

The NI_Iterator container defined in ni_support.h looks something like:-

```typedef struct {
int rank_m1;
npy_intp dimensions[MAXDIM];
npy_intp coordinates[MAXDIM];
npy_intp strides[MAXDIM];
npy_intp backstrides[MAXDIM];
} NI_Iterator;```
•  rank_m1 is basically the rank of the Array which is to be iterated. Its value is equal to N -1, where N is the number of dimensions of the underlying array.
• dimension is an array containing the size of each of the dimension the iterator iterate over. Its value is one less than the dimension of `PyArrayObject` of the Array which is to be iterated.
• coordinates refers to a N-Dimensional index of the array i.e. it tells us the last position visited by the iterator.
• strides tells us stride along each of the dimension in the array. It is same as that of strides of `PyArrayObject` of the Array which is to be iterated.
• For each dimension backstrides tells us the number of bytes needed to jump from the end of a dimension back to its beginning. Note that:- backstride[k] = dimension[k] * stride[k]. It is saved only for optimization purpose.

The `PyArrayIterObject` is a higher version of `NI_Iterator`. Along with the members of `NI_Iterator` it contains some extra members which provide some extra functionality. Following is the exact struct of `PyArrayIterObject` annotated with the function of each of its members:

```typedef struct {
/* Same as rank_m1 in NI_Iterator */
int <strong>nd_m1</strong>;
/* The current 1-D index into the arrray.*/
npy_intp index;
/* The total size of the Array to be iterated. */
npy_intp size;
/* Same as coordinates in NI_Iterator */
npy_intp coordinates[NPY_MAXDIMS];
/* Same as dimensions in NI_Iterator */
npy_intp dims_m1[NPY_MAXDIMS];
/* Same as strides in NI_Iterator*/
npy_intp strides[NPY_MAXDIMS];
/* Same as backstrides in NI_Iterator */
npy_intp backstrides[NPY_MAXDIMS];
/* This array is used to convert 1-D array to N-D array*/
npy_intp factors[NPY_MAXDIMS];
/* The pointer to underlying Arrray*/
PyArrayObject *ao;
/* Pointer to element in the ndarray indicated by the index*/
char *dataptr;
/* This flag is true if Underlying array is C- contiguous.*/
/* It is used to simplify calculations. */
Bool contiguous;
} PyArrayIterObject;```

The members in bold were also part of `NI_Iterator`. As we can see all the extra members from `NI_Iterator` are mostly convenience values and can be derived at the time of requirement.
For iterating through any ndarray, the members of the `NI_Iterator` requires to be set according to the ndarray to iterated. This is done by `NI_InitPointIterator` function. It takes values from input array and initializes all the members of `NI_Iterator`. The function is quite simple to understand and defined as:-

```int NI_InitPointIterator(PyArrayObject *array, NI_Iterator *iterator)
{
int ii;
iterator->rank_m1 = array->nd - 1;
for(ii = 0; ii < array->nd; ii++) {
iterator->dimensions[ii] = array->dimensions[ii] - 1;
iterator->coordinates[ii] = 0;
iterator->strides[ii] = array->strides[ii];
iterator->backstrides[ii] =
array->strides[ii] * iterator->dimensions[ii];
}
return 1;
}```

This sets `NI_Iterator` to iterate over all the dimensions of the input array. But there may be cases when we don’t need to iterate over whole ndarray, but only some axes. For this two other variant function of `NI_InitPointIterator` are available. These are:

• NI_SubspaceIterator:- This functions sets the iterator to iterate through a given set of axes. It takes two arguments, first an iterator and other the set of axes to be iterated and initializes the iterator. Its code can be seen here
• NI_LineIterator:- This function sets the iterator to iterate over .only one given axis. It basically calls NI_SubspaceIterator giving only one axis in the `axes` argument. Its code can be seen here.

I would like to thank my mentor Jaime who helped me in understanding this.

In the next blog I will give the detailed explanation of NI_LineBuffer and NI_FIlterIterators.

Image processing functions are generally thought to operate over two-dimensional arrays of value. There are however a number we need to operate over images with more than wo dimensions. The scipy.ndimage module is an excellent collection of a number of general image processing functions which are designed to operate over arrays with arbitrary dimensions. This module is an extension of Python library written in C using Python – C API to ameliorate its speed. The whole module can be broadly divided into 3 categories:-

• Files containing wrapper functions:- This includes the nd_image.h and nd_image.c files. ndimage.c file mainly contains functions required for extension of module in c viz. All the wrapper functions along with other module initialization function and method table.
• Files containing basic constructs:- These are in the files ni_support.c and ni_support.h. These constructs include a mixture of some containers, macros and various functions. These constructs are like arteries of the module and it can be summarised in three functions:-
1. NI_Iterators and its other variants, NI_LineIterator and NI_SubspaceIterator
2. NI_FilterIterator
3. NI_LineBuffer

I will explain these constructs in detail in further blogs.

• Files containing all the underlying functions:- It includes files ni_filter.c, ni_fourier.c, ni_interpolation.c, ni_measure.c and ni_morphology.c. All the general image processing functions of the module are defined in these files.

For rewriting the module our plan of action as suggested by my mentor Jaime is something like:-

1. Implement basic iteration
2. Implement a couple of selected functions that only use basic iteration
3. Implement iteration over buffered lines
4. Implement a couple of selected functions that only use these two types of iteration
5. Implement iteration over a neighborhood of a point, i.e. filter iteration
6. Implement a couple of selected functions that use the three types of iteration
7. Is there any other basic functionality needed by any function? While true, repeat the above two-step pattern.
8. Once all basic functionality is in place, implement all other missing functions.

In order to identify specific functions in points 2, 4 and 6, I have made a list of all constructs used by various functions in the module. It can be seen here. We will use this list to decide the order of poting of functions.

</keep coding>

The project which I am working in Google Summer of Code’ 15 is a combined project of two organizations scikit-image and scipy. There is a module ndimage in scipy for multidimensional image processing. Scikit image also uses this module for many of its image processing functions.

Objective of Project– The whole module has been written in C using Python-C API and my task is to rewrite the module into cython that will make the code easier to read and maintain. The main challenge expected that is expected is to maintain the same speed as that of c implementation.

The module has basically following major divisions:-

1. Filter Functions:- All the functions in this section perform some kind of spatial filtering of the input array. The output in each of them are some function of neighboring elements of input elements. These neighbour elements are termed as kernel.
2. Correlation and Convolution:- The function under correlation calculate both one-dimensional and multidimensional correlation of the input array with a given kernel. Convolution functions just do the correlation after mirroring the kernel. So in it the result is shifted in the opposite direction.
3. Smoothing and Generic Filter functions:- In this section mainly Guassian and Generic filters are implemented.
4. Interpolation functions:- This section contains different based on B- Spline theory. These function mainly employ spline interpolation to effect some type of geometric transformation to the input array.
5. Morphology functions:- The function under this section are used analysis and processing of geometrical structures. This also includes implementation of grey-scale  morphology.
6. Segmentation and labeling:- Segmentation is the process of separating objects of interest from the background. The function label generates an array where each object is assigned a unique number.

Further details and examples of above functions can be seen here.

scipy.ndimage implementation details:- As said above the whole module is written in c using python – c API. The module depends upon 3 basic constructs defined in support.c file:-

1. Iterators: NI_Iterator in its three variants:- point, subspace and line.
2. Buffer: NI_LineBuffer
3. Filter Iterator:

In the next blog I will be write about how these construct actually work and explain their implementation.

For porting what the initial plan is:- We will try to find a pair of basic construct in support.c and an underlying function in the module which uses only that construct. Going this way will help us in keeping track of correctness and performance at each step in rewriting.

The __getitem__ method(array access using []) of numpy arrays is written for python and while using it in cython it has the corrssponding overhead. We can get around it using following suggestion:-

If the array is intermediate in programme, it can be declared as a C or cython array. But it may not work best for returning value to python code, etc. So instead of it we can use a NumPy array. NumPy arrays are already implemented in C and cython has direct interface with it. Hence they can be used.
Accessing elements of NumPy arrays has roughly the same speeed as that of accessing elements from C array. In NumPy arrays modes can also be specified. Like ‘c’ or ‘fortran’ types. These work best when according to our mode of iteration on arrays(row or column wise). ‘c’ mode works best when the iteration is row wise and ‘fortran’ for column wise. By spcifying the mode we don’t get any extra speed if array is arranged in same way but if not it raises and exception. Using it we can use the different array operations, but each element access time is also improved. But for that we have to access the array element wise.
There are some cons of using NumPy arrays also. Passing NumPy array slices between functions can have a significant speed loss, since these slices are Python Objects. For this we have memoryview. It supports the same fast indexing as that of NumPy arrays and on the other hand their slices continue to support the optimized buffer access. They also work well while passing throgh diffrent functions in module. They can be used with inline functions also.

However passing a memoryview object to a NumPy function becomes slower as the memoryview object has to be first converted to NumPy Array. If you need to use NumPy functions and pass memoryviews between functions you can define a ‘memoryview’ object that views the same data as your array. If for some reason a memoryview is to be converted in NumPy array np.asarray() function can be used. The memoryview can also be declared as C contiguous or Fortran contiguous depending upon conditions using syntax ::1. For eg. A two dimensional, int type, fortran contiguous array can be defined as ‘cdef int [::1 , :]’. Cython also supports pointers. Many of the operations can be done using pointers with almost the same speed as that of optimized array lookup, but code readability is compromized heavily. * is not supported in cython for derefrencing pointers. [] should be used for that purpose.

I will try to upload an example each of abovesaid statements in the upcoming blogs, which will make them more clear.
Until then </keep coding>

Just to understand what are the factors which effect the scale of optimization in Cython I thought to do some experiments. What I did was took a basic function which passes large arrays as arguments and tried to analyse the effect of various changes.

Also can we beat the speed of numpy with explicit C looping? We will see 🙂

The basic function with which I started was:

```def fun(a):
x = np.sin(a)
return x

def trial(a):
return fun(a)
```

Runtime: 195ms
Note:- Using cdef instead of def and defining type of all variables cause no difference in runtime.

Then instead of passing whole array in function, I did explicit looping and used sin function from libc.math.

Code was:-

``` ```
``````cdef fun(a):
return sin(a)

def trial(np.ndarray[double] a):
for i in xrange(a.shape):
a[i] = fun(a[i])
return a``````

Runtime now was: 340ms
There are some more points to note in this function:-
1) If we use def instead of cdef while declaring fun() runtime escalates to 500ms. This is the change a pure C loop without python overhead can bring.
2) Another thing, If np.sin is used in place of runtime is 16s. np.sin is a python function which has some python overhead. When called many times this overhead gets added every time the function is called slows the code heavily. But if we need to pass arrays, this np.sin works quite well as was seen in case-1.

Now only if we define the type of a as double, run-time comes down to 252ms.
Note:- If in def "trial(np.ndarray[double] a): " if we don't define the type of a runtime is 1.525s.

Next I removed the function fun() and did the computations in trial function itself.
``` from libc.math cimport sin```

def trial(np.ndarray[double] a):
for i in xrange(a.shape):
a[i] = sin(a[i])
return a

This time run-time was 169ms. We have finally beaten the 1st code.

I have yet not explained the variation in runtime in many cases. I will try to do it soon.

NB:
1. Data over which all the calculations were done was generated by a = np.linspace(1,1000,1000000)
2. Using typed memoryview instead of np.ndarray caused no change.
3. Timings were estimated by using Ipython magic function %timeit.

There are many ways to handle arrays in Cython. I tried to find the most optimize way out of them. For it I took an easy problem of finding number of elements greater than 20 in two arrays. Then I profiled them in Ipython using `%prun` command and analysed the outcome.

The version I started with was:-

```import numpy as np
cimport numpy as np

def fun(a, b):
c1, c2 = 0, 0
for i in range(len(a)):
if a[i] > 20:
c1 += 1
for i in range(len(b)):
if b[i] > 50:
c2 +=1
return c1, c2

def trial(a, b):
return fun(a, b)
```

I created an array using numpy linspace function of 100000 elements and passed it to the above
function.
Execution time came out to be:- 10.393 seconds.

Then I did some modifications and typed the arguments of Line 6 and changed it to:-

```def fun(np.ndarray[int, ndim = 1]a, np.ndarray[int, ndim = 1] b):
    cdef int i, c1 =0, c2 = 0

```

The execution time after changing  was 0.208 sec i.e. 50x faster and after  it came down to 0.031 seconds.
As the timing was getting smaller and smaller I created another data-set of 1000000 elements(100xto last one) and now the timing was:- 0.23 seconds.

Then I thought of testing the speed gain of memoryview tehnique and changed the code to:-

```# cython: profile=True
import numpy as np
cimport numpy as np

cdef fun(int[:1] a, int[:1] b):
cdef int i, c1 =0, c2 = 0
for i in range(len(a)):
if a[i] > 20:
c1 += 1
for i in range(len(b)):
if b[i] > 50:
c2 +=1
return c1, c2

def trial(a, b):
return fun(a, b)

```

This had the same timings as of the last one but when I changed the buffers to ‘C type contiguous’ i got 0.154 seconds i.e 1.5 better than the last numpy array version over the new data set.

Next modification was completely different. I made the function fun of complete c type.

```# cython: profile=True
import numpy as np
cimport numpy as np

cdef fun(int *a, int *b, int sa, int sb):
cdef int i, c1 =0, c2 = 0
for i in range(sa):
if a[i] > 20:
c1 += 1
for i in range(sb):
if b[i] > 50:
c2 +=1
return c1, c2

def trial(a, b):

return fun( np.PyArray_DATA(a), np.PyArray_DATA(b), len(a), len(b))
```

This version gave me another 10% speedup and now the timing was .144 seconds.

Now since the fun() function was entirely a c function I thought to use it without gil.
I changed the line:

```cdef fun(int *a, int *b, int sa, int sb):
to
cdef fun(int *a, int *b, int sa, int sb) nogil:
```

This gave me another speed up of 2x.