Parallel loop the loop in numpy
Post
Cancel

# Parallel loop the loop in numpy

Slow data analysis code can be a real drag. There are numerous ways to accelerate bottleneck code in Numpy such as compiling expressions with NumExpr or Pythran. However, if you are calling a third-party module you may not be able to use these approaches. In this case your best option might be to do a parallel loop through the array, calling a function on each iteration.

Trying to get your head around libraries for parallel processing in python can be bewildering - there are so many libraries to get to grips with. In this repository I’ve set out an example notebook called `numpyParallelSimple.ipynb` with some of the best libraries for this task.

If you don’t want to run the notebook you can test the core idea in a python kernel that runs in your browser using WebAssembly!

```1 2 3 4 5 6 7 8 9 import numpy as np from joblib import Parallel,delayed xyLength = 3 timesteps 5 arr = np.random.standard_normal(size=(xyLength,xyLength,timesteps)) def timestepFunc(array2D,timeIndex): return np.exp(array2D),timeIndex ```

The joblib function is then:

```1 2 3 4 5 6 7 8 9 10 11 def joblibProcessing(arr:np.ndarray,backend = "threading",nJobs:int=-1): # Iterate through the third-dimension of the array in parallel resultList = Parallel(backend=backend,n_jobs=nJobs)(delayed(timestepFunc)(arr[:,:,timestep],timestep) for timestep in range(arr.shape[2])) # Sort the results back into their original order resultList = sorted(resultList,key=lambda x:x[1]) resultList = [el[0] for el in resultList] # Convert the list of results back into a three-dimensional numpy array return np.stack(resultList,axis=2) # Run the function with threading and check that the outputs are the same as for the serial processing outputJoblib = joblibProcessing(arr=arr) ```

You can then test the performance for difference values of `xyLength` and `timesteps`

```1 2 %timeit -n 1 -r 1 joblibProcessing(arr=arr,backend="threading") %timeit -n 1 -r 1 joblibProcessing(arr=arr,backend="multiprocessing") ```