Home Parallel loop the loop in numpy
Post
Cancel

Parallel loop the loop in numpy

Slow data analysis code can be a real drag. There are numerous ways to accelerate bottleneck code in Numpy such as compiling expressions with NumExpr or Pythran. However, if you are calling a third-party module you may not be able to use these approaches. In this case your best option might be to do a parallel loop through the array, calling a function on each iteration.

Trying to get your head around libraries for parallel processing in python can be bewildering - there are so many libraries to get to grips with. In this repository I’ve set out an example notebook called numpyParallelSimple.ipynb with some of the best libraries for this task.

If you don’t want to run the notebook you can test the core idea in a python kernel that runs in your browser using WebAssembly!

1
2
3
4
5
6
7
8
9
import numpy as np
from joblib import Parallel,delayed

xyLength = 3
timesteps 5
arr = np.random.standard_normal(size=(xyLength,xyLength,timesteps))

def timestepFunc(array2D,timeIndex):
    return np.exp(array2D),timeIndex

The joblib function is then:

1
2
3
4
5
6
7
8
9
10
11
def joblibProcessing(arr:np.ndarray,backend = "threading",nJobs:int=-1):
#   Iterate through the third-dimension of the array in parallel
    resultList = Parallel(backend=backend,n_jobs=nJobs)(delayed(timestepFunc)(arr[:,:,timestep],timestep) for timestep in range(arr.shape[2]))
#   Sort the results back into their original order
    resultList = sorted(resultList,key=lambda x:x[1])
    resultList = [el[0] for el in resultList]
#   Convert the list of results back into a three-dimensional numpy array
    return np.stack(resultList,axis=2)

# Run the function with threading and check that the outputs are the same as for the serial processing
outputJoblib = joblibProcessing(arr=arr)

You can then test the performance for difference values of xyLength and timesteps

1
2
%timeit -n 1 -r 1 joblibProcessing(arr=arr,backend="threading")
%timeit -n 1 -r 1 joblibProcessing(arr=arr,backend="multiprocessing")
This post is licensed under CC BY 4.0 by the author.