Slow data analysis code can be a real drag. There are numerous ways to accelerate bottleneck code in Numpy such as compiling expressions with NumExpr or Pythran. However, if you are calling a third-party module you may not be able to use these approaches. In this case your best option might be to do a parallel loop through the array, calling a function on each iteration.

Trying to get your head around libraries for parallel processing in python can be bewildering - there are so many libraries to get to grips with. In this repository I’ve set out an example notebook called `numpyParallelSimple.ipynb`

with some of the best libraries for this task.

If you don’t want to run the notebook you can test the core idea in a python kernel that runs in your browser using WebAssembly!

1
2
3
4
5
6
7
8
9

import numpy as np
from joblib import Parallel,delayed
xyLength = 3
timesteps 5
arr = np.random.standard_normal(size=(xyLength,xyLength,timesteps))
def timestepFunc(array2D,timeIndex):
return np.exp(array2D),timeIndex

The joblib function is then:

1
2
3
4
5
6
7
8
9
10
11

def joblibProcessing(arr:np.ndarray,backend = "threading",nJobs:int=-1):
# Iterate through the third-dimension of the array in parallel
resultList = Parallel(backend=backend,n_jobs=nJobs)(delayed(timestepFunc)(arr[:,:,timestep],timestep) for timestep in range(arr.shape[2]))
# Sort the results back into their original order
resultList = sorted(resultList,key=lambda x:x[1])
resultList = [el[0] for el in resultList]
# Convert the list of results back into a three-dimensional numpy array
return np.stack(resultList,axis=2)
# Run the function with threading and check that the outputs are the same as for the serial processing
outputJoblib = joblibProcessing(arr=arr)

You can then test the performance for difference values of `xyLength`

and `timesteps`

1
2

%timeit -n 1 -r 1 joblibProcessing(arr=arr,backend="threading")
%timeit -n 1 -r 1 joblibProcessing(arr=arr,backend="multiprocessing")