Nicholas P. Rougier, From Python To Numpy1, 2017

Introduction

def randomwalkfaster(n=1000): from itertools import accumulate

steps = random.choices([-1,+1], k=n) return [0]+list(accumulate(steps))

walk = randomwalkfaster(1000)

accumulate([1,2,3,4,5]) --> 1 3 6 10 152

Without using loops and instead vectorizing the problem we get a 85% increase in performance.

>>> from tools import timeit >>> timeit(“randomwalkfaster(n=10000)”, globals()) 10 loops, best of 3: 2.21 msec per loop

Translating in numpy we get:

def randomwalkfastest(n=1000):

steps = np.random.choice([-1,+1], n) return np.cumsum(steps)

walk = randomwalkfastest(1000)

>>> from tools import timeit >>> timeit(“randomwalkfastest(n=10000)”, globals()) 1000 loops, best of 3: 14 usec per loop

Readability vs Speed

The tradeoff for the massive speedups using numpy is often the readabily of the code: comment your code!

  • future-self will thank you

Anatomy of an array

Code vectorization

Problem vectorization

Custom vectorization

Beyond Numpy

Footnotes

  1. https://www.labri.fr/perso/nrougier/from-python-to-numpy/

  2. https://docs.python.org/3.6/library/itertools.html?highlight=accumulate#itertools.accumulate