My recent post on How To Make Python Run As Fast As Julia triggered some of interest, probably because it challenged a bit the current hype about Julia speed. I am afraid I'll add a bit more in that respect today. I wasn't planning to, but I stumbled across this question on reddit the other day. Someone asked why his Julia code wasn't running as fast as an equivalent Python code:

*I have a problem which requires a loop of the kind below. It is not exactly like that but conceptually pretty similar. So I tried to benchmark at a simple example. My naive approach to implement this in julia is as follows:*

A = rand(2000,2000)

x = rand(2000) .< .1

y = rand(2000) .< .1

function f(A, x, y)

for k = 1:2000

for l = 1:2000

if x[k] && y[l]

A[l,k] += .1

end

end

end

end

f(A, x, y)

@time f(A, x, y)

*In Python I would do*

import scipy as sp

from numba import jit

A = sp.rand(2000, 2000)

x = sp.rand(2000) < .1

y = sp.rand(2000) < .1

@jit

def f(A, x ,y):

for k in range(2000):

for l in range(2000):

if x[k] and y[l]:

A[k,l] += .1

%timeit f(A,x,y)

*The timings are 8ms for Julia and 2ms for Python. I swapped the indices k and l in the Julia implementation as compared to the Python implementation since Julia is column major order and Python row major order.*

*Thanks for your suggestions!*

I like that question because it is a nobrainer for the author to use Python scientific stack and Numba. That's what he/she benchmarks against Julia. That's also what I used in How To Make Python Run As Fast As Julia.

On my machine, with Python 3.4.3 and Julia 0.4.2, the timings are 3.1 miliseconds for Python, and 4.7 miliseconds for Julia.

The best answer to the above question suggested to only loop over the indices where x and y were non null:

function h(A, x, y)

iy = find(y)

for k in find(x)

for l in iy

A[l,k] += .1

end

end

end

This takes 244 microseconds. 20 times faster than the naive Julia code, and 13 times as fast as Python!

Wait a minute, can't we do the same with Python as well? As a matter of fact, we can use the Numpy where() function. Here is the code:

import numpy as np

@jit

def h(A,x,y):

ix = np.where(x)[0]

iy = np.where(y)[0]

for k in range(len(ix)):

Ak = A[ix[k]]

for l in range(len(iy)):

Ak[iy[l]] += .1

It runs in 236 microseconds. Same as the Julia code! I say the same because it is close enough and because we are not testing on a dedicated machine. Moreover, we're running on Windows hence we're given the wall clock time, not the cpu time.

I have few takeaways:

- When people start optimizing their code, then compiled Python and Julia come close.
- Julia syntax is nicer. I am afraid that with Python we must get used to the fact that Numpy arrays aren't the default Python arrays.
- Julia syntax is nicer but there is a trick. When looping over a multi dimensional array, it is faster to start looping with the latest index, because Julia uses column major storage, like Fortran, where Python uses row major storage, like C. In Python, it is faster to loop first on the first index.