TODAS AS PUBLICAÇÕES
 Classificar por:
 Data ▼
 Título
 Curtir
 Comentários
 Visualizações
Why Python
Why are you recommending Python? That's the question a colleague of mine asked when I was pitching Python for data science work. It is a fair question, and I tried to answer with facts and not opinions. Indeed, answering a question about why a language is better than others can quickly turn into a religious war. So, let me try to avoid that with some disclaimers. First of all, I don't think one size fits all: Python is not going to become THE programming language. Depending on the task, other... [More]

Prescriptive Analytics Modeling for Python
There are two kinds of data scientists: those who limit themselves to data analysis, and those who care about turning insights into actions. The latter look for actionable insights . They can use various techniques to turn insights into actions, but any set of tools should include mathematical optimization. Why mathematical optimization? Because it is the only technique that lets you solve a problem by defining an objective, and constraints on how you can reach that objective. Other techniques require you to... [More]

A Speed Comparison Of C, Julia, Python, Numba, and Cython on LU Factorization
How fast can compiled Python be compared to, say C? You'd be surprised by the answer. The study below contradicts common wisdom that you cannot get close to C for matrix oriented computation. A good example of a study supporting the common wisdom is Sebastian F. Walter's Speed comparision Numba vs C vs pure Python at the example of the LU factorization . He has shown that Numba, a recent compiler that can be used with Python, is between 2x and... [More]
Marcações: julia numpy c gcc cython python numba scipy 
Hiring Data Scientists
IKEA interview credit: http://www.canarypete.be/ Hiring data scientists can be one of the hardest job these days. One might think that it is hard because of lack of available candidates. I'd say it is just the opposite. Data science is so hot that many people rebrand themselves as data scientists. One of the issue when hiring is to discover among the many candidates which ones are true data scientists, and which ones pretend to be data scientists. Here is how I proceed when I interview some applicants. It is by... [More]

Installing PyCUDA On Anaconda For Windows
PyCUDA is a great library if you want to use gpu computing with NVIDIA chips. If you want a more portable approach or if you have ATI chips instead of NVIDIA, then you might consider PyOpenCl instead of PyCUDA. I provided instructions on how to install PyOpenCl on Anaconda for Windows in a previous entry . Installing PyCUDA on Anaconda for Windows can be tricky. Here is what you can do, it worked fine for me. I am using the latest Anaconda distribution with Python 3.5 in it.... [More]
Marcações: pycuda big_data python machine_learning anaconda analytics 
Top Posts For 2015
I wish all my readers, their families, and their friends, all the best for 2016. May your dreams come true. I also want to warmly thank you, my readers, for your continued interest. This led me write more entries than ever, with 54 entries in 2015. I still blogged on optimization and how it fits within the analytics and data science landscape, but I added two more streams in 2015: Emerging technologies for cloud computing, like Docker. Python as a language of choice for data science and technical computing. These streams... [More]
Marcações: anaytics docker data_science optimization python cloud big_data machine_learning 
How To Quickly Compute The Mandelbrot Set In Python
Introduction My Christmas Gift was about creating nice images of the Mandelbrot set. A comment on reddit make me write this sequel. The comment is suggesting that I should use a vectorized version of the code rather than the sequential one I am using. I take this excellent suggestion as an excuse to review several ways of computing the Mandelbrot set in Python using vectorized code and gpu computing. I will specifically have a look at Numpy, NumExpr, Numba, Cython, TensorFlow, PyOpenCl, and... [More]
Marcações: python pycuda pyopencl math fractals gpu dataviz opencl mandelbrot 
Installing PyOpenCl On Anaconda For Windows
PyOpenCl is a great library if you want to use gpu computing and you do not necessarily want to rely on NVIDIA chips only. If you are fine targetting only NVIDIA chips, then you may consider PyCUDA instead of PyOpenCl. I provide instructions to install PyCUDA here . Installing PyOpenCl on Windows can be tricky however. Here is what you can do, it worked fine for me. Install the latest Anaconda distribution with Python 3.5 in it. I used the 64 bits version. Download starts... [More]

My Christmas Gift: Mandelbrot Set Computation In Python
My mother likes fractals for their strange beauty. I decided to give her a simple way to generate beautiful fractal images of the Mandelbrot set . As you may guess, I'll be using Python for this. There are available Python code for this on the web, but they are either slow, or they don't produce nice images. Hence my own attempt at it. The code used here is available in a notebook on github or on nbviewer . I explore various ways to speed that code in How To Quickly Compute The Mandelbrot Set In Python . In... [More]

How To Make Julia Run As Fast As Python
My recent post on How To Make Python Run As Fast As Julia triggered some of interest, probably because it challenged a bit the current hype about Julia speed. I am afraid I'll add a bit more in that respect today. I wasn't planning to, but I stumbled across this question on reddit the other day. Someone asked why his Julia code wasn't running as fast as an equivalent Python code: I have a problem which requires a loop of the kind below. It is not exactly like that... [More]

Operational Data Science: Part I
I just came across this blog on Reasons why analytics projects are falling short  and what we can do about it The author identifies three issues which plague many analytics projects: Analytics Projects start from the wrong place. They start with data source when they should start from a business question. I discussed that in Start With A Question . Analytics Projects end too soon. Analytics Projects take too long .. and they fall short The proposed... [More]

What Python Really Is
My recent post about How To Make Python Run As Fast As Julia is quite popular, with over 45k views as of today. It may be popular because it triggered some controversy on Reddit . Some of the controversy came from the fact that I did not use 'pure' Python in my own way of benchmarking Python, which misses the point of the Julia micro benchmarks according to critics. I tried to answer some of it but it degenerated into arguing about what Python really is. It was a deadlock until I found the... [More]

Solving The GCHQ Christmas Puzzle As A MIP With Python
Introduction You may have seen that Britain security and intelligence organisation has proposed a gridshading puzzle also known as nonogram . Let me cite their puzzle definition. In this type of gridshading puzzle, each square is either black or white. Some of the black squares have already been filled in for you. Each row or column is labelled with a string of numbers. The numbers indicate the length of all consecutive runs of black squares, and are displayed in the order that the runs appear in that line. For example, a label "2... [More]

Elementary Matrix Operations In Python
Octave and Matlab are high level languages that support vectors and matrices with a very simple syntax. Python support for matrices is not as nice, but few little tricks should do the job. Let me first briefly introduce how Octave and Matlab support elementary matrices operations, then we'll look at how to achieve the same with Python. The following is based on Octave tutoria l. The code runs fine with Matlab. Octave and Matlab Vector and matrices can be created from a list of elements. Commas are used to... [More]
Marcações: python matlab octave numpy 
Python Is Not C: Take Two
When I wrote Python Is Not C 6 months ago I did not imagine that it would be my most popular post ever, with more than 67k views. The conclusion of that post reads: The lesson is clear: do not write Python code as you would do in C. Use numpy array operations rather than iterate on arrays. For me it meant a mental shift. Given Python ecosystem is rapidly evolving, I decided to revisit this conclusion using the performance improvement tools that I discuss in my previous post . Let me briefly introduce... [More]
Marcações: numba python numpy nearest_neighbors scipy sklearn 