OSX is much better than Windows, isn't it? That's a common wisdom, and it seemed to be confirmed once more when I installed XGBoost on both OS. Before I deep dive, let me briefly describe XGBoost. It is a machine learning algorithm that yields great results on recent Kaggle competitions. I decided to install it on my laptops, an old PC running Windows 7, and a brand new Mac Pro running OSX. I thought the OSX installation was a no-brainer compared to the Windows one, as explained in Installing XGBoost For Anaconda on Windows.
Reality is a bit different, and the OSX installation isn't as smooth as it seems. To be accurate, the default OSX installation of XGBoost runs in single thread mode, as explained in these instructions.
Why is this a problem? Because XGBoost is a machine learning algorithm, and running it may be time consuming. I decided to install it on my computers to give it a try. I am currently working on a dataset with about 100k rows (samples) only, and tuning XGBoost on my old Windows laptop (a Lenovo W520) takes about 2 hours. What surprised me is that it takes 7 hours on my brand new Macbook Pro! It is a bit weird, given they both have Intel i7 quad core cpus, and given that the Mac clock speed is higher. Add to this the premium price of the Mac, and you get me really surprised.
I further observed that other cpu intensive tasks are faster on the Mac Book Pro. Something is definitely wrong, but the culprit is easy to spot: it is all about XGBoost being single threaded on OSX.
Before I explain how to enable multi threading for XGBoost, let me point you to this excellent Complete Guide to Parameter Tuning in XGBoost (with codes in Python). I found it useful as I started using XGBoost. And I assume that you could be interested if you read this far
Back to XGBoost, the installation instructions do explain how to get the mutli-threaded version of XGBoost. unfortunately they did not work for me. The following is what worked for me. i am sharing in case it helps others. I had to perform the following step:
- Get Homebrew if it is not installed yet. Indeed, this is a very useful open source installer for OSX. Instaling it is straightforward, open a terminal, then paste and execute the instruction available on Homebrew home page. I reproduce it here for convenience:
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
- Get gcc with open mp. Just paste and execute the following command in your terminal, once Homebrew installation is completed.
This automatically downloads and builds gcc. It can take a while, it took about 30 minutes for me. Be patient.
brew install gcc --without-multilib
- Get XGBoost. Go to where you want in your filesystem, say <directoy>. Then type the git clone command and execute it:
This downloads the XGBoost code into a new directory named xgboost.
git clone --recursive https://github.com/dmlc/xgboost
- Next step is to build XGBoost. By default, the build process will use the default compilers, cc and c++, which do not support the open mp option used for XGBoost multi-threading. We need to tell the system to use the compiler we just installed. That's the step that was missing from the installation instructions on XGBoost site.
There are various ways to do it, here is the one I used.
- Go to where we downloaded XGBoost
- Then open make/config.mk and uncomment these two lines
export CC = gcc
export CXX = g++
- Depending on you g++ installaiton you may need to change the above two lines into:
export CC = gcc-6
export CXX = g++-6
- We then build with the following commands.
cp make/config.mk .
- Once the build is finished, we can use XGBoost with its command line. I am using Python, hence I performed this final step. You may need to enter the admin password to execute it.
cd python-package; sudo python setup.py install
This concludes the installation.
I tested it with My Anaconda distribution with Python 3.5. It worked fine, and I could run XGBoost. The speedup thanks to multi threading is noticeable, and my Mac Book Pro is now faster than my old PC.
Updated on July 16, 2016. Makefile changed in xgboost, making it easier to use gcc.
Updated on Jan 4, 2017. Upated the gcc and g++ declarations in makefile. The original way didn't worked on some g++ installations. Thanks to Brandon Mitchell who spot the issue.