Explore Linux as a scientific computing platform

Crunch numbers with Sage and Python

Linux® is a great platform for scientific computing and is heavily used by the academic community for numerous tasks. While many open source projects address specific applications, the Sage mathematical project delivers a more generic problem-solving capability. Python is the primary language for many of the highest profile scientific applications, which this article discusses.

Paul Ferrill, CTO, ATAC

Paul Ferrill has been writing in the computer trade press for more than 20 years. He got his start writing networking reviews for PC Magazine on products like LANtastic and early versions of Novell Netware. Paul holds both BSEE and MSEE degrees and has written software for more computer platforms and architectures than he can remember.



07 August 2012

Also available in Chinese Russian Japanese Vietnamese Portuguese Spanish

Linux in the scientific community

It should come as no surprise that Linux has a substantial presence in the scientific community. Solutions abound from high-performance computing clusters to visualization software. There's even an entire Linux distribution based on Red Hat Enterprise Linux targeted for scientific computation, appropriately named Scientific Linux.

Sage and Enthought Python Distribution

This article looks at two different ways to use a Linux workstation for scientific computation. The first is the Sage open source mathematics system and the second is the Enthought Python Distribution (EPD). Both use a number of core open source Python tools under the covers to perform the heavy lifting. If you want to try them, install the individual pieces using the Ubuntu software manager.

Sage is the more comprehensive of the two in that it is more of a shell over a number of different underlying engines. From the Sage command line you can even interact with commercial products such as MATLAB or Mathematica. At the Sage prompt you essentially interact with IPython with access to all its features. You also have to think in terms of objects and methods when you start to explore the capabilities of Sage. Sage includes a number of different computer algebra systems and allows the user to interact with them from the command line.

It's important to note that Sage is based on Python but does pre-parse each statement before passing it to the Python interpreter. This can cause some confusion when looking at simple interactive Sage commands. The rationale for this behavior is a desire to make typing commands into Sage as intuitive from a mathematical sense as possible. One good example is the symbol for exponentiation. In pure Python you must type 2**4 to raise two to the fourth power. In Sage you use the up arrow symbol (^), as in 2^4. Sage also handles some operations such as integer division differently than basic Python.

NumPy and SciPy

The two most well-known pieces of the underlying puzzle for both Sage and EDP are NumPy and SciPy. Both projects have been around since the mid to late 1990s and were originally started by Travis Oliphant, now an employee at Enthought. NumPy provides the core numerical methods to manipulate arrays and matrices. SciPy depends on NumPy for its basic array data structure and contains a wide range of modules for everything from linear algebra to signal processing. Enthought is a sponsor of both projects and continues to contribute heavily to new releases.


Installing Sage

Several options are available for installing Sage. To give Sage a look without physically installing it, you can use a live CD version. You can also run the live CD in a virtual machine such as Oracle VM VirtualBox or VMware Player. This article looks at installing Sage on both a physical and virtual machine running the latest release of Ubuntu (12.04).

Physical machine

Installing Sage from source is the most reliable method to get up and running on Unbuntu 12.04 because the operating system has only recently been released. Download the source tarball and unpack it into a convenient directory. You also need to install a few prerequisites before building, which you can accomplish with the following commands:

sudo apt-get install build-essential gfortran
sudo apt-get install texlive xpdf evince

sudo apt-get install tk8.5-dev

After you have the prerequisites installed, you should be ready to build Sage, which you can launch with the following command:

make

At this point, go watch a movie or take a nap. The full build process can take several hours on a typical desktop machine. You should see a screen like the one in Figure 1 if everything builds correctly.

Figure 1. Sage build screen
Screen capture of the completed Sage build process

Virtual machine

Installing on a virtual machine is a good option if you're less adventurous and just want to get started using Sage. An Open Virtualization Archive (OVA) package is available for download on the Sage site along with binary packages for Fedora 16 and Ubuntu 10.04.3. OVA is a single package file (essentially a tar file) containing all the files needed to launch the virtual machine in the Open Virtualization Format (OVF).


Using Sage

When you have Sage installed, you're ready to begin your journey of mathematical discovery and exploration. Sage has both a command line and a web-based interface.

Command-line interface

To launch the command-line version from the directory where you built Sage, simply type the following command:

./sage

Figure 2 shows an example of the Sage interpreter and some of the mathematical interpretations mentioned earlier.

Figure 2. Sage command line interface and mathematical interpretations
Screen capture of the Sage command-line interface with mathematical interpretations

The difference in the integer division has to do with how Python handles the type of the operation (integer, float, and so on). In Python, the expression 2/3 assigns an integer result that rounds down to zero. Sage treats the division operator as a constructor for rational numbers, meaning you can perform operations on fractions in much the same way as you do on paper. For example, if you type 2/3 + 2/3 at the Sage prompt, you get the result 4/3. If you type the same thing in Python, the result is 0. You can implicitly force the type of literals using float() or int() as in Figure 2.

Web-based interface

To use the Sage Notebook, you can either type notebook() after starting Sage or use this command:

./sage --notebook

This launches the Sage server and the default web browser opened to the Sage Notebook home page. You'll find many features in the notebook to facilitate manipulating your work, such as saving and loading worksheets to a file, plus the normal copy, delete, and rename functions. Sage facilitates collaborative work with the Share and Publish functions in Figure 3.

Figure 3. Sage Notebook features
Screen capture of the Sage Notebook showing a simple test, equations computed and graphed

Computer algebra packages

The base Sage distribution includes a number of computer algebra packages including GAP, Maxima, PARI, and Singular. Each has its own following in the mathematical community and provides slightly different functionality. The key here is that Sage includes each of these packages in the base distribution, meaning you don't have to download and install them separately.

Figure 4 shows an example of using Maxima to perform several matrix operations. This was done using the Sage Notebook, and it shows the user input in black along with the output in blue. The first two lines create a matrix with entries equal to i/j where i and j range from 1 to 4. Notice that these are rational numbers (fractions).

Figure 4. Matrix operations with Maxima
Screen capture with examples of how Maxima interprets matrix operations, with equations followed by the results

Publishing scientific papers

Publishing scientific papers is a requirement for many in the educational community. Sage requires that every object has a LaTeX representation. This is directly tied to the need to produce publication-quality graphics and text using the TeX language. Be aware that you need a full installation of TeX to take advantage of all the features Sage has to offer.


Installing EPD

Scientific computing with Python is what Enthought as a company is all about. Enthought's commercial product offerings and support are what pay the bills, but they still contribute to the open source community in a great way. Their contributions come through directly working on the NumPy and SciPy code base as well as presenting at the annual PyCon conference and hosting the SciPy conference.

You can use the Ubuntu software manager to install the various pieces needed to get running with IPython, NumPy, and SciPy, or you can simply download and install the free version of the EPD, known as EPD Free. Enthought provides both 32- and 64-bit versions of EPD Free that have been tested on Red Hat, Ubuntu, and openSUSE. They do warn that some 64-bit Linux systems don't include 32-bit libraries, hence the need for a 64-bit version of the package.

To install EPD Free, first download the installer script and then run it with the following command:

bash epd_free-7-2-2-rh5-x86.sh

To make it easier to launch EPD in the future, you need to add a few lines to your shell startup file (either .cshrc or .bash_profile). Ubuntu uses the bash shell as the default, so I show the code based on that. For this article I used the following lines:

export PATH=/home/paul/Downloads/epd_free-7.2-2-rh5-x86_64/bin:$PATH

Using EPD

IPython is another common denominator between EPD and Sage as it is the primary user interface tool. EPD installs both a command-line interface and a web-based notebook, much like Sage.

Command-line interface

The Qt-based console offers some enhanced functionality, including full syntax highlighting using the Pygments library. It also provides the ability to do in-line plots. To get this capability, you can launch IPython with the following command:

ipython --pylab

Figure 5 shows the Qt console with an embedded plot.

Figure 5. Creating a new PyDev project
Screen capture of the PyDev interface for creating a new project with a graph drawn at the bottom

Web-based interface

The latest release of IPython (0.12) includes a web-based notebook capability that is similar to Sage. To get the notebook functionality, you need to install several dependencies, including ZeroMQ and the Tornado web server. To launch the notebook from the command line, type the following in a terminal window:

ipython notebook

This starts the Tornado web engine and launches the default web browser open to the dashboard page. If you then select the default notebook, you should see a new window like the one in Figure 6.

Figure 6. iPython Notebook
Screen capture of the project from Figure 5 implemented in the iPython Notebook

The one thing that is different about the web notebook is that you need to use Control-Enter when you want to execute code. This makes it possible to enter multiple lines of code, as in Figure 6, and have everything in the input box executed sequentially. Therefore, you can easily break functions into manageable blocks.

You can annotate your notebooks using the Markdown syntax. If you're not familiar with Markdown, it's essentially a way to create formatted Hypertext Markup Language (HTML) using plain text. For example, a single pound sign (#) followed by a space is used to indicate an H1 in HTML, while two pound signs are used for H2 and so on. This allows you to add annotation or documentation to supplement your code and graphics.

Pandas

The Python Data Analysis Library, or pandas, is another tool with ties to SciPy and NumPy created specifically to address the task of data analysis. Pandas incorporates a large number of libraries along with some standard data models to provide the tools needed to manipulate large datasets efficiently. Comma-separated values (CSV) files represent one of the most common ways of distributing data amongst interested parties. Pandas provides an optimized library function to read and write multiple file formats, including CSV and the efficient HDF5 format.

The read_csv module knows how to parse typical CSV files with header information in the first row. It also knows how to handle files with dates or times using a built-in parser. Pandas includes a datetools module with a long list of manipulation routines for performing various kinds of date math. Listing 1 shows a snippet of code from the pandas documentation showing how to find a date four months and five days from another:

Listing 1. Pandas date projection sample
d = datetime(2012, 4, 20)
d + pandas.DateOffset(months=4, days=5)
datetime.datetime(2012, 8, 25, 0, 0)

Pandas really shines when it comes to slicing and dicing large datasets. After you have your data imported into a native data structure, you have a wide range of tools at your disposal for performing literally any type of manipulation. You can slice the data using the standard Python slicing syntax, perform operations on all or part of the data, or plot it using matplotlib. If you need to do any data manipulation tasks, you definitely want to get up to speed with pandas.


Wrapping up

Linux is without question highly qualified to support virtually any scientific computational task you can throw at it. It has heavy support in the academic community and is rapidly gaining new industry users looking for ways to reduce their software budgets. These tools provide a more than adequate substitute for their commercial counterparts and, best of all, they are all free. For customers looking for fully supported software, there is Enthought and their EPD commercial offering. They provide full customer support and training to all paying customers.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Linux on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Linux, Open source
ArticleID=829220
ArticleTitle=Explore Linux as a scientific computing platform
publish-date=08072012