Linux in the scientific community
It should come as no surprise that Linux has a substantial presence in the scientific community. Solutions abound from highperformance computing clusters to visualization software. There's even an entire Linux distribution based on Red Hat Enterprise Linux targeted for scientific computation, appropriately named Scientific Linux.
Sage and Enthought Python Distribution
This article looks at two different ways to use a Linux workstation for scientific computation. The first is the Sage open source mathematics system and the second is the Enthought Python Distribution (EPD). Both use a number of core open source Python tools under the covers to perform the heavy lifting. If you want to try them, install the individual pieces using the Ubuntu software manager.
Sage is the more comprehensive of the two in that it is more of a shell over a number of different underlying engines. From the Sage command line you can even interact with commercial products such as MATLAB or Mathematica. At the Sage prompt you essentially interact with IPython with access to all its features. You also have to think in terms of objects and methods when you start to explore the capabilities of Sage. Sage includes a number of different computer algebra systems and allows the user to interact with them from the command line.
It's important to note that Sage is based on Python but does preparse each
statement before passing it to the Python interpreter. This can cause some
confusion when looking at simple interactive Sage commands. The rationale
for this behavior is a desire to make typing commands into Sage as
intuitive from a mathematical sense as possible. One good example is the
symbol for exponentiation. In pure Python you must type
2**4
to raise two to the fourth power. In Sage
you use the up arrow symbol (^), as in 2^4
.
Sage also handles some operations such as integer division differently
than basic Python.
NumPy and SciPy
The two most wellknown pieces of the underlying puzzle for both Sage and EDP are NumPy and SciPy. Both projects have been around since the mid to late 1990s and were originally started by Travis Oliphant, now an employee at Enthought. NumPy provides the core numerical methods to manipulate arrays and matrices. SciPy depends on NumPy for its basic array data structure and contains a wide range of modules for everything from linear algebra to signal processing. Enthought is a sponsor of both projects and continues to contribute heavily to new releases.
Installing Sage
Several options are available for installing Sage. To give Sage a look without physically installing it, you can use a live CD version. You can also run the live CD in a virtual machine such as Oracle VM VirtualBox or VMware Player. This article looks at installing Sage on both a physical and virtual machine running the latest release of Ubuntu (12.04).
Physical machine
Installing Sage from source is the most reliable method to get up and running on Unbuntu 12.04 because the operating system has only recently been released. Download the source tarball and unpack it into a convenient directory. You also need to install a few prerequisites before building, which you can accomplish with the following commands:
sudo aptget install buildessential gfortran sudo aptget install texlive xpdf evince sudo aptget install tk8.5dev
After you have the prerequisites installed, you should be ready to build Sage, which you can launch with the following command:
make
At this point, go watch a movie or take a nap. The full build process can take several hours on a typical desktop machine. You should see a screen like the one in Figure 1 if everything builds correctly.
Figure 1. Sage build screen
Virtual machine
Installing on a virtual machine is a good option if you're less adventurous and just want to get started using Sage. An Open Virtualization Archive (OVA) package is available for download on the Sage site along with binary packages for Fedora 16 and Ubuntu 10.04.3. OVA is a single package file (essentially a tar file) containing all the files needed to launch the virtual machine in the Open Virtualization Format (OVF).
Using Sage
When you have Sage installed, you're ready to begin your journey of mathematical discovery and exploration. Sage has both a command line and a webbased interface.
Commandline interface
To launch the commandline version from the directory where you built Sage, simply type the following command:
./sage
Figure 2 shows an example of the Sage interpreter and some of the mathematical interpretations mentioned earlier.
Figure 2. Sage command line interface and mathematical interpretations
The difference in the integer division has to do with how Python handles
the type of the operation (integer, float, and so on). In Python, the
expression 2/3 assigns an integer result that rounds down to zero. Sage
treats the division operator as a constructor for rational numbers,
meaning you can perform operations on fractions in much the same way as
you do on paper. For example, if you type
2/3 + 2/3
at the Sage prompt, you get the
result 4/3
. If you type the same thing in
Python, the result is 0
. You can implicitly
force the type of literals using float()
or
int()
as in Figure 2.
Webbased interface
To use the Sage Notebook, you can either type
notebook()
after starting Sage or use this
command:
./sage notebook
This launches the Sage server and the default web browser opened to the Sage Notebook home page. You'll find many features in the notebook to facilitate manipulating your work, such as saving and loading worksheets to a file, plus the normal copy, delete, and rename functions. Sage facilitates collaborative work with the Share and Publish functions in Figure 3.
Figure 3. Sage Notebook features
Computer algebra packages
The base Sage distribution includes a number of computer algebra packages including GAP, Maxima, PARI, and Singular. Each has its own following in the mathematical community and provides slightly different functionality. The key here is that Sage includes each of these packages in the base distribution, meaning you don't have to download and install them separately.
Figure 4 shows an example of using Maxima to perform several matrix operations. This was done using the Sage Notebook, and it shows the user input in black along with the output in blue. The first two lines create a matrix with entries equal to i/j where i and j range from 1 to 4. Notice that these are rational numbers (fractions).
Figure 4. Matrix operations with Maxima
Publishing scientific papers
Publishing scientific papers is a requirement for many in the educational community. Sage requires that every object has a LaTeX representation. This is directly tied to the need to produce publicationquality graphics and text using the TeX language. Be aware that you need a full installation of TeX to take advantage of all the features Sage has to offer.
Installing EPD
Scientific computing with Python is what Enthought as a company is all about. Enthought's commercial product offerings and support are what pay the bills, but they still contribute to the open source community in a great way. Their contributions come through directly working on the NumPy and SciPy code base as well as presenting at the annual PyCon conference and hosting the SciPy conference.
You can use the Ubuntu software manager to install the various pieces needed to get running with IPython, NumPy, and SciPy, or you can simply download and install the free version of the EPD, known as EPD Free. Enthought provides both 32 and 64bit versions of EPD Free that have been tested on Red Hat, Ubuntu, and openSUSE. They do warn that some 64bit Linux systems don't include 32bit libraries, hence the need for a 64bit version of the package.
To install EPD Free, first download the installer script and then run it with the following command:
bash epd_free722rh5x86.sh
To make it easier to launch EPD in the future, you need to add a few lines to your shell startup file (either .cshrc or .bash_profile). Ubuntu uses the bash shell as the default, so I show the code based on that. For this article I used the following lines:
export PATH=/home/paul/Downloads/epd_free7.22rh5x86_64/bin:$PATH
Using EPD
IPython is another common denominator between EPD and Sage as it is the primary user interface tool. EPD installs both a commandline interface and a webbased notebook, much like Sage.
Commandline interface
The Qtbased console offers some enhanced functionality, including full syntax highlighting using the Pygments library. It also provides the ability to do inline plots. To get this capability, you can launch IPython with the following command:
ipython pylab
Figure 5 shows the Qt console with an embedded plot.
Figure 5. Creating a new PyDev project
Webbased interface
The latest release of IPython (0.12) includes a webbased notebook capability that is similar to Sage. To get the notebook functionality, you need to install several dependencies, including ZeroMQ and the Tornado web server. To launch the notebook from the command line, type the following in a terminal window:
ipython notebook
This starts the Tornado web engine and launches the default web browser open to the dashboard page. If you then select the default notebook, you should see a new window like the one in Figure 6.
Figure 6. iPython Notebook
The one thing that is different about the web notebook is that you need to use ControlEnter when you want to execute code. This makes it possible to enter multiple lines of code, as in Figure 6, and have everything in the input box executed sequentially. Therefore, you can easily break functions into manageable blocks.
You can annotate your notebooks using the Markdown syntax. If you're not familiar with Markdown, it's essentially a way to create formatted Hypertext Markup Language (HTML) using plain text. For example, a single pound sign (#) followed by a space is used to indicate an H1 in HTML, while two pound signs are used for H2 and so on. This allows you to add annotation or documentation to supplement your code and graphics.
Pandas
The Python Data Analysis Library, or pandas, is another tool with ties to SciPy and NumPy created specifically to address the task of data analysis. Pandas incorporates a large number of libraries along with some standard data models to provide the tools needed to manipulate large datasets efficiently. Commaseparated values (CSV) files represent one of the most common ways of distributing data amongst interested parties. Pandas provides an optimized library function to read and write multiple file formats, including CSV and the efficient HDF5 format.
The read_csv module knows how to parse typical CSV files with header information in the first row. It also knows how to handle files with dates or times using a builtin parser. Pandas includes a datetools module with a long list of manipulation routines for performing various kinds of date math. Listing 1 shows a snippet of code from the pandas documentation showing how to find a date four months and five days from another:
Listing 1. Pandas date projection sample
d = datetime(2012, 4, 20) d + pandas.DateOffset(months=4, days=5) datetime.datetime(2012, 8, 25, 0, 0)
Pandas really shines when it comes to slicing and dicing large datasets. After you have your data imported into a native data structure, you have a wide range of tools at your disposal for performing literally any type of manipulation. You can slice the data using the standard Python slicing syntax, perform operations on all or part of the data, or plot it using matplotlib. If you need to do any data manipulation tasks, you definitely want to get up to speed with pandas.
Wrapping up
Linux is without question highly qualified to support virtually any scientific computational task you can throw at it. It has heavy support in the academic community and is rapidly gaining new industry users looking for ways to reduce their software budgets. These tools provide a more than adequate substitute for their commercial counterparts and, best of all, they are all free. For customers looking for fully supported software, there is Enthought and their EPD commercial offering. They provide full customer support and training to all paying customers.
Resources
Learn
 In the developerWorks Linux zone, find hundreds of howto articles and tutorials, as well as downloads, discussion forums, and a wealth of other resources for Linux developers and administrators.
 The Open Source developerWorks zone provides a wealth of information on open source tools and using open source technologies.
 Stay current with developerWorks technical events and webcasts focused on a variety of IBM products and IT industry topics.
 Attend a free developerWorks Live! briefing to get uptospeed quickly on IBM products and tools, as well as IT industry trends.
 Watch developerWorks ondemand demos ranging from product installation and setup demos for beginners, to advanced functionality for experienced developers.
 Follow developerWorks on Twitter, or subscribe to a feed of Linux tweets on developerWorks.
Get products and technologies
 Visit the Sage project site.
 Explore the Enthought Python Distribution site.
 Check out the SciPy site.
 Find the Python resources you need at Python.org.
 Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement Service Oriented Architecture efficiently.
Discuss
 Check out developerWorks blogs and get involved in the developerWorks community.
 Get involved in the developerWorks community. Connect with other developerWorks users while exploring the developerdriven blogs, forums, groups, and wikis.
Comments
Dig deeper into Linux on developerWorks

developerWorks Premium
Exclusive tools to build your next great app. Learn more.

dW Answers
Ask a technical question

Explore more technical topics
Tutorials & training to grow your development skills