Linux in the scientific community
It should come as no surprise that Linux has a substantial presence in the scientific community. Solutions abound from high-performance computing clusters to visualization software. There's even an entire Linux distribution based on Red Hat Enterprise Linux targeted for scientific computation, appropriately named Scientific Linux.
Sage and Enthought Python Distribution
This article looks at two different ways to use a Linux workstation for scientific computation. The first is the Sage open source mathematics system and the second is the Enthought Python Distribution (EPD). Both use a number of core open source Python tools under the covers to perform the heavy lifting. If you want to try them, install the individual pieces using the Ubuntu software manager.
Sage is the more comprehensive of the two in that it is more of a shell over a number of different underlying engines. From the Sage command line you can even interact with commercial products such as MATLAB or Mathematica. At the Sage prompt you essentially interact with IPython with access to all its features. You also have to think in terms of objects and methods when you start to explore the capabilities of Sage. Sage includes a number of different computer algebra systems and allows the user to interact with them from the command line.
It's important to note that Sage is based on Python but does pre-parse each
statement before passing it to the Python interpreter. This can cause some
confusion when looking at simple interactive Sage commands. The rationale
for this behavior is a desire to make typing commands into Sage as
intuitive from a mathematical sense as possible. One good example is the
symbol for exponentiation. In pure Python you must type
2**4 to raise two to the fourth power. In Sage
you use the up arrow symbol (^), as in
Sage also handles some operations such as integer division differently
than basic Python.
NumPy and SciPy
The two most well-known pieces of the underlying puzzle for both Sage and EDP are NumPy and SciPy. Both projects have been around since the mid to late 1990s and were originally started by Travis Oliphant, now an employee at Enthought. NumPy provides the core numerical methods to manipulate arrays and matrices. SciPy depends on NumPy for its basic array data structure and contains a wide range of modules for everything from linear algebra to signal processing. Enthought is a sponsor of both projects and continues to contribute heavily to new releases.
Several options are available for installing Sage. To give Sage a look without physically installing it, you can use a live CD version. You can also run the live CD in a virtual machine such as Oracle VM VirtualBox or VMware Player. This article looks at installing Sage on both a physical and virtual machine running the latest release of Ubuntu (12.04).
Installing Sage from source is the most reliable method to get up and running on Unbuntu 12.04 because the operating system has only recently been released. Download the source tarball and unpack it into a convenient directory. You also need to install a few prerequisites before building, which you can accomplish with the following commands:
sudo apt-get install build-essential gfortran sudo apt-get install texlive xpdf evince sudo apt-get install tk8.5-dev
After you have the prerequisites installed, you should be ready to build Sage, which you can launch with the following command:
At this point, go watch a movie or take a nap. The full build process can take several hours on a typical desktop machine. You should see a screen like the one in Figure 1 if everything builds correctly.
Figure 1. Sage build screen
Installing on a virtual machine is a good option if you're less adventurous and just want to get started using Sage. An Open Virtualization Archive (OVA) package is available for download on the Sage site along with binary packages for Fedora 16 and Ubuntu 10.04.3. OVA is a single package file (essentially a tar file) containing all the files needed to launch the virtual machine in the Open Virtualization Format (OVF).
When you have Sage installed, you're ready to begin your journey of mathematical discovery and exploration. Sage has both a command line and a web-based interface.
To launch the command-line version from the directory where you built Sage, simply type the following command:
Figure 2 shows an example of the Sage interpreter and some of the mathematical interpretations mentioned earlier.
Figure 2. Sage command line interface and mathematical interpretations
The difference in the integer division has to do with how Python handles
the type of the operation (integer, float, and so on). In Python, the
expression 2/3 assigns an integer result that rounds down to zero. Sage
treats the division operator as a constructor for rational numbers,
meaning you can perform operations on fractions in much the same way as
you do on paper. For example, if you type
2/3 + 2/3 at the Sage prompt, you get the
4/3. If you type the same thing in
Python, the result is
0. You can implicitly
force the type of literals using
int() as in Figure 2.
To use the Sage Notebook, you can either type
notebook() after starting Sage or use this
This launches the Sage server and the default web browser opened to the Sage Notebook home page. You'll find many features in the notebook to facilitate manipulating your work, such as saving and loading worksheets to a file, plus the normal copy, delete, and rename functions. Sage facilitates collaborative work with the Share and Publish functions in Figure 3.
Figure 3. Sage Notebook features
Computer algebra packages
The base Sage distribution includes a number of computer algebra packages including GAP, Maxima, PARI, and Singular. Each has its own following in the mathematical community and provides slightly different functionality. The key here is that Sage includes each of these packages in the base distribution, meaning you don't have to download and install them separately.
Figure 4 shows an example of using Maxima to perform several matrix operations. This was done using the Sage Notebook, and it shows the user input in black along with the output in blue. The first two lines create a matrix with entries equal to i/j where i and j range from 1 to 4. Notice that these are rational numbers (fractions).
Figure 4. Matrix operations with Maxima
Publishing scientific papers
Publishing scientific papers is a requirement for many in the educational community. Sage requires that every object has a LaTeX representation. This is directly tied to the need to produce publication-quality graphics and text using the TeX language. Be aware that you need a full installation of TeX to take advantage of all the features Sage has to offer.
Scientific computing with Python is what Enthought as a company is all about. Enthought's commercial product offerings and support are what pay the bills, but they still contribute to the open source community in a great way. Their contributions come through directly working on the NumPy and SciPy code base as well as presenting at the annual PyCon conference and hosting the SciPy conference.
You can use the Ubuntu software manager to install the various pieces needed to get running with IPython, NumPy, and SciPy, or you can simply download and install the free version of the EPD, known as EPD Free. Enthought provides both 32- and 64-bit versions of EPD Free that have been tested on Red Hat, Ubuntu, and openSUSE. They do warn that some 64-bit Linux systems don't include 32-bit libraries, hence the need for a 64-bit version of the package.
To install EPD Free, first download the installer script and then run it with the following command:
To make it easier to launch EPD in the future, you need to add a few lines to your shell startup file (either .cshrc or .bash_profile). Ubuntu uses the bash shell as the default, so I show the code based on that. For this article I used the following lines:
IPython is another common denominator between EPD and Sage as it is the primary user interface tool. EPD installs both a command-line interface and a web-based notebook, much like Sage.
The Qt-based console offers some enhanced functionality, including full syntax highlighting using the Pygments library. It also provides the ability to do in-line plots. To get this capability, you can launch IPython with the following command:
Figure 5 shows the Qt console with an embedded plot.
Figure 5. Creating a new PyDev project
The latest release of IPython (0.12) includes a web-based notebook capability that is similar to Sage. To get the notebook functionality, you need to install several dependencies, including ZeroMQ and the Tornado web server. To launch the notebook from the command line, type the following in a terminal window:
This starts the Tornado web engine and launches the default web browser open to the dashboard page. If you then select the default notebook, you should see a new window like the one in Figure 6.
Figure 6. iPython Notebook
The one thing that is different about the web notebook is that you need to use Control-Enter when you want to execute code. This makes it possible to enter multiple lines of code, as in Figure 6, and have everything in the input box executed sequentially. Therefore, you can easily break functions into manageable blocks.
You can annotate your notebooks using the Markdown syntax. If you're not familiar with Markdown, it's essentially a way to create formatted Hypertext Markup Language (HTML) using plain text. For example, a single pound sign (#) followed by a space is used to indicate an H1 in HTML, while two pound signs are used for H2 and so on. This allows you to add annotation or documentation to supplement your code and graphics.
The Python Data Analysis Library, or pandas, is another tool with ties to SciPy and NumPy created specifically to address the task of data analysis. Pandas incorporates a large number of libraries along with some standard data models to provide the tools needed to manipulate large datasets efficiently. Comma-separated values (CSV) files represent one of the most common ways of distributing data amongst interested parties. Pandas provides an optimized library function to read and write multiple file formats, including CSV and the efficient HDF5 format.
The read_csv module knows how to parse typical CSV files with header information in the first row. It also knows how to handle files with dates or times using a built-in parser. Pandas includes a datetools module with a long list of manipulation routines for performing various kinds of date math. Listing 1 shows a snippet of code from the pandas documentation showing how to find a date four months and five days from another:
Listing 1. Pandas date projection sample
d = datetime(2012, 4, 20) d + pandas.DateOffset(months=4, days=5) datetime.datetime(2012, 8, 25, 0, 0)
Pandas really shines when it comes to slicing and dicing large datasets. After you have your data imported into a native data structure, you have a wide range of tools at your disposal for performing literally any type of manipulation. You can slice the data using the standard Python slicing syntax, perform operations on all or part of the data, or plot it using matplotlib. If you need to do any data manipulation tasks, you definitely want to get up to speed with pandas.
Linux is without question highly qualified to support virtually any scientific computational task you can throw at it. It has heavy support in the academic community and is rapidly gaining new industry users looking for ways to reduce their software budgets. These tools provide a more than adequate substitute for their commercial counterparts and, best of all, they are all free. For customers looking for fully supported software, there is Enthought and their EPD commercial offering. They provide full customer support and training to all paying customers.
- In the developerWorks Linux zone, find hundreds of how-to articles and tutorials, as well as downloads, discussion forums, and a wealth of other resources for Linux developers and administrators.
- The Open Source developerWorks zone provides a wealth of information on open source tools and using open source technologies.
- Stay current with developerWorks technical events and webcasts focused on a variety of IBM products and IT industry topics.
- Attend a free developerWorks Live! briefing to get up-to-speed quickly on IBM products and tools, as well as IT industry trends.
- Watch developerWorks on-demand demos ranging from product installation and setup demos for beginners, to advanced functionality for experienced developers.
- Follow developerWorks on Twitter, or subscribe to a feed of Linux tweets on developerWorks.
Get products and technologies
- Visit the Sage project site.
- Explore the Enthought Python Distribution site.
- Check out the SciPy site.
- Find the Python resources you need at Python.org.
- Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement Service Oriented Architecture efficiently.
- Check out developerWorks blogs and get involved in the developerWorks community.
- Get involved in the developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.