The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

All information submitted is secure.

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

All information submitted is secure.

## Events

An introduction to functional programming

David Mertz has been writing the developerWorks columns Charming Python and XML Matters since 2000. Check out his book Text Processing in Python . For more on David, see his personal Web page. You can contact David at mertz@gnosis.cx.

Summary:  This tutorial is for programmers of imperative languages wanting to learn about functional programming in the language Haskell. If you have programmed in languages such as C, Pascal, Fortran, C++, Java™, Cobol, Ada, Perl, TCL, REXX, JavaScript, Visual Basic, or many others, you have been using an imperative paradigm. This tutorial provides a gentle introduction to the paradigm of functional programming, with specific illustrations in the Haskell 98 language.

Date:  27 Sep 2001
Level:  Introductory PDF:  A4 and Letter (67 KB | 17 pages)Get Adobe® Reader®

Activity:  40235 views

A new expressiveness

A Haskell program consists, basically, of a set of function definitions. Functions are bound to names in a manner that looks very similar to variable assignment in other languages. However, it really is not the same thing; a Haskell bound name is much more similar to a binding in a mathematical proof, where we might say "Let tau refer to the equation . . .." A name binding just provides a shorthand for later use of an equation, but the name can only be bound a single time within a program -- trying to change it generates a program error.

 ```myNum :: Int -- int myNum() { myNum = 12+13 -- return 12+13; } square :: Int -> Int -- int square(int n) { square n = n*n -- return = n*n; } double :: Int -> Int -- int double(int n) { double n = 2*n -- return 2*n; } dubSqr :: Int -> Int -- int dubSqr(int n) { dubSqr n = square (double n) -- return square(double(n)); } ```

Defining functions

There are (optionally) two parts to a function definition. The first part (conceptually, not necessarily within a listing) is the type signature of a function. In a function, the type signature defines all the types of the input, and the type of the output. Some analogous C definitions are given in the end-of-line comments in the example.

The second part of a function definition is the actual computation of the function. In this second part, often (but not always) some ad hoc variables are provided to the left of the equal sign that are involved in the computation to the right. Unlike variables in C, however -- and much like variables in mathematics -- the Haskell variables refer to the exact same "unknown quantity" on both sides of the equal sign (not to a "container" where a value is held).

 ```example :: Int example = double (myNum - square (2+2)) dubSqr2 :: Int -> Int dubSqr2 = square . double -- Function composition ```

It is also often possible to bypass explicit naming of variables entirely in function definitions. In `dubSqr2`, it is enough to say that we should `square` whatever thing is `double`'d. For this, there is no need to mention a variable name since the thing `dubSqr2`'d is just whatever expression follows the bound name in later expressions. Of course, `double` must itself take the same type of input `dubSqr2` expects, and in turn output the type of output `square` needs as input.

More simple function definitions

Like C, Haskell is rigidly typed. The `averageThree` is a good example of a function that requires type coercion in order to return the right value type. However, the `difSquare` function shows something distinct to Haskell. `difSquare` has no type signature, so Haskell will infer the appropriate type signature from the operations involved in the function definition. At first appearance this might seem to be the same thing that dynamically or loosely typed languages do; but what Haskell does is quite different. `difSquare` is rigidly typed at compile time -- there is no runtime dynamism to this, but the type of `difSquare` has a Type Class that includes both integers and floats (and also rationals, complex numbers, etc.). We can find the inferred type within Hugs:

 ```Main> :type difSquare difSquare :: Num a => a -> a -> a ```

That is, both of the input arguments, as well as the output, are inferred to have the Type Class `Num`. Had we explicitly declared a type like `Int`, the function would operate over a narrower range of values (which is good or bad, depending on our needs).

 ```-- Average of three Integers as floating point averageThree :: Int -> Int -> Int -> Float averageThree l m n = fromInt(l+m+n) / 3 -- float averageThree(int l, int m, int n) { -- return ((float)(l+m+n))/3; } difSquare x y = (x-y)^2 -- C lacks polymorphic type inference ```

Recursion

Absent loop structures, flow in Haskell programs is usually expressed as recursion. Thinking about all flow in terms of recursion takes some work, but it turns out to be just as expressive and powerful as the `while` and `for` constructs in other languages.

The trick to recursion is that we would like it to terminate eventually (at least we usually do). One way to guarantee termination of recursion is to use primitive recursion. This amounts to taking a "thing" to recurse on, and making sure that the next call is closer to a terminal condition than the call that got us here. In practice, we can assure this either by decrementing an integer for each call (and terminating at zero, or some other goal), or by taking only the `tail` of a list for each successive call (and terminating at an empty list). Both versions of factorial listed in the example assume they will be passed an integer greater than zero (and will fail otherwise; exercise: how?).

Non-primitive recursion also exists, but it is more difficult to know for sure that a recursion will terminate. Also, mutual recursion between functions is allowed (and frequently encountered), but primitive recursion is still the safest and most common form.

 ```-- Factorial by primitive recursion on decreasing num fac1 :: Int -> Int fac1 n = if n==1 then 1 else (n * fac1 (n-1)) -- Factorial by primitive recursion on list tail fac2 :: Int -> Int fac2 n = prodList [1 .. n] prodList lst = if (length lst)==1 then head lst else head lst*(prodList (tail lst)) ```

Pattern matching

In functional programming, we are "more concerned with how something is defined than with the specifics of how it is calculated" (take this motto with a grain of salt, however, efficiency still matters in some cases). The idea is that it is a compiler or interpreter's job to figure out how to reach a solution, not the programmer's.

One useful way of specifying how a function is defined is to describe what results it will return given different types of inputs. A powerful way of describing "different types of inputs" in Haskell is using pattern matching. We can provide multiple definitions of a function, each having a particular pattern for input arguments. The first listed definition that succeeds in matching a given function call is the one used for that call. In this manner, you can pull out the head and tail of a list, match specific input values, identify empty lists as arguments (for recursion usually), and analyze other patterns. You cannot, however, perform value comparisons with pattern matching (for example, "`n <= 3`" must be detected differently). An underscore is used in a position where something should match, but where the matched value is not used in the definition.

 ```prodLst2 [] = 0 -- Return 0 as product of empty list prodLst2 [x] = x -- Return elem as prod of one-elem list prodLst2 (x:xs) = x * prodLst2 xs third (a,b,c,d) = c -- The third item of a four item tuple three = third (1,2,3,4) -- 'three' is 3 -- Is a sequence a sub-sequence of another sequence? isSubseq [] _ = True isSubseq _ [] = False isSubseq lst (x:xs) = (lst==start) || isSubseq lst xs where start = take (length lst) (x:xs) ```

Guards

Somewhat analogous to pattern matching, and also similar to `if .. then .. else`, constructs (which we saw examples of earlier) are guards in function definitions. A guard is simply a condition that might obtain, and a definition of a function that pertains in that case. Anything that could be stated with pattern matching can also be rephrased into a guard, but guards allow additional tests to be used as well. Whichever guard matches first (in the order listed) becomes the definition of the function for the particular application (other guards might match also, but they are not used for a call if listed later).

In terms of efficiency, pattern matching is usually best, when possible. It is often possible to combine guards with pattern matching, as in the `isSublist` example.

 ```prodLst3 lst -- Guard version of list product | length lst==0 = 0 | length lst==1 = head lst | otherwise = head lst * prodLst3 (tail lst) -- A sublist is a string that occurs in order, but not -- necessarily contiguously in another list isSublist [] _ = True isSublist _ [] = False isSublist (e:es) (x:xs) | e==x && isSublist es xs = True | otherwise = sublist (e:es) xs ```

List comprehensions

One of the most powerful constructs in Haskell is list comprehensions (for mathematicians: this term comes from the "Axiom of Comprehension" of Zermelo-Frankel set theory). Like other functional languages, Haskell builds a lot of power on top of manipulation of lists. In Haskell, however, it is possible to generate a list in a compact form that simply states where the list elements come from and what criteria elements meet. Lists described with list comprehensions must be generated from other starting lists; but fortunately, Haskell also provides a quick "enumeration" syntax to specify starting lists.

 ```-- Odd little list of even i's, multiple-of-three j's, -- and their product; but limited to i,j elements -- whose sum is divisible by seven. myLst :: [(Int,Int,Int)] myLst = [(i,j,i*j) | i <- [2,4..100], j <- [3,6..100], 0==((i+j) `rem` 7)] -- Quick sort algorithm with list comp and pattern matching -- '++' is the list concatenation operator; we recurse on both -- the list of "small" elements and the list of "big" elements qsort [] = [] qsort (x:xs) = qsort [y | y<-xs, y<=x] ++ [x] ++ qsort [y | y<-xs, y>x] ```

Lazy evaluation I

In imperative languages -- and also in some functional languages -- expression evaluation is strict and immediate. If you write `x = y+z;` in C, for example, you are telling the computer to make a computation and put a value into the memory called 'x' right now! (whenever the code is encountered). In Haskell, by contrast, evaluation is lazy -- expressions are only evaluated when, and as much, as they need to be (in fairness, C does include shortcutting of Boolean expressions, which is a minor kind of laziness). The definitions of `f` and `g` in the example show a simple form of the difference.

While a function like `g` is somewhat silly, since `y` is just not used, functions with pattern matching or guards will very often use particular arguments only in certain circumstances. If some arguments have certain properties, those or other arguments might not be necessary for a given computation. In such cases, the needless computations are not performed. Furthermore, when lists are expressed in computational ways (list comprehensions and enumeration ellipsis form), only as many list elements as are actually utilized are actually calculated.

 ```f x y = x+y -- Non-lazy function definition comp1 = f (4*5) (17-12) -- Must compute arg vals in full g x y = x+37 -- Lazy function definition comp2 = g (4*5) (17-12) -- '17-12' is never computed -- Lazy guards and patterns -- Find the product of head of three lists prodHeads :: [Int] -> [Int] -> [Int] -> Int prodHeads [] _ _ = 0 -- empty list give zero product prodHeads _ [] _ = 0 prodHeads _ _ [] = 0 prodHeads (x:xs) (y:ys) (z:zs) = x*y*z -- Nothing computed because empty list matched comp3 = prodHeads [1..100] [] [n | n <- [1..1000], (n `rem` 37)==0] -- Only first elem of first, third list computed by lazy evaluation comp4 = prodHeads [1..100] [55] [n | n <- [1..1000], (n `rem` 37)==0] ```

Lazy evaluation II

A truly remarkable thing about Haskell -- and about lazy evaluation -- is that it is possible to work with infinite lists. Not just large ones, but actual infinities! The trick, of course, is that those parts of the list that are unnecessary for a particular calculation are not calculated explicitly (just the rule for their expansion is kept by the runtime environment).

A famous and ancient algorithm for finding prime numbers is the Sieve of Eratosthenes. The idea here is to keep an initial element of the list of integers, but strike off all of its multiples as possible primes. The example does this, but is performed only as far as needed for a specific calculation. The list `primes`, however, really is exactly the list of all the prime numbers!

 ```-- Define a list of ALL the prime numbers primes :: [Int] primes = sieve [2 .. ] -- Sieve of Eratosthenes sieve (x:xs) = x : sieve [y | y <- xs, (y `rem` x)/=0] memberOrd :: Ord a => [a] -> a -> Bool memberOrd (x:xs) n | x

First class functions (passing functions)

A powerful feature of Haskell (as with all functional programming) is that functions are first class. The first class status of functions means that functions are themselves simply values. Just as you might pass an integer as an argument to a function, in Haskell you can pass another function to a function. To a limited extent, you can do the same with function pointers in a language like C, but Haskell is far more versatile.

The power of Haskell's first class functions lies largely in Haskell's type checking system. In C, one might write a "quicksort" function that accepted a function pointer as an argument, much as in the Haskell example. However, in C you would have no easy way to make sure that the function (pointed to) had the correct type signature. That is, whatever function serves as an argument to `qsortF` must take two arguments of the same type ("a" stands for a generic type) and produce a `Bool` result. Naturally, the list passed as the second argument to `qsortF` must also be of the same type "a." Notice also that the type signature given in the sample code is only needed for documentation purposes. If the signature is left out, Haskell infers all these type constraints automatically. `tailComp` meets the right type signature, with the type `String` being a specialization of the generic type allowed in `qsortF` arguments (a different comparison function might operate over a different type or type class).

 ```-- Quick sort algorithm with arbitrary comparison function qsortF :: (a -> a -> Bool) -> [a] -> [a] qsortF f [] = [] qsortF f (x:xs) = qsortF f [y | y<-xs, f y x] ++ [x] ++ qsortF f [y | y<-xs, not (f y x)] -- Comparison func that alphabetizes from last letter back tailComp :: String -> String -> Bool tailComp s t = reverse s < reverse t -- List of sample words myWords = ["foo", "bar", "baz", "fubar", "bat"] -- tOrd is ["foo","bar","fubar","bat","baz"] tOrd = qsortF tailComp myWords -- hOrd is ["bar","bat","baz","foo","fubar"] hOrd = qsortF (<) myWords ```

First class functions (function factories)

Passing functions to other functions is only half the power of first class functions. Functions may also act as factories, and produce new functions as their results. The ability to create functions with arbitrary capabilities within the program machinery can be quite powerful. For example, one might computationally produce a new comparison function that, in turn, was passed to the `qsortF` function in the previous panel.

Often, a means of creating a function is with lambda notation. Many languages with functional features use the word "lambda" as the name of the operator, but Haskell uses the backslash character (because it looks somewhat similar to the Greek letter, lambda). A lambda notation looks much like a type signature. The arrow indicates that a lambda notation describes a function from one type of thing (the thing following the backslash) to another type of thing (whatever follows the arrow).

The example factory `mkFunc` packs a fair amount into a short description. The main thing to notice is that the lambda indicates a function from `n` to the result. By the type signature, everything is an `Int`, although type inference would allow a broader type. The form of the function definition is primitive recursive. An empty list produces a result of zero. A non-empty list produces either the result given by its `head` pair, or the result that would be produced if only its `tail` is considered (and the tail eventually shrinks to empty by recursion).

 ```-- Make an "adder" from an Int mkAdder n = addN where addN m = n+m add7 = mkAdder 7 -- e.g. 'add7 3' is 10 -- Make a function from a mapping; first item in pair maps -- to second item in pair, all other integers map to zero mkFunc :: [(Int,Int)] -> (Int -> Int) mkFunc [] = (\n -> 0) mkFunc ((i,j):ps) = (\n -> if n==i then j else (mkFunc ps) n) f = mkFunc [(1,4),(2,3),(5,7)] -- Hugs session: -- Main> f 1 -- 4 :: Int -- Main> f 3 -- 0 :: Int -- Main> f 5 -- 7 :: Int -- Main> f 37 -- 0 :: Int ```

4 of 7

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Linux
ArticleID=132894
publish-date=09272001
author1-email=mertz@gnosis.cx
author1-email-cc=

4 of 7