How To do IO in Haskell

Contents

Introduction

This tutorial aims to give a thorough understanding of IO in Haskell without using Monad theory. Instead, there is an emphasis on types. My justification for this approach appears at the end of this document, in the section entitled Reflections.

Throughout the tutorial, there are lots of examples, and a few exercises (with solutions!). You are strongly encouraged to experiment with the examples, and attempt the exercises. You can download them, but you will learn even more if you retype them into your favourite Haskell editor.

All feedback is welcome: please email any comments to info@libra-aries-books.co.uk

Hello, World!

Our first example may possibly be familiar: it's called helloworld.hs.

main = putStr "Hello, World!\n"

This is about as straightforward as it gets, but there is a lot to glean here. As promised, we will focus our attention on the types involved.

This example introduces the function putStr, which has type putStr :: String -> IO (). In words, putStr is a function which takes one argument of type String and returns a result of type IO (). So, when we apply putStr to an argument, putStr s, where s :: String, the complete expression has type IO ().

We will define an IO action to be any expression of type IO (). By this definition, putStr s is an IO action. When putStr s is evaluated, it writes s to standard output.

Now, we've happily been talking about expressions of type IO (), but that's a pretty strange looking type. Let's examine it in detail.

First of all, IO is a type constructor. This means that for any type a, there is a related type IO a. You already know about type constructors, even if you've done very little Haskell, since [] is a type constructor allowing us to define a list type for any existing type, such as [Int], a list of Ints. If you've done just a little bit more Haskell, you've probably come across type constructors such as Tree a, which allows us to define the type of a tree of any other type, for example Tree String. The IO type constructor is exactly like Tree.

We will soon be seeing types like IO String, but what is IO ()? You may not have come across () before, but it is indeed a type, in fact the trivial type. There is only one value of type (), which is also written (): you can think of it as a tuple with no components. The trivial type is fairly useless on its own, but IO () perfectly expresses the type of an IO action that returns no result, just like putStr.

Just to complete our discussion of types: the type of main is, the Haskell report tells us, IO a for some type a. In this case, we have instantiated a as the trivial type (), and the entire program is type correct.

Of course, we can define our own functions with an IO () result type, as in hellofred.hs.

main = hello "Fred"

hello n = putStrLn ("Hello, " ++ n ++ "!")

This example introduces the function putStrLn :: String -> IO (). When putStrLn is evaluated, it writes its String argument to standard output, followed by a newline character.

The type of hello is hello :: String -> IO (), the same as putStrLn itself.

Exercises

1. Define myPutStrLn (identical to putStrLn) in terms of putStr. Solution: myputstrln0.hs.

2. Given main = birthday "Fred" 37, write the function birthday that will output the line Happy Birthday Fred, 37 years old today!. What is the type of birthday? Solution: birthday.hs.

How do you do?

Let's make things just a tiny bit more complicated with howdo.hs.

main = do
  putStrLn "Hello"
  putStrLn "How do you do?"

This is our first encounter with do, which plays a starring role in this tutorial: we will be exploring exactly what do means and what you can do with it. It's easy to get the idea that do is a magic incantation which means "perform some IO now", but in our earlier examples we performed IO without ever mentioning do. We'll firm up our ideas of what do means as we go along.

This example shows one of the three things that can go inside a do expression: an IO action. In fact, we have two IO actions here, and the do construct can be read as "perform this IO action, and then perform this IO action". The notion of sequence is important here; normally Haskell doesn't say much about when expressions will be evaluated, but inside a do expression they are evaluated in order.

So our initial understanding of do is that it evaluates, in order, the IO actions it contains.

As already mentioned, the do construction is an expression, so - like every Haskell expression - it has a type. The type of the do expression is the type of the last IO action it contains: in this case our old friend IO ().

We can define our own IO actions (expressions of type IO ()) and use them inside a do expression, as in hello-bye.hs.

main = do
  hello "Fred"
  goodbye

hello n = putStrLn ("Hello " ++ n ++ "...")
goodbye = putStrLn "...and Goodbye."

Definitions of IO actions do not have to be separate top-level functions: they can also occur, like any definition, inside let expressions and where clauses. Which of these to use is largely a matter of style. The hibye.hs example demonstrates all the possibilities.

main =
    let bye = putStrLn "Bye!" in
    do
      hi "Fred"; how; bye
    where
      hi n = putStrLn ("Hi " ++ n)

how = putStrLn "How're ya doin'?"

This example also demonstrates that normal layout rules apply to do expressions: you can separate IO actions either by putting them on separate lines (suitably indented), or with a semi-colon ;.

Exercise

3. Define myPutStrLn again, but this time with a do expression. Solution: myputstrln1.hs.

More output functions

So far we've met putStr and putStrLn. Here is a complete list of all the output functions defined in the Standard Prelude. Note that the type FilePath is a synonym for String.

putStr :: String -> IO ()
Writes its String argument to standard output.
putStrLn :: String -> IO ()
Writes its String argument to standard output, followed by a newline character.
print :: Show a => a -> IO ()
Writes its argument to standard output, followed by a newline character. You can hand a value of any type to print (provided the type is an instance of type class Show).
putChar :: Char -> IO ()
Writes its Char argument to standard output.
writeFile :: FilePath -> String -> IO ()
Writes its String argument to the file named by the FilePath argument. If the file doesn't exist, it will be created; if it does exist, it will be overwritten.
appendFile :: FilePath -> String -> IO ()
Writes its String argument to the end of the file named by the FilePath argument. If the file doesn't exist, it will be created.

Exercises

4. In queens.hs is a definition of queens, which solves the n-queens problem (adapted from Bird and Wadler). For example, queens 9 returns a 352-element list containing all the solutions for a 9x9 board.

Define a function writeQueens that uses writeFile to write the list to a file. For example, main = writeQueens 8 should be a program that creates a file called queens8.out containing all the solutions to the 8-queens problem. Initially, output the list in Haskell format (i.e. use show). What is the file size of queens9.out? Solution: writequeens.hs.

5. Now adapt your solution so that the program writes ASCII board images. For example, a 4x4 board might be output like this:

.X..
...X
X...
..X.

Separate each board image with a blank line. What is the file size of queens9.out now? My first solution, prettyqueens0.hs, takes the obvious route of leaving queens alone and modifying writeQueens. After a little thought, I realised that the type class system allows an alternative solution where writeQueens does not change (but queens itself does, slightly): prettyqueens1.hs.

Input, binding

So far, our IO examples have included plenty of O, but no I at all! Let's start very simply, with hello.hs.

main = do
  putStr "What is your name? "
  n <- getLine
  putStrLn ("Hello, " ++ n ++ "!")

This example introduces the function getLine :: IO String, which reads a line from standard input. Just for now, we will say that functions like getLine are "IO computations" and return a value which is "IO encumbered".

This example also demonstrates some new syntax, and the second sort of thing we can put in a do expression. The line n <- getLine binds a name n to the value returned by the IO computation getLine, removing the IO encumbering. So in this example, the type of n is n :: String, which makes it a suitable candidate for the string concatenations in the next line.

Unsurprisingly, names bound by <- in a do expression are in scope till the end of the do expression.

The only way to retrieve a value from an IO computation, and remove the IO encumbrance, is to bind a name to the value with <- in a do expression. It's easy to think that getLine is returning a plain String value, and try to write code like badhello.hs.

main = do
  putStr "What is your name? "
  putStrLn ("Hello, " ++ getLine ++ "!") -- illegal

This won't work, and the compiler will tell you so in no uncertain terms. (If you know much about lazy evaluation, you'll see that even if it didn't contain a type error, this example probably wouldn't do what we intend: it would start writing Hello before calling getLine. Remember that do helps us to ensure that IO actions occur in a useful sequence.)

Just as we can define new IO actions, we can define our own IO computations, as we see in prompt.hs.

main = do
  n <- prompt "What is your name? "
  putStrLn ("Hello " ++ n ++ "!")

prompt p = do
  putStr p
  getLine

This behaves just like the first input example, of course, but we have defined a new function prompt which has type prompt :: String -> IO String.

In this case, the value returned by getLine is simply propagated by prompt. Life gets more interesting when we construct new values, as we see in countlines0.hs.

main = do
  l <- countLines "/etc/passwd"
  putStrLn (show l)

countLines f = do
  x <- readFile f
  return (length (lines x))

This example introduces the function readFile :: FilePath -> IO String, which returns the entire contents of the file as a single string. We have used readFile to define a new function countLines with type countLines :: FilePath -> IO Int (where FilePath is a type synonym for String).

The interesting part is the function return, which has the polymorphic type return :: a -> IO a [1]. In other words, return takes an argument of any type at all, and "IO encumbers" it. In this case, a is Int (the return type of length), so return produces a value of type IO Int, and this is also the type of the entire do expression and hence the return type of countLines itself.

[1]Actually, return is a class method for any instance of Monad, not just IO, and its real type is return :: (Monad a) => b -> a b. But we're trying to avoid Monad theory.

So far we've seen IO computations that return values of types IO String and IO Int, but there are (literally) infinite other possibilities. Here is echo0.hs, a Haskell implementation of the Unix echo command, which simply writes its command-line arguments to standard output, separated by spaces.

import Data.List (intersperse)
import System.Environment (getArgs)
    
main = do
  args <- getArgs
  putStrLn (concat (intersperse " " args))

This example introduces the function getArgs (from the System module), which has type getArgs :: IO [String]. It returns a list of the command-line arguments (not including the command name).

Here's echo1.hs, a slightly different, but exactly equivalent, way of writing the previous example.

import Data.List (intersperse)
import System.Environment (getArgs)
    
main = do
  args <- getArgs
  let r = concat (intersperse " " args)
  putStrLn r

Finally we meet the third sort of thing that we can put in a do expression: a let expression. Note that there is no in keyword; the names bound by let are in scope for the remainder of the do expression.

Apart from that, let inside do behaves exactly like a normal let. In particular, you can use layout to bind several names at once. The definitions in such a "multi-let" can even be mutually recursive, as in multilet.hs, an admittedly contrived example.

showGt a b = do
  let
      g1 x y = if x >= y then show x else g2 x y
      g2 x y = g1 y x
  putStrLn (g1 a b)

return is not control-flow syntax

If you have a background in procedural languages, you will probably find the behaviour of this example, return0.hs, surprising.

main = stuff

stuff = do
  putStrLn "hello"
  return 5
  putStrLn "goodbye"

Both strings are output! Doesn't the return statement cause an early return from stuff?

No, because there is no return statement in Haskell. It's a return expression. In this case, the return expression produces a value of type IO Int, which is immediately discarded. It has no effect on how (or whether) the remainder of the do expression is evaluated.

So if return is not control-flow syntax, what is it for? As we have already seen, return usually appears at the end of a do expression, because the value of a do expression is the value of the last expression it contains.

There is another use of return: when the language requires an IO action, but you don't actually want to perform any IO. Indeed, perhaps the best way to see return () is as an IO action that does no IO! Our next example, return1.hs demonstrates this.

main = do
  putStr "How old are you? "
  x <- getLine
  let age = read x
  if age > 99
  then putStrLn "My, that's old!"
  else return ()
  if age < 10
  then putStrLn "Never too young to learn Haskell!"
  else return ()
  putStrLn ("Nice age to be, " ++ show age)

In Haskell, if always comes with an else branch, which must have the same type as the then branch. In this case, we don't want the else branch to do anything, but it needs to have type IO () to match putStrLn, so return () is ideal. (We'll see a neater way to handle this situation later.) As you should expect by now, the Nice age... message is always output.

Incidentally, you might be tempted to try to eliminate the name x in this example by writing age <- read (getLine). That won't work, because read needs a String argument, while getLine returns a value of type IO String. The binding with <- extracts that string from its IO encumbrance.

Actions versus computations

By now, you might be starting to smell a rat: is there really much difference between "IO actions" and "IO computations"? Well, no.

IO actions do return a value, just not a very interesting one, as you can see in putstr.hs.

main = do
  x <- putStr "putStr returns the value "
  print x

The output of this program is always putStr returns the value ().

Conversely, you don't have to bind the value returned by an IO computation. Suppose we define printn that writes some output and also returns a count of how many characters it wrote (a bit like printf() and friends in C), in printn.hs.

printn x =
    let out = show x
    in do
      putStr out
      return (length out)

main = do
  c <- printn [0..9]
  printn (" has " ++ show c ++ " characters")

In the first call, we bind the value returned by printn to the name c; in the second call we simply ignore it.

So forget IO computations; we will use the term IO action henceforth, whether the action returns a useful value or not. Of course, you need to take account of whether an IO action returns a value that you want to capture by binding it with <-.

In practice, the functions defined by the Haskell Report all seem to fall into one category or the other: either they do something (write some output, change current directory, etc.) and return IO (), or they retrieve a value and return it (read some input, report current directory, etc.). It is probably wise to follow this style and avoid creating functions like printn.

Summary of the do expression

We have now met three kinds of statements we can put in a do expression, and there is a fourth: an empty statement. (This will no doubt be a boon to any Haskell user whose keyboard has a stuttering ; key!)

The do expression

When a do expression is evaluated, the statements it contains are evaluated in order. Four types of statements can occur in a do expression:

  1. A bare IO action, like putStrLn "hello". The IO will be performed. If this is the last statement in the do expression, its result becomes the value of the entire expression. Otherwise, any result is discarded.
  2. A name binding, like args <- getArgs. To the right hand side of <- is an IO action to be performed; its result is bound to the name on the left hand side. The name bound is in scope till the end of the do expression.
  3. A let binding, like let acts = map output args. There are no IO actions here: this is exactly like a normal let binding (except there is no in keyword). The names bound are in scope till the end of the do expression. Names bound earlier in the same do expression, whether by let or <-, are of course in scope and may appear on the right hand side of the =.
  4. An empty statement.

The final statement in a do expression must be of type 1, which includes return expressions. This final statement gives its type and value to the entire do expression.

The Handle type

An important type we haven't yet met is the Handle. Here's a slightly different version of the line-counting example, the motivation for which will become clear soon, countlines1.hs.

import IO
    
main = do
  l <- countLinesFile "/etc/passwd"
  putStrLn (show l)
    
countLinesFile f = do
  h <- openFile f ReadMode
  hCountLines h
    
hCountLines h = do
  x <- hGetContents h
  return (length (lines x))

The function hCountLines has type hCountLines :: Handle -> IO Int; like the functions defined by the Haskell report, we prefix an h to the name to indicate that a FileHandle argument is expected. The handle comes, of course, from the call openFile which has type openFile :: FilePath -> IOMode -> IO Handle.

The function hGetContents :: Handle -> IO String, returns as a String the entire (remaining) contents of the file referenced by the handle argument. It also implicitly closes the handle [2]. So hGetContents does for a file Handle what readFile does for a file name.

[2]Strictly, hGetContents and readFile put the handle into a semi-closed state, but the upshot is the same: we don't need to close the handle explicitly.

As you might expect, 3 values of type Handle are already in existence when the program starts: stdin, stdout, and stderr. There is a related function getContents which is equivalent to hGetContents stdin - in other words, it reads the remainder of the program's standard input.

Now, we can combine countLinesFile and getArgs to produce a program that counts the lines in the file(s) given as command line arguments. We'll build up to it in stages, starting with a program that works for just one file, countlines2.hs.

import IO
import System (getArgs)
    
main = do
  args <- getArgs
  let f = head args
  output f
    
output f = do
  l <- countLinesFile f
  putStrLn (f ++ ": " ++ show l)
    
{- definitions of countLinesFile and hCountLines as before -}

This program, of course, ignores all but its first command line argument. It also fails if there are no command line arguments. So let's move on to countlines3.hs.

{- imports as before -}
    
main = do
  args <- getArgs
  let acts = map output args
  sequence_ acts

{- definitions of output, countLinesFile, and hCountLines as before -}

This probably appears a bit mysterious. Let's look closely at what's going on. First, we know that args :: [String], and output :: String -> IO (). So the type of acts is acts :: [IO ()]; in other words, it is a list of IO actions! Can we create such a thing? Of course: this is Haskell after all. (Hey, we can even create an infinite list of IO actions if we want.)

Merely defining acts like this doesn't perform any IO, though. To do that, we need to sequence_ the list. As you can probably guess by now, sequence_ has the type sequence_ :: [IO ()] -> IO (): it takes a list of IO actions and performs them in order, which is itself an IO action [3].

[3]OK, so this is another simplification. Like return, sequence_ applies to all sorts of Monads, and its type really is sequence_ :: Monad m => [m a] -> m ().

Running sequence_ over the result of a map is a common pattern; so much so that the Prelude includes the following definition:

mapM_ f as = sequence_ (map f as)

You might guess that the M here stands for Monad. The trailing underscore we'll talk more about later. Using mapM_ we can express our line counter even more succinctly, as in countlines4.hs.

{- imports as before -}
    
main = do
  args <- getArgs
  mapM_ output args

{- definitions of output, countLinesFile, and hCountLines as before -}

There are still a couple of remaining flaws in this program. One is that if you hand this program no command line arguments at all, it produces no output at all: reasonable, but perhaps not very useful. A Unix program would instead count the lines in its standard input in this case. We can fix that quite easily, with countlines5.hs.

import IO
import System (getArgs)
    
main = do
  args <- getArgs
  case args of
    [] -> do
         l <- hCountLines stdin
         putStrLn (show l)
    xs -> mapM_ output xs
    
output f = do
  l <- countLinesFile f
  putStrLn (f ++ ": " ++ show l)
    
countLinesFile f = do
  h <- openFile f ReadMode
  hCountLines h
    
hCountLines h = do
  x <- hGetContents h
  return (length (lines x))

Finally, the motivation for hCountLines becomes clear! If we can count the lines for any Handle, then we can use the same function for either named files or standard input.

The other flaw with this program is that it gives up if it cannot read a file. Here's an interactive session that demonstrates this problem:

$ runghc countlines5.hs /etc/passwd /etc/printcap
/etc/passwd: 35
/etc/printcap: 4

$ runghc countlines5.hs /etc/passwd /etc/nonesuch /etc/printcap
/etc/passwd: 35
*** Exception: /etc/nonesuch: openFile: does not exist (No such file or directory)

In the second case, a Unix program would display an error message, and then proceed to count the lines in /etc/printcap. We can do this in Haskell, but not till we've looked at error handling.

Error Handling

In many programming languages, IO gets smothered under the extra code required to check for and handle possible errors. (Even the classic Hello, world! program in C is arguably incomplete, as there is no error checking.) Another minor miracle of the IO monad in Haskell is that it provides basic error handling totally for free.

So far, we have seen that IO actions return an "IO encumbered" value (which may be (), or something more exciting). But any IO action can also fail. An IO failure is "out of band": you do not have to examine the return value to see if the operation failed. Indeed, there won't be a return value!

Suppose we have a do expression containing a number of IO actions, and one of them fails. The rest of the do expression will not be evaluated, instead the entire do expression (which is itself an IO action) immediately fails. The failure propagates upwards to main, and beyond: when main fails, the run-time system prints an error message, and the program terminates.

We'll use a new example to explore error handling. Here's a program, interest0.hs, which - for each file given on the command line - uses a simple test to identify whether or not it is interesting.

import Data.List (isPrefixOf)
import System.Environment (getArgs)

main = do
  args <- getArgs
  mapM_ identify args

identify f = do
  putStr (f ++ ": ")
  g <- isInteresting f
  putStrLn (if g then "interesting" else "boring")

isInteresting f = do
  x <- readFile f
  return (x `contains` "Haskell")

[] `contains` _ = False
(x:xs) `contains` y = y `isPrefixOf` (x:xs) || xs `contains` y

We can provoke an IO failure from this program simply by asking it about a file which does not exist:

$ runhugs interest0.hs index.txt helloworld.hs NONESUCH age.hs
index.txt: interesting
helloworld.hs: boring
NONESUCH:
Program error: NONESUCH: IO.openFile: does not exist (file does not exist)

As expected, the IO failure causes the program to terminate with an error message immediately. Note that the failure is generated in the function isInteresting, but it causes an early exit from the do expression in identify (the putStrLn statement is not executed), and also from the mapM_ in main (subsequent files are not examined).

For many programs, this default handling of IO failures is ideal. At the very least, it's a good default. But sometimes we need to take control and recover gracefully from failure. In the case of this example, we would like to report non-existent (or otherwise unreadable) files, and then proceed to test the remaining files.

The function catch takes two arguments, so let's say we've invoked it as catch action handler. The first argument, action, is an IO action to be performed. If the IO action succeeds, catch simply returns its value. But if the IO action fails, then instead of the usual failure propagation, the function handler, another IO action, is invoked. The handler function receives a single argument: we'll see what this is in a moment. It returns a value of the same type as action. Armed with this knowledge, we can start improving the example. Here's interest1.hs.

import Data.List (isPrefixOf)
import System.Environment (getArgs)

main = do
  args <- getArgs
  mapM_ identify args

identify f = do
  putStr (f ++ ": ")
  g <- catch (isInteresting f) handler
  putStrLn (if g then "interesting" else "boring")
    where
      handler e = do
               putStr "(unreadable) "
               return False

isInteresting f = do
  x <- readFile f
  return (x `contains` "Haskell")

[] `contains` _ = False
(x:xs) `contains` y = y `isPrefixOf` (x:xs) || xs `contains` y

And this is what it looks like in action:

$ runhugs interest1.hs index.txt helloworld.hs NONESUCH age.hs
index.txt: interesting
helloworld.hs: boring
NONESUCH: (unreadable) boring
age.hs: interesting

Not perfection, but a step in the right direction! At least the program now continues and examines all the files. There are a couple of obvious improvements to be made, though. First, it would be nice to get more information about why the file is unreadable. Secondly, it's a bit presumptuous to say that every unreadable file is boring, but we've painted ourselves into a corner by using Bool types: there simply is no room (in the type!) for anything other than True or False. Remember that the handler must have the same return type as the action.

For more information about the failure, we need to examine the argument to our handler function. This is of type IOError. There are various things we can do with a value of type IOError, which we'll come to soon. For now, we will simply show (or print) it, which should produce a reasonable error message for human consumption.

To avoid the conclusion that all unreadable files are boring, we need to use a type with more than two values. We could define a new type, but there is an obvious candidate that already exists in Haskell: the type Maybe Bool, which of course has three possible values (Just True, Just False, and Nothing). This brings us to interest2.hs.

import Data.List (isPrefixOf)
import System.Environment (getArgs)

main = do
  args <- getArgs
  mapM_ identify args

identify f = do
  g <- catch (isInteresting f) handler
  case g of
    Just h -> putStrLn (f ++ ": " ++ if h then "interesting" else "boring")
    Nothing -> return ()
  where
    handler e = do
               print e
               return Nothing

isInteresting f = do
  x <- readFile f
  return (Just (x `contains` "Haskell"))

[] `contains` _ = False
(x:xs) `contains` y = y `isPrefixOf` (x:xs) || xs `contains` y

And in action:

$ runhugs interest2.hs index.txt helloworld.hs NONESUCH age.hs
index.txt: interesting
helloworld.hs: boring
NONESUCH: IO.openFile: does not exist (file does not exist)
age.hs: interesting

This is the effect we were after, but the code seems a little hard to follow. An alternative implementation uses the Either type constructor, which is defined in the standard prelude. The Either type constructor is very similar to the Maybe type constructor, but it also has room for a reason why there is a missing value: ideal for error handling. Conventionally, an operation which can either return a result or fail returns the result as a Right value of the Either type (like Just), or the reason for failure as a Left value (like Nothing, but with extra information). Yes, this is a rather weak pun on "right" as opposed to both "left" and "wrong".

The Left and Right sides of an Either type are independent: they can, and usually will, be of different types. So here's interest3.hs, using the type Either IOError Bool and - in my opinion - looking a little cleaner than our previous version.

import Data.List (isPrefixOf)
import System.Environment (getArgs)
import System.IO (hPrint, stderr)

main = do
  args <- getArgs
  mapM_ identify args

identify f = do
  g <- isInteresting f
  case g of
    Right h -> putStrLn (f ++ ": " ++ if h then "interesting" else "boring")
    Left e  -> hPrint stderr e

isInteresting f = catch action handler
    where
      action = do
        x <- readFile f
        return (Right (x `contains` "Haskell"))
      handler e = return (Left e)

[] `contains` _ = False
(x:xs) `contains` y = y `isPrefixOf` (x:xs) || xs `contains` y

As a bonus, in this version the errors are now written to standard error (with hPrint stderr), as is conventional. Otherwise, this version behaves identically to the previous one.

We are getting cannier in our handling of errors, but it is possible to be more subtle yet. As an example problem, consider the *rc* shell, which reads an initialization file $HOME/.rcrc on startup. We do not consider it a problem if this file doesn't exist, but we should warn the user if the file exists, but cannot be read (most likely because it has faulty permissions). How can we do this in Haskell? (I should point out that rc is written in C, not Haskell!)

We need a way to distinguish different possible errors; in this case, we need to handle a "file does not exist" error differently from any other error. There is a whole slew of functions in System.IO.Error that examine error values: the one we want is isDoesNotExistError :: IOError -> Bool. This function takes a value of type IOError, in other words the argument to our catch handler function, and returns True if the error represents "file does not exist", otherwise false.

So here's rcrc.hs, which looks for the user's .rcrc file (using System.Environment.getEnv to discover their home directory). If the file can be read, it is copied to standard output. If an error occurs, the error message is written to standard error in the usual way, unless the error was "file does not exist", in which case it is silently ignored.

import Control.Monad (unless)
import System.Environment (getEnv)
import System.IO (hPrint, stderr)
import System.IO.Error (isDoesNotExistError)

main = do
  home <- getEnv "HOME"
  let f = home ++ "/.rcrc"
  catch (runrcrc f) norcrc

runrcrc f = do
  x <- readFile f
  putStr x

norcrc e = unless (isDoesNotExistError e) (hPrint stderr e)

Here it is in action:

$ echo Hello, world! > $home/.rcrc # readable file...
$ runghc rcrc.hs                   # ...is copied to stdout
Hello, world!

$ chmod 0 $home/.rcrc              # unreadable file...
$ runghc rcrc.hs                   # ...provokes an error
/home/libra/.rcrc: openFile: permission denied (Permission denied)

$ rm $home/.rcrc                   # non-existent file...
$ runghc rcrc.hs                   # ...is silently ignored
$

Here is the complete list of functions for testing error values.

These ungainly-named functions are defined in System.IO.Error for interrogating error values in a catch handler function. They all have type IOError -> Bool, and return True iff the error value represents an error of the appropriate type.

isAlreadyExistsError :: IOError -> Bool
The operation failed because one of its arguments does not exist. For example, if you createDirectory "/tmp/", you will get an "already exists" error (at least on any sane Unix box!).
isDoesNotExistError :: IOError -> Bool
The operation failed because one of its arguments does not exist. For example, if you createDirectory "/tmp/foo/bar", you will get a "does not exist" (unless you happen to have a directory called /tmp/foo!).
isAlreadyInUseError :: IOError -> Bool
The operation failed because one of its arguments is a single-use resource, which is already being used. This is a slightly hard one to provoke, but do { writeFile "/tmp/foo" "foo"; x <- readFile "/tmp/foo"; writeFile "/tmp/foo" "qux" } does the job. The reason for this is that readFile reads the file lazily, so since we haven't used its result (the value of x), the file still in a "semi-closed" state. Thus the second writeFile fails with this error.
isFullError
The operation failed because the device is full.
isEOFError
The operation failed because the end of file has been reached.
isIllegalOperation
A catch-all error: the operation was not possible.
isPermissionError
The operation failed because the user does not have sufficient operating system privilege to perform that operation.
isUserError
A programmer-defined error value has been raised using fail.

IO encumbrance can be cumbersome

We have said that IO actions return a type which is "IO encumbered". For example, readFile :: IO String returns not a plain String, but an IO String. We can remove the IO encumbrance by binding a name inside a do expression, but then that name is only in scope within the do expression.

Consider a program where several different functions need the contents of a file. It would be nice to write a top-level function that returns the contents as a plain String. Then we could call that function whenever we needed it. Here's badlineschars.hs.

file = "/etc/passwd"

lineCount = length (lines contents)
charCount = length contents

{- This cannot be done in Haskell! -}
contents :: String
contents = do
  s <- readFile file
  s

main = do
  putStrLn (show lineCount ++ " lines")
  putStrLn (show charCount ++ " characters")

It can't be done. The type system ensures that any function which calls readFile (or any other IO action) will itself have an IO type.

There are two options. The first option is to call readFile once at the top level, and pass the result down to each function that needs it, as in lineschars0.hs.

file = "/etc/passwd"

lineCount s = length (lines s)
charCount s = length s

main = do
  x <- readFile file
  putStrLn (show (lineCount x) ++ " lines")
  putStrLn (show (charCount x) ++ " characters")

Note that in this case, the functions lineCount and charCount avoid IO types. However, they have an extra argument.

The second option is to give the subsidiary functions IO types, as in lineschars1.hs.

file = "/etc/passwd"

lineCount = do
  s <- readFile file
  return (length (lines s))

charCount = do
  s <- readFile file
  return (length s)

main = do
  ls <- lineCount
  putStrLn (show ls ++ " lines")
  cs <- charCount
  putStrLn (show cs ++ " characters")

In this version, lineCount and charCount no longer take an argument (as in the original, broken attempt), but they now have IO types. Note also that the file is read twice.

Obviously, these are trivial examples, but you will encounter similar situations time & again. In general, solutions in the first style seem preferable (it constrains "IO encumbrance" to the top level, and in any case does less work), although it can be a nuisance to pass all that state down to lower level functions.

Similar problems can occur with output. Consider sudan.hs.

sudan n x y | n == 0    = x + y
            | y == 0    = x
            | otherwise = sudan (n - 1) sudan' (sudan' + y)
    where sudan' = sudan n x (y - 1)

sudTup (n, x, y) = sudan n x y
sudList [n, x, y] = sudTup (n, x, y)

main = print (sudList [1, 5, 4])

The sudTup and sudList functions are there just to make sudan a "low-level" function, some distance from main. Suppose we want to create a variant of sudan that prints a trace of how it is called. Easily done, but the new sudan will have an IO type, and this will propagate through all the intermediate functions up to main. Here's sudantrace0.hs.

sudan n x y | n == 0    = trace (x + y)
            | y == 0    = trace x
            | otherwise = do
  trace 0
  sudan' <- sudan n x (y - 1)
  sudan (n - 1) sudan' (sudan' + y)
    where
      trace r = do
        putStrLn("sudan " ++ show n ++ " " ++ show x ++ " " ++ show y)
        return r

sudTup (n, x, y) = do { p <- sudan n x y; return p }
sudList [n, x, y] = do { q <- sudTup (n, x, y); return q }

main = do
  r <- sudList [1, 5, 4]
  print r

Not only are the changes to the function sudan invasive, but also sudList, sudTup, and even main itself all have to change. This is, to put it mildly, a nuisance. Worse still, the output is not a trace of the original sudan function, since by rewriting it with an IO type we are explicitly specifying (some of) the evaluation order. As with the input example, though, there is no way around this.

At this point, you may well be tempted to go back to programming in <insert name of your favourite programming language before you discovered Haskell>. I will endeavour to offer some crumbs of comfort.

First, the restrictions on IO were not capriciously foisted upon us by ivory tower academics in order to keep Haskell pure. The restrictions are the only way that it is possible to do IO safely in a lazy language.

Secondly, if you find yourself chasing up & down a program adding "IOness" to lots of functions (as a permanent feature), you probably didn't design the program right in the first place. (You have my sympathy: lots of my programs are scarcely designed at all, they started as quick hacks and "just growed". Haskell offers its sympathy by making it much easier than most languages to implement redesigns.) Of course, if you are writing anything larger than a very tiny program, it is well worth pausing before you start to decide which parts of the program need to perform IO.

Thirdly, if you are debugging and really just need to see what some data structure looks like ("bung in a printf"), there are a couple of handy kludges you can use. In the module System.IO.Unsafe there is a function unsafePerformIO :: IO a -> a. As the extraordinary type indicates, unsafePerformIO strips "IOness" from a value. The downside is, as the name implies, this operation is unsafe: there are no guarantees that it will do what you expect when you expect it to. And it breaks the Haskell type system. In theory unsafePerformIO should not exist, but in practice it's sometimes so useful that it is allowed to persist.

To put it more vividly:

Fortunately (at least for supervisors and code reviewers) you will have to import System.IO.Unsafe at the top of any module that uses unsafePerformIO, so a quick glance will reveal this transgression of good Haskell [4].

[4]We used to joke that the requirement on predeclaring labels (line numbers) in Pascal was so that supervisors could quickly reject any program that used goto, without having to read the entire code.

With all the caveats out of the way, how do we use unsafePerformIO? Here's sudanunsafe.hs. (If you don't understand the use of the $ operator here, please see my note about it.

import System.IO.Unsafe (unsafePerformIO)

sudan n x y = unsafePerformIO $ do
                putStrLn ("sudan " ++ show n ++ " " ++ show x ++ " " ++ show y)
                return (realSudan n x y)

realSudan n x y | n == 0    = x + y
                | y == 0    = x
                | otherwise = sudan (n - 1) sudan' (sudan' + y)
    where sudan' = sudan n x (y - 1)

sudTup (n, x, y) = sudan n x y
sudList [n, x, y] = sudTup (n, x, y)

main = print (sudList [1, 5, 4])

The output of this example may not be pretty: it actually varies from one Haskell environment to another, which emphasizes my point that when unsafePerformIO is evaluated is unpredictable. However, the example demonstrates that it is possible to get some handle on what sudan is doing without percolating "IOness" up and down the entire program: note that sudTup, sudList, and main have not changed at all. In a tight debugging spot, this is just the ticket; but please do tidy up the code and remove unsafePerformIO once the bugs have been squashed!

As a convenience, in the module Debug.Trace is the function trace :: String -> a -> a. This function writes its first argument to standard error, then returns its second argument. You will not be surprised to learn that trace utilizes unsafePerformIO, and therefore all the same caveats apply. Calls to trace must be excised from your program before you can consider it finished. Here's sudantrace1.hs, which traces the sudan function using trace.

import Debug.Trace (trace)

sudan n x y = trace msg realSudan n x y
    where msg = "sudan " ++ show n ++ " " ++ show x ++ " " ++ show y

realSudan n x y | n == 0    = x + y
                | y == 0    = x
                | otherwise = sudan (n - 1) sudan' (sudan' + y)
    where sudan' = sudan n x (y - 1)

sudTup (n, x, y) = sudan n x y
sudList [n, x, y] = sudTup (n, x, y)

main = print (sudList [1, 5, 4])

Exercises

17. Try sudantrace1.hs and sudanunsafe.hs in all the Haskell environments you have available. Is the output the same?

18. Write a program that uses unsafePerformIO and dumps core, or otherwise crashes.

Reflections

I wrote this tutorial for people like I was a year ago: a Haskell programmer who found my grasp of IO was still a little shaky. I'd been writing tiny and small programs in Haskell for some years, and read a number of articles and tutorials on Monadic IO, plus the Haskell Report itself, but was still unconfident about the whole business of IO.

About a year ago, it became clear that a large shell script I had been hacking on for some time was in need of a serious rewrite. Despite my best efforts, it had become too large, too messy, too inflexible, and too slow for the job. Had I been under pressure from an employer, I would probably have chosen Perl for the rewrite. One of the best things about working for yourself is that sometimes you can choose what's right over what's expedient, and I decided instead at least to try the rewrite in Haskell.

Well, I learned a lot along the way, and eventually the Haskell version worked well enough to replace the original script. Since then, I have learned even more while cleaning up my original Haskell code. Realising that I do now have a pretty confident grasp of how to do IO in Haskell, I felt I should try to lay out my particular learning curve, in the hope that it will help others.

I think the major drawback, for me, of the other articles and tutorials I've read concerning IO in Haskell is that they talk too much about monad theory. Monads are indeed a minor miracle. (I'm old enough to have programmed in a pure functional language, called "glide" if I remember correctly, which lacked monadic IO. It severely limited what the language could do.) But I've come to realise that it isn't necessary to understand monads to exploit the power of the IO monad! So in this tutorial there will be almost no talk of monads (these few paragraphs excepted).

Perhaps I should give a concrete example of the kind of bewilderment I felt. At least a couple of different articles I had read on the subject started with the monadic binding functions >> and >>=, then introduced do notation as "syntactic sugar". As a result, I would sometimes take a broken do expression and rewrite it with the monadic operators, having got the idea that I somehow needed to scrape under do's sugar coating. Unsurprisingly, this rarely brought enlightenment.

Here's a rough analogy. Pattern matching in function definitions is, it turns out, just syntactic sugar for case expressions. But I've yet to see a Haskell tutorial that starts by teaching function definition with case, before moving onto the normal, sweetened, syntax. Indeed, when I want to write a case expression, I think "backwards" from function definition, not the other way round! Similarly, now that I feel I have grasped do, I'm reasonably comfortable with the monadic binding functions.

This is not to disparage those other tutorials. No two people learn in the same way, and doubtless moving from monadic generalities to IO specifics is more efficient if it works for you. I am not claiming that this tutorial is better than any other, just different, and there's no harm in having more choice.

By the way, you might be wondering how my Haskell rewrite of that large, messy, inflexible, and slow shell script panned out. Well, the initial Haskell version was about the same length (in terms of lines of code) as the shell script. After some cleanup, it would be somewhat shorter, except it's acquired more features in the mean time. Less messy and more flexible, undoubtedly: the shell script had reached the stage where even small changes required a strong cup of coffee and several minutes poring over the code to understand how it worked before changing a line. By contrast, the Haskell version is reasonably clean, has proved easy to extend, and of course with real data structures I can do things of which I had never previously dreamed.

Speed was not the primary motivation for the rewrite, although it was extremely frustrating to work with a shell script that took nearly 2 minutes to run. (The spaghetti nature of the code also made it hard to extract small pieces to work on in isolation.) So I was surprised and delighted that the Haskell version runs in just 3 seconds. (That's with ghc -O2, but even with runghc it takes only about 10 seconds.) Given that the output is about 6M spread over 1500 files, that's shifting some. Mind, it does gobble up a lot of memory!

Tim Goodwin
November 2007

Return to the Top

Problems? Comments? Questions? Contact us by email!