Haskell Exchange 2014

I was surprised to find that despite all-star line-up of speakers, this year’s Haskell Exchange was not as well attended and lower key than the 2013 edition. We were back to SkillsMatter headquarters, this time round with gym music blasting from the speakers during breaks and with no aircon for added frowziness. The talks were good though, some of them very much so.

Safe, Zero-Cost Coercions in Haskell

Simon Peyton-Jones arrived bare-footed and dispelled one of the illusions I have been holding on to for quite a while, that the minimal core language of GHC, System FC, is sound beyond any reasonable doubt. After all, hasn’t this soundness been proven in peer-reviewed papers? It has, but even in the intellectually astute and pure realms of Haskell, adding new, seemingly innocent feature such as newtype can lead to catastrophic consequences. In this case it was GHC bug #1496, which demonstrated how generalised newtype deriving coupled with type families can lead to GHC producing code that segfaults. This has been a source of great embarassment for Simon and took seven years to sort out. The solution relies on restricting under what circumstances type coercions can be performed. First, there is a new built-in type class introduced in GHC, alongside with a function:

class Coercible a b

coerce :: Coercible a b => a -> b

Coercible a b instance indicates that the runtime representation of values of type a is identical to that of values of type b, so no work needs to be done when converting from one to the other. Instances of this class are auto-generated by GHC for custom data types and newtypes, for example

newtype HTML = MkHTML String

will lead to generation of

instance Coercible HTML String

coerce does not introduce any runtime code, but can be invoked by the programmer to transform data between types. Lifting instances, such as

instance Coercible a => Coercible [a]

are also generated for parameterised types, including user-defined types. Therein lies a problem:

newtype HTML = Mk String

type family F a
type instance F String = Char
type instance F HTML   = Bool

data Tricky a = MkT a (F a)

veryNaughty :: Tricky String -> Tricky HTML
veryNaughty ts = coerce ts

What would coerce do to the second type argument, which in Tricky String is Char but in Tricky HTML should be a Bool? This is where the unsoundness stems from. To address this, a new concept needs to be introduced: roles of types parameters. There are three possible roles:

  • representational – not used in a type-function argument
  • nominal – used in type function argument
  • phantom – the argument does not appera at all

Nominal parameters should not lead to type instance generation. For example, in:

type family F a
-- instances ..

data T a b c = T1 b (F a)
             | T2 a b

T would have instance Coercible b b' => Coercible (T a b c) (T a b' c') lifting generated for it, but not instance Coercible a a' => Coercible (T a b c) (T a' b c'). Roles can be inferred, but if a restriction is desirable they can be specified explicitly using syntax

type role Map nominal _

where we say that the role of first type argument to Map is nominal and the second should be inferred. This specific example is actually useful, because for Map the internal layout is determined by the ordering of keys (i.e. Ord instance for the first type parameter), and that could change under coercion to a newtype! Lennart Augustsson pointed out that the necessity to write these can lead to buggy code – an assertion difficult to debunk. Another drawback to which Lennart drew attention was that the current scheme does not work for higher-kinded types.

While Simon argued that nothing simpler than roles can give sound account of type functions, the talk served as a useful reminder that even such rigorously developed and internally simple languages as Haskell suffer from the complexity that software developers require from their tools. After all, newtypes, type families and so on were introduced to address genuine engineering challenges.

Categories for the Working Haskeller

We all now know what a functor is. An algebra for functor F is a pair (F, f) where f :: F B -> B. A homomorphism between (B, f) and (C, g) is a function h :: B -> C such that the following diagram commutes:

digraph m {
node[shape=none,fontcolor="#657b83"];
edge[color="#657b83",fontcolor="#657b83"];
bgcolor=transparent;
rankdir=LR;
"F B" -> B [label=f];
"F C" -> C [label=g];
"F B" -> "F C" [label="F h"];
B -> C [label=h];
{rank=same "F B" "F C"}
{rank=same B C}
}

Algebra (B, f) is initial if there is a unique homomorphism from it to each (C, g).

How does this help? Jeremy Gibbons argued that it allows for construction of elegant APIs – mostly as an inspiration, mind you, not the end product that users look at. Take folds over a list as an example. foldr :: (a -> b -> b) -> b -> [a] -> b function encodes a certain traversal scheme for lists, but if we want a similar type of collapsing performed on our custom data type, a binary tree for example, we are on our own. Or are we?

First, we can separate two concerns in the data type definition: one that specifies the shape, the other that it’s a recursive data type. For a list it can be done as follows:

data ListS a b = NilS | ConsS a b -- shape
data Fix s a = In (s a (Fix s a)) -- recursion
type List a = Fix ListS a         -- put together

Here, list [1,2,3] would be represented by In (ConsS 1 (In (ConsS 2 (In (ConsS 3 (In NilS)))))). As ListS has two type parameters (the type of element and the type of the tail), we can construct a “double map” or (bimap) function that will map over both of them:

bimap :: (a -> a') -> (b -> b') -> ListS a b -> ListS a' b'
bimap f g NilS        = NilS
bimap f g (ConsS a b) = ConsS (f a) (g b)

and implement a fold using it:

foldList :: (ListS a b -> b) -> List a -> b
foldList = f . bimap id (foldList f) . out

This seems overly complicated, but lets us generalise beyond lists to other shapes:

class Bifunctor s where
  bimap :: (a -> a') -> (b -> b') -> s a b -> s a' b'

out :: Fix s a -> s a (Fix s a)
out (In x) = x

fold :: Bifunctor s => (s a b -> b) -> Fix s a -> b
fold f = f . bimap id (fold f) . out

Why Bifunctor? Because it’s a functor in both type arguments. That means that ListS Integer is a functor. Furthermore, (List Integer, In), (Integer, add) are some of its algebras:

In :: ListS Integer (List Integer) -> List Integer
add :: ListS Integer Integer -> Integer

and sum :: List Integer -> Integer is a homomorphism for these two. Furthermore, (List Integer, In) is the initial algebra for ListS Integer, and the unique homomorphism to a given (C, g) is fold g. The commuting of the diagram translates to condition h . f = g . bimap id h. Categorical duality gives us the morphisms in opposite direction (unfolding) virtually for free and in a from that makes the duality obvious:

unfold :: Bifunctor s -> (b -> s a b) -> (b -> Fix s a)`
unfold f = In . bimap id (unfold f) . f

What’s best, we can apply this to all types that can be represented using unfixed shape functor. Yay. That’s what Tony Morris calls “taking DRY principle seriously”.

Jeremy recommended we check out the MSc course in Software Engineering offered by Oxford University. I definitely will take a close look.

Performance Measurement and Optimisation in Haskell

There are more ways to screw up microbenchmarking than one can list in three quarters of an hour, and there is a solution to them that fits in two words: use criterion. Bryan O’Sullivan borrowed ideas from leading tools to put together something that is easy to use and accurate. Always wanted to benchmark that dodgy fib 1 = 1; fib 2 = 1; fib n = fib (n - 1) + fib (n - 2)? There you go:

import Criterion.Main

main = defaultMain [
  bgroup "fib" [ bench "1" $ whnf fib 1
               , bench "5" $ whnf fib 5
               , bench "9" $ whnf fib 9
               ]]

Criterion will take care that the thunk gets evaluated in every benchmark iteration, will detect how noisy the system is, produce kernel density estimate, use ordinary least-squares regression to factor out the cost of time measurements themselves and generate nice HTML reports with graphs and all – all this while making sure that you don’t forget that performance measurement is a statistical activity and you should get used to working with distributions rather than single statistics such as mean or median.

Finally, Bryan’s pro tips for performance optimisation:

  • keep overhead of data types in mind
  • keep uses of your data in mind – e.g., can we stream or do we have to consume all?
  • keep strictness of code and data in mind – neither strictness nor laziness is always appropriate.

Rest - Building APIs Easily

Erik Hasselink kicked off the industrial afternoon with an overview of Rest, a library for (you guessed it) writing RESTful web services that has recently been open sourced by his company, Silk. While to code snippets presented by Erik did not seem overly exciting, what was impressive was the list of features provided by the library, chief among them generation of client libraries in Haskell in Javascript, so that you can start using your service from a separate app without writing any additional code. The library is web framework agnostic and Silk wrote bindings for Happstack, Snap and WAI.

Coping with Change: Data Schema Migration in Haskell

Chris Dornan and Adam Gundry took on a task of migrating a video sharing platform implemented in Ruby onto a new software stack. One of the main challenges is the requirement to carry on migrating data as the software undergoes active development. To that end a schema description DSL has been devised:

ur :: User = record
           | name :: NAme
           | admin :: AdminType

nm :: Name = basic String

This is used for describing both the data written to persistent stores and the communication protocol. To support migration of data this has been extended with human- and machine-readable change logs:

ur :: User
    = record
      | name :: Name
      | admin :: Admin Type
      | password :: ? Password

changes

  version "0.2"
    changed record User
      field added password :: ? Password

  version "0.1"

The build process verifies that the change log is in sync with the entity definition. Rollbacks are not currently supported but are on Adam and Chris’s agenda.

Structural Typing for Structured Products

This talk made me sad. Peter Marks and Tim Williams described FPF Lucid, a DSL they are developing at Barclays for describing payoffs of exotic equity derivatives. A Lucid function looks like this:

function capFloor(perf, {cap, floor})
  return max(floor, min(cap, perf))
end

and its type inferred by the compiler is

capFloor: Num a => (a, {cap: a, floor: a|r}) -> a

where r is row variable, indicating possible other fields of the record. Users of the language don’t have to specify any types and can use records of different shapes as inputs to the same function as long as the records contain required fields – i.e. the typing is structural, not nominal. This gives the language a “dynamic” feel, like coding in Python or Ruby, but with the benefits of comprehensive compile-time type checking.

Designing and implementing a programming language, with full Hindley-Milner type inference involving structural types and row polymorphism, incorporating some cutting-edge programming language research results, coding in Haskell, all that in perhaps a non-ideal, but familiar banking environment, with direct influence on the company’s competitive advantage, sounded close to a dream job. And to think that I spent over a year lurking in FPF team’s chat channel and did not work up the nerve to speak to them and ask if they had any openings before deciding on leaving Barclays for good…

Strongly Typed Publish/Subscribe over WebSockets via Singleton Types

Like Erik, Oliver Charles works on web APIs. In his case the focus was not on REST though, but on subscription channels implemented over WebSockets. The challenge there was ensuring that the server-side implementation of Socket.IO protocol is reliable. To that end relying on strings as event representation was not good enough, as it did not guarantee that the correct type of data was being sent for a given subscription. It has been too late in the day for me to comprehend all of the details of the solution, but in short, combining GADTs, TypeFamilies and DataKinds gives enough of dependent typing to set up something called singleton types which can in turn be used to tag the types of data sent over channel and ensure that types of responses matches that of the request.

Park Bench

The perennial question of how to popularise Haskell in the industry sprung up again, with no new insights. After half an hour I finally registered that the first ten questions will be awarded with a copy of Real World Haskell, and while I don’t have much use for it at home, it could make a decent conversion device that could be put in some conspicuous place at work. My question was around the need for module system improvements such as Backpack. It was met mostly with blank stares from the panel, clearly none of the esteemed park benchers found this area of Haskell enough of a problem to pay close attention to the recent developments.

On a separate note, one interesting piece of opinion from Duncan Coutts was that type classes are overused and often a more suitable solution for where an OO-like construct is called for, is just a record with functions. And that’s all that stuck in my head from this small, but excellent Exchange.