8c6794b6.github.io

Benchmarking arrow syntax

Recently I read a paper about "Audio Processing and Sound Synthesis in Haskell" which uses Arrow and [Double] for its building block of signal data. While reading the document, it shows that using Arrow syntax will make the code slow, due to its wrapping and unwrapping in desugared tuples.

Does arrow syntax make the code slow?

> {-# LANGUAGE Arrows, BangPatterns, NoImplicitPrelude #-}
> module BenchArrowSyntax where
> import Control.Arrow
> import Control.Category
> import Prelude hiding ((.), id)
>
> import Control.DeepSeq
> import Criterion.Main

Comparing performance of code written in below styles:

  1. Pure function, no Arrows (plain).
  2. Arrows, without syntax sugar (bitter).
  3. Arrows, with syntax sugar (sweet).
  4. Arrows, with syntax sugar, inlined (sweet-inline).

The GHC version used to benchmark this code was 7.0.3. The main function looks like this:

> main :: IO ()
> main =
>  let !xs = let ys = [1..10000] in ys `deepseq` ys
>  in  defaultMain
>        [ bench "plain" (whnf (map plain) xs)
>        , bench "bitter" (whnf (unA bitter) xs)
>        , bench "sweet" (whnf (unA sweet) xs)
>        , bench "sweet-inline" (whnf (unB sweet_inline) xs)
>        ]

Showing the result of running the compiled code:

$ ghc --make -O2 -o a.out -main-is BenchArrowSyntax BenchArrowSyntax.lhs
[1 of 1] Compiling BenchArrowSyntax ( bench_arrow_syntax.lhs, bench_arrow_syntax.o )
Linking a.out ...
$ ./a.out
warming up
estimating clock resolution...
mean is 16.06060 us (40001 iterations)
found 2976 outliers among 39999 samples (7.4%)
  1930 (4.8%) high mild
  1041 (2.6%) high severe
estimating cost of a clock call...
mean is 103.2901 ns (61 iterations)

benchmarking plain
mean: 29.50057 ns, lb 29.43500 ns, ub 29.60982 ns, ci 0.950
std dev: 422.9860 ps, lb 275.5593 ps, ub 700.1025 ps, ci 0.950

benchmarking bitter
mean: 29.60260 ns, lb 29.57317 ns, ub 29.63793 ns, ci 0.950
std dev: 164.7924 ps, lb 139.0684 ps, ub 196.2178 ps, ci 0.950

benchmarking sweet
mean: 261.4728 ns, lb 261.1549 ns, ub 261.9058 ns, ci 0.950
std dev: 1.876539 ns, lb 1.454937 ns, ub 3.033409 ns, ci 0.950

benchmarking sweet-inline
mean: 135.2866 ns, lb 134.9794 ns, ub 135.5847 ns, ci 0.950
std dev: 1.552157 ns, lb 1.412139 ns, ub 1.726597 ns, ci 0.950

It shows that the code written with arrow syntax runs about 9 times slower than the others, and there's not so much difference between plain function from arrow codes written without syntax sugar. What's going on with arrow syntax?

Below 2 functions were used in this benchmark. Simple addition and multiplication for Int.

> func1 :: Int -> Int
> func1 = (+3)
>
> func2 :: Int -> Int
> func2 = (*7)

The type to implement Arrow instance for bitter and sweet. It is a simple newtype wrapper for mapping function over list.

> newtype A a b = A {unA :: [a] -> [b]}
>
> instance Category A where
>   id = A id
>   A f . A g = A (f . g)
>
> instance Arrow A where
>   arr f = A (map f)
>   first (A f) = A $ uncurry zip . first f . unzip

First, the plain function. In other word, a function with (->) type.

> plain :: (->) Int Int
> plain = func2 . func1

Dumped core of plain, removed couple type signatures and variable suffixes for clarity.

| plain :: Int -> Int
| plain = \(x :: Int) -> case x of _ { I# x1 -> I# (*# (+# x1 3) 7) }

Next, arrow without syntax. Manually wrapping functions with arr, and combining with (>>>) operator.

> bitter :: A Int Int
> bitter = arr func1 >>> arr func2

Excerpt of dumped core of bitter. Again, GHC type prefix and variable suffixes are removed.

| bitter :: A Int Int
| bitter =
|   bitter1
|   `cast` (sym (A Int Int)
|           :: ([Int] -> [Int]) ~ BenchArrowSyntax.A Int Int)
|
| bitter1 :: [Int] -> [Int]
| bitter1 = \(x :: [Int]) -> map bitter2 x
|
| bitter2 :: Int -> Int
| bitter2 = \(x :: Int) -> case x of _ { I# x1 -> I# (*# (+# x1 3) 7) }

Though delegation from bitter to bitter1 and bitter2 exists, from what we see in the function body of bitter2, the guts of its definition remains same as plain.

And then, with arrow syntax. In its body, result of each arrow are bounded to varialbe with explicit name. Requires Arrows LANGUAGE pragma.

> sweet :: A Int Int
> sweet = proc x -> do
>   y <- arr func1 -< x
>   z <- arr func2 -< y
>   returnA -< z

Excerpt dumped core of sweet. We see zipping and unzipping done during the desugaring of syntax, in main12 and $fArrowA3.

| sweet :: A Int Int
| sweet =
|   main12 `cast` (sym (NTCo:A Int Int) :: ([Int] -> [Int]) ~ A Int Int)
|
| main12 :: [Int] -> [Int]
| main12 =
|   \(x :: [Int]) ->
|    case $fArrowA3 (map main17 x) of _
|      { (# ww1, ww2 #) ->
|        case $fArrowA3 (map main16 (zip (map main15 ww1) ww2)) of _
|           { (# ww4, ww5 #) -> map main14 (zip (map main13 ww4) ww5)
|      }
|    }
|
| main17 :: Int -> (Int, ())
| main17 = \(x :: Int) -> (x, ())
|
| main16 :: (Int, ()) -> (Int, ())
| main16 =
|   \(x :: (Int, ())) ->
|    (case x of _ { (y, ds1) -> case ds1 of _ { () -> y } }, ())
|
| main15 :: Int -> Int
| main15 = \(x :: Int) -> case x of _ { I# x1 -> I# (+# x1 3) }
|
| main14 :: (Int, ()) -> Int
| main14 =
|   \(x :: (Int, ())) ->
|    case x of _ { (z2, ds1) ->
|      case ds1 of _ { () -> z2 }
|    }
|
| main13 :: Int -> Int
| main13 = \ (x :: Int) -> case x of _ { I# x1 -> I# (*# x1 7) }
|
| Rec {
| $fArrowA3 :: forall b d. [(b, d)] -> (# [b], [d] #)
| $fArrowA3 =
|   \(w :: [(b, d)]) ->
|    case w of _ {
|      [] -> (# [], [] #);
|      : y ys ->
|        case y of _ { (a1, b) ->
|        let {
|          ds1 :: ([b], [d])
|          ds1 =
|            case $fArrowA3 ys of _ { (# ww1, ww2 #) -> (ww1, ww2) } } in
|        (# : a1 (case ds1 of _ { (as, bs) -> as }),
|           : b (case ds1 of _ { (as, bs) -> bs }) #)
|        }
|    }
| end Rec }

Finally, Arrow syntax with INLINE pragma. The only difference between newtype A and B is, the use of INLINE in function first. The function body of sweet and sweet_inline, and definitions for Arrow class in A and B are identical.

> newtype B a b = B {unB :: [a] -> [b]}
>
> instance Category B where
>   id = B id
>   B f . B g = B (f . g)
>
> instance Arrow B where
>   arr f = B (map f)
>   {-# INLINE first #-}
>   first (B f) = B $ uncurry zip . first f . unzip
>
> sweet_inline :: B Int Int
> sweet_inline = proc x -> do
>   y <- arr func1 -< x
>   z <- arr func2 -< y
>   returnA -< z

Excerpt dumped core of sweet_inline. Note that zip function is not used. Instead, more Rec blocks are used. sweet_inline runs 2x faster than sweet, a big difference made from single INLINE pragma.

| sweet_inline :: B Int Int
| sweet_inline =
|   main8 `cast` (sym (NTCo:B Int Int) :: ([Int] -> [Int]) ~ B Int Int)
|
| main8 :: [Int] -> [Int]
| main8 =
|   \(x :: [Int]) ->
|    case $wgo x of _ { (# ww1, ww2 #) ->
|      case main_go1 ww1 ww2 of _ { (x1, y) ->
|        main_go x1 y
|      }
|    }
|
| Rec {
| $wgo :: [Int] -> (# [Int], [()] #)
| $wgo =
|   \(w :: [Int]) ->
|    case w of _ {
|      [] -> (# [] @ Int, [] @ () #);
|      : y ys ->
|        let {
|          ys1 :: ([Int], [()])
|          ys1 =
|            case $wgo ys
|            of _ { (# ww1, ww2 #) ->
|            (ww1, ww2)
|            } } in
|        (# : y (case ys1 of _ { (as, bs) -> as }),
|           : () (case ys1 of _ { (as, bs) -> bs }) #)
|    }
| end Rec }
|
| Rec {
| main_go1 :: [Int] -> [()] -> ([Int], [()])
| main_go1 =
|   \(ds :: [Int]) ->
|    case ds of _ {
|      [] -> z;
|      : y ys ->
|        let {
|          _x :: Int
|          _x = y of _ { I# x -> I# (+# x 3) } } in
|        let {
|          ys1 :: [()] -> ([Int], [()])
|          ys1 = main_go1 ys } in
|        \ (ds1 :: [()]) ->
|          case ds1 of _ {
|            [] -> n_r2wU;
|            : y1 ys2 ->
|              let {
|                ys3 :: ([Int], [()])
|                ys3 = ys1 ys2 } in
|              (: (case y1 of _ { () -> _x })
|                 (case ys3 of _ { (as, bs) -> as }),
|               : ()
|                 (case ys3 of _ { (as, bs) -> bs }))
|          }
|    }
| end Rec }
|
| n_r2wU :: ([Int], [()])
| n_r2wU = ([], [])
|
| Rec {
| main_go :: [Int] -> [()] -> [Int]
| main_go =
|   \(ds :: [Int]) ->
|    case ds of _ {
|      [] -> z1;
|      : y ys ->
|        let {
|          _x :: Int
|          _x = case y of _ { I# x -> I# (*# x 7) } } in
|        let {
|          ys1 :: [()] -> [Int]
|          ys1 = main_go ys } in
|        \ (ds1 :: [()]) ->
|          case ds1 of _ {
|            [] -> [];
|            : y1 ys2 ->
|              : (case y1 of _ { () -> _x })
|                (ys1 ys2)
|          }
|    }
| end Rec }

So, indeed, arrow syntax made the code slow, as written in the paper, mainly due to those use of tupling and untupling to reference varialbes by names, as we can find in sweet and sweet_inline. When arrow is used without syntax sugar, and there's no tuple packing nor unpacking, its performance is almost same as pure function variant, as we see in comparison of plain and bitter.

When do want to use arrow syntax, even we know that it runs slower? The main cause of slow down was packing and unpacking of tuples. When we see that these use of tupes are inevitable, it may show a better looking code. I have not yet met a practical situation heavily using arrow syntax, which seems valueable enough for performance trade-off.

TAGGED: haskell, arrow, performance