Sambori_al_carrer_de_la_Corona,_València

Making FlatMap lazy

flatMap() is pretty cool. When I first learned what map() did, it got shoved into every line of code I had, regardless of how applicable it was. (That’s usually a good sign.) Then, along came flatMap(). It’s deceptively simple: its name pretty much explains exactly what it does. Useful, not game-changing, although it does let you do some pretty cool things:

extension CollectionType where
  SubSequence : CollectionType,
  SubSequence.SubSequence == SubSequence,
  SubSequence.Generator.Element == Generator.Element,
  Generator.Element : SequenceType  {
  
  func everyOf() -< [[Generator.Element.Generator.Element]] {
    guard let fSeq = first else { return [[]] }
    let rest = dropFirst(self).everyOf()
    return fSeq.flatMap {
      el in rest.map { [el] + $0 }
    }
  }
}

[[1, 2], [3, 4, 5]].everyOf() // [[1, 3], [1, 4], [1, 5], [2, 3], [2, 4], [2, 5]]

flatMap() is much, much more than just that, though. For instance, if I try to rewrite that function without the guard, something funny happens:

func everyOf() -< [[Generator.Element.Generator.Element]] {
  return first.flatMap {
    el in dropFirst(self).everyOf().map { [el] + $0 }
  }
}

It’s much more compact, kind of cooler looking. (although less efficient: the let rest in the first version is being recalculated for every element in the second version) But it doesn’t compile. The problem is pretty easy to see – first is optional – but that’s not the error. Looking at the documentation for flatMap(), apparently it can work on an Optional, albeit in a strange way:

func flatMap(x: T?, f: @noescape(T) -< U?) -< U?

Returns f(self)! iff self and f(self) are not nil.

Whaaaaat? That definitely doesn’t make sense. I mean, I can kind of understand what it does, but that has nothing to do with the other flatMap(). Why would you give these two unrelated functions the same name? (I realise I’m mixing Swift 1.2 and Swift 2 here, by the way) This is how they get you hooked. They slip monads in right under your nose. (and it’s so sneaky! no mention of monads, nothing in the Swift book, just some small, understated function signature.)

But that’s not what I’m going to talk about. After the rabbit hole of flatMap() (after? during?), you get a feel for what’s going on with these functional functions. You get a picture in your mind of what exists and what doesn’t, even before reading the documentation. flatMap() and map() are bread-and-butter functions in the standard library of most functional programming languages, and there are well accepted conventions for what they do. Their Swift counterparts follow those conventions.

There’s actually even more in Swift than just the conventions. A concept very close to my own heart, laziness, is implemented pretty impressively on several of the sequence functions in the standard library: map(), filter(), contains()enumerate(), etc. None of these force the full evaluation of the underlying sequence they’re called on, and on top of that, map(), filter(), and enumerate() produce sequences that are themselves lazy. However, poor old flatMap() is left out in the cold. It has no lazy version – it just returns an array. A while ago, I tried to add my own lazy version.

To do this, you’ve got to get to grips with the internals of sequences. In particular, you’ve got to wrangle GeneratorType.

Encapsulates iteration state and interface for iteration over a sequence.

The protocol has one requirement: any GeneratorType must have a method called next(), which returns an Optional.

public struct NaturalNumbers: GeneratorType {
  
  private let upTo: Int
  private var i   : Int = 0
  
  public mutating func next() -< Int? {
    return i == upTo ? nil : i++
  }

  public init(upTo: Int) {
    self.upTo = upTo
  }
}

var g = NaturalNumbers(upTo: 5)

g.next() // 0
g.next() // 1
g.next() // 2
g.next() // 3
g.next() // 4
g.next() // nil

This will look pretty familiar to a lot of people. Generators in Python are one of the most fun, interesting aspects of the language, and Java has a similar construct to Swift. (unlike C# and Python, though, Swift has no yield keyword. The reasons – which I do not understand – are complex, and possibly fundamental to the language, so Swift may never get a yield. Still, though, managing state yourself is a pain, and defer seems to be doing something similar to what yield would do. We can hope!)

Generators look a little strange to me, though. They clash a lot of different ways of doing things. First off, they’re about as stateful as you can get. And they come with all of that dangerous goodness of stateful things. Take this, for instance:

var i = 0

public struct NaturalNumbers: GeneratorType {
  
  private let upTo: Int
  
  public mutating func next() -< Int? {
    return i == upTo ? nil : i++
  }

  public init(upTo: Int) {
    self.upTo = upTo
  }
}

The bugs here are fantastic. The obvious one is messing with i outside of the generator:

var g = NaturalNumbers(upTo: 4)

g.next() // 0
g.next() // 1
g.next() // 2

i = 0

g.next() // 0

How about we make two generators:

var g0 = NaturalNumbers(upTo: 3)
var g1 = NaturalNumbers(upTo: 6)

g0.next() // 0
g0.next() // 1

g1.next() // 2
g1.next() // 3

Or, if we pass the stop point for the lower generator:

g0.next() // 0
g0.next() // 1

g1.next() // 2
g1.next() // 3
g1.next() // 4

g0.next() // 5
g0.next() // 6
g0.next() // 7

g1.next() // 8

Now both of them will continue indefinitely.

None of the bugs above will raise any kind of compiler warnings. The only hint you get as to the dodginess of the generators is that they have to be declared with var. Unless you forget to mark the next() method as mutating. (which is perfectly valid, since i is outside of the struct) If you go passing generators around a lot, sometimes the mutable and immutable semantics get messed up: say you pass a generator to a function which passes it to another function – the second function can mutate the generator all it wants, and the first function can go on thinking it’s completely untouched. If you make an array of generators, and the compiler will helpfully tell you it should be let. (Although, I have to concede, you’d be asking for it there.)

The weirdness is that most generators are structs, not classes. A class, with its reference semantics, is clearly a better suit. In fact, the construct you get for making your own generators, anyGenerator, is a class (one of the few in the standard library). If you want to go making your own types that conform to GeneratorType, though, there’s nothing stopping you from making them all structs, and perverting the value semantics as much as you want.

Why is it this way? Joe Groff is pretty amazingly participatory on twitter. Usually, whenever I have some question about why something was done in Swift, he’s answered it, several times over, six months ago:

So there’s the answer. Performance.

Maybe you’re not yet impressed with the bugs above. Alright, then, how about this: one of the tempting uses for generators is functional-style head-tail decomposition. After all, once you call next(), the generator is left with only the tail. (Another clash! FP and OOP this time).

extension GeneratorType {
  mutating func uncons() -< (Element, Self)? {
    return next().map { ($0, self) }
  }
}

Fantastic! Let’s use it in something complex. This is a version of the everyOf() (cartesian product) function from above, using generators:

extension GeneratorType where Element : CollectionType {
  mutating func product() -< [[Element.Generator.Element]] {
    return uncons().map {
      (head, var tail) in
      return head.flatMap {
        el in tail.product().map {
          [el] + $0
        }
      }
    } ?? [[]]
  }
}

Except that it doesn’t. The first time you traverse tail is the last time you traverse tail. Here’s what you get:

var g = [[1, 2], [3, 4]].generate()

g.product() // [[1, 3], [1, 4], [2]]

Easy enough to fix, though. Just force a thunk on tail:

extension GeneratorType where Element : CollectionType {
  mutating func product() -< [[Element.Generator.Element]] {
    return uncons().map {
      (head, var tail) in
      let thunked = tail.product()
      return head.flatMap {
        el in thunked.map {
          [el] + $0
        }
      }
    } ?? [[]]
  }
}

That’s actually not the end of the bugs with generators – there are some rules you have to follow. We’re in a fast-and-loose part of the language now, so the compiler won’t enforce these rules. You’ve got to do it yourself. For instance:

  • Once a generator gives you nil, leave it alone.

This is a big one. You cannot call a generator’s next() method once it has already returned nil. It mightn’t be immediately obvious why. Look back up to the NaturalNumbers generator above: it has its own, personal logic for finding out when it finishes. Now, as it happens, it will keep returning nil once it does so once, but there’s no guarantee that every generator will do that. The next() method could have been written like this:

return ++i == upTo ? nil : i

It makes sense in that contrived example, but why in the standard library? Again, Joe Groff to the rescue:

So it’s a rule, and we’re not to violate it. For instance, this handy looking function, which zips two sequences together, but pads the shorter out with nil:

func zipAndPad<
  S0: SequenceType, S1: SequenceType
  <(s0: S0, _ s1: S1)
  -< AnySequence<(S0.Generator.Element?, S1.Generator.Element?)< {
    var (g0, g1) = (s0.generate(), s1.generate())
    return AnySequence( anyGenerator {
      _ -< (S0.Generator.Element?, S1.Generator.Element?)? in
      let (e0, e1) = (g0.next(), g1.next())
      return (e0 != nil || e1 != nil) ? (e0, e1) : nil
    } )
}
zipAndPad([1, 2, 3], [1, 2]) // (1, 1), (2, 2), (3, nil)

Is not allowed. (The actual way to manage it is a bit of a headache)

That in mind, let’s look at what flatMap() does eagerly.

Return an Array containing the non-nil results of mapping transform over self

flatMap(_: (Self.Generator.Element) -< T?) -< [T]

Return an Array containing concatenated results of mapping transform over self

flatMap<S : SequenceType<(_: (Self.Generator.Element) -< S) -< [S.Generator.Element]

There are two distinct behaviours there. The first works with optionals, the second with sequences. The first one seems a little strange to me, actually. I’m not sure if flatMap() in other languages acts like that. I mean, if SequenceType has a flatMap() that works like that, shouldn’t Optional have a version like this?

func flatMap<S : SequenceType< ( T -< S) -< S

But it doesn’t make sense. I have no idea what it would do. At any rate, I’m happy it exists: it’s an incredibly useful function. So let’s try make it lazy. Here’s our generator:

public struct FlatMapOptGen<G : GeneratorType, T< : GeneratorType {
  private let transform: G.Element -< T?
  private var g : G
  mutating public func next() -< T? {
    while let next = g.next() {
      if let ret = transform(next) {
        return ret
      }
    }
    return nil
  }
}

Pretty simple. It uses some generator its given – from the underlying sequence – and a while loop. No extra calls, no problems. The only thing is… when you’re in a generator, you’ve got the next() method available to you, even in the next() method itself. I smell recursion. Here’s what it looks like:

  mutating public func next() -< T? {
    return g.next().flatMap { transform($0) ?? next() }
  }

Ooh, that’s cool. We even get to use flatMap(). Unfortunately, it’s far, far less efficient than the iterative version. I’m not sure if tail-call optimisation is possible for this method in particular, but at any rate, TCO isn’t something you can rely on in Swift:

So the actual implementation can’t have this lovely pretty version. The rest is boilerplate:

public struct FlatMapSeq<S0 : SequenceType, S1 : SequenceType< : SequenceType {
  private let seq: S0
  private let transform: S0.Generator.Element -< S1
  public func generate() -< FlatMapGen<S0.Generator, S1< {
    return FlatMapGen(transform: transform, g: seq.generate())
  } 
}
public extension SequenceType {  
  func flatMap<S : SequenceType<
    (transform: (Generator.Element) -< S) -< FlatMapSeq<Self, S< {
      return FlatMapSeq(seq: self, transform: transform)
  }
}

I’m not sure I fully understand the naming conventions for the standard library, but I think that FlatMapSequenceView would be the proper name (instead of FlatMapSeq).

The final one is the sequence flattening. Here’s what it looks like finished:

public struct FlatMapGen<G : GeneratorType, S : SequenceType< : GeneratorType {
  
  private let transform: G.Element -< S
  private var g : G
  private var innerGen : S.Generator?
  
  public mutating func next() -< S.Generator.Element? {
    for ; innerGen != nil; innerGen = g.next().map(transform)?.generate() {
      if let next = innerGen?.next() {
        return next
      }
    }
    return nil
  }
  private init(transform: G.Element -< S, var g: G) {
    self.transform = transform
    self.innerGen = g.next().map(transform)?.generate()
    self.g = g
  }
}

C-style for loops, vars left right and centre, this is very imperative code. I think that’s kind of cool – it’s a lazy flatMap(), one of the bedrock functions in functional languages, and to get it in Swift, we’ve got to go in completely the opposite direction.

At long last, there’s still one problem. If we add these as methods on SequenceType, we’re going to override the normal flatMap(). That’s no good. You can go through each of the lazy sequence structs, or, you can make an overarching Lazy protocol. This is what I did (all these functions are from SwiftSequence), but it raises some other questions.

For instance, say we want to go from lazy to eager. It’s very, very easy to do that on a lazy sequence: just wrap it in the Array init function, and you’re thunked right away. On the other hand, going from eager to lazy? Well, I got a blog post out of it. I’m tempted to think, then: why isn’t Swift lazy by default? Laziness is certainly cooler, and surely you’re not losing anything, if you always have a thunk available to you?

That experiment has already been done. And the verdict is pretty clear: default laziness is more of a pain than it’s worth, especially in performance-critical code. In Haskell, in particular, performance is notoriously hard to reason about. Simon Peyton-Jones has a lecture on the topic (kind of on the topic): The next Haskell will be Strict.It seems everyone gets tempted, and then burned: even in early Swift, laziness had a more default role in Sequences than it does now. So, cool and fun as laziness is, like the real-world version, it’s good only in moderation.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s