Décorations_rues_Montignac

Sequence Conformance

So you want to conform to SequenceType. Maybe you’re even more ambitious than that: CollectionType, MutableSliceable, RangeReplaceableCollectionType – you want it all. What’s involved?

SequenceType

This is the most basic kind of sequence: anything that conforms to it can be iterated with a for...in loop, and gets some cool methods like map, flatMap, reduce, etc. It has one basic requirement: a generate() method, which returns something that conforms to GeneratorType.

GeneratorType

Generators are stateful objects that represent iteration. They must have one method – next() – which returns an optional. This optional should be successive elements of the sequence, until the sequence is finished, when it should be nil. (the protocol only requires the generator to return nil once: calling next() after it has returned nil is not allowed) These can’t be iterated with a for...in loop. If we take the example of iterating through an array, it’s easy to see what’s going on:

let nums = [1, 2, 3]

// nums is an Array, which conforms to
// SequenceType 

var numGenerator = nums.generate()

// numGenerator is an IndexingGenerator, which
// conforms to GeneratorType 

numGenerator.next() // 1?
numGenerator.next() // 2?
numGenerator.next() // 3?
numGenerator.next() // nil

for i in nums {
  // Here, the for-loop calls the generate() method on nums.
}

// The for-loop is equivalent to this:

var g = nums.generate()
while let i = g.next() {

}

There’s one extra complication here: IndexingGenerator also conforms to SequenceType. How? Like this:

extension IndexingGenerator : SequenceType {
  func generate() -> IndexingGenerator {
    return self
  }
}

You’ll actually see this with a lot of the standard library generators. Also, because of default methods on protocols, anything that conforms to GeneratorType can be made to conform to SequenceType with no extra code:

extension SomeGenerator : SequenceType {}

Swift will fill in the method for you.

AnyGenerator

Declaring a new type with all of the conformances and whatnot needed can be a bit of a pain, so Swift has a few helpers to make generators and sequences a bit easier.

The first is a class called AnyGenerator. It’s worth noting that it’s a class, rather than a struct: since generators are stateful and reference-y, a class is a better semantic fit. (However, the std lib generators are all structs: this is because a struct allows for much faster iteration. It’s worth keeping in mind if you’re defining your own sequence where performance may be important)

Making your own generator with AnyGenerator is easy, though. You can wrap it around some other generator, or you can use a closure that represents the next() method:

var i = 1
var upToThree = anyGenerator { i <= 3 ? i++ : nil }
upToThree.next() // 1
upToThree.next() // 2
upToThree.next() // 3
upToThree.next() // nil

Note the lower case: anyGenerator is a function that produces an AnyGenerator, the class.

(AnyGenerator also conforms to SequenceType)

Making a sequence

So, to define your own sequence, you can use AnyGenerator for your generate() method:

struct UpTo : SequenceType {
  private let from, to: Int
  func generate() -> AnyGenerator<Int> {
    var i = from
    return anyGenerator { i <= self.to ? nil : i++ }
  }
}

There’s one problem with SequenceType since beta 5, though. As an example, here’s a List type:

enum List<Element> : GeneratorType, SequenceType {
  case Nil
  indirect case Cons(Element, List<Element>)
  
  mutating func next() -> Element? {
    guard case let .Cons(x, xs) = self else { return nil }
    self = xs
    return x
  }
}

In those few lines, we actually get a surprising amount of default methods. dropFirst(), for instance:

let list = List.Cons(1, .Cons(2, .Cons(3, .Nil)))

list.dropFirst()

Previously, that function was only available on CollectionTypes. Here, we get a default implementation that returns an AnySequence. In a list, though, dropFirst() could be made more efficient:

extension List {
  func dropFirst(n: Int = 1) -> List {
    switch (n, self) {
    case (0, _), (_, .Nil): return self
    case let (_, .Cons(_, xs)): return xs.dropFirst(n - 1)
    }
  }
}

But now we get an error: type 'List' does not conform to protocol 'SequenceType'. Because we’ve overridden one default protocol method, we lose all the others. Swift was expecting several methods for slicing, including dropFirst, dropLast, prefix, etc. From those, it would infer the SubSequence typealias. When we were using the default methods, the type inferred was AnySequence. Not anymore. Now it’s List, so when Swift looks for the other slicing methods that return a list, it can’t find them, and it tells you you’re not conforming to the protocol. This means that if you want to implement one, you’ve got to implement them all. Ugh:

public protocol SequenceType {
    typealias Generator : GeneratorType
    typealias SubSequence
    public func generate() -> Self.Generator
    public func underestimateCount() -> Int
    public func map<T>(@noescape transform: (Self.Generator.Element) -> T) -> [T]
    public func filter(@noescape includeElement: (Self.Generator.Element) -> Bool) -> [Self.Generator.Element]
    public func forEach(@noescape body: (Self.Generator.Element) -> ())
    public func dropFirst(n: Int) -> Self.SubSequence
    public func dropLast(n: Int) -> Self.SubSequence
    public func prefix(maxLength: Int) -> Self.SubSequence
    public func suffix(maxLength: Int) -> Self.SubSequence
    public func split(maxSplit: Int, allowEmptySlices: Bool, @noescape isSeparator: (Self.Generator.Element) -> Bool) -> [Self.SubSequence]
}

Anything that returns Self.SubSequence there has to be reimplemented. Most painful is split – I haven’t yet found a decent algorithm for that function with any sequence that I’ve had.

Indexable

This is a new protocol in beta 5: it doesn’t actually inherit from anything else. Here’s the requirements:

typealias Index : ForwardIndexType
public var startIndex: Self.Index { get }
public var endIndex: Self.Index { get }
public subscript (position: Self.Index) -> Self._Element { get }

ForwardIndexType is the base protocol for all indices – BidirectionalIndexType and RandomAccessIndexType both conform. All it needs is to be Equatable, and have a successor() method. The variables and subscript can give us a clue as to how IndexingGenerator might work:

struct IndexingGenerator<Base : Indexable> : GeneratorType {
  private let base: Base
  private var i: Base.Index
  mutating func next() -> Base._Element? {
    return i == base.endIndex ? nil : base[i++]
  }
  init(_ base: Base) {
    self.base = base
    i = base.startIndex
  }
}

This also sheds a little light on why Swift does “one-past-the-end” indexing. (the endIndex variable represents the index after the last index, confusingly) Since indices don’t have to be comparable, you couldn’t write IndexingGenerator like this:

mutating func next() -> Base._Element? {
  return i <= base.endIndex ? base[i++] : nil
}

So you’re stuck with only being able to use ==. If endIndex was the last index, then you’d only be able to tell that you were on the last element, not that you had passed it. Now, you could use flags (a finished boolean, or something), but that’s less efficient.

There’s one more interesting aspect of IndexingGenerator: it’s used everywhere. Arrays, ContiguousArrays, ArraySlices: they all use IndexingGenerator. That’s kind of interesting: you’d think that since SequenceType is the base protocol, the iteration is handled first, and the indexing is built on top of that. For the array types, at least, it’s the other way around: iteration is built on indexing. You get some interesting performance characteristics as a result.

Here’s an example. enumerate() might not work the way you’d expect it to: it doesn’t return a tuple of (index, value) for a given collection. Instead, it returns a tuple of (Int, value): that means that if you enumerated over a String, for instance, you couldn’t actually index back into the string with the first member of the tuple:

let word = "hello".characters

for (index, value) in word.enumerate() {
  print(value)
  word[index] // error
}

So it might make sense to define our own enumerate, that does use the indices of the base. There was a good discussion of this on Twitter, specifically about making a forEachIndex:

extension CollectionType {
  func forEachIndex(@noescape body: (Index, Generator.Element) -> ()) {
    for (i, v) in zip(indices, self) {
      body(i, v)
    }
  }
}

Compare that to this method:

extension CollectionType {
  func forEachIndex(@noescape body: (Index, Generator.Element) -> ()) {
    for i in indices {
      body(i, self[i])
    }
  }
}

First one must be faster, right? I mean, the second one has an index lookup, which must be slow. All the first one does is increment each index, and then increment each element.

But it’s not. The second method is about twice the speed. Why? Well, when you iterate through self, you’re using an IndexingGenerator. It’s not just the array types, either: String acts like this, too. That’s really surprising: surely String has really slow indexing? What with all the unicode-correctness?

It looks like the index for String contains information about the memory location of the character itself. (or information about the character itself) So index is far more than just an indication of how many characters into the string you are – it’s very closely tied to the actual string itself.

This is important with regards to the standard library algorithms. Methods and functions on CollectionTypes expect that an index is the fastest way to get to any individual element. That means that making List indexable, like this:

extension List : Indexable {
  var startIndex: Int { return 0 }
  var endIndex: Int {
    guard case let .Cons(_, xs) = self else { return 0 }
    return xs.endIndex + 1
  }
  subscript(i: Int) -> Element {
    switch (i, self) {
    case (0, .Cons(let v, _)): return v
    case (_, .Cons(_, let x)): return x[i - 1]
    default: fatalError()
    }
  }
}

Might be a bad idea. (Airspeed Velocity had a post on making Lists O(1) indexable)

CollectionType

This is where the sequence protocols come into their own. With this protocol, you get a truckload of powerful methods, with surprisingly little implementation:

public protocol CollectionType : Indexable, SequenceType {
    typealias Generator : GeneratorType = IndexingGenerator<Self>
    public func generate() -> Self.Generator
    typealias SubSequence : Indexable, SequenceType = Slice<Self>
    public subscript (position: Self.Index) -> Self.Generator.Element { get }
    public subscript (bounds: Range<Self.Index>) -> Self.SubSequence { get }
    public func prefixUpTo(end: Self.Index) -> Self.SubSequence
    public func suffixFrom(start: Self.Index) -> Self.SubSequence
    public func prefixThrough(position: Self.Index) -> Self.SubSequence
    public var isEmpty: Bool { get }
    public var count: Self.Index.Distance { get }
    public var first: Self.Generator.Element? { get }
}

There’s two extra requirements that can’t be expressed by the protocol: the collection should be multi-pass, and SubSequence should be a CollectionType. You can conform to this with just three lines:

struct UpTo: CollectionType {
  let startIndex = 0
  let endIndex: Int
  subscript(i: Int) -> Int { return i }
}

And you get all of the default methods that others get. For instance, slicing gives you a wrapper struct back:

UpTo(endIndex: 10)[3...7] // Slice<UpTo>

There’s a good example here, by the way, of why you shouldn’t assume 0 is the startIndex:

UpTo(endIndex: 10)[3...7].startIndex // 3

Along with beta 5 came some new cool functions, like popFirst(), popLast(), as well as more slicing tools.

RangeReplaceableCollectionType

public protocol RangeReplaceableCollectionType : CollectionType {
    public init()
    public mutating func replaceRange<C : CollectionType where C.Generator.Element == Generator.Element>(subRange: Range<Self.Index>, with newElements: C)
    public mutating func reserveCapacity(n: Self.Index.Distance)
    public mutating func append(x: Self.Generator.Element)
    public mutating func extend<S : SequenceType where S.Generator.Element == Generator.Element>(newElements: S)
    public mutating func insert(newElement: Self.Generator.Element, atIndex i: Self.Index)
    public mutating func splice<S : CollectionType where S.Generator.Element == Generator.Element>(newElements: S, atIndex i: Self.Index)
    public mutating func removeAtIndex(i: Self.Index) -> Self.Generator.Element
    public mutating func removeFirst() -> Self.Generator.Element
    public mutating func removeFirst(n: Int)
    public mutating func removeRange(subRange: Range<Self.Index>)
    public mutating func removeAll(keepCapacity keepCapacity: Bool)
}

The number of functions available to you here really gets to silly levels – but you only need three of the basic methods implemented: replaceRange, init(), and reserveCapacity: a function that can replace an arbitrary range of elements with an arbitrary number of new elements, an initialiser that can initialise an empty instance, and… reserveCapacity.

reserveCapacity

So this guy is interesting. Arrays in Swift aren’t necessarily contiguous (unless you use ContiguousArray), but they’re mostly contiguous. For this reason, it often makes sense to allocate the space in memory for you to fit the array into – especially if you have a rough idea of how big the array is going to be. For instance, the method map() will always produce an array the same length as the sequence mapped over. So map() should use reserveCapacity before it fills the array. That information is difficult to get, though. For one, map() works on SequenceTypes, not just CollectionTypes. SequenceType doesn’t have a count property – so how can you know the length of the sequence? Even if it did, the count property on CollectionTypes returns Index.Distance, whereas for reserveCapacity on an array, you need an Int.

underestimateCount()

This is a method on all SequenceTypes. It’s kind of a stand-in for count It has one requirement: it must be nondestructive. (you can tell they’re serious: “nondestructive” is in bold in the docs. The only other bold words are scary important-sounding things like “raw memory”, and “Equality implies substitutability”) The default returns 0. However, several of the std lib SequenceTypes are able to return the actual number of elements they contain: if you’re writing a method on SequenceType, or something that conforms to SequenceType, bear this in mind.

Laziness

There’s a little bit more to this in the standard library than you might think. The standard SequenceTypes like Array, Set, and Dictionary are all strict. Then there are those that have lazy in the name: LazySequence, LazyForwardCollection, LazyBidirectionalCollection, and LazyRandomAccessCollection. These are lazy, obviously. But there’s a third group: sequences that are lazy, but are treated as if they’re strict: AnySequence, Zip2Sequence, and EnumerateSequence, among others. (when I say “treated” I mean that they get the strict versions of map and filter)

Why have strict, explicitly lazy, and implicitly lazy? Well, the implicitly lazy sequences above are all built out of some underlying sequence. They’re really just a wrapper. For efficiency’s sake, they won’t force evaluation on the underlying sequence: but for simplicity’s sake, they get the standard map and filter.

What’s the use of this laziness? Well, if you want to chain sequence methods together, as is the Swifty way, the strict versions will kill your performance. Every method will allocate, fill, and destroy an array. However, if your methods are lazy, there are no intermediate arrays created, and you can decide when you want to allocate and fill. (David Owens had a great post on this)

Great! So how do I go about using all of this lazy goodness? Well, for any of the std lib structs, just call the lazy function on them, and then all subsequent map and filter methods will be piped. However, LazySequence is a struct, not a protocol (yet…), so if you want your own custom fancy sequence to get the lazy methods, you’ll have to wrap it in LazySequence. (or one of the other lazy structs, where applicable)

If you want to write a method on lazy sequences, there are currently 2 options available to you:

  1. You can write 4 versions of your method, one for each lazy sequence struct in the std lib.
  2. Define a new protocol, LazySequenceType, and have the std lib structs conform, like this:
public protocol LazySequenceType : SequenceType {}

extension LazySequence                : LazySequenceType {}
extension LazyForwardCollection       : LazySequenceType {}
extension LazyBidirectionalCollection : LazySequenceType {}
extension LazyRandomAccessCollection  : LazySequenceType {}

Then, just define your methods on LazySequenceType once, and you’re good to go.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s