So you want to conform to SequenceType
. Maybe you’re even more ambitious than that: CollectionType
, MutableSliceable
, RangeReplaceableCollectionType
– you want it all. What’s involved?
SequenceType
This is the most basic kind of sequence: anything that conforms to it can be iterated with a for...in
loop, and gets some cool methods like map
, flatMap
, reduce
, etc. It has one basic requirement: a generate()
method, which returns something that conforms to GeneratorType
.
GeneratorType
Generators are stateful objects that represent iteration. They must have one method – next()
– which returns an optional. This optional should be successive elements of the sequence, until the sequence is finished, when it should be nil. (the protocol only requires the generator to return nil once: calling next()
after it has returned nil is not allowed) These can’t be iterated with a for...in
loop. If we take the example of iterating through an array, it’s easy to see what’s going on:
let nums = [1, 2, 3] // nums is an Array, which conforms to // SequenceType var numGenerator = nums.generate() // numGenerator is an IndexingGenerator, which // conforms to GeneratorType numGenerator.next() // 1? numGenerator.next() // 2? numGenerator.next() // 3? numGenerator.next() // nil for i in nums { // Here, the for-loop calls the generate() method on nums. } // The for-loop is equivalent to this: var g = nums.generate() while let i = g.next() { }
There’s one extra complication here: IndexingGenerator
also conforms to SequenceType
. How? Like this:
extension IndexingGenerator : SequenceType { func generate() -> IndexingGenerator { return self } }
You’ll actually see this with a lot of the standard library generators. Also, because of default methods on protocols, anything that conforms to GeneratorType
can be made to conform to SequenceType
with no extra code:
extension SomeGenerator : SequenceType {}
Swift will fill in the method for you.
AnyGenerator
Declaring a new type with all of the conformances and whatnot needed can be a bit of a pain, so Swift has a few helpers to make generators and sequences a bit easier.
The first is a class called AnyGenerator
. It’s worth noting that it’s a class, rather than a struct: since generators are stateful and reference-y, a class is a better semantic fit. (However, the std lib generators are all structs: this is because a struct allows for much faster iteration. It’s worth keeping in mind if you’re defining your own sequence where performance may be important)
Making your own generator with AnyGenerator
is easy, though. You can wrap it around some other generator, or you can use a closure that represents the next()
method:
var i = 1 var upToThree = anyGenerator { i <= 3 ? i++ : nil } upToThree.next() // 1 upToThree.next() // 2 upToThree.next() // 3 upToThree.next() // nil
Note the lower case: anyGenerator
is a function that produces an AnyGenerator
, the class.
(AnyGenerator
also conforms to SequenceType
)
Making a sequence
So, to define your own sequence, you can use AnyGenerator
for your generate()
method:
struct UpTo : SequenceType { private let from, to: Int func generate() -> AnyGenerator<Int> { var i = from return anyGenerator { i <= self.to ? nil : i++ } } }
There’s one problem with SequenceType
since beta 5, though. As an example, here’s a List
type:
enum List<Element> : GeneratorType, SequenceType { case Nil indirect case Cons(Element, List<Element>) mutating func next() -> Element? { guard case let .Cons(x, xs) = self else { return nil } self = xs return x } }
In those few lines, we actually get a surprising amount of default methods. dropFirst()
, for instance:
let list = List.Cons(1, .Cons(2, .Cons(3, .Nil))) list.dropFirst()
Previously, that function was only available on CollectionType
s. Here, we get a default implementation that returns an AnySequence
. In a list, though, dropFirst()
could be made more efficient:
extension List { func dropFirst(n: Int = 1) -> List { switch (n, self) { case (0, _), (_, .Nil): return self case let (_, .Cons(_, xs)): return xs.dropFirst(n - 1) } } }
But now we get an error: type 'List' does not conform to protocol 'SequenceType'
. Because we’ve overridden one default protocol method, we lose all the others. Swift was expecting several methods for slicing, including dropFirst
, dropLast
, prefix
, etc. From those, it would infer the SubSequence
typealias. When we were using the default methods, the type inferred was AnySequence
. Not anymore. Now it’s List
, so when Swift looks for the other slicing methods that return a list, it can’t find them, and it tells you you’re not conforming to the protocol. This means that if you want to implement one, you’ve got to implement them all. Ugh:
public protocol SequenceType { typealias Generator : GeneratorType typealias SubSequence public func generate() -> Self.Generator public func underestimateCount() -> Int public func map<T>(@noescape transform: (Self.Generator.Element) -> T) -> [T] public func filter(@noescape includeElement: (Self.Generator.Element) -> Bool) -> [Self.Generator.Element] public func forEach(@noescape body: (Self.Generator.Element) -> ()) public func dropFirst(n: Int) -> Self.SubSequence public func dropLast(n: Int) -> Self.SubSequence public func prefix(maxLength: Int) -> Self.SubSequence public func suffix(maxLength: Int) -> Self.SubSequence public func split(maxSplit: Int, allowEmptySlices: Bool, @noescape isSeparator: (Self.Generator.Element) -> Bool) -> [Self.SubSequence] }
Anything that returns Self.SubSequence
there has to be reimplemented. Most painful is split
– I haven’t yet found a decent algorithm for that function with any sequence that I’ve had.
Indexable
This is a new protocol in beta 5: it doesn’t actually inherit from anything else. Here’s the requirements:
typealias Index : ForwardIndexType public var startIndex: Self.Index { get } public var endIndex: Self.Index { get } public subscript (position: Self.Index) -> Self._Element { get }
ForwardIndexType
is the base protocol for all indices – BidirectionalIndexType
and RandomAccessIndexType
both conform. All it needs is to be Equatable
, and have a successor()
method. The variables and subscript can give us a clue as to how IndexingGenerator
might work:
struct IndexingGenerator<Base : Indexable> : GeneratorType { private let base: Base private var i: Base.Index mutating func next() -> Base._Element? { return i == base.endIndex ? nil : base[i++] } init(_ base: Base) { self.base = base i = base.startIndex } }
This also sheds a little light on why Swift does “one-past-the-end” indexing. (the endIndex
variable represents the index after the last index, confusingly) Since indices don’t have to be comparable, you couldn’t write IndexingGenerator
like this:
mutating func next() -> Base._Element? { return i <= base.endIndex ? base[i++] : nil }
So you’re stuck with only being able to use ==
. If endIndex
was the last index, then you’d only be able to tell that you were on the last element, not that you had passed it. Now, you could use flags (a finished
boolean, or something), but that’s less efficient.
There’s one more interesting aspect of IndexingGenerator
: it’s used everywhere. Array
s, ContiguousArray
s, ArraySlice
s: they all use IndexingGenerator
. That’s kind of interesting: you’d think that since SequenceType
is the base protocol, the iteration is handled first, and the indexing is built on top of that. For the array types, at least, it’s the other way around: iteration is built on indexing. You get some interesting performance characteristics as a result.
Here’s an example. enumerate()
might not work the way you’d expect it to: it doesn’t return a tuple of (index, value)
for a given collection. Instead, it returns a tuple of (Int, value)
: that means that if you enumerated over a String
, for instance, you couldn’t actually index back into the string with the first member of the tuple:
let word = "hello".characters for (index, value) in word.enumerate() { print(value) word[index] // error }
So it might make sense to define our own enumerate
, that does use the indices of the base. There was a good discussion of this on Twitter, specifically about making a forEachIndex
:
extension CollectionType { func forEachIndex(@noescape body: (Index, Generator.Element) -> ()) { for (i, v) in zip(indices, self) { body(i, v) } } }
Compare that to this method:
extension CollectionType { func forEachIndex(@noescape body: (Index, Generator.Element) -> ()) { for i in indices { body(i, self[i]) } } }
First one must be faster, right? I mean, the second one has an index lookup, which must be slow. All the first one does is increment each index, and then increment each element.
But it’s not. The second method is about twice the speed. Why? Well, when you iterate through self
, you’re using an IndexingGenerator
. It’s not just the array types, either: String
acts like this, too. That’s really surprising: surely String
has really slow indexing? What with all the unicode-correctness?
It looks like the index for String
contains information about the memory location of the character itself. (or information about the character itself) So index is far more than just an indication of how many characters into the string you are – it’s very closely tied to the actual string itself.
This is important with regards to the standard library algorithms. Methods and functions on CollectionType
s expect that an index is the fastest way to get to any individual element. That means that making List
indexable, like this:
extension List : Indexable { var startIndex: Int { return 0 } var endIndex: Int { guard case let .Cons(_, xs) = self else { return 0 } return xs.endIndex + 1 } subscript(i: Int) -> Element { switch (i, self) { case (0, .Cons(let v, _)): return v case (_, .Cons(_, let x)): return x[i - 1] default: fatalError() } } }
Might be a bad idea. (Airspeed Velocity had a post on making Lists O(1) indexable)
CollectionType
This is where the sequence protocols come into their own. With this protocol, you get a truckload of powerful methods, with surprisingly little implementation:
public protocol CollectionType : Indexable, SequenceType { typealias Generator : GeneratorType = IndexingGenerator<Self> public func generate() -> Self.Generator typealias SubSequence : Indexable, SequenceType = Slice<Self> public subscript (position: Self.Index) -> Self.Generator.Element { get } public subscript (bounds: Range<Self.Index>) -> Self.SubSequence { get } public func prefixUpTo(end: Self.Index) -> Self.SubSequence public func suffixFrom(start: Self.Index) -> Self.SubSequence public func prefixThrough(position: Self.Index) -> Self.SubSequence public var isEmpty: Bool { get } public var count: Self.Index.Distance { get } public var first: Self.Generator.Element? { get } }
There’s two extra requirements that can’t be expressed by the protocol: the collection should be multi-pass, and SubSequence
should be a CollectionType
. You can conform to this with just three lines:
struct UpTo: CollectionType { let startIndex = 0 let endIndex: Int subscript(i: Int) -> Int { return i } }
And you get all of the default methods that others get. For instance, slicing gives you a wrapper struct back:
UpTo(endIndex: 10)[3...7] // Slice<UpTo>
There’s a good example here, by the way, of why you shouldn’t assume 0 is the startIndex
:
UpTo(endIndex: 10)[3...7].startIndex // 3
Along with beta 5 came some new cool functions, like popFirst()
, popLast()
, as well as more slicing tools.
RangeReplaceableCollectionType
public protocol RangeReplaceableCollectionType : CollectionType { public init() public mutating func replaceRange<C : CollectionType where C.Generator.Element == Generator.Element>(subRange: Range<Self.Index>, with newElements: C) public mutating func reserveCapacity(n: Self.Index.Distance) public mutating func append(x: Self.Generator.Element) public mutating func extend<S : SequenceType where S.Generator.Element == Generator.Element>(newElements: S) public mutating func insert(newElement: Self.Generator.Element, atIndex i: Self.Index) public mutating func splice<S : CollectionType where S.Generator.Element == Generator.Element>(newElements: S, atIndex i: Self.Index) public mutating func removeAtIndex(i: Self.Index) -> Self.Generator.Element public mutating func removeFirst() -> Self.Generator.Element public mutating func removeFirst(n: Int) public mutating func removeRange(subRange: Range<Self.Index>) public mutating func removeAll(keepCapacity keepCapacity: Bool) }
The number of functions available to you here really gets to silly levels – but you only need three of the basic methods implemented: replaceRange
, init()
, and reserveCapacity
: a function that can replace an arbitrary range of elements with an arbitrary number of new elements, an initialiser that can initialise an empty instance, and… reserveCapacity
.
reserveCapacity
So this guy is interesting. Arrays in Swift aren’t necessarily contiguous (unless you use ContiguousArray), but they’re mostly contiguous. For this reason, it often makes sense to allocate the space in memory for you to fit the array into – especially if you have a rough idea of how big the array is going to be. For instance, the method map()
will always produce an array the same length as the sequence mapped over. So map()
should use reserveCapacity
before it fills the array. That information is difficult to get, though. For one, map()
works on SequenceType
s, not just CollectionType
s. SequenceType
doesn’t have a count
property – so how can you know the length of the sequence? Even if it did, the count
property on CollectionType
s returns Index.Distance
, whereas for reserveCapacity
on an array, you need an Int
.
underestimateCount()
This is a method on all SequenceType
s. It’s kind of a stand-in for count
It has one requirement: it must be nondestructive. (you can tell they’re serious: “nondestructive” is in bold in the docs. The only other bold words are scary important-sounding things like “raw memory”, and “Equality implies substitutability”) The default returns 0. However, several of the std lib SequenceType
s are able to return the actual number of elements they contain: if you’re writing a method on SequenceType
, or something that conforms to SequenceType
, bear this in mind.
Laziness
There’s a little bit more to this in the standard library than you might think. The standard SequenceType
s like Array, Set, and Dictionary are all strict. Then there are those that have lazy in the name: LazySequence
, LazyForwardCollection
, LazyBidirectionalCollection
, and LazyRandomAccessCollection
. These are lazy, obviously. But there’s a third group: sequences that are lazy, but are treated as if they’re strict: AnySequence
, Zip2Sequence
, and EnumerateSequence
, among others. (when I say “treated” I mean that they get the strict versions of map
and filter
)
Why have strict, explicitly lazy, and implicitly lazy? Well, the implicitly lazy sequences above are all built out of some underlying sequence. They’re really just a wrapper. For efficiency’s sake, they won’t force evaluation on the underlying sequence: but for simplicity’s sake, they get the standard map
and filter
.
What’s the use of this laziness? Well, if you want to chain sequence methods together, as is the Swifty way, the strict versions will kill your performance. Every method will allocate, fill, and destroy an array. However, if your methods are lazy, there are no intermediate arrays created, and you can decide when you want to allocate and fill. (David Owens had a great post on this)
Great! So how do I go about using all of this lazy goodness? Well, for any of the std lib structs, just call the lazy
function on them, and then all subsequent map
and filter
methods will be piped. However, LazySequence
is a struct, not a protocol (yet…), so if you want your own custom fancy sequence to get the lazy methods, you’ll have to wrap it in LazySequence
. (or one of the other lazy structs, where applicable)
If you want to write a method on lazy sequences, there are currently 2 options available to you:
- You can write 4 versions of your method, one for each lazy sequence struct in the std lib.
- Define a new protocol,
LazySequenceType
, and have the std lib structs conform, like this:
public protocol LazySequenceType : SequenceType {} extension LazySequence : LazySequenceType {} extension LazyForwardCollection : LazySequenceType {} extension LazyBidirectionalCollection : LazySequenceType {} extension LazyRandomAccessCollection : LazySequenceType {}
Then, just define your methods on LazySequenceType
once, and you’re good to go.