Iterators and Generators#

So far, you have seen things like

a = [1,2,3]
for x in a:
    print(x)
1
2
3

This looks very different from a C-style for loop where we loop over the variable index:

for (size_t i = 0; i < 3; i++) {
    printf("%d\n", i);
}

Or for instance, we can use something called a range

for i in range(3):
    print(i)
0
1
2

or other data types

d = {"hello": 1, "goodbye": 2}
for k,v in d.items():
    print(k, ' : ', v)
hello  :  1
goodbye  :  2

The key to using this sort of syntax is the concept of iterator. This is common in object-oriented programming (not just in Python), but you probably haven’t seen iterators before if you’ve only used imperative languages.

An object is iterable if it implements the __iter__ method, which is expected to return an iterator object. An object is an iterator if it implements the __next__ method, which either

  1. returns the next element of the iterable object

  2. raises the StopIteration exception if there are no more elements to iterate over

A Basic Iterator#

What if we want to replicate range?

r = range(3)
type(r)
range

we can produce an iterator using the iter function

ri = iter(r)
type(ri)
range_iterator

we can explicitly run through the iterator using the next function

next(ri)
0
r = range(1,5,2)
ri = iter(r)
print(next(ri))
print(next(ri))
1
3
class my_range_iterator(object):
    def __init__(self, start, stop, stride):
        self.state = start
        self.stop = stop
        self.stride = stride
        
    def __next__(self):
        if self.state >= self.stop:
            raise StopIteration  # signals "the end"
        ret = self.state # we'll return current state
        self.state += self.stride # increment state
        return ret
        
        
# an iterable
class my_range(object):
    def __init__(self, start, stop, stride=1):
        self.start = start
        self.stop = stop
        self.stride = stride
    
    def __iter__(self):
        return my_range_iterator(self.start, self.stop, self.stride)
r = my_range(0,3)
type(r)
__main__.my_range
ri = iter(r)
type(ri)
__main__.my_range_iterator
next(ri)
0
for i in my_range(0,3):
    print(i)
0
1
2
for i in range(0,3):
    print(i)
0
1
2

You can also create classes that are both iterators and iterables

# an iterable and iterator
class my_range2(object):
    def __init__(self, start, stop, stride=1):
        self.start = start
        self.stop = stop
        self.stride = stride
        self.state = start
    
    def __iter__(self):
        return self
    
    def __next__(self):
        if self.state >= self.stop:
            raise StopIteration  # signals "the end"
        ret = self.state # we'll return current state
        self.state += self.stride # increment state
        return ret

Using Iterators for Computation#

Let’s now use iterators for something more interesting - computing the Fibonacci numbers.

class FibonacciIterator(object):
    def __init__(self):
        self.a = 0 # current number
        self.b = 1 # next number
        
    def __iter__(self):
        return self
    
    def __next__(self):
        ret = self.a
        self.a, self.b = self.b, self.a + self.b # advance the iteration
        return ret
for i in FibonacciIterator():
    if i > 1000:
        break
    print(i)
0
1
1
2
3
5
8
13
21
34
55
89
144
233
377
610
987

Note that we never raise a StopIteration exception - the iterator will just keep going if we let it.

Exercise#

Define FibonacciIterator so it will iterate over all Fibonacci numbers until they are greater than a parameter n.

## Your code here

Generators#

Often, a more elegant way to define an iterator is using a generator

This is a special kind of iterator defined using a function instead of using classes that implement the __iter__ and __next__ methods.

See this post for more discussion.

def my_range3(state, stop, stride=1):
    while state < stop:
        yield state
        state += stride
        

Note that we use the def keyword instead of the class keyword for our declaration. The yield keyword returns subsequent values of the iteration.

r = my_range3(0,3)
type(r)
generator
ri = iter(r)
type(ri)
generator
next(ri)
0
for i in my_range3(0,3):
    print(i)
0
1
2

Our Fibonacci example re-written using a generator:

def FibonacciGenerator():
    a = 0
    b = 1
    while True:
        yield a
        a, b = b, a + b
for i in FibonacciGenerator():
    if i > 1000:
        break
    print(i)
0
1
1
2
3
5
8
13
21
34
55
89
144
233
377
610
987

Exercise#

Define FibonacciGenerator so it will iterate over all Fibonacci numbers until they are greater than a parameter n.

## Your code here
def FibonacciGenerator(n):
    a = 0
    b = 1
    while a < n:
        yield a
        a, b = b, a + b
        
for i in FibonacciGenerator(1000):
    print(i)
0
1
1
2
3
5
8
13
21
34
55
89
144
233
377
610
987

Iteration tools#

Some useful tools for iterators that come in handy are:

zip - iterates over multiple iterators simulataneously

for i, a in zip([0,1,2], ['a', 'b', 'c']):
    print(i,a)
0 a
1 b
2 c

reversed - iterates in reverse order

for i in reversed(range(3)):
    print(i)
2
1
0

enumerate - returns the iteration step count as well as the iterator value

for i, a in enumerate('abc'):
    print(i, a)
0 a
1 b
2 c

Exercise#

Implement your own versions of zip and enumerate using generators

## Your code here
Hide code cell content
def my_zip(a, b):
    ai = iter(a)
    bi = iter(b)
    while True:
        try:
            yield next(ai), next(bi)
        except:
            return
        
for i, a in my_zip(range(3), 'abc'):
    print(i,a)
0 a
1 b
2 c
Hide code cell content
def my_enumerate(a):
    ct = 0
    for x in a:
        yield ct, x
        ct = ct + 1
        
for i, a in my_enumerate('abcd'):
    print(i, a)
0 a
1 b
2 c
3 d

The Itertools Package#

A useful package for dealing with iterators is the itertools package. Here are a few examples - click on the link to see what else the package provides.

product gives the equivalent of a nested for-loop

from itertools import product

for i, j in product(range(2), range(3)):
    print(i, j)
0 0
0 1
0 2
1 0
1 1
1 2
# equivalent to:
for i in range(2):
    for j in range(3):
        print(i,j)
0 0
0 1
0 2
1 0
1 1
1 2

repeat just repeats a value

from itertools import repeat

for i in repeat(10, 5):
    print(i)
10
10
10
10
10

permutations

from itertools import permutations

for p in permutations(range(3)):
    print(p)
(0, 1, 2)
(0, 2, 1)
(1, 0, 2)
(1, 2, 0)
(2, 0, 1)
(2, 1, 0)

Exercise#

Implement your own version of product and repeat using generators.

## Your code here
Hide code cell content
def my_product(a, b):
    for x in a:
        for y in b:
            yield x, y
            
for x, y in my_product(range(2), range(3)):
    print(x,y)
0 0
0 1
0 2
1 0
1 1
1 2
Hide code cell content
def my_repeat(n, ct):
    for i in range(ct):
        yield n
        
for x in my_repeat(10, 5):
    print(x)
10
10
10
10
10

Iterators for Scientific Computing#

One way you might use an iterator in scientific computing is when implementing an iterative algorithm.

Here is an example of the power method, which finds the largest eigenvalue-eigenvector pair of a matrix.

import numpy as np

def PowerMethodGenerator(A, x):
    
    def RayleighQuotient(A, x):
        """
        x^T A x / x^T x
        """
        return np.dot(x, A @ x) / np.dot(x,x)
    
    x = x / np.linalg.norm(x)
    rq_prev = np.inf
    rq = RayleighQuotient(A, x)
    
    while True:
        # yield state: RayleighQuotient, x, and difference from previous iteration
        yield rq, x, np.abs(rq - rq_prev)
        
        # compute next iteration
        x = A @ x
        x = x / np.linalg.norm(x)
        rq_prev = rq
        rq = RayleighQuotient(A, x)
    
# here's a version that uses the generator in a while-loop

n = 100
A = np.random.randn(n, n)
A = A @ A.T + 5 # constant increases eigenvalue in constant vector direction
x0 = np.random.randn(n)

solver = PowerMethodGenerator(A, x0)
tol = 1e-4

while True:
    rq, x, eps = next(solver)
    print(rq, eps)
    if eps < tol:
        break
91.65353860349349 inf
264.39569700399113 172.74215840049766
331.6639107292152 67.26821372522409
368.28772596250286 36.623815233287644
413.3976718417469 45.10994587924404
484.7479524293232 71.35028058757632
562.2652416355212 77.51728920619797
611.8783070096393 49.61306537411815
633.5529078330354 21.674600823396077
641.4382055406876 7.885297707652171
644.1195027473336 2.681297206646036
645.012162607875 0.8926598605413574
645.3077508031024 0.29558819522742397
645.4055758031813 0.0978250000788421
645.437974639844 0.03239883666276455
645.4487147429135 0.010740103069451834
645.4522779526445 0.0035632097310553945
645.4534608841237 0.0011829314792066725
645.4538538008304 0.00039291670668717416
645.4539843618311 0.00013056100067387888
645.4540277587727 4.339694157806662e-05

If we decide that we’re not satisfied with convergence yet, we can resume where we left off

tol = 1e-6

while True:
    rq, x, eps = next(solver)
    print(rq, eps)
    if eps < tol:
        break
645.4540421868085 1.4428035797209304e-05
645.454046984525 4.797716542270791e-06
645.4540485801232 1.5955981780280126e-06
645.454049110837 5.307138053467497e-07

You can do the same thing with for-loops

# here's a version that uses the generator in a for-loop

n = 100
A = np.random.randn(n, n)
A = (A @ A.T) + 5 # constant increases eigenvalue in constant vector direction
x0 = np.random.randn(n)

solver = PowerMethodGenerator(A, x0)
tol = 1e-4

for rq, x, eps in solver:
    print(rq, eps)
    if eps < tol:
        break

tol = 1e-6
print('\nresuming iteration after decreasing tolerance\n')
for rq, x, eps in solver:
    print(rq, eps)
    if eps < tol:
        break
99.0121738951512 inf
265.67917789316414 166.66700399801294
314.90946480059256 49.23028690742842
336.75004510501526 21.8405803044227
362.4605434157907 25.71049831077545
411.99372643132097 49.533183015530255
486.8135814069874 74.81985497566643
552.3993200771835 65.58573867019606
586.9473008072683 34.54798073008487
600.5193815571838 13.572080749915472
605.2751068811125 4.7557253239286865
606.8931962730779 1.6180893919654409
607.4446441957763 0.551447922698344
607.6347540168648 0.1901098210885266
607.7012112789654 0.06645726210058456
607.7247647284495 0.023553449484097655
607.7332199154426 0.008455186993160169
607.7362907038118 0.0030707883692002724
607.7374177947916 0.0011270909797076456
607.7378354685817 0.00041767379013890604
607.7379916195705 0.0001561509888006185
607.7380504785261 5.885895564006205e-05

resuming iteration after decreasing tolerance

607.7380728365575 2.2358031401381595e-05
607.7380813921574 8.555599833925953e-06
607.7380846893398 3.297182388450892e-06
607.7380859687665 1.2794267831850448e-06
607.7380864685549 4.997883706892026e-07