Skip to content

Generators and iterators

Iteration

As we already know, Python has several objects that implement an iterative protocol.

*

    a = [1, 2, 3, 4]
    for item in a:
        print(item)

In this case, we iterate through the list and print each successive element.

*

    a = "Alice has a cat"
    for item in a:
        print(item)

Here we iterate over the string, in each loop we print a single character that is in it.

*

    a = {'name': "John", 'surname': "Smith"}
    for item in a:
        print(item)

In this case, we iterate over the dictionary keys, printing one of them at each step.

Iterating through a large collection

Consider the following problem: we would like to spell one million primes. Let's look at the code below:

    from math import sqrt


    def is_prime(n):
        for i in range(2, int(sqrt(n)) + 1):
            if n % i == 0:
                return False
        return True


    def get_n_primes(n):
        primes = []
        i = 2
        while len(primes) != n:
            if is_prime(i):
                primes.append(i)
            i += 1
        return primes


    lst = get_n_primes(1000000)
    for elem in lst:
        print(elem)

First, we create a function that will check if the number is prime - that is, if any of the numbers from 2 to its root are its divisors (i.e. whether it is completely divided by them) - if so, we return False, because since the number has a divisor it is definitely not the first. If we iterate over the entire range and find no divisor, the number is prime and we return True.

Next, we create the get_n_primes function, which takes the n parameter, which signifies how many of these primes we want to generate. We create an empty list where we will store the generated numbers. We check whether the current value is a prime number - if so, then we add it to the list, if not - we increase the value of the variable and go to the next iteration of the loop. We repeat the process until the length of this list is not equal to n.

When invoking, we first create a list that contains a million primes, and then iterate over it. As we can see, there are a lot of these iterations here and it is not an optimal solution.

Iterator

Or maybe there is a way not to keep all items in memory (i.e. not to allocate a large amount of memory to a large collection), but to remember which number was generated last and return the next one based on that?

It exists, we call it iterator. It is a class that implements an iterative protocol, it has methods __iter__, __next__ and throws exception StopIteration.

    class PrimeIterator:
        # Iterator that iterates over n primes
        def __init__(self, n):
            self.n = n
            self.generated_numbers = 0
            self.number = 1

        def __iter__(self):
            return self

        def __next__(self):
            self.number += 1
            if self.generated_numbers >= self.n:
                raise StopIteration
            elif is_prime(self.number):
                self.generated_numbers += 1
                return self.number
            return self.__next__()


    iter = PrimeIterator(1000000)
    for elem in iter:
        print(elem)

__iter__ method will return a reference to an instance of this class, and method __next__ consecutive prime numbers.

Using an iterator allows us not to generate the entire large collection of numbers - instead, we have access to the next prime in each subsequent iteration step.

Let's follow the execution for a simpler example, suppose we want the iterator to give us access to three prime numbers, i.e.

    iter = PrimeIterator(3)
    for elem in iter:
        print(elem)
  1. First, we create an instance of the PrimeIterator class, remember how many numbers are to be generated and create two additional parameters: the first to store the number of prime numbers generated so far (during initialization we set the value to 0), the second - to remember the value of the currently considered number (initially we set this value on 1).
  2. During the first step of iteration, __next__ is run for the first time. We increment the number parameter so that its value is now equal to 2. We check whether we have generated as many numbers as it is given during the initialization of n. So far we've found 0 primes, and we need to specify 3, so we're not finishing the iterator yet. We check if the value of the number parameter is a prime number, for the value 2 is_prime will return True, so we increase the generated_numbers parameter and return the prime found - the screen will display the value 2 in this step.
  3. We go to the next step of the iteration, the __next__ method is run again: we increase the number parameter, now its value is 3. We compare the parameters generated_numbers and n: 1 is less than 3, so we are not finished yet the operation of the iterator. The is_prime function returns True for a value of 3, so we increment the generated_numbers parameter (it will now be equal to 2) and return the prime found: the screen will print the value 3.
  4. We go to the next iteration step, we are back in the __next__ method: we increment the number parameter, now its value is 4. We check if generated_numbers is greater than or equal to n - it is not (2 is less than 3) so we don't finish the action. The is_prime function for the value 4 will return False, so we haven't found the prime, we go to the last line of the method which tells us that in that case we run the method again in the same iteration step. So we increase the number parameter, now its value is 5. We compare generated_numbers and n, the latter is still larger, so we do not terminate the iterator. The is_prime function returns True for a value of 5, so we increment the generated_numbers parameter (it will now be equal to 3) and return the currently found prime - the screen will print 5.
  5. We go to the next iteration step, increase the number parameter so that now its value equals 6. We check if generated_numbers is greater than or equal to n - this is because both parameters have a value of 3, so the StopIteration exception is thrown, and our the iterator ends here.

Generator

We can achieve a similar memory optimization by using generators.

Generator is a function that uses the yield keyword when it returns a value. We can treat yield as a return statement, except that here we return the value and store the state of the function, i.e. we store the values of its variables in memory at each iteration.

    def prime_generator(n):
        # Generator to iterate over n prime numbers
        number = 2
        generated_numbers = 0
        while generated_numbers != n:
            if is_prime(number):
                yield number
                generated_numbers += 1
            number += 1


    gen = prime_generator(1000000)
    for elem in gen:
        print(elem)

The execution will be analogous to what happened in the iterator: we create auxiliary variables, one called number to store the value of the currently considered number suspected of being the first, and the second, called generated_numbers to remember how many primes have already been generated. As long as the value of this parameter is different from the passed n we will check if the is_prime function returns True for the current number parameter: if so, we know that we have found a prime number, so we return its value using the yield keyword and increase the value of the variable generated_numbers. Then we increase the value of the number variable and go to the next loop.