Skip to content

Multithreading

Multithreading

Python allows different parts of your code to be executed simultaneously, using threads, which are sequences of program instructions that can be executed independently of the rest of the code. We use the threading module for this.

Example

Let's look at an example program code that uses this module:

   import threading


    def iterate_print(iter):
        for item in iter:
            print(item)


    if __name__ == "__main__":
        # creating threads
        t1 = threading.Thread(target=iterate_print, args=(range(5),))  # writing out successive natural numbers
        t2 = threading.Thread(target=iterate_print, args=("Python",))  # writing out successive characters of the string

        # starting threads
        t1.start()
        t2.start()

        # waiting until both threads have finished executing before executing further code
        t1.join()
        t2.join()

        print("Done!")

We define the function iterate_print, which will take the iterable object and print each of its elements to the screen in turn. We create two threads, the first will use the mentioned function on natural numbers from 0 to 4, the second - on the string "Python". We start the execution of threads with the start method, and with the join method, we tell the program to wait with the execution of the rest of the code until the threads terminate.

Executing the above code will produce this result in the console:

    0
    P
    1
    y
    2
    t
    3
    h
    o
    n
    4
    Done!

As we can see, the results of executing these functions can intertwine with each other - this means that we are dealing here with a program that, instead of being executed sequentially, was run simultaneously on two threads.

An example with returning a value

If we wanted to run a function on a thread that returns a value, we need to define our own thread class that will inherit from threading.Thread and implement the run and join methods. Let's look at the example below:

    import threading
    import time


    class ThreadWithReturnValue(threading.Thread):
        def __init__(self, target, args=(), kwargs=None):
            if kwargs is None:
                kwargs = {}
            self.target = target
            self.args = args
            self.kwargs = kwargs
            super().__init__()

        def run(self):
            self.result = self.target(*self.args, **self.kwargs)

        def join(self, timeout=None):
            super().join(timeout)
            return self.result


    def print_cube(num):
        # A function that returns the third power of a number given as a parameter
        time.sleep(5)
        print(f"Cube: {num * num * num}")


    def print_square(num):
        # A function that returns the square of the number given as a parameter
        time.sleep(5)
        return num * num


    if __name__ == "__main__":
        # creating threads
        t1 = ThreadWithReturnValue(target=print_square, args=(10,))
        t2 = threading.Thread(target=print_cube, args=(10,))

        # starting threads
        t1.start()
        t2.start()

        # waiting until both threads have finished executing before executing further code
        print(t1.join())
        t2.join()

        print("Done!")

In the run method of the newly created class, we run the function passed in the target parameter for the passed positional (args) and named (kwargs) arguments, and save the result in the result parameter. In the join method, we execute the join method from the parent class and return the value of the result parameter.

The result of executing the code will be:

    Cube: 1000
    100
    Done!

As before, we create two threads, except that this time one of them will use a function that prints a value to the screen, and the other will use a function that returns a value but does not print anything by itself. This is why running join on a thread operating on this function must be wrapped in a print function - otherwise we would not see this value on the screen,

Example 1 multiple threads

So let's check if executing the code snippets at the same time actually saves time. For this purpose, let's define a function that will take two parameters: link url and file name, and the result of its execution will be the writing of the page content under a given link to a text file. Next, let's do this function for 3 addresses, first sequentially in a loop, and then with three simultaneous threads. The execution time of both these approaches will be measured with the timeit module.

The result of code execution:

    import timeit
    import requests


    def crawl(url, dest):
        try:
            result = requests.get(url).text
            with open(dest, "a") as f:
                f.write(result)

        except requests.exceptions.RequestException:
            print("Error with URL check!")


    def wo_threading_func(urls):
        for url in urls:
            crawl(url, "without_threads.txt")


    def with_threading_func(urls):
        import threading

        threads = []
        for url in urls:
            th = threading.Thread(target=crawl, args=(url, "with_threads.txt"))
            th.start()
            threads.append(th)

        for th in threads:
            th.join()


    if __name__ == "__main__":
        wo_threading = "wo_threading_func(urls)"
        with_threading = "with_threading_func(urls)"

        setup = '''
    from __main__ import wo_threading_func, with_threading_func

    urls = [
        "https://jsonplaceholder.typicode.com/comments/1",
        "https://jsonplaceholder.typicode.com/comments/2",
        "https://jsonplaceholder.typicode.com/comments/3"
    ]
        '''

        print("Without threads:", timeit.timeit(stmt=wo_threading,
                                           setup=setup,
                                           number=100))
        print("With threads", timeit.timeit(stmt=with_threading,
                                         setup=setup,
                                         number=100))

will result:

    Without threads: 37.47563172900118
    With threads 20.870981110958382

As you can see, we managed to gain time by using threads.

Example 2 multiple threads

Let's try another function this time: define a count function that takes two parameters, _from and _to, which will keep decreasing the value of _from until it is _to. Let's do this function in the standard way first, setting _from to 400000 and _to to 0. Next, divide this interval in half and execute count 400000 to 200000 on one thread, and 200000 to 0 on the other . Both approaches do the same - they count from 400,000 to 0, but can we predict which one will be faster? We will compare the execution time of these approaches using the timeit module.

The result of code execution:

    import timeit


    def count(_from, _to):
        while _from >= _to:
            _from -= 1


    def wo_threading_func():
        count(400000, 0)


    def with_threading_func():
        import threading

        t1 = threading.Thread(target=count, args=(400000, 200000))
        t2 = threading.Thread(target=count, args=(200000, 0))

        t1.start()
        t2.start()

        t1.join()
        t2.join()


    if __name__ == "__main__":
        wo_threading = "wo_threading_func()"
        with_threading = "with_threading_func()"
        setup = "from __main__ import wo_threading_func, with_threading_func"

        print("Without threads:", timeit.timeit(stmt=wo_threading,
                                           setup=setup,
                                           number=100))
        print("With threads:", timeit.timeit(stmt=with_threading,
                                         setup=setup,
                                         number=100))
will be:

    Without threads: 2.2036261550383642
    With threads: 2.406025374075398

It would seem that by dividing the task into two threads, we will get the time half the time, and as a result it was higher than without the use of threads. Why?

GIL

Python has GIL, Global Interpreter Lock, which only allows one thread to execute at a time. The exceptions are I/O operations (inputs/outputs), they are not blocked by it. This is why in the first example we were able to run the code faster using threads.

Multiprocessing

If we want to parallelize the calculations, we can use the multiprocessing module, which runs the code on subprocesses. They each get their own interpreter and memory space, so GIL won't be a problem. Thanks to this, we can use many processors in the computer.

Let's add a third approach to the previous example, in which we will also divide the interval into two parts, but this time we will run each of them on a sub-process.

By calling the following code:

    def with_multiprocessing_func():
        import multiprocessing

        p1 = multiprocessing.Process(target=count, args=(400000, 200000))
        p2 = multiprocessing.Process(target=count, args=(200000, 0))

        p1.start()
        p2.start()

        p1.join()
        p2.join()


    if __name__ == "__main__":
        wo_threading = "wo_threading_func()"
        with_threading = "with_threading_func()"
        with_multiprocessing = "with_multiprocessing_func()"
        setup = "from __main__ import wo_threading_func, with_threading_func, with_multiprocessing_func"

        print("Without threads:", timeit.timeit(stmt=wo_threading,
                                           setup=setup,
                                           number=100))
        print("With threads:", timeit.timeit(stmt=with_threading,
                                         setup=setup,
                                         number=100))
        print("With_multiprocessing:", timeit.timeit(stmt=with_multiprocessing,
                                              setup=setup,
                                              number=100))

we will get:

    Without threads: 2.169961199979298
    With threads: 2.344646318932064
    With_multiprocessing: 1.4635963779874146

As we can see, thanks to the multiprocessing module, it was possible to parallelize the calculations and perform them faster than the standard approach.