Data structures¶
Data structures are a way of storing information in a program. So far we have learned about many complex types - tuples, lists, dictionaries, sets; these are only a few which have a Python implementation. There are more structures widely used in computer science. They were not taken out of nowhere, but "discovered" by programmers during many years of software development. Some of them will be presented in this chapter.
Stack¶
The first data structure to look at closely is the stack. This is an abstract data structure - in Python there is no separate type for the stack, you have to implement it yourself. Despite this, it is treated as a very important concept that is worth knowing.
The stack resembles a plate feeder known from some restaurants, a pipe with balls or a stack of books. In all the structures just mentioned, each new element is stacked on top of the previous ones, creating the * stack * from which the structure takes its name. More importantly, new data can only be added to the top of the stack, and the rest are unavailable to us until we remove the item from above.
A new item is added to the stack with the push
operation. If we get a value from the stack, then also only the one at the top - this operation is called pop
. The first item to be removed will be the one that got there last. Hence, the stack is called a LIFO (Last In, First Out) structure. Ideally, it is possible to visualize it on the basis of the plate dispenser - at the moment, we only have access to the one located at the top, and picking those below is possible only after getting rid of those from the top.
The idea of the stack is shown below.
In computer science, often referring to the stack is meant a fragment isolated in memory, the access to which is accelerated, due to the manner and architecture of the structure, which is the stack.
Queue¶
While in the case of a stack, the last element added is the first to leave the stack, the queue is different - the first element added will leave it first. This is the principle of First In, First Out (FIFO).
An example from the real world of this structure is a regular queue to the store - the person who gets there first will be served first.
It is a structure used in all systems and devices to properly manage tasks and processes. Were it not for the queuing of orders, the printer could print on one page several documents sent to it at the same time by different users - it certainly would not be something expected
Queue operation:
List¶
A list is a sequential data structure that consists of a string of elements of the same type. Note that the data structure of list is different from list as a data type in Python.
A one-way list, that is, one in which you can only walk through items in one direction (shown by arrows on the picture), is shown below.
As shown on the picture above, from a given element of the list we can only go to the next element (in the case of a one-way list) or to the next or previous one (a two-way list - such a list would require an additional parameter that remembers and points to the previous element). Getting to the i-th element requires going through the next elements from the first to the target, it is in vain to look for an index approach here (although it can be implemented, why not?). A single element of the list consists of data (on the picture: attribute data), pointer to the next element (attribute: next) and possibly a pointer to the previous element (prev - only in the case of two-way lists). When creating a list, you usually add two additional pointers: head (points to the first element of the list) and tail (points to the last element). The count (incremented with each addition of a new element) is used to count the length of the list.
List items don't have to be next to each other in memory. Thus, the list does not require a contiguous memory area and can be distributed across its various segments. New items can be added quickly anywhere in the list, so the list can grow dynamically in memory. You can also remove any elements from the list, which causes the list to shrink. If we want to find a specific item in a list, we have to go from the first item (head) to the last (tail) until we find the data we are looking for. It is not possible to reach an item via an index.
The Python implementation of the list type draws heavily from the general principles and definition of a data structure of the same name.
Trees¶
A tree is a structure made of nodes, as shown in the picture above (nodes: A, B, C, D, E, F, G, H). Nodes store data, they are related to each other in a hierarchical manner by edge, represented by arrows. The first node of the tree is called the root node, in the picture: A. The remaining nodes, which we will call the child node, leave the root. Sons are child nodes in a hierarchical structure. The sons of the same father are called sibling node. We call the parent node of the son. If a node has no children, it is called a leaf node, C, E, F, G, H in the picture, otherwise it is called an internal node. A sequence of nodes connected by edges is a path. There is only one way (path) to get from the root to a specific node in the tree.
In computer science, trees are said to grow from top to bottom, which is due to having leaves at the bottom and a root at the top of the structure.
Binary tree¶
A binary tree is one in which nodes can have a maximum of two sons (children). Child nodes are called left child node and right child node, respectively. A specific development of such a structure is the assumption that on the left side of a given node there may be nodes with smaller values, and on its right only those that store larger data. This approach perfectly simplifies the search algorithms.
Trees allow you to easily operate on sorted data, they are used in practically every field of computer science (eg databases, computer graphics, word processing, telecommunications, servers).
Heap¶
A Heap is a binary tree with a special property - in the root (at the top of the heap) there is always the smallest value if the heap is minimum or the largest if the heap is maximum.
Python provides an implementation of a minimal heap in the heapq
module. Although the heap is technically a tree, in practice it is stored as a list. This structure is poorly searchable, but great for grabbing the smallest element from its vertex. Then the next smallest element (or the largest, if the heap is the so-called maximum) will jump into that place. Then we take the items in ascending order from the heap (this is what heap sort works). Create a heap from the list of random elements and pull them out until it is empty. Since the heap always returns the largest (or the smallest depending on its type) value, we will get an ordered list of items. The heap can also be treated as a priority queue - after all, we will always remove an element with an extreme priority from it.