Skip to content

Operations on strings

What are strings in Python?

Strings are character chain in the sequence. Python can read all characters from the Unicode standard. In this case two data types str and bytes are used. Every single character is allocated in he memory. Python concatenates characters into words, using an order set in the sequence. In this way we are able to determine easily what characters the variable type str or bytes consists of, also we are able to locate the specific location. The important thing is that Python numerates elements in the sequence, starting from 0 - letter A have an index 0. That's why strings in IT are called chains, because every single element is a chain link.

Here is string "Hello, World!" with each element (character) index number from the sequence:

H e l l o , W o r l d !
0 1 2 3 4 5 6 7 8 9 10 11 12

The idea how Python stores string is important in case of manage them. Due to the string is a complex sequence, build of elements, we can get to each character by using index, where it is located. To do this we need a string and a variable that store it and use square brackets [ ], where we need insert a specified index number, which is related to a specific character, that we want to get.

    >>> print(f"First letter in word 'Alice have cat' is '{'Alice have cat'[0]}'.")
    "First letter in word 'Alice have cat' is 'A'."
    >>> hello = 'Hello, World!'
    >>> print(hello[7])
    'W'

Why line print(hello[7]) return character 'W'?

H e l l o , W o r l d !
0 1 2 3 4 5 6 7 8 9 10 11 12
-13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1

In Python we can get to character/element from sequence, using index from "end". Instead of counting until the last element, we can use string[-1], which will get the character from the end. Similarly string[-2] will get one before the last.

Python provides several functions for strings. Part of them will be showed later. If you want to get more knowledge, please visit this website: (https://docs.python.org/3/) - official language documentation.

Function len()

Function len() returns number of characters in a string.

    sentence = "Lorem ipsum dolor sit amet..."
    print(len(sentence))  # Return 29
    print("Alice have cat")  # Return 11

The important fact is that len() will return number the one greater than the last character index (len() is counted from 1, while indexes are counted from 0).

Function .index()

Function .index() is different from above, as it needs to be used on a string value (or a variable with a chain data type) - that's why a dot operator is always used - indirection. It returns the number of the first character occurrence in a string, that is used as a parameter in the function:

The following code checks what index number is for letter 'o' in the string 'Hello, World!':

    hello = "Hello, World!"
    print(hello.index('o'))  # Return 4
H e l l o , W o r l d !
0 1 2 3 4 5 6 7 8 9 10 11 12

Function .count()

Function .count() returns the number of occurrences for a specific character, defined in a parameter.

    hello = "Hello, World!"
    print(hello.count('o'))  # Return 2

Due to character 'o' occurs two times in the string 'Hello, World!', function .count() return number 2.

String cut (string slicing)

Operator [ ] can be used to get few characters at the same time from a string, as a result we get a new string.

    # It gets substring (substring) from characters chain
    hello = "Hello, World!"
    print(hello[7:12])  # Return World
H e l l o , W o r l d !
0 1 2 3 4 5 6 7 8 9 10 11 12

Important - you have to remember, that in case of such a code hello[7:12] you will get 5 characters (12-7), but without character from index 12! It's an often mistake, which is made by beginner programmers. This expression will return characters on 7, 8, 9, 10 and 11 position from the variable hello.

String slicing with some characters exclusion

Last one example can be extended. There's a way to get characters by more than the one position. The following code shows how to get part from 'Hello, World!', but every second character is skipped:

    # It gets substring from string with skip each second character
    hello = "Hello, World!"
    print(hello[7:12:2])  # Return "Wrd"
H e l l o , W o r l d !
0 1 2 3 4 5 6 7 8 9 10 11 12

The general way to get substring looks as following:

string[FROM_WHAT_INDEX_START : TO_WHAT_INDEX_BUT_WITHOUT_IT : BY_HOW_MANY_POSITIONS_WE_GET].

If we skip start and the end point, Python will treat it, as from the first to the last one:

    test = "Test string"
    print(test[:4])  # It will return first 4 characters, string 'Test'
    print(test[5:])  # It will return 'string', it will start from 5 position until the end of the string
    print(test[:])  # It will return 'Test string', whole string, from the beginning to the end

String reverse

By using specific code with an operator [] we can read string from the end.

    # Return characters in reversed order
    hello = "Hello, World!"
    print(hello[::-1])  # Return "!dlroW ,olleH"

We read the string from the beginning to the end, one by one but with a negative step - that's why string is returned from the other side. This is the Python trick.

Function .upper()

Function .upper() returns the same string, as it's called and converted all characters to capital letters.

    # return string in capital letters
    hello = "Hello, World!"
    print(hello.upper())  # Return HELLO, WORLD!

Function .lower()

Function .lower() returns the same string, as it's called and converted all characters to lowercase.

    # return string in lowercase
    hello = "Hello, World!"
    print(hello.lower())  # Return hello, world!

Function .replace()

Function .replace() replaces all occurrences of the specific character (set as a first parameter) to the character, which is set in the second argument. It doesn't update it in the previous variable (strings are unalterable!), but returns it as a new chain with replaced values.

    hello = "Hello, World!"
    print(hello.replace('l', 'Z'))  # Return HeZZo, WorZd!