Python - Loops

Overview
Questions:
  • How can I make a program do many things?

Objectives:
  • Explain what for loops are normally used for.

  • Trace the execution of a simple (unnested) loop and correctly state the values of variables in each iteration.

  • Write for loops that use the Accumulator pattern to aggregate values.

Requirements:
Time estimation: 40 minutes
Level: Introductory Introductory
Supporting Materials:
Last modification: Sep 28, 2022
License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License The GTN Framework is licensed under MIT
Best viewed in a Jupyter Notebook

This tutorial is best viewed in a Jupyter notebook! You can load this notebook one of the following ways

Launching the notebook in Jupyter in Galaxy

  1. Instructions to Launch JupyterLab
  2. Open a Terminal in JupyterLab with File -> New -> Terminal
  3. Run wget https://training.galaxyproject.org/training-material/topics/data-science/tutorials/python-loops/data-science-python-loops.ipynb
  4. Select the notebook that appears in the list of files on the left.

Downloading the notebook

  1. Right click one of these links: Jupyter Notebook (With Solutions), Jupyter Notebook (Without Solutions)
  2. Save Link As..

A for loop tells Python to execute some statements once for each value in a list, a character string, or some other collection: “for each thing in this group, do these operations”

Comment

This tutorial is significantly based on the Carpentries Programming with Python, Programming with Python, and Plotting and Programming in Python, which are licensed CC-BY 4.0.

Adaptations have been made to make this work better in a GTN/Galaxy environment.

Agenda

In this tutorial, we will cover:

  1. For Loops
    1. Structure
    2. A for loop is made up of a collection, a loop variable, and a body.
    3. Things You Can Loop Over
    4. Indentation
    5. Variable Naming
    6. Range
    7. Accumulation
    8. Exercises
    9. Cumulative Sum

For Loops

Which of these would you rather write

Input: Manually
print(2)
print(3)
print(5)
print(7)
print(11)
Output: With Loops
for number in [2, 3, 5, 7, 11]:
    print(number)

It may be less clear here, since you just need to do one operation (print) but if you had to do two operations, three, more?

Structure

A for loop is made up of a collection, a loop variable, and a body.

for number in [2, 3, 5]:
    doubled = number * 2
    print(f"{number} doubled is {doubled}")
  • number - this is the loop variable. It’s a new variable, that’s assigned to the values from the collection. It does not need to be defined before the loop.
  • the collection, [2, 3, 5] is a list of numbers which we can tell from the square brackets used: [, ]
  • the loop body, where we double a number and the print out a message. The loop body is what gets executed for every iteration of the loop.
Input: The loop
for number in [2, 3, 5]:
    doubled = number * 2
    print(f"{number} doubled is {doubled}")
Output: What's really happening internally
# First iteration, number = 2
doubled = number * 2
print(f"{number} doubled is {doubled}")
# Second iteration, number = 3
doubled = number * 3
print(f"{number} doubled is {doubled}")
# Third iteration, number = 5
doubled = number * 5
print(f"{number} doubled is {doubled}")

Writing loops saves us time and makes sure our code is accurate, that we don’t accidentally introduce a typo somewhere in the loop body.

Things You Can Loop Over

You can loop over characters in a string

dna_string = 'ACTGGTCATCG'
for base in dna_string:
    print(base)

You can loop over lists:

cast = ['Elphaba', 'Glinda', 'Fiyero', 'Nessarose']
for character in cast:
    print(character)

Indentation

The first line of the for loop must end with a colon, and the body must be indented with four spaces. Many editors do this automatically for you and even convert Tabs into 4 spaces.

The colon at the end of the first line signals the start of a block of statements.

for x in y:
    print(x)

or

if x > 10:
    print(x)

or even further nesting is possible:

for x in y:
    if x > 10:
        print(x)

The indentation is in fact, quite necessary. Notice how this fails:

#Fix me!
for number in [2, 3, 5]:
print(number)

And, likewise, this:

patient1 = "z2910"
  patient2 = "y9583"

Variable Naming

Loop variables can be called anything, i, j, and k are very commong defaults due to their long history of use in other programing languages. As with all variables, loop variables are: Created on demand, and Meaningless; their names can be anything at all.

for kitten in [2, 3, 5]:
    print(kitten)

But meaningless is bad for variable names, and whenever possible, we should strive to pick useful, accurate variable names that help use remember what’s going on:

for sequence in sequences:
    print()
for patient in clinic_patients:
    print()
for nucleotide in dna_sequence:
    print()

Range

You can use range to iterate over a sequence of numbers. This is a built in function (check help(range)!) so it’s always available even if you don’t import anything. The range produced is non-inclusive: range(N) is the numbers 0 to N-1, so the result will be exactly the length you requested.

for number in range(10):
    print(number)

In python range is a special type of iterable: none of the numbers are created until we need them.

print(range(5))
print(range(-3, 8)[0:4])

The easiest way to see what numbers are actually in there is to convert it to a list:

print(list(range(5)))
print(list(range(-3, 8)))
print(list(range(0, 10, 2)))

Accumulation

In programming you’ll often want to accumulate some values: counting things (or “accumulating”). The pattern consists of creating a variable to store your result, running a loop over some data, and in that loop, adding to the variable for your result.

# Sum the first 10 integers.
total = 0
for number in range(1, 11):
    total = total + (number)
print(f" final: ")

But how did we get that result? We can add some “debugging” lines to the above code to figure out how we got to that result. Try adding the following line in the above loop

print(f'Currently {number}, our total is {total}')

You can add it before you update total, after it, or both! Compare the outputs to understand what’s happening on each line.

There are multiple ways to efficiently control your loop if you need it. these are the inbuilt python functions: continue & break

when python encounters continue in your loop it will stop working and goes to the next iteration of the loop.

for letter in 'Galaxy':
    if letter == 'l':
        continue
    print(f'The letters are: {letter}')

with break python stops the loop and continues with the next part of the code like nothing happened

for letter in 'Galaxy':
    if letter == 'l':
        break
    print(f'The letters are: {letter}')
print('Done')
# Test break and continue here

Exercises

Question: Tracing Execution

Create a table showing the numbers of the lines that are executed when this program runs, and the values of the variables after each line is executed.

total = 0
for char in "tin":
    total = total + 1
Line Variables
1 total = 0
2 total = 0 char = ‘t’
3 total = 1 char = ‘t’
2 total = 1 char = ‘i’
3 total = 2 char = ‘i’
2 total = 2 char = ‘n’
3 total = 3 char = ‘n’
#Test your code here!
Question: Reversing a String

Fill in the blanks in the program below so that it prints “stressed” (the reverse of the original character string “desserts”).

original = "stressed"
result = ____
for char in original:
    result = ____
print(result)
original = "stressed"
result = ""
for char in original:
    result = char + result
print(result)
# Test your code here!
original = "stressed"
result = ____
for char in original:
    result = ____
print(result)
Question: Practice Accumulating

Fill in the blanks in each of the programs below to produce the indicated result.

# Total length of the strings in the list: ["red", "green", "blue"] => 12
total = 0
for word in ["red", "green", "blue"]:
    ____ = ____ + len(word)
print(total)
total = 0
for word in ["red", "green", "blue"]:
    total = total + len(word)
print(total)
# List of word lengths: ["red", "green", "blue"] => [3, 5, 4]
lengths = ____
for word in ["red", "green", "blue"]:
    lengths.____(____)
print(lengths)
lengths = []
for word in ["red", "green", "blue"]:
    lengths.append(len(word))
print(lengths)
# Concatenate all words: ["red", "green", "blue"] => "redgreenblue"
words = ["red", "green", "blue"]
result = ____
for ____ in ____:
    ____
print(result)
words = ["red", "green", "blue"]
result = ""
for word in words:
    result = result + word
print(result)

Create an acronym: Starting from the list ["red", "green", "blue"], create the acronym "RGB" using a for loop.

Hint: You may need to use a string method to properly format the acronym.

acronym = ""
for word in ["red", "green", "blue"]:
    acronym = acronym + word[0].upper()
print(acronym)
#Test your code here!

Cumulative Sum

Reorder and properly indent the lines of code below so that they print a list with the cumulative sum of data. The result should be [1, 3, 5, 10].

cumulative.append(total)
for number in data:
cumulative = []
total += number
total = 0
print(cumulative)
data = [1,2,2,5]
total = 0
data = [1,2,2,5]
cumulative = []
for number in data:
    total += number
    cumulative.append(total)
print(cumulative)
# Test your code here!
Question: A classic programmer test: Fizz Buzz

FizzBuzz is a classic “test” question that is used in some job interviews to remove candidates who really do not understand programming. Your task is this:

Write a for loop that loops over the numbers 1 to 50.

  • If the number is divisible by 3, write Fizz instead of the number
  • If the number is divisible by 5, write Buzz instead of the number
  • If the number is divisible by 3 and 5 both, write FizzBuzz instead of the number
  • Otherwise, write the number itself.
for i in range(1, 50):
    if i % 3 == 0 and i % 5 == 0:
        print("FizzBuzz")
    elif i % 3 == 0:
        print("Fizz")
    elif i % 5 == 0:
        print("Buzz")
    else:
        print(i)
# Do a FizzBuzz
Question: Identifying Item Errors
  1. Read the code below and try to identify what the errors are without running it.
  2. Run the code, and read the error message. What type of error is it?
  3. Fix the error.
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
print(f'My favorite season is {seasons[4]}')

This list has 4 elements and the index to access the last element in the list is 3.

seasons = ['Spring', 'Summer', 'Fall', 'Winter']
print(f'My favorite season is {seasons[3]}')
# Fix me!
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
print(f'My favorite season is {seasons[4]}')
Question: Correct the errors

This code is completely missing indentation, it needs to be fixed. Can you make some guesses at how indented each line should be?

data = [1, 3, 5, 9]
acc = 0
for i in data:
if i < 4:
acc = acc + i * 2
else:
acc = acc + i
print(f'The value at {i} is {acc}')
print(f'The answer is {acc}')
data = [1, 3, 5, 9]
acc = 0
# There is a : character at the end of this line, so you KNOW the next line
# must be indented.
for i in data:
    # Same here, another :
    if i < 4:
        acc = acc + i * 2
    # And again! Another :
    else:
        acc = acc + i
# But what about these lines?
print(f'The value at {i} is {acc}')
print(f'The answer is {acc}')

Here this code is actually ambiguous, we don’t know how indented the two prints should be. This very synthetic example lacks good context, but there are three places it could be, with three different effects.

There are two bits of knowledge we can use, however:

  • the first print uses i, so it must be within the loop
  • the second print cannot be indented more than the first print (Why? It would require a block like for ... : or if .. : to indent further.)

The first option, no indentation, prints out the value once per loop, that seems good

[...]
    else:
        acc = acc + i
    print(f'The value at {i} is {acc}')

The second, prints out the value only during the else case, not otherwise.

    else:
        acc = acc + i
        print(f'The value at {i} is {acc}')

So that’s probably wrong, and we should take the first option. That leaves two options for the final print, no indentation, or at the same level as our first print statement. We can guess that we probably want to print out the final result of the loop, and that it should not be indented.

data = [1, 3, 5, 9]
acc = 0
for i in data:
    if i < 4:
        acc = acc + i * 2
    else:
        acc = acc + i
    print(f'The value at {i} is {acc}')
print(f'The answer is {acc}')
# This code accidentally lost it's indentation! Can you fix it?
data = [1, 3, 5, 9]
acc = 0
for i in data:
if i < 4:
acc = acc + i * 2
else:
acc = acc + i
print(f'The value at {i} is {acc}')
print(f'The answer is {acc}')
Question: Trimming a FASTQ string

Given a FASTQ string, and a list with quality scores, use break to print out just the good bit of DNA and it’s quality score.

# We've got a Read
read = """
@SEQ_ID
GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
+
55CCF>>>>>>CCCCCCC65!''*((((***+))%%%++)(%%%%).1***-+*''))**
""".strip().split('\n')

def quality_to_percent(q):
    return 100 * (1 - (10 ** (q / -10)))

sequence = read[1]
quality_scores = [ord(x) - 33 for x in read[3]]

for i in ... # TODO

There are two ways to do this, one you might be able to guess, and one that might be new:

  1. Loop over a range() using len(sequence). Since len(sequence) == len(quality_scores), when we access the Nth position of either, they match up.
  2. zip(sequence, quality_scores) will loop over both of these lists together. It produces a new list that looks like [['G', 20], ['A', 20], ['T', 34]].

The naïve solution is quite easy and readable:

for i in range(len(sequence)):
    if quality_scores[i] < 15:
        break
    print(f'Base {i} = {sequence[i]} with {quality_to_percent(quality_scores[i])}% accuracy')

But we can make this a bit prettier using the zip() function:

for base, score in zip(sequence, quality_scores):
    if score < 15:
        break
    print(f'Base = {base} with {quality_to_percent(score)}% accuracy')

But note that we don’t have the position in the list anymore, so we remove it from the print statement.

# We've got a Read
read = """
@SEQ_ID
GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
+
55CCF>>>>>>CCCCCCC65!''*((((***+))%%%++)(%%%%).1***-+*''))**
""".strip().split('\n')

def quality_to_percent(q):
    return 100 * (1 - (10 ** (q / -10)))

# Extract the sequence
sequence = read[1]
# And the quality scores, and map those to the correct values.
quality_scores = [ord(x) - 33 for x in read[3]]

# Write something here
# That loops over BOTH the sequence and Quality Scores.
# And prints them out
# If the quality scores are `<15`, then break and quit printing.
for i in ...
Key points
  • A for loop executes commands once for each value in a collection.

  • A for loop is made up of a collection, a loop variable, and a body.

  • The first line of the for loop must end with a colon, and the body must be indented.

  • Indentation is always meaningful in Python.

  • Loop variables can be called anything (but it is strongly advised to have a meaningful name to the looping variable).

  • The body of a loop can contain many statements.

  • Use range to iterate over a sequence of numbers.

  • The Accumulator pattern turns many values into one.

Frequently Asked Questions

Have questions about this tutorial? Check out the FAQ page for the Foundations of Data Science topic to see if your question is listed there. If not, please ask your question on the GTN Gitter Channel or the Galaxy Help Forum

Feedback

Did you use this material as an instructor? Feel free to give us feedback on how it went.
Did you use this material as a learner or student? Click the form below to leave feedback.

Click here to load Google feedback frame

Citing this Tutorial

  1. The Carpentries, Helena Rasche, Donny Vrins, Bazante Sanders, 2022 Python - Loops (Galaxy Training Materials). https://training.galaxyproject.org/training-material/topics/data-science/tutorials/python-loops/tutorial.html Online; accessed TODAY
  2. Batut et al., 2018 Community-Driven Data Analysis Training for Biology Cell Systems 10.1016/j.cels.2018.05.012


@misc{data-science-python-loops,
author = "The Carpentries and Helena Rasche and Donny Vrins and Bazante Sanders",
title = "Python - Loops (Galaxy Training Materials)",
year = "2022",
month = "09",
day = "28"
url = "\url{https://training.galaxyproject.org/training-material/topics/data-science/tutorials/python-loops/tutorial.html}",
note = "[Online; accessed TODAY]"
}
@article{Batut_2018,
    doi = {10.1016/j.cels.2018.05.012},
    url = {https://doi.org/10.1016%2Fj.cels.2018.05.012},
    year = 2018,
    month = {jun},
    publisher = {Elsevier {BV}},
    volume = {6},
    number = {6},
    pages = {752--758.e1},
    author = {B{\'{e}}r{\'{e}}nice Batut and Saskia Hiltemann and Andrea Bagnacani and Dannon Baker and Vivek Bhardwaj and Clemens Blank and Anthony Bretaudeau and Loraine Brillet-Gu{\'{e}}guen and Martin {\v{C}}ech and John Chilton and Dave Clements and Olivia Doppelt-Azeroual and Anika Erxleben and Mallory Ann Freeberg and Simon Gladman and Youri Hoogstrate and Hans-Rudolf Hotz and Torsten Houwaart and Pratik Jagtap and Delphine Larivi{\`{e}}re and Gildas Le Corguill{\'{e}} and Thomas Manke and Fabien Mareuil and Fidel Ram{\'{\i}}rez and Devon Ryan and Florian Christoph Sigloch and Nicola Soranzo and Joachim Wolff and Pavankumar Videm and Markus Wolfien and Aisanjiang Wubuli and Dilmurat Yusuf and James Taylor and Rolf Backofen and Anton Nekrutenko and Björn Grüning},
    title = {Community-Driven Data Analysis Training for Biology},
    journal = {Cell Systems}
}
                   

Congratulations on successfully completing this tutorial!