Control Structures

This past weekend was the Super Bowl. I thought for this post I would use a data set that was posted on Kaggle that contains data on the Super Bowl games from 1967 to present. This data set was put together by Timo Bozsolik.

In this post, I’m going to get into control structures. DataQuest says that control structures are necessary for repeated application or to execute an action only if a condition is met. Otherwise, writing code would be very tedious and not as fun.

I admit I struggled with control structures in other languages. However with R, I finally feel like I have a grasp on this concept. I feel like something clicked in this lesson. So without further delay, let’s get into it!

Conditional Statements

A conditional statement is a type of control structure that performs different calculations or actions based on whether or not a predefined condition is TRUE or FALSE. You can express conditional statements using comparison operators. One type of conditional statement is called an if statement. Let’s look at the screenshot below. I created a conditional statement that looks at the titles of the Super Bowl games. In this case, the condition is the first element of the Super Bowl title column being greater than 53. The action is the print statement, “Jennifer Lopez and Shakira performed at this Super Bowl.

if statement in R.

I could add another type of conditional statement called the else statement. I can combine my if and else statements to form what is called an if/else statement. An action is executed if the condition in the if statement is TRUE. If that condition is not TRUE, the action in the else statement is executed.

if/else statement in R.

There is also something called nested else/if statements where an else/if statement is written within another else/if statement like so…

nested else/if statements in R.

The problem with conditional statements is that they are inefficient. They are lengthy and if you have a situation where you have multiple conditional statements, it becomes repetitive. Another type of control structure solves the problem of repetition: For-loops.

For-loops

Print Elements of A Sequence

For-loops perform an operation a given number of times, allowing me to execute a piece of code repeatedly on elements of a sequence. For example, let’s say I want to print all the Super Bowl titles. I could write a for-loop that looks like this:

using a for-loop to print elements of a sequence.

Let’s examine this for-loop. The index variable i represents the element of a sequence. I can read this as “for every element in the sb column of the clean_superbowl data frame, print the element.” Although not shown here, this for-loop printed every single Super Bowl title until it reached the end of all the titles(in this case Super Bowl I (1)).

I don’t always have to use i. I could use any variable name. The variable name should describe what the variable represents making code more readable. I also would avoid using common names as variables. For example, I would not use sum as a variable name because it can cause problems if I wanted to use the sum() function later on.

Looping Over Rows in a Data Frame

When writing a for-loop, the elements I specify can be values, vectors, or data structures. In this next example, I wrote a for-loop to execute an operation on elements that are rows in a data frame. This for-loop calculates how many points the winning team won by for each game.

looping over a data frame

Let’s break this down further. I’ll start by explaining the first line of code. In the clean_superbowl data frame, each match has its own row. Since I want to perform the subtraction operation for each row of the data frame, the first part of the for-loop will consist of defining i as an element of the sequence of numbers from one to fifty-four (the number of rows in the data frame).

The nrow(clean_superbowl) returns the number of rows in the data frame. The print() function is used to display the results.

Nested Control Structures

As I mentioned previously, executing one or more control structures inside another is called nesting. A for-loop can be used to loop over conditional statements. In this example, I used a for-loop, that for each row in the clean_superbowl data frame, print “Aww it was so close!” if the difference between the winner points and loser points is less than 10 and ” A Total Blowout!” if not.

nested control structures using a for-loop.

 Storing For-Loop Output in Objects

Though I can print out the output of my for-loop, I ultimately want to store the output of my for-loop in an object so that I can use it. In this next example, I wanted to calculate the total number of points scored in each Super Bowl game. I first created an empty vector, total_points_scored. I then wrote the for-loop to add new elements into the vector. The new elements are the sums of winner_pts and loser_pts for each game.

storing a for-loop output in a vector.

After running the for-loop, the total_points_scored vector contains a sum winner_pts and loser_pts for each game.

More Than Two Cases

There are times where my code would need to specify more than two outcomes. This is where selection control statements come in. Selection control statements allow me to specify more than two outcomes by adding else if statements to my code. Let’s say if I wanted to specify three conditions: games that were won by less than or equal to 10 points, games that were won by greater than or equal to 11 points and less than or equal to 20 points, games that were won by greater than 21 points. I can write the following:

More than two conditions using a for loop

As you can see, each row printed a statement based on the condition I specified.

One Last Note

Before I sign off, I want to introduce an R package I learned about a few days ago. It’s called the janitor package. The janitor package has simple functions for examining and cleaning data. This package came in handy for cleaning up the names of the columns in this dataset.

Cleaning data using janitor package in R.

I first installed the janitor package. Second, I loaded the package using the library function showed in the above photo. Then, I imported the data. Next, I created a new object called clean_superbowl, that has the clean names. The last line of code gives you a side by side comparison of the column names of the imported data and the cleaned data frame. You can find more information about the janitor package here.

Okay, that’s all for this post. I’m signing off for now! Until next time…

One thought on “Control Structures

Comments are closed.