When working with data in R, it’s essential to have it organized in a clear and consistent format. This makes analysis easier and more reliable. The tidyr package in R helps us achieve this by providing simple functions to tidy up our data.

What is Tidy Data?

Tidy data means that:

  1. Each variable has its own column. For example, if you’re recording information about students, variables could be “Name,” “Age,” and “Grade,” each in separate columns.
  2. Each observation has its own row. Each student would have their own row with their respective information.
  3. Each value has its own cell. The intersection of a row and column should contain a single value, like a student’s age or grade.

Installing and Loading tidyr

Before using tidyr, you need to install and load it into your R environment:

install.packages("tidyr")  # Install tidyr
library(tidyr)             # Load tidyr

Common Functions in tidyr

Here are some basic functions in tidyr that help in tidying data:

1. pivot_longer()

This function transforms data from a wide format to a long format. It’s useful when you have multiple columns that represent similar information.

Example:

Suppose you have a dataset of students’ scores in different subjects:

NameMathScienceEnglish
Alice859088
Bob788285

To tidy this data:

library(tidyr)

# Original data
students <- data.frame(
  Name = c("Alice", "Bob"),
  Math = c(85, 78),
  Science = c(90, 82),
  English = c(88, 85)
)

# Use pivot_longer to tidy the data
tidy_students <- pivot_longer(students, cols = Math:English, names_to = "Subject", values_to = "Score")

print(tidy_students)

The result will be:

NameSubjectScore
AliceMath85
AliceScience90
AliceEnglish88
BobMath78
BobScience82
BobEnglish85

Now, each row represents a single observation of a student’s score in a subject.

2. pivot_wider()

This function does the opposite of pivot_longer(). It transforms data from a long format back to a wide format.

Example:

Using the tidy_students data from above:

# Use pivot_wider to spread the data
wide_students <- pivot_wider(tidy_students, names_from = Subject, values_from = Score)

print(wide_students)

The result will be:

NameMathScienceEnglish
Alice859088
Bob788285

This returns the data to its original wide format.

3. separate()

This function splits a single column into multiple columns based on a separator.

Example:

Suppose you have a dataset with full names:

FullName
Alice Johnson
Bob Smith

To separate the full names into first and last names:

# Original data
names <- data.frame(
  FullName = c("Alice Johnson", "Bob Smith")
)

# Use separate to split the FullName column
separated_names <- separate(names, col = FullName, into = c("FirstName", "LastName"), sep = " ")

print(separated_names)

The result will be:

FirstNameLastName
AliceJohnson
BobSmith

Now, the full names are split into two separate columns.

4. unite()

This function combines multiple columns into a single column.

Example:

Using the separated_names data from above:

# Use unite to combine FirstName and LastName
united_names <- unite(separated_names, col = FullName, FirstName, LastName, sep = " ")

print(united_names)

The result will be:

FullName
Alice Johnson
Bob Smith

This combines the first and last names back into a single column.