Tidying Data with tidyr - Sangy Academy

When working with data in R, it’s essential to have it organized in a clear and consistent format. This makes analysis easier and more reliable. The tidyr package in R helps us achieve this by providing simple functions to tidy up our data.

What is Tidy Data?

Tidy data means that:

Each variable has its own column. For example, if you’re recording information about students, variables could be “Name,” “Age,” and “Grade,” each in separate columns.
Each observation has its own row. Each student would have their own row with their respective information.
Each value has its own cell. The intersection of a row and column should contain a single value, like a student’s age or grade.

Installing and Loading tidyr

Before using tidyr, you need to install and load it into your R environment:

install.packages("tidyr")  # Install tidyr
library(tidyr)             # Load tidyr

Common Functions in tidyr

Here are some basic functions in tidyr that help in tidying data:

1. `pivot_longer()`

This function transforms data from a wide format to a long format. It’s useful when you have multiple columns that represent similar information.

Example:

Suppose you have a dataset of students’ scores in different subjects:

Name	Math	Science	English
Alice	85	90	88
Bob	78	82	85

To tidy this data:

library(tidyr)

# Original data
students <- data.frame(
  Name = c("Alice", "Bob"),
  Math = c(85, 78),
  Science = c(90, 82),
  English = c(88, 85)
)

# Use pivot_longer to tidy the data
tidy_students <- pivot_longer(students, cols = Math:English, names_to = "Subject", values_to = "Score")

print(tidy_students)

The result will be:

Name	Subject	Score
Alice	Math	85
Alice	Science	90
Alice	English	88
Bob	Math	78
Bob	Science	82
Bob	English	85

Now, each row represents a single observation of a student’s score in a subject.

2. `pivot_wider()`

This function does the opposite of pivot_longer(). It transforms data from a long format back to a wide format.

Example:

Using the tidy_students data from above:

# Use pivot_wider to spread the data
wide_students <- pivot_wider(tidy_students, names_from = Subject, values_from = Score)

print(wide_students)

The result will be:

Name	Math	Science	English
Alice	85	90	88
Bob	78	82	85

This returns the data to its original wide format.

3. `separate()`

This function splits a single column into multiple columns based on a separator.

Example:

Suppose you have a dataset with full names:

FullName
Alice Johnson
Bob Smith

To separate the full names into first and last names:

# Original data
names <- data.frame(
  FullName = c("Alice Johnson", "Bob Smith")
)

# Use separate to split the FullName column
separated_names <- separate(names, col = FullName, into = c("FirstName", "LastName"), sep = " ")

print(separated_names)

The result will be:

FirstName	LastName
Alice	Johnson
Bob	Smith

Now, the full names are split into two separate columns.

4. `unite()`

This function combines multiple columns into a single column.

Example:

Using the separated_names data from above:

# Use unite to combine FirstName and LastName
united_names <- unite(separated_names, col = FullName, FirstName, LastName, sep = " ")

print(united_names)

The result will be:

FullName
Alice Johnson
Bob Smith

This combines the first and last names back into a single column.

< dplyr

Files >

What is Tidy Data?

Installing and Loading tidyr

Common Functions in tidyr

1. pivot_longer()

2. pivot_wider()

3. separate()

4. unite()

1. `pivot_longer()`

2. `pivot_wider()`

3. `separate()`

4. `unite()`