When learning the R programming language, you will often work with data. Data can come in many shapes and sizes, such as numbers, text, or lists. In R, there is a special way to organize data into tables. These tables are easy to work with and are called tibbles.
Tibbles are an improved version of data frames in R. If you already know what a data frame is, think of a tibble as a smarter and more user-friendly version. If you don’t know what a data frame is, don’t worry! We’ll explain tibbles from the beginning.
What Is a Tibble?
A tibble is like a spreadsheet. Imagine a table with rows and columns, where:
- Each column has a name (like “Name”, “Age”, or “Salary”).
- Each row is a single observation or entry (like information about one person).
For example, here is a simple table:
Name | Age | Salary |
---|---|---|
Alice | 25 | 50000 |
Bob | 30 | 60000 |
Charlie | 35 | 70000 |
This is how a tibble might look when printed in R.
Why Use Tibbles?
Tibbles are better than regular data frames because:
- They are easier to read.
- Tibbles show only the first 10 rows and as many columns as fit on your screen. This avoids overwhelming you with too much information.
- They don’t change data types automatically.
- With data frames, numbers might accidentally become text, or text might turn into factors (a type of category in R). Tibbles keep the data exactly as it is.
- They are more predictable.
- Tibbles don’t surprise you with hidden rules. For example, if you select one column, it will stay as a tibble, not something else.
How to Create a Tibble
To create a tibble, you can use the tibble()
function from the tibble
package. Let’s make the table above as a tibble.
# Load the tibble package
library(tibble)
# Create a tibble
data <- tibble(
Name = c("Alice", "Bob", "Charlie"),
Age = c(25, 30, 35),
Salary = c(50000, 60000, 70000)
)
# Print the tibble
data
When you run this code, R will display the tibble like this:
# A tibble: 3 × 3
Name Age Salary
<chr> <dbl> <dbl>
1 Alice 25 50000
2 Bob 30 60000
3 Charlie 35 70000
- means the column contains text (character data).
- means the column contains numbers (double-precision values).
How to Work with Tibbles
1. Access Columns
You can get a single column using the $
symbol:
# Get the Age column
data$Age
This will show:
[1] 25 30 35
2. Add a New Column
You can add a new column by assigning values to it:
# Add a new column called Bonus
data$Bonus <- c(5000, 6000, 7000)
data
Now, the tibble looks like this:
# A tibble: 3 × 4
Name Age Salary Bonus
<chr> <dbl> <dbl> <dbl>
1 Alice 25 50000 5000
2 Bob 30 60000 6000
3 Charlie 35 70000 7000
3. Filter Rows
To filter rows, use the filter()
function from the dplyr
package:
# Load dplyr package
library(dplyr)
# Filter rows where Age is greater than 25
filtered_data <- filter(data, Age > 25)
filtered_data
This will show:
# A tibble: 2 × 4
Name Age Salary Bonus
<chr> <dbl> <dbl> <dbl>
1 Bob 30 60000 6000
2 Charlie 35 70000 7000