Introduction to Loops and Vectorization in R

What Are Loops and Vectorization?

In R, loops allow you to execute a block of code multiple times, while vectorization enables performing operations efficiently on entire vectors or arrays without explicit loops. Understanding both concepts helps write efficient and readable R programs.

Loops in R

Loops are control structures that repeat a set of instructions multiple times. R supports several types of loops, including for, while, and repeat loops.

The for Loop

A for loop iterates over a sequence, executing a block of code for each element.

Syntax:

for (variable in sequence) {
  # Code to execute
}

Example: Printing Numbers from 1 to 5

for (i in 1:5) {
  print(i)
}

Output:

[1] 1
[1] 2
[1] 3
[1] 4
[1] 5

The while Loop

A while loop executes as long as a specified condition is TRUE.

Syntax:

while (condition) {
  # Code to execute
}

Example: Printing Numbers Until a Condition is Met

x <- 1
while (x <= 5) {
  print(x)
  x <- x + 1
}

The repeat Loop

A repeat loop runs indefinitely until explicitly stopped using break.

Syntax:

repeat {
  # Code to execute
  if (condition) {
    break
  }
}

Example: Using repeat to Print Numbers

x <- 1
repeat {
  print(x)
  x <- x + 1
  if (x > 5) {
    break
  }
}

Breaking and Skipping Iterations

  • break exits a loop prematurely.
  • next skips the current iteration and continues with the next one.

Example: Skipping Even Numbers

for (i in 1:10) {
  if (i %% 2 == 0) {
    next
  }
  print(i)
}

Output:

[1] 1
[1] 3
[1] 5
[1] 7
[1] 9

Vectorization in R

R is optimized for vectorized operations, meaning functions operate directly on entire vectors instead of individual elements.

Vectorized Operations vs Loops

Loops can be slow for large datasets, whereas vectorized operations run faster and more efficiently.

Example: Squaring Numbers (Loop vs Vectorized Approach)

Using a for loop:

numbers <- c(1, 2, 3, 4, 5)
squared <- numeric(length(numbers))

for (i in seq_along(numbers)) {
  squared[i] <- numbers[i]^2
}
print(squared)

Using vectorization:

numbers <- c(1, 2, 3, 4, 5)
squared <- numbers^2
print(squared)

Both return:

[1]  1  4  9 16 25

Vectorized Functions

Many built-in R functions are vectorized, allowing them to operate on entire vectors at once.

Example: Applying Mathematical Functions

x <- c(1, 2, 3, 4, 5)
y <- c(6, 7, 8, 9, 10)

sum_result <- x + y  # Element-wise addition
log_result <- log(x)  # Logarithm applied to each element
sqrt_result <- sqrt(y)  # Square root applied to each element

apply(), sapply(), and lapply()

Instead of using loops, functions like apply(), sapply(), and lapply() apply operations to elements of data structures efficiently.

Example: Using sapply() to Compute Factorial

factorial_vector <- sapply(1:5, factorial)
print(factorial_vector)

Output:

[1]   1   2   6  24 120

Performance Comparison: Loops vs Vectorization

Using system.time() to compare execution times:

# Using a loop
system.time({
  result <- numeric(100000)
  for (i in 1:100000) {
    result[i] <- i^2
  }
})

# Using vectorization
system.time({
  result <- (1:100000)^2
})

Vectorization is significantly faster, making it the preferred approach for handling large datasets efficiently.