Dynamic Column Names in R: Seven Approaches Compared

metaprogramming

r-language

A comprehensive guide to adding columns with programmatically-generated names in R dataframes. Covers base R, tidyverse (classic and modern), data.table, collapse, rlang::inject(), and do.call patterns.

Author

Zenn

Published

February 16, 2026

0.1 Introduction

While reviewing some production R code recently, the following pattern appeared:

data |>
  mutate(
    !!paste0(measure, "_bl") := baseline_value,
    !!paste0(measure, "_cng") := current - baseline_value
  )

The !! and := operators were unfamiliar. After investigation, they proved to be part of R’s tidy evaluation system: a powerful but often misunderstood feature for programmatic data manipulation.

This exploration led to cataloguing seven different approaches to creating columns with dynamic names in R:

Base R using [[ assignment
Base R using do.call() and setNames()
The “classic” tidyverse approach with !! and :=
The modern tidyverse glue-style syntax
The rlang expression splicing pattern
The data.table approach
The collapse package approach

0.2 The Problem

Consider a function that calculates change scores for any cognitive measure. Given a dataframe with columns rid (subject ID), vis (visit), and a measure like mmse, we wish to create two new columns:

mmse_bl: the baseline value for each subject
mmse_cng: the change from baseline at each visit

The column names must be constructed dynamically because the function should work for any measure (mmse, adas13, cdr, etc.).

Here is our sample data:

Code

library(dplyr)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

Code

df <- tibble(
  rid = rep(c("001", "002"), each = 3),
  vis = rep(c("bl", "m06", "m12"), 2),
  mmse = c(28, 27, 25, 30, 29, 28)
)

df

# A tibble: 6 × 3
  rid   vis    mmse
  <chr> <chr> <dbl>
1 001   bl       28
2 001   m06      27
3 001   m12      25
4 002   bl       30
5 002   m06      29
6 002   m12      28

0.3 Approach 1: Base R with `[[` Assignment

The most straightforward approach uses base R’s [[ assignment operator:

Code

calculate_change_base <- function(data, measure) {
  bl_col <- paste0(measure, "_bl")
  cng_col <- paste0(measure, "_cng")

  result <- data

  for (id in unique(data$rid)) {
    idx <- data$rid == id
    baseline <- data[[measure]][idx & data$vis == "bl"][1]
    result[[bl_col]][idx] <- baseline
    result[[cng_col]][idx] <- data[[measure]][idx] - baseline
  }

  result
}

calculate_change_base(df, "mmse")

# A tibble: 6 × 5
  rid   vis    mmse mmse_bl mmse_cng
  <chr> <chr> <dbl>   <dbl>    <dbl>
1 001   bl       28      28        0
2 001   m06      27      28       -1
3 001   m12      25      28       -3
4 002   bl       30      30        0
5 002   m06      29      30       -1
6 002   m12      28      30       -2

0.3.1 Discussion

How it works: The [[ operator accepts character strings for column names, unlike $ which requires literal names. Writing df[["mmse"]] is equivalent to df$mmse, but the former allows df[[variable]] where variable contains the column name.

Advantages:

No dependencies beyond base R
Syntax is familiar to programmers from other languages
Explicit and easy to debug

Disadvantages:

Verbose, especially with grouped operations
Requires explicit loops for by-group calculations
Mutable state pattern (result is modified in place)
Does not integrate with dplyr pipelines
Performance degrades with many groups

When to use: Small scripts with no package dependencies, or when interfacing with code from other languages where this pattern is standard.

0.4 Approach 2: Base R with `do.call()` and `setNames()`

A more functional base R approach avoids explicit loops:

Code

calculate_change_docall <- function(data, measure) {
  bl_col <- paste0(measure, "_bl")
  cng_col <- paste0(measure, "_cng")

  baselines <- tapply(
    data[[measure]][data$vis == "bl"],
    data$rid[data$vis == "bl"],
    FUN = `[`, 1
  )

  bl_values <- baselines[data$rid]
  cng_values <- data[[measure]] - bl_values

  new_cols <- setNames(
    list(as.numeric(bl_values), as.numeric(cng_values)),
    c(bl_col, cng_col)
  )

  do.call(cbind, c(list(data), new_cols))
}

calculate_change_docall(df, "mmse")

  rid vis mmse mmse_bl mmse_cng
1 001  bl   28      28        0
2 001 m06   27      28       -1
3 001 m12   25      28       -3
4 002  bl   30      30        0
5 002 m06   29      30       -1
6 002 m12   28      30       -2

0.4.1 Discussion

How it works: setNames() creates a named list where the names come from a character vector. do.call(cbind, ...) then binds these columns to the original dataframe. The tapply() function computes grouped summaries without explicit loops.

Advantages:

No external dependencies
Functional style (no mutable state)
Vectorized operations for better performance

Disadvantages:

Dense, less readable syntax
tapply() returns an array requiring careful indexing
Type coercion issues (note the as.numeric() calls)
Returns a data.frame, not a tibble

When to use: Package development where minimising dependencies is critical, or performance-sensitive code where the overhead of dplyr is unacceptable.

0.5 Approach 3: Classic Tidyverse with `!!` and `:=`

The tidyverse introduced tidy evaluation to handle dynamic column names. Two operators are central:

:= (the “walrus” operator): Allows the left-hand side of an assignment to be evaluated
!! (bang-bang): Unquotes an expression, forcing immediate evaluation

Code

calculate_change_classic <- function(data, measure) {
  bl_col <- paste0(measure, "_bl")
  cng_col <- paste0(measure, "_cng")

  data |>
    group_by(rid) |>
    mutate(
      !!bl_col := .data[[measure]][vis == "bl"][1],
      !!cng_col := .data[[measure]] - .data[[bl_col]]
    ) |>
    ungroup()
}

calculate_change_classic(df, "mmse")

# A tibble: 6 × 5
  rid   vis    mmse mmse_bl mmse_cng
  <chr> <chr> <dbl>   <dbl>    <dbl>
1 001   bl       28      28        0
2 001   m06      27      28       -1
3 001   m12      25      28       -3
4 002   bl       30      30        0
5 002   m06      29      30       -1
6 002   m12      28      30       -2

0.5.1 Discussion

How it works: Standard mutate() syntax like mutate(new_col = value) treats new_col as a literal name. The = operator does not evaluate its left-hand side. The := operator changes this behaviour. When mutate(!!bl_col := value) is written, the !! forces bl_col to be evaluated (yielding "mmse_bl"), and := uses that string as the column name.

The .data pronoun explicitly references columns in the dataframe, avoiding ambiguity between dataframe columns and local variables.

Advantages:

Integrates seamlessly with dplyr pipelines
Grouped operations are trivial
Declarative, readable intent (once the syntax is learnt)

Disadvantages:

Unfamiliar syntax for newcomers (!! and := are not standard R)
Requires understanding tidy evaluation concepts
The rlang package must be available (loaded with dplyr)

When to use: Legacy tidyverse code (pre-dplyr 1.0), or when the full power of quasiquotation is needed for complex metaprogramming.

0.6 Approach 4: Modern Tidyverse with Glue Syntax

Starting with dplyr 1.0 (June 2020), a cleaner syntax emerged using glue-style interpolation:

Code

calculate_change_modern <- function(data, measure) {
  data |>
    group_by(rid) |>
    mutate(
      "{measure}_bl" := .data[[measure]][vis == "bl"][1],
      "{measure}_cng" := .data[[measure]] - .data[[paste0(measure, "_bl")]]
    ) |>
    ungroup()
}

calculate_change_modern(df, "mmse")

# A tibble: 6 × 5
  rid   vis    mmse mmse_bl mmse_cng
  <chr> <chr> <dbl>   <dbl>    <dbl>
1 001   bl       28      28        0
2 001   m06      27      28       -1
3 001   m12      25      28       -3
4 002   bl       30      30        0
5 002   m06      29      30       -1
6 002   m12      28      30       -2

0.6.1 Discussion

How it works: The string "{measure}_bl" is interpolated like glue::glue(), substituting the value of measure directly. This occurs at the level of the column name, with := still required to enable left-hand side evaluation.

Advantages:

Most readable of all tidyverse approaches
Reduces cognitive load (no need to remember !! semantics)
Consistent with glue syntax used elsewhere in the tidyverse

Disadvantages:

Requires dplyr >= 1.0
Still requires := (cannot use =)
Less flexible than !! for complex quasiquotation

When to use: New tidyverse code. This is the current recommended approach for dynamic column names in dplyr.

0.7 Approach 5: rlang Expression Splicing

For more explicit metaprogramming, rlang provides expr() for building expressions and !!! for splicing them into function calls:

Code

library(rlang)

calculate_change_rlang <- function(data, measure) {
  bl_col <- paste0(measure, "_bl")
  cng_col <- paste0(measure, "_cng")

  exprs <- list(
    expr(.data[[!!measure]][vis == "bl"][1]),
    expr(.data[[!!measure]] - .data[[!!bl_col]])
  )
  names(exprs) <- c(bl_col, cng_col)

  data |>
    group_by(rid) |>
    mutate(!!!exprs) |>
    ungroup()
}

calculate_change_rlang(df, "mmse")

# A tibble: 6 × 5
  rid   vis    mmse mmse_bl mmse_cng
  <chr> <chr> <dbl>   <dbl>    <dbl>
1 001   bl       28      28        0
2 001   m06      27      28       -1
3 001   m12      25      28       -3
4 002   bl       30      30        0
5 002   m06      29      30       -1
6 002   m12      28      30       -2

0.7.1 Discussion

How it works: expr() captures an expression without evaluating it, while !! inside expr() forces evaluation of specific parts. The result is a list of named expressions. The !!! (splice) operator unpacks this list into mutate(), equivalent to writing each expression as a separate argument.

Advantages:

Build expressions programmatically before evaluation
Useful when the number of columns is dynamic
Expressions can be inspected, modified, or logged before use
Clear separation between expression construction and evaluation

Disadvantages:

More verbose for simple cases
Requires understanding quasiquotation deeply
Overkill for straightforward dynamic naming

When to use: Complex metaprogramming where expressions must be built dynamically, such as generating an unknown number of columns based on input data, or when inspecting or logging expressions before evaluation is desirable.

0.8 Approach 6: data.table

The data.table package has its own := operator that predates tidyverse adoption:

Code

library(data.table)


Attaching package: 'data.table'

The following object is masked from 'package:rlang':

    :=

The following objects are masked from 'package:dplyr':

    between, first, last

Code

calculate_change_dt <- function(data, measure) {
  bl_col <- paste0(measure, "_bl")
  cng_col <- paste0(measure, "_cng")

  dt <- as.data.table(data)

  dt[, (bl_col) := .SD[[measure]][vis == "bl"][1], by = rid]
  dt[, (cng_col) := .SD[[measure]] - .SD[[bl_col]], by = rid]

  dt[]
}

calculate_change_dt(df, "mmse")

      rid    vis  mmse mmse_bl mmse_cng
   <char> <char> <num>   <num>    <num>
1:    001     bl    28      28        0
2:    001    m06    27      28       -1
3:    001    m12    25      28       -3
4:    002     bl    30      30        0
5:    002    m06    29      30       -1
6:    002    m12    28      30       -2

0.8.1 Discussion

How it works: In data.table, wrapping a variable in parentheses on the left-hand side of := forces evaluation: (bl_col) evaluates to "mmse_bl". Without parentheses, bl_col := value would create a column literally named “bl_col”. The .SD pronoun (Subset of Data) references the current group’s data.

Advantages:

Extremely fast, especially for large datasets
Memory efficient (modifies in place by reference)
Mature, battle-tested codebase
The by argument handles grouping concisely

Disadvantages:

Different mental model from base R and tidyverse
Modify-by-reference semantics can cause subtle bugs
Less readable for those unfamiliar with data.table idioms
Parentheses syntax (col) is easy to forget

When to use: Performance-critical applications, very large datasets (millions of rows), or codebases already using data.table.

0.9 Approach 7: collapse

The collapse package offers high-performance data manipulation with its own idioms:

Code

library(collapse)

collapse 2.1.6, see ?`collapse-package` or ?`collapse-documentation`


Attaching package: 'collapse'

The following object is masked from 'package:data.table':

    fdroplevels

The following object is masked from 'package:stats':

    D

Code

calculate_change_collapse <- function(data, measure) {
  bl_col <- paste0(measure, "_bl")
  cng_col <- paste0(measure, "_cng")

  bl_data <- fsubset(data, vis == "bl")
  bl_values <- get_vars(bl_data, c("rid", measure))
  names(bl_values)[2] <- bl_col

  result <- join(data, bl_values, on = "rid", how = "left")
  result[[cng_col]] <- result[[measure]] - result[[bl_col]]

  result
}

calculate_change_collapse(df, "mmse")

left join: data[rid] 6/6 (100%) <3:1st> bl_values[rid] 2/2 (100%)

# A tibble: 6 × 5
  rid   vis    mmse mmse_bl mmse_cng
  <chr> <chr> <dbl>   <dbl>    <dbl>
1 001   bl       28      28        0
2 001   m06      27      28       -1
3 001   m12      25      28       -3
4 002   bl       30      30        0
5 002   m06      29      30       -1
6 002   m12      28      30       -2

0.9.1 Discussion

How it works: collapse provides fast versions of common operations (fsubset, get_vars, join, etc.). Unlike dplyr, collapse does not use tidy evaluation, so dynamic column access requires base R syntax like get_vars() with character vectors and direct [[ assignment. The approach above uses a join strategy rather than grouped mutation.

Advantages:

Extremely fast (often faster than data.table for certain operations)
Low memory footprint
Works with both data.frames and data.tables
Good for time series and panel data

Disadvantages:

Smaller user community than tidyverse or data.table
Inconsistent tidy evaluation support across functions
Often requires mixing collapse functions with base R syntax
Less comprehensive documentation

When to use: Performance-critical code, especially time series or panel data analysis. Also useful when speed is needed but data.table’s reference semantics are to be avoided.

0.10 Comparison Summary

Approach	Dependencies	Readability	Performance	Best For
Base R `[[`	None	Moderate	Low	No-dependency scripts
Base R `do.call`	None	Low	Moderate	Package development
Classic `!!` `:=`	dplyr	Low	Moderate	Legacy tidyverse code
Modern glue	dplyr >= 1.0	High	Moderate	New tidyverse code
rlang `!!!` splice	rlang	Moderate	Moderate	Complex metaprogramming
data.table	data.table	Moderate	High	Large datasets
collapse	collapse	Moderate	Very High	Performance-critical work

0.11 Practical Example: Multiple Measures

The real power emerges when processing multiple measures:

Code

df_multi <- tibble(
  rid = rep(c("001", "002"), each = 3),
  vis = rep(c("bl", "m06", "m12"), 2),
  mmse = c(28, 27, 25, 30, 29, 28),
  adas = c(12, 14, 18, 10, 12, 15),
  cdr = c(0.5, 0.5, 1.0, 0.5, 0.5, 0.5)
)

calculate_all_changes <- function(data, measures) {
  result <- data |> group_by(rid)

  for (measure in measures) {
    result <- result |>
      mutate(
        "{measure}_bl" := .data[[measure]][vis == "bl"][1],
        "{measure}_cng" := .data[[measure]] - .data[[paste0(measure, "_bl")]]
      )
  }

  result |> ungroup()
}

calculate_all_changes(df_multi, c("mmse", "adas", "cdr"))

# A tibble: 6 × 11
  rid   vis    mmse  adas   cdr mmse_bl mmse_cng adas_bl adas_cng cdr_bl cdr_cng
  <chr> <chr> <dbl> <dbl> <dbl>   <dbl>    <dbl>   <dbl>    <dbl>  <dbl>   <dbl>
1 001   bl       28    12   0.5      28        0      12        0    0.5     0  
2 001   m06      27    14   0.5      28       -1      12        2    0.5     0  
3 001   m12      25    18   1        28       -3      12        6    0.5     0.5
4 002   bl       30    10   0.5      30        0      10        0    0.5     0  
5 002   m06      29    12   0.5      30       -1      10        2    0.5     0  
6 002   m12      28    15   0.5      30       -2      10        5    0.5     0

0.12 Common Pitfalls

0.12.1 1. Forgetting := on the LHS (tidyverse)

# Wrong: creates column literally named "{measure}_bl"
mutate("{measure}_bl" = value)

# Correct: evaluates the string
mutate("{measure}_bl" := value)

0.12.2 2. Forgetting parentheses (data.table)

# Wrong: creates column named "bl_col"
dt[, bl_col := value]

# Correct: evaluates bl_col to get "mmse_bl"
dt[, (bl_col) := value]

0.12.3 3. Confusing column references

# Ambiguous: is 'measure' a column or variable?
mutate("{measure}_bl" := measure[vis == "bl"][1])

# Clear: .data[[measure]] references the column
mutate("{measure}_bl" := .data[[measure]][vis == "bl"][1])

0.12.4 4. Reference semantics in data.table

dt <- as.data.table(df)
dt2 <- dt  # This is NOT a copy!
dt2[, new_col := 1]  # Modifies both dt and dt2

# Safe copy:
dt2 <- copy(dt)

1 What Did We Learn?

1.1 Lessons Learnt

:= enables dynamic LHS: In both tidyverse and data.table, this operator allows the left side of an assignment to be evaluated
!! is the unquote operator: Forces immediate evaluation of an expression in tidy eval contexts
Glue syntax is cleaner: "{var}_suffix" is more readable than !!paste0(var, "_suffix")
Parentheses matter in data.table: (col) evaluates, col is literal
.data removes ambiguity: Always use .data[[col]] for column references in functions
Choose based on context: Readability (glue) vs performance (data.table) vs no dependencies (base R)

1.3 Reproducibility

Code

sessionInfo()

R version 4.5.3 (2026-03-11)
Platform: aarch64-apple-darwin25.3.0
Running under: macOS Tahoe 26.5

Matrix products: default
BLAS:   /opt/homebrew/Cellar/openblas/0.3.32/lib/libopenblasp-r0.3.32.dylib 
LAPACK: /opt/homebrew/Cellar/r/4.5.3/lib/R/lib/libRlapack.dylib;  LAPACK version 3.12.1

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Los_Angeles
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] collapse_2.1.6    data.table_1.17.8 rlang_1.2.0       dplyr_1.2.1      

loaded via a namespace (and not attached):
 [1] digest_0.6.37     utf8_1.2.6        R6_2.6.1          fastmap_1.2.0    
 [5] tidyselect_1.2.1  xfun_0.56         magrittr_2.0.5    glue_1.8.0       
 [9] tibble_3.3.1      knitr_1.50        parallel_4.5.3    pkgconfig_2.0.3  
[13] htmltools_0.5.8.1 generics_0.1.4    rmarkdown_2.29    lifecycle_1.0.5  
[17] cli_3.6.6         vctrs_0.7.3       compiler_4.5.3    tools_4.5.3      
[21] pillar_1.11.1     evaluate_1.0.5    Rcpp_1.1.0        yaml_2.3.10      
[25] jsonlite_2.0.0    htmlwidgets_1.6.4

2 Let’s Connect

Have questions, suggestions, or spot an error? Let me know.

GitHub: rgt47
Twitter/X: @rgt47
LinkedIn: Ronald Glenn Thomas
Email: Contact form

I would enjoy hearing from readers who:

Spot an error or a better approach to any of the code in this post.
Have suggestions for topics to cover.
Want to discuss R programming, data science, or reproducible research.
Have questions about anything in this tutorial.
Simply want to say hello and connect.

0.1 Introduction

0.2 The Problem

0.3 Approach 1: Base R with [[ Assignment

0.3.1 Discussion

0.4 Approach 2: Base R with do.call() and setNames()

0.4.1 Discussion

0.5 Approach 3: Classic Tidyverse with !! and :=

0.5.1 Discussion

0.6 Approach 4: Modern Tidyverse with Glue Syntax

0.6.1 Discussion

0.7 Approach 5: rlang Expression Splicing

0.7.1 Discussion

0.8 Approach 6: data.table

0.8.1 Discussion

0.9 Approach 7: collapse

0.9.1 Discussion

0.10 Comparison Summary

0.11 Practical Example: Multiple Measures

0.12 Common Pitfalls

0.12.1 1. Forgetting := on the LHS (tidyverse)

0.12.2 2. Forgetting parentheses (data.table)

0.12.3 3. Confusing column references

0.12.4 4. Reference semantics in data.table

1 What Did We Learn?

1.1 Lessons Learnt

1.2 Related posts in this cluster

1.3 Reproducibility

2 Let’s Connect

0.3 Approach 1: Base R with `[[` Assignment

0.4 Approach 2: Base R with `do.call()` and `setNames()`

0.5 Approach 3: Classic Tidyverse with `!!` and `:=`