focusonr
  • Home
  • Blog
  • rgtlab.org
Categories
All (44)
aws (4)
clinical-trials (2)
cross-validation (1)
data-cleaning (2)
data-visualization (4)
docker (4)
git (7)
javascript (2)
julia (1)
linux (2)
lua (1)
machine-learning (2)
macos (5)
metaprogramming (2)
model-selection (2)
neovim (2)
package-development (2)
python (4)
quarto (5)
r (28)
random-forest (2)
regression (5)
reproducibility (14)
rmarkdown (3)
shell (10)
shiny (5)
teaching (1)
testing (1)
vim (4)
zzcollab (4)

focusonr

Notes on R, statistical computing, and applied biostatistics

NoteThe lab now lives at rgtlab.org

The Thomas Lab activities (curriculum, publications, software, teaching) have moved to rgtlab.org. This site retains the blog only.

Welcome to focusonr, a working blog on R, statistical computing, and applied biostatistics by Ronald ‘Ryy’ G. Thomas. Recent posts below.

Install Linux Mint on a MacBook Air

linux
macos
r
python

A practical guide to installing Linux Mint 22 on a 2016 MacBook Air, transforming aging Apple hardware into a functional data science workstation.

Apr 30, 2026

Set Names to Lowercase for Multiple Dataframes in R

r
data-cleaning

A systematic approach to programmatically finding every dataframe in an R environment and standardising column names using eapply() and list2env().

Apr 30, 2026

Setting Up Multi-Language Quarto Documents on macOS

quarto
r
python
julia

A practical guide to the configuration plumbing required to render a multi-language Quarto document from scratch on macOS.

Apr 30, 2026

Functional Plot Generation with purrr

r
data-visualization

I did not really know how to programmatically generate multiple plots from grouped data until I discovered purrr’s map2 and pmap functions – this post walks through the approach step by step using Palmer Penguins.

Apr 30, 2026

Rapid Conversion of Draft R Scripts to Formal Rmd Reports

r
rmarkdown
reproducibility

I did not really know how to quickly convert a working R script into a presentable report until I discovered knitr::spin() and a few supporting workflows that changed how I share analytical results.

Apr 30, 2026

Refactoring a Personal Toolbox: Scripts versus Shell Functions

shell

Personal toolboxes accumulate helpers across years of small fixes: some end up as shell functions in ‘.zshrc’, some as scripts in ‘~/bin’, often with no consistent rule for which goes where. A principled split (function only when shell state must change, versioned script otherwise) removes hundreds of lines of logic from the average dotfile, makes every helper shellcheck-able, and reduces shell startup time.

Apr 25, 2026

Building a statistical computing textbook in the Age of AI

quarto
teaching
reproducibility

I did not really appreciate how much structural decision-making goes into a textbook until I tried to draft two at once, both under the ‘in the Age of AI’ framing.

Apr 24, 2026

A pocket terminal for your Linux laptop with ttyd and Tailscale

linux
shell

A reproducible configuration for browser-based terminal access to a Linux laptop from a mobile device, using ttyd for the terminal emulation layer and Tailscale for authenticated network transport, with no public-internet exposure.

Apr 15, 2026

Configure the Command Line for Data Science Development

shell
macos
git
neovim

A systematic approach to configuring command-line tools for data science workflows, covering terminal emulation, shell configuration, version control, and readline settings.

Feb 16, 2026

Dynamic Column Names in R: Seven Approaches Compared

r
metaprogramming

A comprehensive guide to adding columns with programmatically-generated names in R dataframes. Covers base R, tidyverse (classic and modern), data.table, collapse, rlang::inject(), and do.call patterns.

Feb 16, 2026

Updating an R Package: A Complete Development Workflow

r
package-development
git

I did not really understand the full lifecycle of modifying an R package until I had to push a feature branch through CI/CD and watch it either pass or fail on three operating systems. This post walks through the entire process.

Feb 11, 2026

Setting Up a Comprehensive Research Backup System on macOS

shell
git
macos
reproducibility

I didn’t really know how fragile my research files were until a colleague lost months of work to a corrupted hard drive. This post walks through the three-tier backup system I built to protect 300+ Git repositories across 20GB of research data.

Feb 11, 2026

Setting Up a Virtual Server on AWS EC2 Console

aws
shiny
docker

I did not really know how cloud servers worked until I needed to host a Shiny app for collaborators and had to figure out AWS EC2 from scratch.

Feb 11, 2026

Creating a GitHub Dotfiles Repository

git
shell

I did not really know how much time I was wasting reconfiguring every new machine until I set up a dotfiles repository on GitHub and watched a four-hour ritual collapse into a single command.

Feb 11, 2026

Setting Up Git for Data Science Workflows

git
reproducibility

I did not really know how much time I was wasting without proper version control until a single overwritten script cost me three days of work.

Feb 11, 2026

Setting Up Neovim as a Data Science IDE

neovim
vim
r
python

I did not really know how much faster code editing could be until I switched from a mouse-driven IDE to Neovim’s modal, keyboard-centric workflow.

Feb 11, 2026

Reproducible Blog Posts with ZZCOLLAB: A Quarto Workflow

quarto
r
docker
reproducibility
zzcollab

I did not really appreciate how fragile technical blog posts are until one of my own stopped rendering six months after publication. This post documents the workflow I built to treat each blog post as a standalone ZZCOLLAB reproducible research project with Docker, renv, and CI/CD.

Feb 10, 2026

The Pipe Equivalence Myth: When f() |> g() Is Not the Same as g(f())

r
metaprogramming

Piping and nesting function calls are semantically different operations. A subtle bug in an expression- capturing wrapper reveals how R’s lazy evaluation interacts with the pipe operator.

Jan 31, 2026

Running ZZedc Independently for Clinical Research Data Management

clinical-trials
shiny
aws
reproducibility

I did not really know how achievable investigator independence in clinical data management was until I deployed ZZedc on a personal AWS instance and ran a pilot study without vendor involvement.

Dec 7, 2025

Archiving 400 GitHub Repos Locally

git
shell
reproducibility

A systematic approach to archiving hundreds of private GitHub repositories locally with verified backups, metadata exports, and selective deletion.

Dec 2, 2025

From Markdown to Blog Post: A ZZCOLLAB Conversion Workflow

quarto
zzcollab
reproducibility

A systematic workflow for converting standalone markdown documentation into professional blog posts using ZZCOLLAB symlinks and Quarto metadata.

Dec 2, 2025

Combining Observable JS and Shiny in a Single Quarto Document

quarto
shiny
javascript
r
data-visualization

I did not really know how difficult it would be to combine Observable JS and Shiny in one Quarto document until every data-loading approach I tried failed except fetching from a public URL.

Dec 1, 2025

Testing Data Analysis Workflows in R

r
testing
reproducibility

I did not really know how to systematically test a data analysis pipeline until I applied software engineering practices from testthat and assertr to my own research workflow.

Jul 25, 2025

Embedding Python Code in UltiSnips Snippets

vim
python

I did not really know that UltiSnips could execute Python code at expansion time until I needed a snippet that generated a variable number of elements based on user input.

Jul 20, 2025

Controlling Table Placement in R Markdown PDF Reports

r
rmarkdown

I did not really know how to control where tables appeared in my R Markdown PDF reports until I stopped fighting LaTeX’s float algorithm and learnt to work with it instead.

Jul 15, 2025

Writing a Simple R Package Using S3 Methods

r
package-development

I did not really know how S3 method dispatch worked in R until I tried to build a Table 1 function that handled numeric and categorical variables differently without writing separate code paths.

Jul 10, 2025

Sharing Shiny Applications Reproducibly with Docker

r
shiny
docker
reproducibility

I did not really know how unreliable it was to share a Shiny app as a collection of R scripts until my colleague spent an afternoon installing packages before he could even launch the application.

Jun 28, 2025

Sharing R Markdown Code Reproducibly with Docker

r
docker
rmarkdown
reproducibility

I did not really know how fragile sharing R code could be until my colleague spent an afternoon debugging missing packages, and I realised Docker could have prevented every single error.

Jun 25, 2025

Configuring Yabai as a Tiling Window Manager on macOS

macos
shell

I did not really know how much faster a tiling window manager could make my daily workflow until I configured yabai with keyboard shortcuts and stopped reaching for the mouse.

Jun 20, 2025

Setting Up R, Vimtex, and UltiSnips in Vim

vim
r

I did not really know how to connect Vim, R, and LaTeX into a single productive workflow until I spent a weekend configuring vimtex and UltiSnips from scratch.

Jun 15, 2025

Writing a Simple Vim Plugin for REPL Interaction

vim
r

I did not really know how Vim’s terminal API worked until I wrote a small plugin that sends code from an editing buffer to a running R session.

May 20, 2025

Prototyping a Shiny App with ChatGPT

r
shiny

I did not really know how effective ChatGPT could be as a prototyping partner until I iteratively built a modular Shiny app for Palmer Penguin exploration in three prompts.

May 15, 2025

A Mac Workflow for Tracking Daily Research Progress

shell
git
macos

I did not really know how to keep a consistent daily research log until I combined macOS dictation with ChatGPT summarization and a few short bash scripts. This post walks through the entire workflow from folder structure to searchable, version-controlled notes.

Apr 12, 2025

Clinical Trial Data Validation Across Languages and Tools

clinical-trials
data-cleaning
r
lua
javascript

I did not really understand how many layers of validation sit between a clinical trial data entry form and a reliable analysis dataset until I started mapping out EDC edit checks, open-source tools, and the possibility of generating JavaScript validation from a simple spreadsheet processed by Lua.

Jan 15, 2025

Palmer Penguins Part 5: Random Forest versus Linear Models

r
random-forest
machine-learning
model-selection

A head-to-head comparison reveals that a random forest outperforms the linear model by only two percentage points, prompting reflection on the interpretability-performance tradeoff.

Jan 5, 2025

Palmer Penguins Part 4: Model Diagnostics and Interpretation

r
regression
model-selection

Verifying regression model trustworthiness through systematic diagnostic checks on linearity, normality, and influence.

Jan 4, 2025

Palmer Penguins Part 3: Cross-Validation and Model Comparison

r
cross-validation
random-forest
machine-learning

Testing whether regression models hold up on new data through ten-fold cross-validation and comparison against a random forest.

Jan 3, 2025

Palmer Penguins Part 2: Multiple Regression and Species Effects

r
regression

Adding species identity to the body mass prediction model causes R-squared to jump from 0.76 to 0.86, demonstrating the power of biological groupings in multiple regression.

Jan 2, 2025

Palmer Penguins Part 1: Exploratory Data Analysis and Simple Regression

r
regression
data-visualization

An exploration of how a simple flipper measurement can reveal substantial information about penguin body mass through the Palmer Penguins dataset and simple linear regression.

Jan 1, 2025

Predictive Modeling of Penguin Body Mass

r
regression
data-visualization

Regression models on the Palmer Penguins dataset reveal how much species identity shapes morphometric relationships. Simpson’s Paradox emerges clearly, reversing an apparent correlation once species is accounted for.

Jan 1, 2025

Palmer Penguins Part 1: Exploratory Data Analysis and Simple Regression

r
regression
zzcollab
reproducibility

Exploring the Palmer Penguins dataset reveals how much a single morphometric measurement can predict about penguin body mass. A simple regression model provides a strong baseline for subsequent analyses.

Jan 1, 2025

Constructing a reproducible blog post using zzcollab tools

r
zzcollab
reproducibility

I didn’t really know much about [topic] until I tried to [implement/understand] it myself. Here’s what I learned along the way.

Jan 1, 2025

Setup Post Template (worked example: AWS CLI provisioning)

aws
shell
reproducibility

I did not really appreciate how tedious it was to click through the EC2 console until I automated the whole process with four short bash scripts and the AWS CLI.

Jan 1, 2025

Launching AWS EC2 Instances with Bash Scripts and the AWS CLI

aws
shell

I did not appreciate how tedious it was to click through the EC2 console until I automated the whole process with four short bash scripts and the AWS CLI.

Apr 12, 2023
No matching items

    Copyright 2023-2026, Ronald ‘Ryy’ G. Thomas. The lab’s other activities live at rgtlab.org.