focusonr
  • Home
  • Blog
  • rgtlab.org
Categories
All (48)
ai (1)
aws (3)
checklist (1)
ci (2)
claude (1)
cli (1)
clinical-trials (2)
cloud (1)
cross-validation (1)
data-cleaning (1)
data-visualization (4)
docker (4)
dotfiles (3)
dropbox (1)
editing (1)
git (5)
github-actions (2)
gpg (3)
javascript (2)
julia (1)
linux (3)
llm (1)
lua (1)
machine-learning (2)
macos (7)
metaprogramming (2)
migration (1)
model-selection (2)
neovim (1)
obs (1)
package-development (1)
packaging (1)
publishing (1)
python (4)
quarto (6)
r (26)
r-package (1)
random-forest (2)
regression (4)
renv (1)
reproducibility (17)
rmarkdown (3)
screencast (1)
secrets (1)
security (3)
setup (9)
shell (10)
shiny (5)
sync (1)
teaching (1)
testing (2)
testthat (1)
tinytest (1)
unix (2)
vim (4)
workflow (7)
youtube (1)
zsh (1)
zzcollab (7)

focusonr

Notes on R, statistical computing, and applied biostatistics

The lab now lives at rgtlab.org

The Thomas Lab activities (curriculum, publications, software, teaching) have moved to rgtlab.org. This site retains the blog only.

Welcome to focusonr, a working blog on R, statistical computing, and applied biostatistics by Ronald ‘Ryy’ G. Thomas. Recent posts below.

Install Linux Mint on a MacBook Air

linux
macos
r
python
A practical guide to installing Linux Mint 22 on a 2016 MacBook Air, transforming aging Apple hardware into a functional data science workstation.
May 29, 2026

Functional Plot Generation with purrr

r
data-visualization
I did not really know how to programmatically generate multiple plots from grouped data until I discovered purrr’s map2 and pmap functions – this post walks through the approach step by step using Palmer Penguins.
May 29, 2026

Setting Up Multi-Language Quarto Documents on macOS

quarto
r
python
julia
A practical guide to the configuration plumbing required to render a multi-language Quarto document from scratch on macOS.
May 29, 2026

Rapid Conversion of Draft R Scripts to Formal Rmd Reports

r
rmarkdown
reproducibility
I did not really know how to quickly convert a working R script into a presentable report until I discovered knitr::spin() and a few supporting workflows that changed how I share analytical results.
May 29, 2026

Seven Required Elements for a zzc Manuscript report.Rmd

zzcollab
rmarkdown
reproducibility
workflow
publishing
Every zzcollab manuscript compendium needs seven specific YAML and structural elements in its report.Rmd to produce a correctly formatted, stamped, and staged PDF. Missing any one of them is easy to do and surprisingly hard to diagnose later.
May 25, 2026

GitHub Actions workflows for zzcollab research compendia

ci
github-actions
zzcollab
reproducibility
r-package
Not all zzcollab repos need the same CI. A manuscript-package compendium requires both an R CMD check workflow and a report-rendering workflow; a data-analysis compendium needs only the latter; a blog-post compendium needs neither. This post maps the three workspace types to their correct workflow combinations and walks through each YAML file line by line.
May 18, 2026

Sharing R Code via Docker: R Markdown Reports and Shiny Applications

r
docker
rmarkdown
shiny
reproducibility
I did not really know how fragile sharing R code could be until my colleague spent an afternoon debugging missing packages, and I realised Docker could have prevented every single error – whether the output was a static PDF or a live Shiny app.
May 17, 2026

Unix Command-Line Workspace Setup for Data Science Researchers

shell
macos
linux
dotfiles
git
workflow
setup
A new researcher arrives with two macOS laptops and a Linux Mint workstation. This post builds the terminal layer, the shell layer, and the dotfiles repository that keeps all three machines consistent from day one.
May 17, 2026

Multi-Laptop macOS Bootstrap: Migrating Dotfiles to a Versioned Git Repository

dotfiles
shell
macos
setup
reproducibility
I did not appreciate how fragile my development environment was until I tried to set up a second MacBook and discovered my entire shell configuration had no git history, no rollback, and no deployment mechanism.
May 17, 2026

Extending the R-Vim Workflow: LaTeX Integration and Dynamic Snippets

vim
r
python
A complete configuration guide for the Vim-based R and LaTeX workflow: vimtex for LaTeX compilation, ALE for linting, UltiSnips for static snippet expansion, and UltiSnips Python interpolation for dynamic, parametric snippets.
May 17, 2026

Research Backup Architecture: Ongoing System and GitHub Archival

git
shell
macos
reproducibility
A unified treatment of research backup architecture: the three-tier ongoing system (automated Git pushes, cloud sync, and Time Machine) and the bulk GitHub archival procedure for migrating 400+ private repositories to local storage with verified backups and selective deletion.
May 17, 2026

Setting Up pass: a Unix Password Manager

unix
shell
gpg
security
setup
I did not appreciate how fragile browser-based password storage was until I migrated 600 credentials to an encrypted, version-controlled store in a single afternoon.
May 17, 2026

Secrets Management for the Workflow Construct

setup
security
secrets
gpg
unix
cloud
aws
An open-source secrets-management scheme for the Workflow Construct. The post documents the installation and configuration of pass (the Unix password store), the AWS IAM Identity Center workflow that replaces plaintext credentials, and the secret-injection patterns that keep API keys and tokens out of dotfiles, shell history, and the working tree.
May 17, 2026

Security Foundations for a Multi-Laptop Research Cluster

security
gpg
dotfiles
macos
setup
Posts 64-66 establish a portable dotfiles repository, a multi-laptop sync architecture, and a GPG-encrypted credential store. This post asks what that infrastructure is worth if the threat model is wrong or the controls are incomplete.
May 17, 2026

Provisioning AWS EC2 Instances: Console and CLI Methods

aws
shell
shiny
docker
Setting up an AWS EC2 instance from scratch, with two parallel paths: a console walkthrough for understanding the components, and four bash scripts for repeatable CLI automation.
May 17, 2026

Migrating Off Dropbox: Beyond Dotfiles

workflow
sync
migration
dropbox
Post 24 establishes a portable dotfiles repository. This post extends the same goal to the rest of a single-user research workflow: project content, backup-pipeline source paths, and the append-only history files that Dropbox handles particularly badly. Frames the problem as three layers and walks through trade-offs for each.
May 7, 2026

A tiered CI strategy for zzcollab research compendia

ci
github-actions
reproducibility
zzcollab
renv
I did not realise how much my CI was lying to me until I added a single explicit failure check and watched five ‘passing’ projects turn honestly red. A workflow-style migration guide for zzcollab projects across four workspace types.
May 6, 2026

From testthat to tinytest: Converting an R Package Test Suite

r
testing
tinytest
testthat
packaging
I did not really appreciate how much ceremony testthat carries until I rewrote a small package’s suite in tinytest and lost roughly two-thirds of the lines without losing any coverage.
May 2, 2026

Setting up OBS for Live R Coding Screencasts

setup
obs
youtube
screencast
r
reproducibility
A reproducible workflow for producing short, focused screencasts of R data analysis using OBS Studio and YouTube. Includes installation, scene configuration, recording, editing, and a worked five-minute example based on the Palmer Penguins dataset.
May 2, 2026

A 55-Item Initiation Checklist for zzcollab Data Analyses

r
zzcollab
reproducibility
workflow
checklist
quarto
A clinical research collaborator emails a single CSV file with twelve columns, two hundred and eighty-seven rows, no codebook, and a promise of an updated extract ‘next week’. We walk through the fifty-five items that move that attachment from the inbox to a reproducible zzcollab compendium ready for archival.
Apr 29, 2026

LLM-Augmented Editing for the Workflow Construct

setup
llm
ai
claude
editing
workflow
An open-architecture pattern for LLM-augmented editing scoped to an applied biostatistician’s daily work. The post documents Claude Code at the shell as the primary integration, distinguishing it from in-buffer plugins, graphical AI editors, and browser-based prompting. The CLAUDE.md per-project convention is presented as the load-bearing artefact that makes the integration usable rather than novel.
Apr 28, 2026

Modern CLI Replacements for the Shell Layer

setup
shell
cli
workflow
Eight Rust-implemented utilities (ripgrep, fd, bat, eza, zoxide, delta, lazygit, fzf) that compose into a Shell-layer extension of the Workflow Construct. The post documents the installation, configuration, daily-use substitutions, and failure modes for each, plus the autojump-to-zoxide migration that preserves the j muscle memory while upgrading the underlying frecency database.
Apr 27, 2026

A Workflow Construct for the Modern Data Scientist

setup
workflow
zsh
vim
docker
reproducibility
A reproducible reference for the laptop-scale workflow construct used in day-to-day data science and biostatistics work. The post documents the existing layers (hardware, operating systems, file system, shell, editor, scripts, applications, cloud, backup, R packages) drawn from a working notebook outline, then proposes a set of evidence-graded extensions appropriate to a 2026 polyglot practice (LLM-augmented editing, modern CLI replacements, secret scanning, secure remote access, polyglot reproducibility, and knowledge management).
Apr 26, 2026

Refactoring a Personal Toolbox: Scripts versus Shell Functions

shell
Personal toolboxes accumulate helpers across years of small fixes: some end up as shell functions in ‘.zshrc’, some as scripts in ‘~/bin’, often with no consistent rule for which goes where. A principled split (function only when shell state must change, versioned script otherwise) removes hundreds of lines of logic from the average dotfile, makes every helper shellcheck-able, and reduces shell startup time.
Apr 25, 2026

Building a statistical computing textbook in the Age of AI

quarto
teaching
reproducibility
I did not really appreciate how much structural decision-making goes into a textbook until I tried to draft two at once, both under the ‘in the Age of AI’ framing.
Apr 24, 2026

A pocket terminal for your Linux laptop with ttyd and Tailscale

linux
shell
A reproducible configuration for browser-based terminal access to a Linux laptop from a mobile device, using ttyd for the terminal emulation layer and Tailscale for authenticated network transport, with no public-internet exposure.
Apr 15, 2026

Dynamic Column Names in R: Seven Approaches Compared

r
metaprogramming
A comprehensive guide to adding columns with programmatically-generated names in R dataframes. Covers base R, tidyverse (classic and modern), data.table, collapse, rlang::inject(), and do.call patterns.
Feb 16, 2026

Setting Up Git for Data Science Workflows

git
reproducibility
I did not really know how much time I was wasting without proper version control until a single overwritten script cost me three days of work.
Feb 11, 2026

Setting Up Neovim as a Data Science IDE

neovim
vim
r
python
I did not really know how much faster code editing could be until I switched from a mouse-driven IDE to Neovim’s modal, keyboard-centric workflow.
Feb 11, 2026

Updating an R Package: A Complete Development Workflow

r
package-development
git
I did not really understand the full lifecycle of modifying an R package until I had to push a feature branch through CI/CD and watch it either pass or fail on three operating systems. This post walks through the entire process.
Feb 11, 2026

Reproducible Blog Posts with ZZCOLLAB: A Quarto Workflow

quarto
r
docker
reproducibility
zzcollab
I did not really appreciate how fragile technical blog posts are until one of my own stopped rendering six months after publication. This post documents the workflow I built to treat each blog post as a standalone ZZCOLLAB reproducible research project with Docker, renv, and CI/CD.
Feb 10, 2026

The Pipe Equivalence Myth: When f() |> g() Is Not the Same as g(f())

r
metaprogramming
Piping and nesting function calls are semantically different operations. A subtle bug in an expression- capturing wrapper reveals how R’s lazy evaluation interacts with the pipe operator.
Jan 31, 2026

Running ZZedc Independently for Clinical Research Data Management

clinical-trials
shiny
aws
reproducibility
I did not really know how achievable investigator independence in clinical data management was until I deployed ZZedc on a personal AWS instance and ran a pilot study without vendor involvement.
Dec 7, 2025

From Markdown to Blog Post: A ZZCOLLAB Conversion Workflow

quarto
zzcollab
reproducibility
A systematic workflow for converting standalone markdown documentation into professional blog posts using ZZCOLLAB symlinks and Quarto metadata.
Dec 2, 2025

Combining Observable JS and Shiny in a Single Quarto Document

quarto
shiny
javascript
r
data-visualization
I did not really know how difficult it would be to combine Observable JS and Shiny in one Quarto document until every data-loading approach I tried failed except fetching from a public URL.
Dec 1, 2025

Testing Data Analysis Workflows in R

r
testing
reproducibility
I did not really know how to systematically test a data analysis pipeline until I applied software engineering practices from testthat and assertr to my own research workflow.
Jul 25, 2025

Configuring Yabai as a Tiling Window Manager on macOS

macos
shell
I did not really know how much faster a tiling window manager could make my daily workflow until I configured yabai with keyboard shortcuts and stopped reaching for the mouse.
Jun 20, 2025

Writing a Simple Vim Plugin for REPL Interaction

vim
r
I did not really know how Vim’s terminal API worked until I wrote a small plugin that sends code from an editing buffer to a running R session.
May 20, 2025

Prototyping a Shiny App with ChatGPT

r
shiny
I did not really know how effective ChatGPT could be as a prototyping partner until I iteratively built a modular Shiny app for Palmer Penguin exploration in three prompts.
May 15, 2025

A Mac Workflow for Tracking Daily Research Progress

shell
git
macos
I did not really know how to keep a consistent daily research log until I combined macOS dictation with ChatGPT summarization and a few short bash scripts. This post walks through the entire workflow from folder structure to searchable, version-controlled notes.
Apr 12, 2025

Clinical Trial Data Validation Across Languages and Tools

clinical-trials
data-cleaning
r
lua
javascript
I did not really understand how many layers of validation sit between a clinical trial data entry form and a reliable analysis dataset until I started mapping out EDC edit checks, open-source tools, and the possibility of generating JavaScript validation from a simple spreadsheet processed by Lua.
Jan 15, 2025

Palmer Penguins Part 5: Random Forest versus Linear Models

r
random-forest
machine-learning
model-selection
A head-to-head comparison reveals that a random forest outperforms the linear model by only two percentage points, prompting reflection on the interpretability-performance tradeoff.
Jan 5, 2025

Palmer Penguins Part 4: Model Diagnostics and Interpretation

r
regression
model-selection
Verifying regression model trustworthiness through systematic diagnostic checks on linearity, normality, and influence.
Jan 4, 2025

Palmer Penguins Part 3: Cross-Validation and Model Comparison

r
cross-validation
random-forest
machine-learning
Testing whether regression models hold up on new data through ten-fold cross-validation and comparison against a random forest.
Jan 3, 2025

Palmer Penguins Part 2: Multiple Regression and Species Effects

r
regression
Adding species identity to the body mass prediction model causes R-squared to jump from 0.76 to 0.86, demonstrating the power of biological groupings in multiple regression.
Jan 2, 2025

Constructing a reproducible blog post using zzcollab tools

r
zzcollab
reproducibility
I didn’t really know much about [topic] until I tried to [implement/understand] it myself. Here’s what I learned along the way.
Jan 1, 2025

Palmer Penguins Part 1: Exploratory Data Analysis and Simple Regression

r
regression
data-visualization
An exploration of how a simple flipper measurement can reveal substantial information about penguin body mass through the Palmer Penguins dataset and simple linear regression.
Jan 1, 2025

Predictive Modeling of Penguin Body Mass

r
regression
data-visualization
Regression models on the Palmer Penguins dataset reveal how much species identity shapes morphometric relationships. Simpson’s Paradox emerges clearly, reversing an apparent correlation once species is accounted for.
Jan 1, 2025
No matching items

    Copyright 2023-2026, Ronald ‘Ryy’ G. Thomas. The lab’s other activities live at rgtlab.org.