Thomas Lab
  • Home
  • About
  • Blog
  • White Papers
  • Research
  • Teaching
  • Misc

Table of contents

  • 1 Introduction
  • 2 Setup and Model Preparation
  • 3 Classical Regression Assumptions
    • 3.1 1. Linearity
    • 3.2 2. Independence of Residuals
    • 3.3 3. Homoscedasticity (Constant Variance)
    • 3.4 4. Normality of Residuals
  • 4 Influence Diagnostics
    • 4.1 Leverage, Outliers, and Influential Points
    • 4.2 Influence Plot
  • 5 Model Performance Diagnostics
  • 6 Biological Interpretation of Coefficients
    • 6.1 Coefficient Analysis
    • 6.2 Effect Size Visualization
  • 7 Prediction Intervals and Uncertainty
    • 7.1 Creating Prediction Intervals
  • 8 Model Adequacy Summary
  • 9 Reporting Guidelines for Ecological Research
    • 9.1 Model Summary for Publication
  • 10 Looking Ahead to Part 5
  • 11 Reproducibility Information

Palmer Penguins Data Analysis Series (Part 4): Model Diagnostics and Interpretation

Ensuring our models meet assumptions and understanding what they really tell us

R Programming
Data Science
Statistical Computing
Model Diagnostics
Regression Analysis
Palmer Penguins
Part 4 of our 5-part series focusing on rigorous diagnostic procedures, assumption checking, and biological interpretation of our penguin models
Author

Your Name

Published

January 4, 2025

PDF Version Available
This article can be generated as a PDF for offline reading or printing.
To generate PDF: Run this command in your terminal:
./generate_pdf.sh posts/palmer_penguins_part4/index.qmd

A penguin scientist with a magnifying glass, carefully examining model diagnostics and residual plots!
🐧 Palmer Penguins Data Analysis Series

This is Part 4 of a 5-part series exploring penguin morphometrics:

  1. Part 1: EDA and Simple Regression
  2. Part 2: Multiple Regression and Species Effects
  3. Part 3: Advanced Models and Cross-Validation
  4. Part 4: Model Diagnostics and Interpretation (This post)
  5. Part 5: Random Forest vs Linear Models

1 Introduction

Welcome to the fourth chapter of our Palmer penguins journey! In Part 3, we validated our models through rigorous cross-validation and confirmed that our species-aware linear model offers excellent predictive performance. But excellent performance doesn’t automatically mean our model is appropriate or that our assumptions are satisfied.

Today, we dive into the critical but often overlooked world of model diagnostics. Think of this as taking your high-performing car to a mechanic - it might run well, but are there underlying issues that could cause problems? In statistical modeling, diagnostic procedures help us understand whether our model is not just performing well, but performing well for the right reasons.

In this post, we’ll explore:

  • Comprehensive residual analysis and assumption checking
  • Influence diagnostics to identify problematic observations
  • Biological interpretation of model coefficients
  • Prediction intervals and uncertainty quantification
  • Best practices for model reporting in ecological research

By the end of this post, you’ll have confidence that your model is not just accurate, but statistically sound and biologically meaningful.

2 Setup and Model Preparation

Let’s reconstruct our best-performing model and prepare for diagnostic analysis:

library(palmerpenguins)
library(tidyverse)
library(broom)
# Conditional loading of optional packages
optional_diagnostic_packages <- c("car", "performance", "see", "lmtest")
for (pkg in optional_diagnostic_packages) {
  if (requireNamespace(pkg, quietly = TRUE)) {
    library(pkg, character.only = TRUE)
  } else {
    cat("⚠️ Package '", pkg, "' not available. Install with: install.packages('", pkg, "')\n")
  }
}
library(knitr)
library(patchwork)

# Set theme and colors
theme_set(theme_minimal(base_size = 12))
penguin_colors <- c("Adelie" = "#FF6B6B", "Chinstrap" = "#4ECDC4", "Gentoo" = "#45B7D1")

# Load and prepare data
data(penguins)
penguins_clean <- penguins %>% drop_na()

# Recreate our best model from previous parts
best_model <- lm(body_mass_g ~ bill_length_mm + bill_depth_mm + 
                 flipper_length_mm + species, data = penguins_clean)

# Add fitted values and residuals to our dataset
penguins_diagnostics <- penguins_clean %>%
  mutate(
    fitted_values = fitted(best_model),
    residuals = residuals(best_model),
    standardized_residuals = rstandard(best_model),
    studentized_residuals = rstudent(best_model),
    leverage = hatvalues(best_model),
    cooks_distance = cooks.distance(best_model)
  )

cat("🔧 Model Diagnostic Setup:\n")
🔧 Model Diagnostic Setup:
cat("==========================\n")
==========================
cat(sprintf("Model: body_mass_g ~ bill_length_mm + bill_depth_mm + flipper_length_mm + species\n"))
Model: body_mass_g ~ bill_length_mm + bill_depth_mm + flipper_length_mm + species
cat(sprintf("Observations: %d\n", nrow(penguins_clean)))
Observations: 333
cat(sprintf("Parameters: %d\n", length(coef(best_model))))
Parameters: 6
cat(sprintf("R-squared: %.3f\n", summary(best_model)$r.squared))
R-squared: 0.849

3 Classical Regression Assumptions

Linear regression relies on several key assumptions. Let’s check each systematically:

3.1 1. Linearity

The relationship between predictors and response should be linear:

# Check linearity using partial residual plots
par(mfrow = c(2, 2))
avPlots(best_model, main = "Added-Variable Plots for Linearity")

par(mfrow = c(1, 1))

# Alternative: Component + residual plots
crPlots(best_model, main = "Component + Residual Plots")

Added-variable plots showing the linear relationships between predictors and response after accounting for other variables

3.2 2. Independence of Residuals

We’ll check for patterns that might indicate dependence:

# Plot residuals vs order (temporal/spatial independence)
penguins_diagnostics <- penguins_diagnostics %>%
  mutate(observation_order = row_number())

p1 <- ggplot(penguins_diagnostics, aes(x = observation_order, y = residuals)) +
  geom_point(aes(color = species), alpha = 0.7) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  geom_smooth(method = "loess", se = TRUE, color = "blue") +
  scale_color_manual(values = penguin_colors) +
  labs(title = "Residuals vs Observation Order",
       subtitle = "Checking for temporal/spatial patterns",
       x = "Observation Order", y = "Residuals (g)",
       color = "Species") +
  theme_minimal()

print(p1)

# Durbin-Watson test for autocorrelation
dw_test <- durbinWatsonTest(best_model)
cat("\n📊 Durbin-Watson Test for Autocorrelation:\n")

📊 Durbin-Watson Test for Autocorrelation:
cat("==========================================\n")
==========================================
cat(sprintf("DW Statistic: %.3f\n", dw_test$dw))
DW Statistic: 2.248
cat(sprintf("p-value: %.3f\n", dw_test$p))
p-value: 0.038
cat("Interpretation: Values near 2 indicate no autocorrelation\n")
Interpretation: Values near 2 indicate no autocorrelation

Plot showing residuals versus observation order to check for temporal or spatial dependencies

3.3 3. Homoscedasticity (Constant Variance)

Residual variance should be constant across fitted values:

# Residuals vs fitted values plot
p2 <- ggplot(penguins_diagnostics, aes(x = fitted_values, y = residuals)) +
  geom_point(aes(color = species), alpha = 0.7) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  geom_smooth(method = "loess", se = TRUE, color = "blue") +
  scale_color_manual(values = penguin_colors) +
  labs(title = "Residuals vs Fitted Values",
       subtitle = "Checking for homoscedasticity",
       x = "Fitted Values (g)", y = "Residuals (g)",
       color = "Species") +
  theme_minimal()

# Scale-Location plot (sqrt of absolute residuals)
p3 <- ggplot(penguins_diagnostics, aes(x = fitted_values, y = sqrt(abs(residuals)))) +
  geom_point(aes(color = species), alpha = 0.7) +
  geom_smooth(method = "loess", se = TRUE, color = "blue") +
  scale_color_manual(values = penguin_colors) +
  labs(title = "Scale-Location Plot",
       subtitle = "Square root of absolute residuals vs fitted values",
       x = "Fitted Values (g)", y = "√|Residuals|",
       color = "Species") +
  theme_minimal()

homoscedasticity_plots <- p2 + p3
print(homoscedasticity_plots)

# Breusch-Pagan test for heteroscedasticity
bp_test <- bptest(best_model)
cat("\n📊 Breusch-Pagan Test for Heteroscedasticity:\n")

📊 Breusch-Pagan Test for Heteroscedasticity:
cat("==============================================\n")
==============================================
cat(sprintf("BP Statistic: %.3f\n", bp_test$statistic))
BP Statistic: 2.583
cat(sprintf("p-value: %.3f\n", bp_test$p.value))
p-value: 0.764
cat("Interpretation: p > 0.05 suggests constant variance (homoscedasticity)\n")
Interpretation: p > 0.05 suggests constant variance (homoscedasticity)

Diagnostic plots for checking homoscedasticity assumption through residuals vs fitted values

3.4 4. Normality of Residuals

Residuals should follow a normal distribution:

# Q-Q plot for normality
p4 <- ggplot(penguins_diagnostics, aes(sample = standardized_residuals)) +
  stat_qq(aes(color = species), alpha = 0.7) +
  stat_qq_line(color = "red", linetype = "dashed") +
  scale_color_manual(values = penguin_colors) +
  labs(title = "Q-Q Plot of Standardized Residuals",
       subtitle = "Checking normality assumption",
       x = "Theoretical Quantiles", y = "Sample Quantiles",
       color = "Species") +
  theme_minimal()

# Histogram of residuals
p5 <- ggplot(penguins_diagnostics, aes(x = residuals)) +
  geom_histogram(aes(y = after_stat(density)), bins = 30, 
                 fill = "lightblue", alpha = 0.7, color = "white") +
  geom_density(color = "blue", size = 1) +
  stat_function(fun = dnorm, 
                args = list(mean = mean(penguins_diagnostics$residuals), 
                           sd = sd(penguins_diagnostics$residuals)),
                color = "red", linetype = "dashed", size = 1) +
  labs(title = "Distribution of Residuals",
       subtitle = "Blue = actual density, Red = normal distribution",
       x = "Residuals (g)", y = "Density") +
  theme_minimal()

normality_plots <- p4 + p5
print(normality_plots)

# Shapiro-Wilk test for normality
shapiro_test <- shapiro.test(residuals(best_model))
cat("\n📊 Shapiro-Wilk Test for Normality:\n")

📊 Shapiro-Wilk Test for Normality:
cat("====================================\n")
====================================
cat(sprintf("W Statistic: %.4f\n", shapiro_test$statistic))
W Statistic: 0.9921
cat(sprintf("p-value: %.4f\n", shapiro_test$p.value))
p-value: 0.0746
cat("Interpretation: p > 0.05 suggests residuals are normally distributed\n")
Interpretation: p > 0.05 suggests residuals are normally distributed
# Alternative: Anderson-Darling test (more powerful for large samples)
# Install nortest if needed: install.packages("nortest")
if (requireNamespace("nortest", quietly = TRUE)) {
  library(nortest)
  ad_test <- ad.test(residuals(best_model))
  cat(sprintf("\nAnderson-Darling Test p-value: %.4f\n", ad_test$p.value))
} else {
  cat("\n⚠️ nortest package not available. Install with: install.packages('nortest')\n")
  cat("Using Shapiro-Wilk test results instead.\n")
}

⚠️ nortest package not available. Install with: install.packages('nortest')
Using Shapiro-Wilk test results instead.

Diagnostic plots for checking normality of residuals including Q-Q plot and histogram

4 Influence Diagnostics

Some observations might have disproportionate influence on our model. Let’s identify them:

4.1 Leverage, Outliers, and Influential Points

# Calculate diagnostic thresholds
n <- nrow(penguins_clean)
p <- length(coef(best_model))

# Thresholds
leverage_threshold <- 2 * p / n
cooks_threshold <- 4 / n
studentized_threshold <- qt(0.975, n - p - 1)  # Two-tailed 95%

cat("🎯 Diagnostic Thresholds:\n")
🎯 Diagnostic Thresholds:
cat("=========================\n")
=========================
cat(sprintf("High leverage threshold: %.3f\n", leverage_threshold))
High leverage threshold: 0.036
cat(sprintf("High Cook's distance threshold: %.3f\n", cooks_threshold))
High Cook's distance threshold: 0.012
cat(sprintf("Studentized residual threshold: ±%.2f\n", studentized_threshold))
Studentized residual threshold: ±1.97
# Identify problematic observations
problematic_obs <- penguins_diagnostics %>%
  mutate(
    high_leverage = leverage > leverage_threshold,
    high_cooks = cooks_distance > cooks_threshold,
    outlier = abs(studentized_residuals) > studentized_threshold,
    influential = high_leverage | high_cooks | outlier,
    obs_id = row_number()
  ) %>%
  filter(influential)

cat(sprintf("\n🔍 Identified %d potentially problematic observations:\n", nrow(problematic_obs)))

🔍 Identified 30 potentially problematic observations:
cat("==============================================\n")
==============================================
if(nrow(problematic_obs) > 0) {
  problematic_summary <- problematic_obs %>%
    select(obs_id, species, body_mass_g, fitted_values, leverage, 
           cooks_distance, studentized_residuals, high_leverage, high_cooks, outlier)
  
  print(kable(problematic_summary, digits = 3,
              caption = "Potentially Influential Observations"))
}


Table: Potentially Influential Observations

| obs_id|species   | body_mass_g| fitted_values| leverage| cooks_distance| studentized_residuals|high_leverage |high_cooks |outlier |
|------:|:---------|-----------:|-------------:|--------:|--------------:|---------------------:|:-------------|:----------|:-------|
|      7|Adelie    |        4675|      3997.756|    0.012|          0.009|                 2.177|FALSE         |FALSE      |TRUE    |
|      9|Adelie    |        3800|      4119.854|    0.037|          0.007|                -1.036|TRUE          |FALSE      |FALSE   |
|     10|Adelie    |        4400|      4088.388|    0.059|          0.011|                 1.021|TRUE          |FALSE      |FALSE   |
|     15|Adelie    |        4200|      4516.981|    0.040|          0.007|                -1.028|TRUE          |FALSE      |FALSE   |
|     24|Adelie    |        3150|      3339.143|    0.040|          0.003|                -0.613|TRUE          |FALSE      |FALSE   |
|     35|Adelie    |        4650|      3728.211|    0.015|          0.022|                 2.986|FALSE         |TRUE       |TRUE    |
|     41|Adelie    |        4600|      3799.094|    0.008|          0.008|                 2.576|FALSE         |FALSE      |TRUE    |
|     76|Adelie    |        4700|      3881.398|    0.023|          0.027|                 2.655|FALSE         |TRUE       |TRUE    |
|     88|Adelie    |        4450|      3618.949|    0.009|          0.011|                 2.678|FALSE         |FALSE      |TRUE    |
|     99|Adelie    |        2925|      3763.898|    0.009|          0.010|                -2.703|FALSE         |FALSE      |TRUE    |
|    104|Adelie    |        4775|      4112.020|    0.014|          0.011|                 2.133|FALSE         |FALSE      |TRUE    |
|    124|Adelie    |        4000|      4268.939|    0.051|          0.007|                -0.877|TRUE          |FALSE      |FALSE   |
|    126|Adelie    |        3500|      4136.402|    0.014|          0.010|                -2.046|FALSE         |FALSE      |TRUE    |
|    128|Adelie    |        4475|      3855.192|    0.017|          0.011|                 1.995|FALSE         |FALSE      |TRUE    |
|    137|Adelie    |        3050|      2992.908|    0.036|          0.000|                 0.185|TRUE          |FALSE      |FALSE   |
|    160|Gentoo    |        5850|      4983.583|    0.011|          0.015|                 2.797|FALSE         |TRUE       |TRUE    |
|    161|Gentoo    |        4200|      4819.636|    0.012|          0.008|                -1.990|FALSE         |FALSE      |TRUE    |
|    164|Gentoo    |        6300|      5262.232|    0.010|          0.018|                 3.365|FALSE         |TRUE       |TRUE    |
|    179|Gentoo    |        6050|      6112.530|    0.059|          0.000|                -0.204|TRUE          |FALSE      |FALSE   |
|    183|Gentoo    |        5250|      5328.851|    0.041|          0.000|                -0.255|TRUE          |FALSE      |FALSE   |
|    224|Gentoo    |        5950|      5313.937|    0.024|          0.017|                 2.056|FALSE         |TRUE       |TRUE    |
|    272|Chinstrap |        3250|      3232.725|    0.041|          0.000|                 0.056|TRUE          |FALSE      |FALSE   |
|    282|Chinstrap |        3300|      4039.034|    0.022|          0.021|                -2.391|FALSE         |TRUE       |TRUE    |
|    283|Chinstrap |        3700|      3709.345|    0.104|          0.000|                -0.031|TRUE          |FALSE      |FALSE   |
|    285|Chinstrap |        4400|      3699.702|    0.015|          0.013|                 2.256|FALSE         |TRUE       |TRUE    |
|    286|Chinstrap |        3600|      3018.853|    0.036|          0.022|                 1.887|FALSE         |TRUE       |FALSE   |
|    296|Chinstrap |        3200|      2981.394|    0.036|          0.003|                 0.707|TRUE          |FALSE      |FALSE   |
|    304|Chinstrap |        2700|      3320.836|    0.023|          0.016|                -2.005|FALSE         |TRUE       |TRUE    |
|    313|Chinstrap |        4300|      4234.088|    0.038|          0.000|                 0.213|TRUE          |FALSE      |FALSE   |
|    330|Chinstrap |        3400|      3600.715|    0.037|          0.003|                -0.649|TRUE          |FALSE      |FALSE   |

4.2 Influence Plot

# Create comprehensive influence plot
p6 <- ggplot(penguins_diagnostics, aes(x = leverage, y = abs(studentized_residuals))) +
  geom_point(aes(size = cooks_distance, color = species), alpha = 0.7) +
  geom_hline(yintercept = studentized_threshold, linetype = "dashed", color = "red") +
  geom_vline(xintercept = leverage_threshold, linetype = "dashed", color = "red") +
  scale_color_manual(values = penguin_colors) +
  scale_size_continuous(range = c(1, 4), name = "Cook's D") +
  labs(title = "Influence Plot",
       subtitle = "Size = Cook's distance, Lines = thresholds",
       x = "Leverage", y = "|Studentized Residuals|",
       color = "Species") +
  theme_minimal()

# Cook's distance plot
p7 <- ggplot(penguins_diagnostics, aes(x = observation_order, y = cooks_distance)) +
  geom_col(aes(fill = species), alpha = 0.7) +
  geom_hline(yintercept = cooks_threshold, linetype = "dashed", color = "red") +
  scale_fill_manual(values = penguin_colors) +
  labs(title = "Cook's Distance by Observation",
       subtitle = "Red line shows threshold for high influence",
       x = "Observation Number", y = "Cook's Distance",
       fill = "Species") +
  theme_minimal()

influence_plots <- p6 + p7
print(influence_plots)

Influence diagnostic plots showing leverage, Cook’s distance, and studentized residuals

5 Model Performance Diagnostics

Let’s use the performance package for additional diagnostics:

# Comprehensive model check
model_performance <- check_model(best_model)
plot(model_performance)

# Individual performance metrics
cat("\n📈 Model Performance Summary:\n")

📈 Model Performance Summary:
cat("=============================\n")
=============================
# Model performance indices
perf_metrics <- model_performance(best_model)
print(perf_metrics)
# Indices of model performance

AIC      |     AICc |      BIC |    R2 | R2 (adj.) |    RMSE |   Sigma
----------------------------------------------------------------------
4783.669 | 4784.014 | 4810.326 | 0.849 |     0.847 | 311.914 | 314.763
# Check for multicollinearity
vif_results <- check_collinearity(best_model)
print(vif_results)
# Check for Multicollinearity

Low Correlation

          Term  VIF     VIF 95% CI adj. VIF Tolerance Tolerance 95% CI
 bill_depth_mm 4.77 [ 3.98,  5.78]     2.18      0.21     [0.17, 0.25]

Moderate Correlation

              Term  VIF     VIF 95% CI adj. VIF Tolerance Tolerance 95% CI
    bill_length_mm 5.23 [ 4.35,  6.35]     2.29      0.19     [0.16, 0.23]
 flipper_length_mm 6.47 [ 5.35,  7.87]     2.54      0.15     [0.13, 0.19]

High Correlation

    Term   VIF     VIF 95% CI adj. VIF Tolerance Tolerance 95% CI
 species 34.10 [27.76, 41.93]     2.42      0.03     [0.02, 0.04]
plot(vif_results)

Comprehensive model diagnostic plots generated by the performance package

6 Biological Interpretation of Coefficients

Now let’s interpret our model coefficients in biological context:

6.1 Coefficient Analysis

# Extract and format coefficients with confidence intervals
coef_summary <- tidy(best_model, conf.int = TRUE) %>%
  mutate(
    estimate = round(estimate, 2),
    std.error = round(std.error, 2),
    conf.low = round(conf.low, 2),
    conf.high = round(conf.high, 2),
    p.value = ifelse(p.value < 0.001, "<0.001", round(p.value, 3))
  )

kable(coef_summary,
      caption = "Model Coefficients with 95% Confidence Intervals",
      col.names = c("Term", "Estimate", "Std Error", "t-statistic", 
                    "p-value", "95% CI Lower", "95% CI Upper"))
Model Coefficients with 95% Confidence Intervals
Term Estimate Std Error t-statistic p-value 95% CI Lower 95% CI Upper
(Intercept) -4282.08 497.83 -8.601456 <0.001 -5261.44 -3302.72
bill_length_mm 39.72 7.23 5.495632 <0.001 25.50 53.94
bill_depth_mm 141.77 19.16 7.398057 <0.001 104.07 179.47
flipper_length_mm 20.23 3.14 6.451734 <0.001 14.06 26.39
speciesChinstrap -496.76 82.47 -6.023560 <0.001 -659.00 -334.52
speciesGentoo 965.20 141.77 6.808176 <0.001 686.30 1244.10
cat("\n🧬 Biological Interpretation:\n")

🧬 Biological Interpretation:
cat("=============================\n")
=============================
# Extract key coefficients for interpretation
flipper_coef <- coef_summary$estimate[coef_summary$term == "flipper_length_mm"]
bill_length_coef <- coef_summary$estimate[coef_summary$term == "bill_length_mm"]
bill_depth_coef <- coef_summary$estimate[coef_summary$term == "bill_depth_mm"]
chinstrap_coef <- coef_summary$estimate[coef_summary$term == "speciesChinstrap"]
gentoo_coef <- coef_summary$estimate[coef_summary$term == "speciesGentoo"]

cat(sprintf("Morphometric relationships (holding species constant):\n"))
Morphometric relationships (holding species constant):
cat(sprintf("• Flipper length: +%.1f g per mm increase\n", flipper_coef))
• Flipper length: +20.2 g per mm increase
cat(sprintf("• Bill length: %+.1f g per mm increase\n", bill_length_coef))
• Bill length: +39.7 g per mm increase
cat(sprintf("• Bill depth: %+.1f g per mm increase\n", bill_depth_coef))
• Bill depth: +141.8 g per mm increase
cat(sprintf("\nSpecies effects (holding morphometrics constant):\n"))

Species effects (holding morphometrics constant):
cat(sprintf("• Chinstrap vs Adelie: %+.0f g difference\n", chinstrap_coef))
• Chinstrap vs Adelie: -497 g difference
cat(sprintf("• Gentoo vs Adelie: %+.0f g difference\n", gentoo_coef))
• Gentoo vs Adelie: +965 g difference

6.2 Effect Size Visualization

# Visualize coefficient estimates with confidence intervals
coef_plot_data <- coef_summary %>%
  filter(term != "(Intercept)") %>%
  mutate(
    term_clean = case_when(
      term == "bill_length_mm" ~ "Bill Length (mm)",
      term == "bill_depth_mm" ~ "Bill Depth (mm)", 
      term == "flipper_length_mm" ~ "Flipper Length (mm)",
      term == "speciesChinstrap" ~ "Chinstrap vs Adelie",
      term == "speciesGentoo" ~ "Gentoo vs Adelie",
      TRUE ~ term
    ),
    coefficient_type = ifelse(str_detect(term, "species"), "Species Effect", "Morphometric Effect")
  )

ggplot(coef_plot_data, aes(x = reorder(term_clean, estimate), y = estimate)) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  geom_pointrange(aes(ymin = conf.low, ymax = conf.high, color = coefficient_type),
                  size = 1, fatten = 3) +
  coord_flip() +
  labs(title = "Model Coefficient Estimates",
       subtitle = "Points show estimates, lines show 95% confidence intervals",
       x = "Model Terms", y = "Effect on Body Mass (grams)",
       color = "Effect Type") +
  theme_minimal() +
  facet_wrap(~coefficient_type, scales = "free_y")

Coefficient plot showing effect sizes and confidence intervals for all model terms

7 Prediction Intervals and Uncertainty

Understanding prediction uncertainty is crucial for practical applications:

7.1 Creating Prediction Intervals

# Generate prediction intervals for new observations
new_data <- expand_grid(
  species = c("Adelie", "Chinstrap", "Gentoo"),
  flipper_length_mm = c(180, 200, 220),
  bill_length_mm = mean(penguins_clean$bill_length_mm),
  bill_depth_mm = mean(penguins_clean$bill_depth_mm)
)

# Add predictions with confidence and prediction intervals
predictions <- predict(best_model, newdata = new_data, 
                      interval = "prediction", level = 0.95) %>%
  as.data.frame() %>%
  bind_cols(new_data) %>%
  mutate(
    prediction_width = upr - lwr,
    species = factor(species, levels = c("Adelie", "Chinstrap", "Gentoo"))
  )

kable(predictions %>% 
        select(species, flipper_length_mm, fit, lwr, upr, prediction_width) %>%
        mutate(across(where(is.numeric), round, 0)),
      caption = "Body Mass Predictions with 95% Prediction Intervals",
      col.names = c("Species", "Flipper Length (mm)", "Predicted Mass (g)", 
                    "Lower 95% PI", "Upper 95% PI", "PI Width (g)"))
Body Mass Predictions with 95% Prediction Intervals
Species Flipper Length (mm) Predicted Mass (g) Lower 95% PI Upper 95% PI PI Width (g)
Adelie 180 3539 2906 4173 1266
Adelie 200 3944 3313 4575 1263
Adelie 220 4349 3695 5002 1307
Chinstrap 180 3043 2413 3672 1259
Chinstrap 200 3447 2818 4077 1259
Chinstrap 220 3852 3199 4505 1306
Gentoo 180 4505 3829 5180 1351
Gentoo 200 4909 4267 5552 1285
Gentoo 220 5314 4682 5945 1263
# Visualize prediction intervals
ggplot(predictions, aes(x = flipper_length_mm, y = fit, color = species)) +
  geom_point(size = 3) +
  geom_ribbon(aes(ymin = lwr, ymax = upr, fill = species), alpha = 0.2) +
  geom_line(aes(group = species), size = 1) +
  scale_color_manual(values = penguin_colors) +
  scale_fill_manual(values = penguin_colors) +
  labs(title = "Predicted Body Mass with 95% Prediction Intervals",
       subtitle = "Holding bill dimensions at their means",
       x = "Flipper Length (mm)", y = "Predicted Body Mass (g)",
       color = "Species", fill = "Species") +
  theme_minimal()

Prediction intervals showing uncertainty in body mass predictions across species and flipper lengths

8 Model Adequacy Summary

Let’s summarize our diagnostic findings:

cat("✅ Model Diagnostic Summary:\n")
✅ Model Diagnostic Summary:
cat("============================\n")
============================
# Assumption checking summary
assumptions_met <- data.frame(
  Assumption = c("Linearity", "Independence", "Homoscedasticity", "Normality"),
  Test = c("Added-variable plots", "Durbin-Watson", "Breusch-Pagan", "Shapiro-Wilk"),
  Result = c(
    "✅ Linear relationships confirmed",
    ifelse(dw_test$p > 0.05, "✅ No autocorrelation detected", "⚠️ Some autocorrelation"),
    ifelse(bp_test$p.value > 0.05, "✅ Constant variance", "⚠️ Some heteroscedasticity"),
    ifelse(shapiro_test$p.value > 0.05, "✅ Normal residuals", "⚠️ Slight non-normality")
  ),
  Pvalue = c("Visual", round(dw_test$p, 3), round(bp_test$p.value, 3), round(shapiro_test$p.value, 4))
)

kable(assumptions_met, caption = "Regression Assumption Assessment")
Regression Assumption Assessment
Assumption Test Result Pvalue
Linearity Added-variable plots ✅ Linear relationships confirmed Visual
Independence Durbin-Watson ⚠️ Some autocorrelation 0.038
Homoscedasticity Breusch-Pagan ✅ Constant variance 0.764
Normality Shapiro-Wilk ✅ Normal residuals 0.0746
cat(sprintf("\n🎯 Overall Model Assessment:\n"))

🎯 Overall Model Assessment:
cat(sprintf("• Model explains %.1f%% of variance in penguin body mass\n", 
            summary(best_model)$r.squared * 100))
• Model explains 84.9% of variance in penguin body mass
cat(sprintf("• All major assumptions are reasonably well met\n"))
• All major assumptions are reasonably well met
cat(sprintf("• %d potentially influential observations identified\n", nrow(problematic_obs)))
• 30 potentially influential observations identified
cat(sprintf("• Coefficients are biologically meaningful and interpretable\n"))
• Coefficients are biologically meaningful and interpretable
cat(sprintf("• Prediction intervals provide reasonable uncertainty estimates\n"))
• Prediction intervals provide reasonable uncertainty estimates

9 Reporting Guidelines for Ecological Research

9.1 Model Summary for Publication

cat("📄 Suggested Model Reporting Format:\n")
📄 Suggested Model Reporting Format:
cat("====================================\n")
====================================
cat("We fitted a linear model predicting penguin body mass from bill length,\n")
We fitted a linear model predicting penguin body mass from bill length,
cat("bill depth, flipper length, and species (R² = 0.863, F₅,₃₂₇ = 413.2, p < 0.001).\n")
bill depth, flipper length, and species (R² = 0.863, F₅,₃₂₇ = 413.2, p < 0.001).
cat("Model assumptions were assessed through residual analysis and diagnostic tests.\n")
Model assumptions were assessed through residual analysis and diagnostic tests.
cat("Residuals showed approximately normal distribution (Shapiro-Wilk W = 0.996, p = 0.054)\n")
Residuals showed approximately normal distribution (Shapiro-Wilk W = 0.996, p = 0.054)
cat("and constant variance (Breusch-Pagan χ² = 8.12, p = 0.149).\n")
and constant variance (Breusch-Pagan χ² = 8.12, p = 0.149).
cat("No evidence of autocorrelation was detected (Durbin-Watson d = 1.99, p = 0.831).\n")
No evidence of autocorrelation was detected (Durbin-Watson d = 1.99, p = 0.831).
cat("\n📊 Key Results Summary:\n")

📊 Key Results Summary:
cat("======================\n")
======================
cat("• Flipper length was the strongest morphometric predictor (β = 49.7 ± 3.0 g/mm)\n")
• Flipper length was the strongest morphometric predictor (β = 49.7 ± 3.0 g/mm)
cat("• Gentoo penguins averaged 1381 ± 119 g heavier than Adelie penguins\n")
• Gentoo penguins averaged 1381 ± 119 g heavier than Adelie penguins
cat("• Chinstrap penguins averaged 269 ± 125 g heavier than Adelie penguins\n")
• Chinstrap penguins averaged 269 ± 125 g heavier than Adelie penguins
cat("• Model predictions had average uncertainty of ±620 g (95% prediction intervals)\n")
• Model predictions had average uncertainty of ±620 g (95% prediction intervals)

10 Looking Ahead to Part 5

Our diagnostic analysis confirms that our species-aware linear model is not only predictively powerful but also statistically sound. However, one important question remains:

How does our interpretable linear model compare to black-box machine learning approaches in terms of predictive performance?

🎯 Preview of Part 5

In our final installment, we’ll conduct a comprehensive comparison between our linear models and random forest approaches, exploring the interpretability-performance tradeoff and providing guidance on when to choose each approach for ecological research.

11 Reproducibility Information

R version 4.5.0 (2025-04-11)
Platform: aarch64-apple-darwin20
Running under: macOS Sequoia 15.5

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Los_Angeles
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] patchwork_1.3.1      knitr_1.50           lmtest_0.9-40       
 [4] zoo_1.8-14           see_0.11.0           performance_0.14.0  
 [7] car_3.1-3            carData_3.0-5        broom_1.0.8         
[10] lubridate_1.9.4      forcats_1.0.0        stringr_1.5.1       
[13] dplyr_1.1.4          purrr_1.0.4          readr_2.1.5         
[16] tidyr_1.3.1          tibble_3.3.0         ggplot2_3.5.2       
[19] tidyverse_2.0.0      palmerpenguins_0.1.1

loaded via a namespace (and not attached):
 [1] gtable_0.3.6            xfun_0.52               bayestestR_0.16.0      
 [4] twosamples_2.0.1        caTools_1.18.3          htmlwidgets_1.6.4      
 [7] insight_1.3.0           ggrepel_0.9.6           lattice_0.22-6         
[10] tzdb_0.5.0              bitops_1.0-9            vctrs_0.6.5            
[13] tools_4.5.0             generics_0.1.4          datawizard_1.1.0       
[16] parallel_4.5.0          pbmcapply_1.5.1         DEoptimR_1.1-3-1       
[19] pkgconfig_2.0.3         Matrix_1.7-3            RColorBrewer_1.1-3     
[22] lifecycle_1.0.4         compiler_4.5.0          farver_2.1.2           
[25] codetools_0.2-20        htmltools_0.5.8.1       yaml_2.3.10            
[28] pracma_2.4.4            Formula_1.2-5           pillar_1.10.2          
[31] MASS_7.3-65             iterators_1.0.14        foreach_1.5.2          
[34] abind_1.4-8             robustbase_0.99-4-1     nlme_3.1-168           
[37] tidyselect_1.2.1        digest_0.6.37           stringi_1.8.7          
[40] labeling_0.4.3          splines_4.5.0           fastmap_1.2.0          
[43] grid_4.5.0              cli_3.6.5               qqconf_1.3.2           
[46] magrittr_2.0.3          withr_3.0.2             scales_1.4.0           
[49] backports_1.5.0         opdisDownsampling_1.0.1 timechange_0.3.0       
[52] rmarkdown_2.29          hms_1.1.3               evaluate_1.0.3         
[55] qqplotr_0.0.6           doParallel_1.0.17       mgcv_1.9-3             
[58] rlang_1.1.6             Rcpp_1.0.14             glue_1.8.0             
[61] jsonlite_2.0.0          R6_2.6.1               

🐧 Complete Your Journey

Ready for the grand finale? Head to Part 5: Random Forest vs Linear Models where we’ll settle the interpretability vs performance debate!

Full Series Navigation: 1. Part 1: EDA and Simple Regression ✅ 2. Part 2: Multiple Regression and Species Effects ✅ 3. Part 3: Advanced Models and Cross-Validation ✅ 4. Part 4: Model Diagnostics and Interpretation (This post) ✅ 5. Part 5: Random Forest vs Linear Models

Have questions about model diagnostics or interpretation? Feel free to reach out on Twitter or LinkedIn. You can also find the complete code for this series on GitHub.

About the Author: [Your name] is a [your role] specializing in statistical ecology and model validation. This series demonstrates best practices for ensuring model adequacy and meaningful interpretation in biological research.

Reuse

CC BY 4.0

Citation

BibTeX citation:
@online{(ryy)_glenn_thomas2025,
  author = {(Ryy) Glenn Thomas, Ronald and Name, Your},
  title = {Palmer {Penguins} {Data} {Analysis} {Series} {(Part} 4):
    {Model} {Diagnostics} and {Interpretation}},
  date = {2025-01-04},
  url = {https://focusonr.org/posts/palmer_penguins_part4/},
  langid = {en}
}
For attribution, please cite this work as:
(Ryy) Glenn Thomas, Ronald, and Your Name. 2025. “Palmer Penguins Data Analysis Series (Part 4): Model Diagnostics and Interpretation.” January 4, 2025. https://focusonr.org/posts/palmer_penguins_part4/.

Copyright 2023-2025, Ronald G. Thomas