Reproducibility, Reporting, and Diagnostics

Session 8: Final Course Wrap-Up

Agenda

  • Put CohortMethod in perspective: know the design logic, not every default
  • Briefly situate other estimation options, especially SelfControlledCaseSeries
  • Make diagnostics and failure rules explicit before reviewing estimates
  • Use renv to lock and restore package versions
  • Review how Strategus results move from folders to databases and reports

What You Are Actually Expected to Know

  • You are not expected to memorize every default in CohortMethod or CohortMethodModule
  • You are expected to understand the design choices: target, comparator, outcome, washout, risk window, PS strategy, outcome model, and diagnostics
  • Inspect defaults with the documentation and formals() before you accept them
  • In real projects, design choices are usually reviewed with both a clinician and a statistician

Other Population-Level Estimation Packages

  • SelfControlledCaseSeries (SCCS) is a within-person, case-only design
  • Each person serves as their own control, so time-invariant confounding is reduced by design
  • SCCS is attractive when exposure timing and outcome timing are the key question
  • SCCS is not a drop-in replacement for CohortMethod; it comes with its own assumptions and diagnostics
  • In Strategus, this appears through SelfControlledCaseSeriesModule

Diagnostics Need To Be Prespecified

  • Do not wait until after the estimates to decide what counts as “good enough”
  • Prespecify diagnostics and what failure means before reviewing results
  • Typical examples: balance, overlap/equipoise, attrition, MDRR, and EASE
  • If diagnostics fail, the right conclusion is often “not interpretable yet,” not “no effect”

Why We Must Publish All Results

Volcano plots comparing the published observational literature with a large-scale systematic study.

Selective publication can make an evidence base look much stronger and much more directional than it really is.

Why We Need renv

  • Without version control for packages, the same code can work on one machine and fail on another
  • renv creates a project-specific library instead of relying on whatever happens to be installed globally
  • renv.lock records exact package versions so collaborators can recreate the environment
  • Strategus helps reproducibility at the study-design level; renv helps reproducibility at the software-environment level

The Minimum renv Workflow

renv::init()

# install or update packages as needed

renv::snapshot()
renv::restore()
  • snapshot() writes the current package state to the lockfile
  • restore() rebuilds the project library from that lockfile
  • Important: renv::install() alone is often not enough to put a package into renv.lock

How renv Decides What To Lock

  • By default, renv::snapshot() uses implicit discovery of project dependencies
  • That means install() changes the project library, while snapshot() decides what gets declared in the lockfile
  • It scans the whole project recursively, not just the file you happen to run today
  • It looks for patterns like library(pkg), require(pkg), and pkg::fun()
  • Use renv::dependencies() to inspect what renv thinks the project uses
  • Use a small project or .renvignore when you want tighter control over the scan

Demo Pattern We Will Use

renv::init()
renv::install("dplyr")
dplyr::tibble(x = 1:3)
renv::snapshot()
renv::restore()
  • We keep the demo in a tiny project and put the workflow in RenvManagement.R
  • That gives one obvious file to run, while still letting renv scan the whole project
  • Note: Installed does not automatically mean locked

Same Idea For GitHub Packages

renv::install("OHDSI/Strategus")
renv::install("OHDSI/CohortGenerator")
Strategus::createResultsDataModelSettings(
  resultsDatabaseSchema = "study_results",
  resultsFolder = "demo-results"
)
renv::snapshot()
  • renv.lock records the GitHub remote metadata as well as the package version
  • renv::install("OHDSI/Strategus") alone may not lock it; you still need project code that references it before an implicit snapshot()
  • This matters because many OHDSI workflows depend on packages installed from GitHub

Reporting After Strategus Runs

  • Every module writes result files to the resultsFolder in your execution settings
  • For quick checks, you can inspect CSV outputs directly
  • For integrated review across modules, load results into a database
  • From there you can use the OHDSI Shiny results viewer or OhdsiReportGenerator

Reporting Note For Today

  • We are not going to cover how to run the OHDSI Shiny apps locally on your machines
  • In practice this can be temperamental because of environment, database, and package issues
  • Today the goal is to understand the reporting workflow and browse existing examples

Public Results Apps To Browse

Results Database Workflow

resultsDataModelSettings <- Strategus::createResultsDataModelSettings(
  resultsDatabaseSchema = "study_results",
  resultsFolder = "path/to/results"
)

Strategus::createResultDataModel(
  analysisSpecifications = analysisSpecifications,
  resultsDataModelSettings = resultsDataModelSettings,
  resultsConnectionDetails = resultsConnectionDetails
)

Strategus::uploadResults(
  analysisSpecifications = analysisSpecifications,
  resultsDataModelSettings = resultsDataModelSettings,
  resultsConnectionDetails = resultsConnectionDetails
)
  • The schema should already exist, and it should be empty before creation

Reality Check: Manual Extraction Still Happens

  • Historically, we have often had to manually extract results from the results schema
  • That workflow is not ideal: the schema is confusing, extraction is error-prone, and it is easy to make inconsistent choices
  • It also goes somewhat against the OHDSI norm of using shared tooling and standard result viewers
  • But in practice it is sometimes still required while the software and reporting workflows continue to mature

OHDSI Norms And Debates

  • Oversimplifying a bit, there are different instincts within the OHDSI community
  • One instinct is to keep everything “on rails”
  • Standard data model, standard packages, standard diagnostics, standard reporting
  • Main benefit: comparability and fewer analyst degrees of freedom

A Different Instinct

  • Another instinct is to treat OMOP as a great schema and HADES as a strong toolbox
  • That view is more comfortable doing bespoke analyses with other tools
  • Main benefit: flexibility and tailoring the method to the question
  • If you go off rails, the burden on documentation, diagnostics, and justification goes up

Final Takeaways

  • Learn the logic of the design; do not try to memorize every default
  • Use renv so your software environment is reproducible too
  • Plan the results-review pipeline before running the study
  • Treat diagnostics as hard and fast decision rules, not optional
  • Publish and report all results, especially when they are inconvenient