| Title: | Data Quality Check Framework for Clinical and Analytical Data |
|---|---|
| Description: | A configuration-driven framework for running domain-level data quality checks and consolidating findings into structured Excel reports with role-based feedback routing. It supports trial-level and study-level checks across multiple data domains. Reports are routed to separate feedback channels for Data Management (DM), Medical Writing (MW), Study Data Tabulation Model (SDTM) programmers, and Analysis Data Model (ADaM) programmers, as well as other relevant data roles. Reviewer responses are incorporated automatically on re-run. |
| Authors: | Ganesh Babu G [aut, cre] |
| Maintainer: | Ganesh Babu G <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 1.1.0 |
| Built: | 2026-05-24 07:09:09 UTC |
| Source: | https://github.com/ganeshbabunn/rcoregage |
Merges new findings with previously saved issues, incorporates reviewer
feedback from role-separated feedback folders, and writes four
role-specific reports plus all_open.xlsx and
all_closed.xlsx to cfg$reports.
build_reports(cfg, state)build_reports(cfg, state)
cfg |
A named list of paths from |
state |
The state object returned by |
Feedback loop:
After distributing reports to reviewers, they place updated files in
the appropriate subfolder under cfg$feedback:
feedback/DM/ - Data Management reviewer
feedback/MW/ - Medical Writing reviewer
feedback/SDTM/ - SDTM programmer
feedback/ADAM/ - ADaM programmer
On the next run, build_reports() reads the most recent file
from each feedback folder, merges analyst_note,
review_note and status updates back into the findings, and
overwrites the reports with the merged version.
Invisibly NULL. Reports are written to cfg$reports.
tmp_rep <- tempdir() cfg <- list( rule_registry = system.file("extdata", "rule_registry.xlsx", package = "rCoreGage"), trial_checks = tmp_rep, study_checks = tmp_rep, inputs = tmp_rep, reports = tmp_rep, feedback = tmp_rep ) state <- setup_coregage(cfg) state$domains <- load_inputs(cfg) state <- run_checks(cfg, state) build_reports(cfg, state)tmp_rep <- tempdir() cfg <- list( rule_registry = system.file("extdata", "rule_registry.xlsx", package = "rCoreGage"), trial_checks = tmp_rep, study_checks = tmp_rep, inputs = tmp_rep, reports = tmp_rep, feedback = tmp_rep ) state <- setup_coregage(cfg) state$domains <- load_inputs(cfg) state <- run_checks(cfg, state) build_reports(cfg, state)
Called at the end of every individual check block. Validates the
findings data frame, counts valid observations via
count_valid, and appends to state$issues and
state$summary_log.
collect_findings( state, df, id, desc_col = "description", sobs = TRUE, unblind_codes = character(0) )collect_findings( state, df, id, desc_col = "description", sobs = TRUE, unblind_codes = character(0) )
state |
The current state object. |
df |
A data frame produced by the check. Must contain columns:
|
id |
Character. The check ID matching the ID column in
|
desc_col |
Character. Name of the description column in |
sobs |
Logical. Whether to limit output to 20 observations.
Defaults to |
unblind_codes |
Character vector. Topic codes that could unblind
the study. Rows with negative subject IDs where these codes are
absent from the description are excluded. Defaults to |
Updated state object.
# Build a minimal state object state <- list( issues = data.frame(id = character(0), subj_id = character(0), vis_id = numeric(0), description = character(0), review = character(0), stringsAsFactors = FALSE), summary_log = data.frame(headlink = character(0), nu = integer(0), rule_set = character(0), sobs = character(0), stringsAsFactors = FALSE) ) # A findings data frame produced by a check MY_CHECK <- data.frame( subj_id = c("SUBJ-001", "SUBJ-002"), vis_id = NA_real_, description = c("AESEV missing for RASH", "AESEV missing for HEADACHE"), stringsAsFactors = FALSE ) state <- collect_findings(state, MY_CHECK, id = "AECHK001")# Build a minimal state object state <- list( issues = data.frame(id = character(0), subj_id = character(0), vis_id = numeric(0), description = character(0), review = character(0), stringsAsFactors = FALSE), summary_log = data.frame(headlink = character(0), nu = integer(0), rule_set = character(0), sobs = character(0), stringsAsFactors = FALSE) ) # A findings data frame produced by a check MY_CHECK <- data.frame( subj_id = c("SUBJ-001", "SUBJ-002"), vis_id = NA_real_, description = c("AESEV missing for RASH", "AESEV missing for HEADACHE"), stringsAsFactors = FALSE ) state <- collect_findings(state, MY_CHECK, id = "AECHK001")
Returns the number of rows in df, optionally excluding rows
that could unblind the study (negative subject IDs where none of
the unblinding topic codes appear in the description).
count_valid(df, unblind_codes = character(0))count_valid(df, unblind_codes = character(0))
df |
A data frame with at least |
unblind_codes |
Character vector of topic codes that flag
potential unblinding. Defaults to |
Integer. Number of valid rows.
df <- data.frame(subj_id = c("001", "-002"), description = c("Issue A", "Issue B"), stringsAsFactors = FALSE) count_valid(df)df <- data.frame(subj_id = c("001", "-002"), description = c("Issue A", "Issue B"), stringsAsFactors = FALSE) count_valid(df)
Scaffolds a complete CoreGage project folder structure at the specified
path and copies all required template files from the package installation.
After running this function, open the generated .Rproj file in
RStudio and start working.
create_project(name, path, overwrite = FALSE)create_project(name, path, overwrite = FALSE)
name |
Character. Project name. Used as the folder name and
|
path |
Character. Parent directory where the project folder will be created. Defaults to the current working directory. |
overwrite |
Logical. Whether to overwrite an existing project at
the same path. Defaults to |
Invisibly returns the full path to the created project.
proj_path <- file.path(tempdir(), "example_project") dir.create(proj_path, showWarnings = FALSE) create_project(name = "TRIAL_ABC", path = proj_path)proj_path <- file.path(tempdir(), "example_project") dir.create(proj_path, showWarnings = FALSE) create_project(name = "TRIAL_ABC", path = proj_path)
Reads every .csv and (optionally) .sas7bdat file from the
inputs/ folder into a named list. The name of each element is the
lowercase filename without extension (e.g. AE.csv becomes
domains$ae). Drop a new domain file into inputs/ and it is
picked up automatically on the next run - no code change required.
load_inputs(cfg)load_inputs(cfg)
cfg |
A named list of paths from |
A named list of data frames, one per domain file found.
# Create a temporary inputs folder with a sample CSV tmp_inp <- tempdir() write.csv( data.frame(USUBJID = "SUBJ-001", AETERM = "RASH"), file.path(tmp_inp, "AE.csv"), row.names = FALSE ) cfg <- list(inputs = tmp_inp) domains <- load_inputs(cfg) # domains$ae contains the loaded Adverse Events data# Create a temporary inputs folder with a sample CSV tmp_inp <- tempdir() write.csv( data.frame(USUBJID = "SUBJ-001", AETERM = "RASH"), file.path(tmp_inp, "AE.csv"), row.names = FALSE ) cfg <- list(inputs = tmp_inp) domains <- load_inputs(cfg) # domains$ae contains the loaded Adverse Events data
Loops through all active rule sets defined in the rule registry,
sources each check script from the appropriate folder
(trial checks from cfg$trial_checks, study checks from
cfg$study_checks), and calls check_RULESET(state, cfg),
where RULESET is the value from the Rule_Set column in
rule_registry.xlsx.
run_checks(cfg, state)run_checks(cfg, state)
cfg |
A named list of paths from |
state |
The state object returned by |
Updated state object with issues and
summary_log populated.
tmp_rep <- tempdir() cfg <- list( rule_registry = system.file("extdata", "rule_registry.xlsx", package = "rCoreGage"), trial_checks = tmp_rep, study_checks = tmp_rep, inputs = tmp_rep, reports = tmp_rep, feedback = tmp_rep ) state <- setup_coregage(cfg) state$domains <- load_inputs(cfg) state <- run_checks(cfg, state)tmp_rep <- tempdir() cfg <- list( rule_registry = system.file("extdata", "rule_registry.xlsx", package = "rCoreGage"), trial_checks = tmp_rep, study_checks = tmp_rep, inputs = tmp_rep, reports = tmp_rep, feedback = tmp_rep ) state <- setup_coregage(cfg) state$domains <- load_inputs(cfg) state <- run_checks(cfg, state)
Reads the rule registry (rule_registry.xlsx), builds the active rule switch vector, and initialises all empty master tables required for a check run. This is the first function called in every run.
setup_coregage(cfg)setup_coregage(cfg)
cfg |
A named list of paths produced by sourcing project_config.R.
Must contain: |
A named list (the state object) containing:
rule_registry, active_rules, session,
issues, summary_log, review_log.
tmp_rep <- tempdir() cfg <- list( rule_registry = system.file("extdata", "rule_registry.xlsx", package = "rCoreGage"), trial_checks = tmp_rep, study_checks = tmp_rep, inputs = tmp_rep, reports = tmp_rep, feedback = tmp_rep ) state <- setup_coregage(cfg)tmp_rep <- tempdir() cfg <- list( rule_registry = system.file("extdata", "rule_registry.xlsx", package = "rCoreGage"), trial_checks = tmp_rep, study_checks = tmp_rep, inputs = tmp_rep, reports = tmp_rep, feedback = tmp_rep ) state <- setup_coregage(cfg)