Resample Study Data
ResampleStudy.RdCreates a resampled version of a study with new subject and study IDs. Supports stratified sampling based on domain activity (e.g., oversample patients with high protocol deviations) and randomizes site assignments.
Usage
ResampleStudy(
lRaw,
strNewStudyID,
nSubjects = NULL,
TargetSiteCount = NULL,
replacement = TRUE,
strOversamplDomain = NULL,
vOversamplQuantileRange = c(0, 1),
seed = NULL
)Arguments
- lRaw
Named list of raw data domains (e.g., Raw_SUBJ, Raw_AE, etc.)
- strNewStudyID
Character string for the new study ID
- nSubjects
Integer number of subjects to sample. NULL (default) samples same number as enrolled subjects in original data
- TargetSiteCount
Numeric. Approximate target number of sites in the resampled study. If NULL (default), uses sites from sampled subjects naturally. If specified, generates approximately N sites with weighted patient distributions. Note: Final site count may vary as sites with zero patients are excluded. Must be a positive integer.
- replacement
Logical indicating whether to sample with replacement (default: TRUE)
- strOversamplDomain
Character string naming a domain to use for stratified sampling. NULL (default) samples from all enrolled subjects
- vOversamplQuantileRange
Numeric vector of length 2 with quantile range (0-1) for oversampling. Default c(0, 1) includes all subjects
- seed
Integer seed for reproducibility. NULL (default) uses current random state
Details
This function performs the following steps:
Optionally filters subjects by their activity level in a specified domain
Samples subjects with or without replacement
Randomizes site assignments by shuffling invid values (or generates new sites if TargetSiteCount specified)
Updates all subject, study, and site IDs across all domains
Maintains referential integrity across domains
When TargetSiteCount is specified:
Generates TargetSiteCount site IDs with metadata sampled from original sites
Samples patient counts per site from the distribution observed in sampled subjects
Creates weighted site assignment: subjects are assigned to sites proportionally to sampled patient counts
Final site count may be less than target if some sites receive no patients through sampling
The function handles multiple subject ID formats:
subjid: Simple ID (e.g., "0496")
subjectid: Composite ID (e.g., "X1670496-113XXX")
subject_nsv: NSV format (e.g., "0496-113XXX")
Examples
# Load test data
lRaw <- list(
Raw_SUBJ = clindata::rawplus_dm,
Raw_AE = clindata::rawplus_ae,
Raw_SITE = clindata::ctms_site,
Raw_STUDY = clindata::ctms_study
)
# Standard resampling
lStudy1 <- ResampleStudy(lRaw, "STUDY001", seed = 123)
# Oversample from high-AE patients (top 25%)
lStudy2 <- ResampleStudy(
lRaw,
"STUDY002",
nSubjects = 50,
strOversamplDomain = "Raw_AE",
vOversamplQuantileRange = c(0.75, 1.0),
seed = 456
)
#> Filtered to 309 subjects with Raw_AE records in 0.75-1.00 quantile range (6-31 records)
# Generate study with target of ~30 sites
lStudy3 <- ResampleStudy(
lRaw,
"STUDY003",
nSubjects = 200,
TargetSiteCount = 30,
seed = 789
)