Preventing and Accounting for Missing Clinical Trial Data for FDA Drug Approvals: Part 2


In Part One of this series, we discussed ways to prevent missing clinical trial data by incorporating key strategies in study design and patient follow-up. However, the reality of clinical trials is that having some missing data is inevitable, so we must also consider how to appropriately address the issue in the statistical analysis.

As the FDA becomes more stringent on how they expect sponsors to handle missing data in clinical trials to support device and drug approvals, sponsors should give careful consideration to pre-specified analyses that will support the robustness of the trial results. In order to understand the different approaches to account for missing data at the analysis stage, it’s important to understand the types of missing data that can occur.


Types of Missing Data

We classify missing data in clinical trials as being one of three types:

  1. Missing Completely At Random (MCAR) – missing data not related to the outcome or other variables in the dataset (e.g., the site coordinator forgot to record key endpoint measurements on the case report form for a particular visit).
  2. Missing At Random (MAR) – missing data not related to the outcome, but other measured variables in the clinical study can account for what is missing (e.g., the patient missed a visit due to an extended vacation, but the patient remained on the study treatment).
  3. Missing Not At Random (MNAR) – missing data related to the outcome (e.g., a patient had side effects from the drug and was more likely to not take the study medication and miss study visits).

If all the missing data in the study were MCAR, we could ignore them completely in the analysis without risk of bias. In practice, however, the reason for most missing data in trials is not random, so this assumption can’t be made. Therefore, our focus below on contemporary approaches to analyzing missing data will be assuming MAR or MNAR.


Analyses for Missing Data

There is no “one-size-fits-all” approach for analyzing missing data in a trial. Each trial will require its own considerations. For illustration purposes, let’s use a simple example of a randomized, double-blind, placebo-controlled trial for a product to treat hypertension. Let’s assume that the primary endpoint in this trial is reduction in systolic blood pressure.

Simple imputation methods used extensively in the past, such as last observation carried forward (LOCF) and baseline observation carried forward (BOCF), are now actively discouraged by FDA. Their rationale is two-fold: these methods don’t properly account for uncertainty in the missing values, and their underlying assumptions are typically not justifiable. For example, a patient who was randomized to active treatment discontinued the drug due to an adverse event before the primary endpoint was assessed and then did not come back for any additional study visits. With LOCF, we would impute the patient’s primary endpoint value with a blood pressure measurement taken while still on active treatment. This is clearly inappropriate and a likely overestimate of the treatment effect because we wouldn’t expect the improvement in blood pressure to continue after the treatment has been discontinued. BOCF, on the other hand, can be overly conservative and lead to an underestimate of the treatment effect. For example, it would not be reasonable to impute “no improvement in systolic blood pressure” for a patient who missed their primary endpoint visit due to a family emergency but had been an excellent responder to treatment at all other study visits. In general, these simple imputations do not fully reflect the variability inherent in missing data and can lead to biased estimates of the treatment effect.

The most common methods for handling missing data assume that data are missing at random (MAR). These include mixed-models with repeated measures (MMRM) analysis and multiple imputation (MI). MI is a common choice as a first-line method for missing data. The reason for this is that other predictors in the dataset (e.g., a gender imbalance in missing data rates, the reason for drop-out) can be incorporated into the estimation of the treatment effect. These methods produce estimates that appropriately account for uncertainty and allow for valid inference if the assumptions of MAR are met. However, the MAR assumption cannot be tested.

The most common analytic method for data assumed to be Missing Not at Random (MNAR) is pattern-mixture models (PMM). These models assume that the response for patients who drop out and/or discontinue treatment is different than patients who continue through the study. In our example of the hypertension trial, a PMM could assume that (1) when patients are still on treatment, missing data are similar to the observed on-treatment values, but (2) when patients drop out, they tend to have values less favorable than observed data (e.g., blood pressure values returning to baseline). Sensitivity analyses like PMM ensure that the treatment effect is robust to the range of reasonable assumptions for missing values.

Overall, any analysis with missing data should be scientifically justifiable and, for the purposes of an FDA advisory committee, easily understood by a broad audience. Clinical experts and statisticians should work collaboratively to ensure that pre-planned analyses for missing data are statistically sound and clinically reasonable.



All analyses for missing data rely on assumptions that cannot be tested or confirmed, so the primary effort should be toward preventing missing data in the first place. Incorporating practical elements into the study design, paired with a commitment to minimize missing data during the clinical study, will provide the most robust evidence for device and drug approval.



Chris Miller, MS is a biostatistician who brings experience in the design, analysis, and interpretation of clinical trials to 3D clients. As a senior project manager, Chris leverages statistical expertise with excellent communications skills to integrate complex data with key messages. Connect with Chris on LinkedIn.