The conundrum of missing data vs. inter-rater variability.
Reliable data is critical for making informed decisions about the efficacy and safety of new treatments in clinical trials. While self-reporting from the trial participant themselves on signs, symptoms, feelings, and function is always preferable, in certain populations it is common, or indeed necessary, for data reported by caregivers to form the primary or secondary endpoint. For example, trials in children who are unable to self-report, and individuals with cognitive dysfunction or neurodegeneration such as Alzheimer’s disease often rely on data reported by their caregivers (this does not refer to a healthcare professional). Such observer-generated outcome-related data is termed observer-reported outcome (ObsRO) data.
In the ideal scenario, the same caregiver would respond throughout the trial to maintain consistency in the data being reported; however, it is not always the case that a trial participant will have one, dedicated caregiver making and reporting all relevant observations. Although ObsROs should only be based on observable parameters, there is still potential for subjectivity in the way an observer rates a given behavior, symptom, or event, and, therefore, the potential for variability between raters.1,2 If multiple raters were to report on the same endpoint, then any discordance between raters when pooling data could obscure treatment effects.
Assuming the same caregiver is always with the participant is unrealistic, especially when data is being captured remotely; among many situations, the reality includes children splitting time between separated parents, being looked after by grandparents, and older populations being cared for by multiple caregivers. This issue needs to be addressed regarding the way ObsRO data are managed, analyzed, and reported to authorities.
It is well documented that missing data presents a large challenge to evaluating treatment efficacy, and although there are established techniques to help deal with missing data during analyses, these cannot be seen to be a replacement for the actual data points, and in any case, will only be appropriate in certain circumstances. This, therefore, leads to the question of whether restricting data collection to a single caregiver and potentially increasing the amount of missing data points is preferable to multiple reporters that could lead to more complete but higher-variability data.
It is worth highlighting that this is not a new issue. It is unlikely that the same caregiver consistently completes ObsRO(s) during a trial for a given participant. Even if it is the same reporter, they may not complete the ObsRO(s) within the specified time window. Yet these measures have traditionally been completed on paper, and ensuring attributability and contemporaneousness is far more challenging for that modality.
ObsROs are increasingly completed electronically, and yet there remains a higher level of scrutiny that electronic data capture is subject to despite the numerous benefits it offers over paper. Have the increased safeguards to data collection offered by electronic data capture, such as unique logins and preventing data completion outside of the predefined time windows, brought greater attention to the issue of multiple caregiver report?
Here, we highlight some areas for considerations for data collection in clinical trials that consist of multiple caregivers.
Training is key for all stakeholders in clinical trials, and if there are multiple caregivers reporting in a trial, it is important that they all receive role-specific training. It can be challenging to provide training on specific copyrighted ObsRO measures (beyond any that are already mandated by the copyright holder), as they have often gone through a validation process to ensure that the instructions and items can be understood without assistance from anyone else.
When the same individual is reporting throughout, any difference in the way items are interpreted and rated between reporters (e.g., mothers of different participants) in a trial would be less concerning, as there would still be consistency within the reporter associated with a given participant.
Where there are multiple reporters associated with the same participant (e.g., mother and father), any differences in the way an ObsRO is interpreted can introduce noise into the data when a different reporter is responding at different timepoints. Thus, some form of training to reduce this may be beneficial (such as what constitutes a severe symptom). However, as highlighted, this will need to be carefully thought through for copyrighted measures, and according to the specifics of the protocol.
For home-grown measures, this can be more straightforward. For example, it is common for event-driven diaries to be developed for a specific study, and training on what constitutes a discrete event and how to record it will be a core component of training for all raters.
It is important to ensure the electronic clinical outcome assessment system is set up to facilitate multiple caregiver reports. This includes each reporter having a unique identifier to ensure data attributability. It is also important that data flows into the same location, and given the risk of duplicate data entry, that devices/accounts are synced as close to real-time as possible. Planning for how duplicate data entry (if it does occur) will be addressed in the dataset and analyses will also be important.
Inter-rater variability is a known phenomenon, and it is well-documented that pooling data from multiple respondents can create noise. An ISPOR Task Force Report on this topic states that it “raises the question of how data from different reporters can support a single claim.”3 However, that mainly addresses the scenario for when different age-group versions of the same measure are being used and where the reporter can vary between parent and child, though the concept of multiple reporters is similar. While one approach could be to pool the data into a single analysis, it would be necessary to demonstrate that there is an acceptable level of agreement between the raters. It would also be important to conduct sensitivity analyses and examine if outcomes are similar. Further guidance on this would be beneficial, including how results should be interpreted if the two reporters have different perceptions.
For any clinical trial, a robust statistical analyses plan is critical to outline the approach that will be taken.
As with any topic related to medical product development, regulatory considerations should inform study design. Understanding what is an (un)acceptable level of missing data and what evidence would be required to approve a treatment where efficacy is based on data from multiple caregivers will help ongoing conversations around the reality of multiple caregiver reports.
While the preference is always for participants to self-report, and where ObsROs are used that it is a single observer reporting throughout, this is not the reality. The request for enabling multiple reporters within the electronic data capture system is becoming more common, and best practices should start being developed. While such guidance does not exist yet, we cannot ignore this reality, and we hope this is an area that receives more attention to ensure the optimal approach is taken.
Authored on behalf of Critical Path Institute’s eCOA Consortium by Florence Mowlem, PhD, Chief Scientific Officer, uMotif; and Estelle Haenel, PhD, PharmD, Chief Medical Officer, Kayentis.
References
1. Langberg J.M.; Epstein, J.N.; Simon, J.O.; et al. Parental Agreement on ADHD Symptom-Specific and Broadband Externalizing Ratings of Child Behavior. J Emot Behav Disord. 2010. 18 (1), 41-50. https://journals.sagepub.com/doi/abs/10.1177/1087054714561290
2. Davé, S.; Nazareth, I.; Senior, R.; et al. A Comparison of Father and Mother Report of Child Behavior on the Strengths and Difficulties Questionnaire. Child Psychiatry Hum Dev. 2008. 39, 4, 399–413. https://pubmed.ncbi.nlm.nih.gov/18266104/
3. Matza, L.S.; Patrick, D.L.; Riley, A.W.; et al. Pediatric Patient-Reported Outcome Instruments for Research to Support Medical Product Labeling: Report of the ISPOR PRO Good Research Practices for the Assessment of Children and Adolescents Task Force. Value Health. 2013. 16 (4), 461–479. https://pubmed.ncbi.nlm.nih.gov/23796280/