Rid Yourself Of Dirty Data
Real-world data holds enormous potential. But it also risks being riddled with bias.
Real-world data (RWD) holds great promise in being able to reveal the true efficacy of treatments, but it also lacks the rigor and randomization of clinical trials; it therefore comes freighted with bias that can undermine the story it appears to be telling.
Bias comes in many guises, but in a scientific context it refers to any systematic error that results in an inaccurate estimation of the effect of an exposure on an outcome.
And as the sources of RWD proliferate from electronic medical records and billing data to novel ones including wearables and social media, so do the sources of bias. Understanding, and where possible, correcting for, such bias is therefore vital in cementing the trust in the claims made for RWD.
Through ever more sophisticated statistical methods, progress is being made to address and resolve at least some of these biases. “For the past 20 years or so biostatisticians and econometricians have been coming up with a variety of different approaches and in general they work for observable forms of bias,” says David Thompson, Senior Vice President for Real World and Late Phase at Syneos Health.
It is the missing, inconsistent or unobserved data that almost by definition remain problematic.
Unseen RWD bias comes from multiple sources. A good example is medical records.
Electronic medical records may be filled out in different ways, different software packages may omit certain fields of data and physicians will inevitably vary in the extent to which they populate data fields they regard as irrelevant to a treatment at hand.
“Just because you have a data field for a certain variable does not mean it will be populated consistently in billing claims or medical records,” says Thompson. “We are talking about software systems where there is incredible variability even if the physician uses the same software system because they will use it differently.
“Some do the absolute minimum, some use all the bells and whistles. That kind of variation exists among users of all kinds of software and electronic record keeping and there is a lot of provider induced variability as well.”
Another example of a source of hidden bias is the clinically correct but unrecorded reasons why physicians decide to use a particular drug. A simple example would be where they might tend to use more powerful, broader spectrum antibiotics for patients who are more seriously ill and treating more straightforward cases with a milder antibiotic, says Thompson.
“If you have that decision making built into the comparison you are going to recognize one group as on average far more ill and so as you compare outcomes of care, one drug comes out looking terrible because it is being used on sicker patients with a poorer prognosis.”
As new sources of data emerge, for example novel data from connected devices or analysis of unstructured data from social media, the machine learning needed to interpret it requires new computation methods that may be opaque to some consumers of the results, says Sean McElligott, Director Market Access, Real World Value and Evidence at Janssen.
“When you’re talking about clinical notes or social listening you are going from the world of structured to unclean, unstructured data which raises a whole new set of issues of what insights you can get from it. As you go from left to right on thedata spectrum, from controlled clinical trials towards uncontrolled social listening, you also get much larger sizes and that brings different ways of looking at it because your sample size and variable dimensionality become potentially larger.”
Addressing bias and either adjusting for it or understanding the extent to which it makes a particular clinical claim more or less valid is important in cementing trust in its use among payers and regulators in particular.
But the extent to which it is necessary to unpick the bias in the data matters more in some arenas than others and depends in part on the use to which the interpretation of the data is put. When RWD is needed to provide a directional signal for a decision, says McElligott, long discussions on statistical methods aren’t necessarily called for.
“People are often looking for a signal that points in the right direction even if it is not deterministic”. For example, “If you are asking, ‘Are we dosing too often?’, and you are looking at a product that is dosed every day or every six months and the data suggests patients remember or forget to take their medications, you might have enough information to ask whether you need a new formulation in terms of dosing.”
At the far extreme of the spectrum, regulators require a high level of certainty that RWD is robust.
So far it looks unlikely that current methods of interpreting RWD are trustworthy enough for regulators to permit its use to inform their decisions.
Regulators including the FDA, which is evaluating the potential use of RWD for regulatory decisions, are looking at the issue but it seems they are so far not convinced, says Thompson.
“I am skeptical that the FDA and regulatory authorities more generally will ever have sufficient comfort levels with non-randomized data such that they would allow a company to submita label extension based on it.”
There needs to be more of a concerted effort across the healthcare industry to communicate clearly how bias is dealt with and the assumptions made to adjust for it.
“We have to open up about methods and there has to be a broad education among everyone,” says McElligott. “They have to be comfortable with it. Not everyone has a PhD in statistics, so we need something that translates it into plain English.”
An independent arbiter able to weigh in on bias and to approve the methodologies used to understand and screen for it would help foster trust beyond econometricians and bio-statisticians, he says.
“At the moment there’s no consensus on which a governing body will adjudicate the methods of how to deal with observed and unobserved bias.”
As well as achieving consensus on the methods by which bias is addressed, so that more healthcare professionals can weigh the evidence with greater clarity, Thompson observes that there is also a ‘middle pathway’ to achieving better real-world insights through a broader use of pragmatic clinical trials (PCT).
These could be employed, for example where payers have restricted the use of a treatment for reasons of cost in which case aPCT would offer a way for payers to re-asses their reimbursement policy where improved outcomes are demonstrated without higher costs. PCTs could also be of use when a product is late to market but offers better results than prior treatments.
“The PCT concept tends to be less expensive to conduct because the monitoring requirements are less, and you don't require patients to come in for as many assessments. It is the most expensive real-world data but compared with randomized controlled trials they are relatively inexpensive.”
While PCTs offer a relatively robust means of gathering actionable RWD their use will inevitably be limited and other forms of RWD still have insights to offer, so it is important not to let the existence of bias disqualify their use, says McElligott.
“RCTs have randomization and therefore you can determine causality, but it is a controlled environment not reflecting the real-world practical use of medications. It is difficult to understand things like adherence, for example, because an RCT is a high-touch environment and patients are getting a lot of encouragement to stay on the drugs.
Being cognizant of the [issues across the] entire spectrum of data is helpful because that helps with the ultimate questions of what sort of biases do we have to deal with and what is the appropriate way to deal with them? As long as we all accept the method we can deal with the biases in RWD.”
Since you're here...
... and value our content, you should sign-up to our newsletter. Sign up here