Answering Big Questions on Big Data
Patient-reported outcomes gathered from the internet holds huge potential for the pharma industry and there is increasing regulatory and healthcare provider interest in this data, but just how can we make the most of the mass of information?
Every day millions of people are sharing their experiences with medicines over the internet. With such a plethora of information comes a huge opportunity for the pharma industry to learn more about patient experiences in the drive to deliver improved products to the market. But the use of big data also raises many unanswered questions from the industry. So how can we really capture all the benefits big data offers?
“Rather than wondering what questions can we ask given the available technology, we are saying let’s build the technology to answer the questions that we need to ask”
James Sawyer, CEO of Prism Ideas, has been working in the field of big data for many years and believes that the answer revolves around building the technology to fit the important questions: “Rather than wondering what questions can we ask given the available technology, we are saying let’s build the technology to answer the questions that we need to ask.”
“You’ve got comments, forums, blogs, micro blogs, social networks, and then ratings and reviews - so it’s important to take a broad brush approach so you collect data from as many different sources as possible.”
Pharma hasn’t had the best experience with big data. As a fairly conservative industry, the generic approaches available for analysing patient reporting have not been particularly satisfactory. Sawyer explains that both the dictionary-based (pulling key words) and sediment-based (identifying key emotions) approaches simply haven’t worked because they end up being taken out of context. This can result in major problems with the accuracy of data. “In fact, the chances are that you’ll perhaps only get 10 per cent accuracy if you use that approach in terms of what’s being said”, observes Sawyer.
So he is proposing a different approach, a variation on relational modelling, where a number of predicates including key words, contextual language, emotional triggers, and other specific details are all linked together: “Using this kind of model you can achieve 80 to 90 per cent accuracy of data”. Pretty compelling statistics.
The text that is gathered off the social media sites is all “free text”, and when aggregated from numerous sources, will results in databases utilizing multiple database structures and formatting. As such, the vital next step for this information is for it to go through a process of “normalisation”, during which it will be formatted into a single, usable stream of data.
“Once you’ve got that free text, then you aggregate the data and then filter it according to particular categories that are within that data. You find out whether the patient is talking about health, to start off with, because we’ll look at a car website, because people will talk about their bad back on that.”
“So you find out first whether the topic is in fact health related, and whether your chosen topics are mentioned within it, so you filter out that way around. But the free text is the raw material, wherever it comes from, and once you’ve got that data aggregated and you’ve pulled the filters, then you can start looking and seeing whether specific data points are mentioned within it.”
Automation is another critical feature because, Sawyer points out, as soon as you put subjectivity from a human reviewer in, noise is added to the problem. For instance, the algorithms Prism Ideas employ, identifies the magnitude of the benefit that the patient needs to identify and then this allows a gradation of their statement.
Questions are often raised about the sources of the data and whether the information is correct, what kind of permission is needed or how to keep it anonymous.
“Sourcing this sort of data is like reviewing data from the literature. Anything that is posted on the internet is public data, it’s publically available. Yes, there are those discussions about Facebook and its privacy rules in terms of the legalities of it, the data is there, it’s published for all intents and purposes. The data that they have provided is a personal disclosure and it’s utilised in an anonymous fashion. It’s complicated but that’s the way it works.”
One of the other challenges for the industry is the requirement to report any potential side effect as soon as they become aware of them, even the incomplete cases, so potentially every single adverse event that is reported on the internet must be reported. This, notes Sawyer, could result in an awfully large burden of work which neither the pharma industry nor the regulators want. But he has some thoughts on potential remedies: “One solution is that we can look at the data as observational studies, so using the information as a signal detection mechanism to identify things that might warrant employing traditional approaches”.
But what can be done to counteract the inherent bias that can comes through in any data from the internet. Sawyer simply doesn’t see this as a problem. The data is just so big that any bias is “drowned out” by all the other data. One of the values that can be garnered through the normalisation processes is identifying repeat individuals.
“At the moment the use of big data within the pharma industry is piecemeal and that’s really because the process is so complicated. But there is some huge potential out there if we get the process and technology right”
The industry is now discussing the possibility of using big data for a wide range of business activities, such as planning clinical trial recruitment. There are also benefits from utilising such information for gene pools to help identify patients with a particular target for genetic therapy. Sawyer’s team has completed big data evaluation projects on a wide range of health topics, including hay fever, multiple sclerosis and stopping smoking.
There is also whole new big data field that has recently started up in the UK where the government has indicated that they are going to open their patient data to the industry. This, explains Sawyer, is big data, but in a different form as it is already semi-structured. There’ll be free text in there but there will also be structured diagnoses and information on medicines. “At the moment the use of big data within the pharma industry is piecemeal and that’s really because the process is so complicated. But there is some huge potential out there if we get the process and technology right”.
Big data offers some significant benefits, both in the clinical development of pharmaceuticals and in combating patient non-adherence, and it seems that because there is simply so much data out there the industry will have no choice but to take notice of it. “Pharma will take more and more time and attention to find out what patients are saying in general, as well as from big data. They will also take more time to utilise that data that’s available from other sources, be they health care records or other places we talked about, and then the regulators will get involved. In fact the regulators are getting involved”, says Sawyer, and if that is the case, then this is a topic pharma need to get to grips with sooner rather than later.
Since you're here...
... and value our content, you should sign-up to our newsletter. Sign up here