Carpe IDMP: Leveraging the Commoditization of Scientific Data

Real World Evidence & Market Access Summit USA

Dec 3, 2015 - Dec 4, 2015, Philadelphia

Leverage Real Life Data & Analytics for Value-based Market Access

Regulators, patients and payers are dragging pharma into the digital age, but can the machines make sense of it all?

By Tom MacFarlane on Aug 19, 2015

The research firm, Gartner, publishes an annual analysis of our collective hopes for various technological buzz-words; a 'Hype-Cycle', plotting the level of expectation attached to any particular innovation or trend over time. In the most recent, 20th edition, Big Data is deemed to have completed its ascent of the 'Peak of Inflated Expectations' and must now slip inexorably into the 'Trough of Disillusionment' before it can rise, phoenix-like, onto a 'Plateau of Productivity'.

Hyperbole aside, this trajectory is not unrecognizable in Big Data's reputation among pharmaceutical executives. Those areas of the industry that follow more conventional rules of sales and marketing (notably over-the-counter products and, to a lesser extent, prescription medicines in those regions where direct-to-consumer advertising is legal), have surely benefitted from the ability to mine vast media and online information-seams. However, the ubiquitous promise of Big Data-assisted innovation in the life sciences has yet to materialize.

Yet, there are signs that the aforementioned era of productivity may be reached sooner than expected, signs most strikingly displayed in a recent edition of the journal PLOS Computational Biology.

The Silicon Scientist

A mechanism to explain how individual cells, each containing identical DNA instructions, are able to coordinate, localize and limit their own replication to form complex 3D shapes - an arm, for example – has eluded biologists for more than a century. But now, a solution has been arrived at not by researchers, but by a computer at the Centre for Regenerative and Developmental Biology, Tufts University.

Daniel Lobo and Michael Levin set their computer the task of crunching through voluminous pharmacological, genetic and experimental data in an attempt to establish mechanisms of morphological regulation in a specific species of flatworm. Astonishingly, over the course of several days, the computer was not only able to derive a regulatory network that fit all experimental scenarios fed into it, it even hypothesized the involvement of two additional, previously unidentified biological products. The supposed interaction profiles of these two products led thus to two candidate genes, both of which had already been sequenced within the animal's genome, making the final model the most comprehensive and reliable explanation of planarian regeneration to date.

Lobo and Levin's findings will hopefully lead to advancements within the field of regenerative medicine, but the success of their method has more momentous and wide-reaching implications, provided certain hurdles can be overcome.

Although the actual computer run-time was relatively short, the necessary preparatory work was completed over more than a year. The principle difficulty the duo faced was the lack of standardization in the experimental data, or in the mathematical relationships that would describe that data. This is a problem common to most data-sets aggregated from different sources and is one of the key challenges associated with Big Data analytics.

The Regulators' Digital Decree

In an ideal world, at least from an information science perspective, all data would be codified according to rigorous and universal system of conventions, thus harmonizing across conditions, companies, regions and languages. The pharmaceutical industry has had to implement a degree of standardization, for example the MedDRA dictionary of adverse events, but for the most part there is no overarching and industry-wide system governing how their accumulated information is structured and recorded.

Indeed, many of the world's regulatory authorities consider this lack of standardization to be an obstacle to their effective monitoring of drug safety. This is because it hinders the reliable tracking of product batches and, significantly, the performance of meta-analyses on accumulated pharmacovigilance data (for example, analyses of different brands of the same drug being produced by different companies; of products originating from the same manufacturing facility; or of products with similar therapeutic class or indication).

Cue the entry of the ISO Identification of Medicinal Products (IDMP) standard, a multi-stakeholder effort to digitize and harmonize information pertaining to the origin, composition, manufacture, licensing and clinical particulars of medicines. The European Commission (EC) has already set a mandate for the implementation of IDMP in the EU and other regions, including the US and Canada, have set direction to follow suit.

Aggregating and preparing the data required for IDMP will prove a considerable challenge for most companies with large product portfolios, especially if they were accumulated inorganically, hence both industry and regulators are currently seeking to negotiate (essentially, to postpone) the EC's 2016 deadline. And yet IDMP is only the most recent and visible manifestation of a more fundamental trend; the rise of Health Level Seven (HL7)-based architecture.

Bringing Order out of Chaos

HL7 is a standards developer, like ISO with whom they work closely, and together these two organizations provide a basis for the interoperability of healthcare data. Whenever a hospital needs to transfer information about a patient or a clinical test, be it internally or to an outside institution, the chances are they'll do so via a HL7-based message. However the evolving healthcare landscape is increasingly bringing together patients, clinicians, payors, researchers, and industry, meaning inter-hospital communication is no longer the only requirement.

The most recent suite of HL7 specifications, Version 3, brings a scalable information architecture that can be used to model any entity in the health domain. It is such versatility that is driving the ever-increasing adoption of HL7-based methodology, some of which will already be familiar. The Individual Case Safety Report (ICSR), for example, is an HL7-artefact; so too is US Structured Product Labelling (SPL); while HL7's new standard, FHIR (Fast Healthcare Interoperability Resources), is expected to become dominant in the coming explosion of web-enabled medical devices and healthcare apps.

Digitization of information pertaining to a company's products will undoubtedly lead to benefits in terms of business processes, marketing and operations, perhaps even research for large enough entities, but the full potential is only likely to be realized when these data can be effectively combined with other, external sources.

All of this matters for Big Data because standards bring with them unified data classification, quantification, interrelation and – in the form of controlled vocabularies – nomenclature. IDMP will use in excess of 60 such vocabularies. Translating data into this 'language' is one of the key difficulties that make the EC's deadline unachievable (that and the fact that the vocabularies themselves, in common with some other aspects of the standard, have not been finalized yet), but should greatly facilitate the performance of exploratory analytics.

To be clear, IDMP data does not, in and of itself, constitute Big Data in terms of its scale. Digitization of information pertaining to a company's products will undoubtedly lead to benefits in terms of business processes, marketing and operations, perhaps even research for large enough entities, but the full potential is only likely to be realized when these data can be effectively combined with other, external sources.

And the external sources are legion. Examples include information on chemical synthesis, their targets, interactions and bioactivity (e.g. Reaxys); cell-signalling pathways (e.g. the Signal Transduction Knowledge Environment, STKE); genetic sequences (e.g. Genbank); patent and legal-information (e.g. Questal-Orbit); clinical trial databases (e.g. FDA's Clinicaltrials.gov and EMA's Eudract); scientific literature and services which aggregate and index this data (e.g. Web of Science); web-search data (e.g. Google); information from mobile health apps and tracking devices (e.g. Fitbit) and, potentially, if properly anonymized, public and private healthcare records. What's more, there are already efforts to introduce controlled vocabularies into many of these sources, for example the Gene Ontology Consortium or the Systems Biology Ontology.

A New Hope

What sort of insights might pharmaceutical companies gain from this new information asset? That remains to be seen. It seems inevitable that improved analytics will benefit commercial and operational aspects; so too the more 'behavioral' side of R&D, such as using patient/social/prescription data to determine the stimulus and outcome of non-compliance, design clinical trials, or propose risk-minimization measures; but what about more pure scientific advances?

Cancer is a field where the promise of meta-analyses has been trumpeted almost since the introduction of electronic health records; and we must endure almost daily news reports of the myriad dietary and life-style factors that have been associated either with oncogenesis or with remission. Indeed, such meta-analyses may have (justifiably) contributed to some of the scepticism that surrounds Big Data in medicine.

More usefully, IBM's Watson, the system that vanquished human contestants on Jeopardy in 2011, has since been put to work analyzing medical information to recommend treatment strategies for lung and brain-cancer patients. IBM estimates that about 80% of the available healthcare data is 'unstructured', the free-text found in books and journals, for example, and their principle difficulty still lies in training the Watson to understand the enormous complexities, subtleties and vagaries of human language.

Elsewhere, a company called Propeller Health plans to combine information from their customers' wifi-enabled inhalers with weather conditions, wind direction, air pollution, pollen counts, land-use and traffic patterns to create an 'Asthma Risk-Map'; while the actual manufacturers of these asthma medications hope to use that same data to better inform their supply-chains.

Neither of these use-cases can quite be considered medical breakthrough, but perhaps now is the time to feel more optimistic. What Lobo and Levin's work has done is show what computers can really achieve when provided with high-quality inputs. And what HL7, ISO and the regulatory authorities are doing is propel the industry down a path towards this objective. Perhaps Big Data's Trough of Disillusionment in healthcare may not be nearly so deep, or as wide, as feared.

About the author: Tom Macfarlane is a drug development professional who is as passionate about innovation in information science and regulatory science as he is medical science. He has held positions within both industry and CROs, having previously worked as a senior consultant within Parexel's Integrated Product Development practice, and now leads Regulatory Affairs on behalf of Accenture Life Sciences.

Real World Evidence & Market Access Summit USA

Dec 3, 2015 - Dec 4, 2015, Philadelphia

Leverage Real Life Data & Analytics for Value-based Market Access

Reuters Events | Pharma
Thought leadership and innovation for the Pharmaceutical Industry

Real World Evidence & Market Access Summit USA