Send to printer »

Feature Articles : Apr 15, 2012 (Vol. 32, No. 8)

The Data Ecosystem in R&D

  • Chris Molloy, Ph.D.

“In the long history of humankind ... those who learned to collaborate and improvise most effectively have prevailed.”

—Charles Darwin

In any ecosystem there exists multiple interdependent relationships, niches, and environmental pressures. In all ecosystems there are food chains that convert energy from simple to increasingly complex states.

This metaphor is absolutely applicable to the data-driven environment of life sciences and healthcare R&D, where the efficient generation of complex information assets is key to success. Expressing the creation of complex knowledge assets from simple data through a “data-chain” enables a new thinking and one that is shared with other technology-dependent sectors. This approach enables collaborative thinking.

Today’s world is a 24/7 highly connected data-driven society where more people have smartphones than fresh running water. Yet while we are empowering the patient and the individual to be at the center of their data ecosystem, we are asking the brightest minds in the world to work in document-driven, linear processes familiar to researchers of a precomputer age.

In all sectors of R&D, researchers are finding it harder to collaborate with colleagues in the next building than to program their home TV from the workplace. R&D has become an information science and understanding the ecosystem is the first step in managing it.

Making processes and data interoperable across the ecosystem relies on making information available in real-time, and in a structured and secured manner. This can then be readily consumed by those from multiple disciplines who have the need to use it and the need to build high-quality capital knowledge assets.

The Real Ecosystem

Traditionally the R&D process has been a linear progression through a multidisciplinary set of teams chained together to provide basic research, new product discovery, regulated trials, and manufacturing. This concept, popular for over 30 years, does not reflect the way that these teams really collaborate and, in fact, serves to entrench a siloed mentality that is often reinforced by separate historical management and informatics structures.

In reality the R&D process is a complex interdependent community of projects, supported by various teams each providing skills and guidance to move products from inception to delivery: an ecosystem of ideas, data, and information.

Everyone in the ecosystem is both a generator and consumer of data.

The scientific method starts with a hypothesis “an idea based upon facts already known”—and ends with the communication of a conclusion to one or more partners to take the next step. The ganging together of these cycles of experimentation serves to build projects, departments, and entire R&D organizations (Figure 1). This is a food chain of information, and ideas become more valuable and complex as they progress through the ecosystem.

Facts, like food, need to be in a consumable form to be utilized effectively; they need to be complete and comprehensible; they need to be delivered at the right time and actionable. So often in R&D this fails to happen, leading to inefficient “digestion” of the data, IP loss, inadequate decision making, and a poor corpus of organizational knowledge.

Individuals should be at the center of their data ecosystem, not dependent upon others to define what they receive.

A recent survey of 682 researchers by IDBS and Scientific Computing reviewed the ability of researchers to work within that data ecosystem. The research shows that today’s researchers wish to, but fail to collaborate effectively. In many cases this is simply because they cannot efficiently move data from one person to another.

Highly Fragmented, Document Driven

Our survey identifies a number of issues. The data ecosystem is highly fragmented with researchers having to use multiple, often disjointed systems to capture, compute, and structure their information (Figure 2).

Notable in the research is the prevalence of in-house systems. These represent niches within the ecosystem that are often vestigial: an important workaround from some time in history that can be an impediment to evolution.

The second important finding was that data—the common currency and language of science—is most often communicated via documents (Figure 3).

Documents are containers of data that are constrained by their creators with certain levels of context and interpretation. Previous research by IDBS has shown that approximately 25% of a researcher’s time is spent writing such reports—compressing information for communication.

In a linear process where the requirements for the consumption of data are well understood and unchanging, this may be sensible. However, in a complex environment where different consumers require different data it is not.

The research shows that 60% of researchers have to wade through multiple documents to extract the data they need to start their work. Even when this is done, researchers may often have to spend time in unstructured Q&A sessions with those groups who generated the underlying data in order to challenge and obtain what they need.

Thomas Goetz, executive editor of Wired and author of The Decision Tree: Taking Control of Your Health in the New Era of Personalized Medicine, reminded us that “We can profoundly change our behavior once we are provided with the relevant data.” Our survey showed that 91% of researchers could not align data from internal or external collaborators effectively.

Evolutionary history is metered by a number of transformational events that appear to change the course of development. These extinction events are driven by environmental pressures and the inability of much of an ecosystem to “fit” the new reality.

Environmental pressures affect any ecosystem, and today’s environment, particularly in pharmaceutical sciences, is highly challenging. Andrew Witty, CEO of Glaxo­SmithKline, put it this way: “The blockbuster business model clearly worked—and up until the time of the human genome breakthroughs, most would have expected this trend to continue. It has not. So we are having to reinvent our industry.”

R&D must become more productive and innovative—in fact more innovative than ever—to reverse the current trend of cost-per-marketed pharmaceutical, now estimated at almost $4 billion. Leaving aside the undoubted physical challenges of patent cliffs and regulatory fences, the productivity gap has driven the externalization of R&D.

Chris Viehbacher, Sanofi’s CEO, said the firm has realized that “major groups are not great sources of innovation.” Externalization allows companies to tap into smaller, flexible organizations and global talent but has the effect of making the extended organization significantly more complex.

The ecosystem though remains very similar: groups consume data, groups produce data. The complexity is only noticeable if the data is not available to the consumer.

Each data generator should be able to seamlessly add to this landscape of data and each can draw from it in the way they need, personalizing their data feed so that it remains consumable and relevant. Reliance upon “Death Star” warehouses—which must be built with knowledge of all the questions that will be asked of them—are the dinosaurs of our modern, personalized IT age. Is the cloud the new ecosystem? It will play a vital role in providing extensible computing but requires structure, application, and security.

There remains a lack of understanding about the various ways in which the cloud environment affects existing systems. An increasingly common approach to survival in this atmosphere is “cloud-claiming”: to simply provide a hosted version of an existing client/server application and brand it as a cloud-based or SaaS solution. This hosting-equals-cloud approach is not SaaS but mainly leverages Infrastructure-as-a-Service (IaaS).

As Seymour Dunker, CEO of iCharts, says, “Everyone acknowledges that data is exploding, but no one seems to have a handle on finding relevant data and making meaning from it. The ecosystem of data is still in a very infant stage. The big gap is a data publishing and distribution platform that makes it simple to take the data from the source to where it can be utilized most effectively.”

Perfect Storm Approaching

Our research shows that there may be such an extinction event on the horizon. Researchers now recognize that their systems are too fragmented, that many of their existing tools can no longer support their ecological (business) requirements and, alarmingly, that there are too few informatics staff to solve the problem internally.

One of the likely outcomes of this is that many niche solutions will require integration into data systems that can survive the challenges of living in a data-rich, highly collaborative, data-centric future—in short: scalable, searchable, and secure.

Yet none of this theory is really new. The single point of truth concept of supply chain logistics is well established and the world of finance has recognized itself as a Big Data business for over 25 years. Learning from other sectors is vital if the new life sciences data ecosystem is to be viable.

Harness and Manage the Ecosystems

Darwin puts us back in the frame: “In the survival of favored individuals and races, during the constantly recurring struggle for existence, we see a powerful and ever-acting form of selection.”

In the R&D ecosystem those who can capture, consume, and share data efficiently and effectively will remain able to generate high-value knowledge assets. Those who are unable to shift from the siloed poorly connected models will be overtaken.

Michael Crichton in Jurassic Park, memorably stated “life will find a way.”

The data ecosystem will evolve with or without intervention, but surely those who manage to embrace the move from static documents to just-in-time access to quality data on demand will be those at the top of tomorrow’s ecological pyramid.