Send to printer »

Feature Articles : Apr 1, 2013 (Vol. 33, No. 7)

Computational Bio Problem Solving

  • Josh P. Roberts

We are clearly on the verge of a revolution in medicine—one in which the sequence of As, Ts, Gs, and Cs is as much a part of a patient’s phenotypic profile as are the sounds of the heart or were the colors of the bile. Yet we are clearly not there yet, with many hurdles still to be overcome.

At the recent “Asia Pacific Bioinformatics” conference in Vancouver, Brad Popovic, Ph.D., CSO at Genome BC, moderated a session where stakeholders from pathologists and funders to bioinformaticians and large technology firms discussed how to traverse the technical, political, cultural, and other landscapes while avoiding the hidden mines on the way to viable medical informatics. The panel discussion helped to provide attendees—who Dr. Popovic described as largely trainees, and largely from the broadly defined bioinformatics field—with some guideposts to get from here to there.

Sequencing technology has been advancing at such a clip that in some ways it is outpacing our ability to assemble and interpret the data it is able to generate, said Dr. Popovic.

Among the biggest concerns in medicine—and in medical testing specifically—are whether data is accurate, reproducible, and translatable from one instrument (or algorithm, or lab, or technician) to another.

Jordan Stockton, Ph.D., director of product marketing, computational biology for Illumina, was asked, “How do I know what truth is? How do I know when I see a call, or a variation, or an interesting marker, that I know I’m looking at something real?”

He believes the answer has two parts. On the one hand, vendors like Illumina “own” the “accuracy problem”—of optimizing the fidelity of the calls, and alerting the end user when they should or shouldn’t trust the data. That problem, he said, is “addressable if not solved.”

(A greater challenge is in knowing what the data means, and in what context should it be taken to mean something medically serious or medically actionable, Dr. Stockton said. That is something that will potentially evolve rather than being solved in one fell swoop, and will be tackled by the community and vendors together.)

The medical community wants to see measures of confidence and portability. One way for this to happen—to move the field forward—is to coalesce around a particular set of standards. “I think in the past three years there has been a lot of consensus around various means of variant calling and various means of representation of data,” Dr. Stockton said.

He added that Illumina’s market presence allows the company the opportunity, and to some extent gives it the burden, to impose some standardization. Using a cloud-based system to analyze the data churned out by sequencers, too, “gives people the opportunity to literally be using the same code to execute the same test anywhere in the world.”

Research vs. Clinical Spheres

As of now the field is mostly in the research mode, trying to collect data, write algorithms to deal with the data, experiment with human interfaces, and debate the ethics, decide on regulation, and explore options for reimbursement.

Clinicians will need to know when a result is actionable and when it is a “variation of unknown significance (VUS),” said pathologist Aly Karsan, M.D., head of clinical diagnostic genomics and director of the cancer genetics lab at the BC Cancer Agency. It’s important that a test has been validated.

Treatment decisions require clear, crisp answers. Yet for the most part we haven’t done enough genomes to know what “normal” is, and we don’t know how extensively SNPs and indels pervade, pointed out George Michaels, Ph.D., director of life sciences programs in the Health Strategy and Solutions Group at Intel: “Interpreting genomic information in terms of the context of individual and populations really is in the early stages of directly impacting clinical practice.”

The complexity involved in personalized medicine is only beginning to be dealt with computationally. For now it’s unlikely that a genomic analysis can help recommend a particular drug or treatment—with possible exceptions in oncology—although Dr. Michaels predicts that affordable proactive personal genome technologies will soon be able to identify those that should not be used.

“There is going to have to be some larger consortia to actually define what variants actually impact on the biology of the case at hand,” Dr. Karsan remarked.

Lock It Down

While physicians need—and regulators require—clear and concise answers on which to base clinical decisions, there is an inclination among those developing tests to continually try and improve their product.

Interrogating the same samples on a routine basis should yield the same result. Even minor fixes in the software that calls bases coming off a sequencer, for example, or the algorithms into which those calls are being fed, can result in different data being delivered, and this is “not really stable enough to be stable from a clinical perspective,” said Dr. Karsan. The first step for genomic approaches to be truly useful is “locking down the informatics approaches.”

That doesn’t have to mean throwing away the key, though. “Locking down means that, on a day to day basis, you use what you have,” he noted. “If you find a bug, you fix it, but then you test and you go back and ask, ‘Does this bug-fix actually impact on previous results?’”

Algorithms can also be improved just by virtue of having a larger training set. Take a hypothetical algorithm, based on mining the data of 5,000 patients, from which were collected not just one variable-worth of data but literally millions. It uncovers a set of 22 biomarkers useful in establishing a prognosis, it’s vetted in clinical trials, it’s marketed as a test in CLIA labs, and it’s ordered by oncologists. But then fast-forward two years and 15,000 more patients, and “based on these 20,000 patient experiences and genomic data, it’s these 22 markers plus another 5 markers that we’ve found,” offered Elai Davicioni, president and CSO of GenomeDX.

Yet from both a regulatory and reimbursement perspective, a new test based on the 27 markers cannot simply be marketed as the new and improved version of the original test. A test has to be locked down, immutable: “If you want to change you basically have to go through the whole process again,” he lamented, although he does not see an easy solution to the dilemma.

Informatics software, like any black box, faces “the most rigorous bar for anything to go through FDA approval,” pointed out Dr. Popovic. “You’d better have an algorithm that can be replicated a gazillion times in different clinical environments, against different types of samples, if you’re ever going to put that into clinical medicine. I don’t think the field has matured yet to really realize that.”

Adolescence

Based on his experience integrating genomic technologies into clinical practice “in the early days” of the late 1980s, Dr. Popovic cited four interrelated issues that the field will have to grapple with on its way to maturity.

“People’s credentials define their turf in medicine, and people in bioinformatics currently don’t have any turf defined.” This is the first of the issues, he said. Although genetics and genomics are applied in pathology labs, bioinformatics is not part of lab medicine or pathology—it is “essentially a group of outsiders trying to break into somebody else’s turf.”

Second, the bioinformatics community “hasn’t really developed algorithms and had to put them through a clinical sieve to become the standard of care.” This is a long and laborious process, and it is important to understand that the road from a cool algorithm helping to pinpoint the meaning of a VUS, to becoming standard of care, is a long one.

Third, the field needs to grapple with how to get reimbursed for their efforts: “You have to up front reconcile what you’re going to do to get paid for that. You truly need a business plan to show how you’re going to execute this and actually put it into clinical practice.”

And last, there is a need to understand where to aim the technology to answer questions that are relevant to the physician who will be ordering the test. Just because something can be done doesn’t mean it will be.

Dr. Karsan pointed out that from the user’s perspective, even a targeted panel of 40–50 genes—let alone a whole genome or whole exome sequence—is still a lot of information coming in. It’s important to have tools for visualization of that data, and to have an easy way to interface with the databases that are currently available.

Ultimately the pathologist would have the sequencer hooked up to the hospital information system, which can then kick out a report saying a test is positive or it’s negative—not detailing the VUSs that may be interesting to follow. Because, explained Dr. Popovic, “that’s absolutely meaningless to the doc that has 30 seconds to read the report and give an answer to the patient. They need a yes/no.”

All that Data

Huge amounts of data are the curse of modern biology—eclipsing the Library of Congress in terms of volume of data. “We’re running out of room to add new stuff,” pointed out Dr. Michaels. In addition, the sheer computing power required by some new algorithms is testing the limits of present technology.

Intel is beginning to devote more resources to studying the energetics of computing. Many algorithmic approaches, they have found, are extremely inefficient, moving lots of data around, which is “the most expensive part, energy-wise,” he said. This is driving new memory technologies and methodologies. “The systems that we develop in the future are going to be very different, from an architectural standpoint, but also with very different programming issues associated with them.”

For its part, Illumina “is committed to making the data as small and portable as possible,” said Dr. Stockton. The firm is collaborating with the community on ways to represent the genome that is “vastly more compact,” including working with EMBL’s European Bioinformatics Institute on the CRAM compression format, as well as “streamlining the data that comes directly off of the instrument.”

Space issues aside, there is talk of whether it will make more sense to store a person’s genomic sequence or to have the patient resequenced later if necessary. Among the unresolved issues, noted Dr. Popovic, who co-authored an Oregon law that paved the way for the federal Genetic Information Nondiscrimination Act of 2008, are security of the data, privacy concerns (including the right for the patient not to know), and how to prevent the data from being used to discriminate against the patient.