June 15, 2006 (Vol. 26, No. 12)
A Seven-fold Path for Defining Quality and Acceptable Performance
Many articles have been written about the state of biomarker identification and validation using gene expression and proteomics. Many call for greater care and skepticism regarding biomarker identification and validation, and some paint the whole field as in crisis. Even the popular press has picked up on the issue and reported on parts of the controversy. Clearly, for some types of tests, incorrect diagnosis could result in significant patient harm, thus these criticisms need to be carefully considered. To paint the entire field as unreliable, however, seems rash in view of the tremendous opportunities and the valuable tests that have already reached the clinic.
Let’s look at the issues, apply the valuable suggestions, and consider these tests in the context in which they will be used, in the context of tests that are widely employed today, and not in isolation from practical and clinical decision making.
Biomarkers are patient diagnostic tools that have recently risen in importance as an element of healthcare research, because newer technologies, such as proteomics, transcriptomics, and genetic tests, hold the promise of developing a variety of innovative new tests.
For clinicians, biomarkers are attractive because they promise individualization of therapy, tailoring a patient’s drug or dose to that which is more likely to work for that patient’s particular pathology. Drug developers as well hope to apply biomarkers to identify patients who will benefit from a new therapy and also to use biomarkers to rapidly demonstrate pharmacological effect, allowing new candidate drugs to demonstrate efficacy more quickly in smaller trials.
Patient Health Benefit
Many articles that question current results make good suggestions and should be taken as a measure of the best practices for the field and as measures of study quality.
First, an adequate number of subjects to represent the entire patient population eligible for the test should be required in the study. Many of the current studies include as few as 20-80 subjects, wherein as few as 25% of the patients have the pathology or outcome under investigation. Along with others, I suggest that 200 subjects (of which at least 10-20% should exhibit the phenotype or outcome under investigation) is a minimum number to identify biomarkers that may hold true when larger populations are examined.
Besides study size, the second aspect to be considered is suitable patient diversity. Too many published studies use patient populations that poorly represent the likely spectrum of patients who would be real-world candidates for the test. There are several causes for low patient diversity, one of the most common being the reliance on one or two clinical centers and a single clinician to make the diagnostic decision about the patient’s disease status. This type of geographic and diagnostician bias skews the biomarker, which results in a lack of widespread utility. I recommend that at least 5-10 different clinical centers and diagnosing clinicians be used to assemble the patient cohort.
Third, proper estimation of performance is key, as many published studies inadequately estimate biomarker future performance. A solution would be to insist on more statistical rigor from the primary investigators and to clearly indicate that split-sample cross validation of the training dataset be required for publication.
Fourth, there must be an accurate demonstration of statistical significance. Almost all publications describing biomarkers fail to test statistical significance using the class permutation test. Whole-genome microarrays measure tens-of-thousands of analytes, as do several proteomic techniques. A test that measures many analytes with too few patients can easily derive biomarkers that are not based on biology but arise due to random variations.
Fifth, I encourage workers to apply the various algorithms available as it would improve results. Simple biomarker derivation approaches using techniques such as gene list rankings by the t-statistic, correlation coefficient, nearest centroid, and other analysis of variance and correlation methods are common.
Several more powerful and typically successful methods are available, such as linear discriminates, support vector machines (linear and nonlinear), and neural nets.
Successful development of a biomarker that meets these five criteria would allow the biomarker to be considered an early research finding, or a research biomarker.
Sixth, to validate this biomarker further, the investigator must take another step. Investigators should forward validate using a new cohort of patients of similar size as or larger than that used in the biomarker development phase. Importantly, this group should not include patients that were part of the biomarker development group, and the new patients preferably should come from different clinical centers and be diagnosed by different physicians than those in the biomarker development.
Assuming that the research biomarker continues to show acceptable performance in the forward validation test, then the biomarker can be considered as a probable clinical biomarker.
The seventh step in biomarker validation is to conduct clinical trials that aim to demonstrate patient survival benefit, or outcome benefit, resulting from the application of the test, as compared to current standards of medical practice.
If this last test is successful the biomarker may be considered a clinical benefit validated biomarker.
Analytical Validity
The recommendations presented here are designed to demonstrate clinical reliability and a patient health benefit for newly reported biomarkers. The recommendations are more stringent than would be required for demonstrating analytical validity. The FDA’s “Guidance for Industry: Pharmacogenomic Data Submissions,” defines two levels of validationknown valid biomarker and probable valid biomarker. Both of these definitions focus on validation at the analytical level and generally leave the patient health improvement benefit statement to be demonstrated via a “widespread agreement in the medical and scientific community.”
To allow for the publication of intermediate results I recommend that journals encourage publication but insist that publication of research biomarkers and probable clinical biomarkers be labeled appropriately. These reviewers should also insist that biomarkers that do not adhere to all of the first five steps be labeled as correlated analytes.
Evaluating Success & Performance
Biomarker validation through formal proof of clinical benefit in a controlled clinical trial has not been applied to many biomarkers that are in wide clinical use today. These markers often are used because of the compelling logic of the biomarker’s utility, due to anecdotal reports, and informal physician reports.
These tests are often marketed via reference labs that are not regulated by the FDA or as analyte-specific reagents to hospitals and research labs. Almost all of the immuno-histological tests and many of the viral load and viral genotyping tests used in medicine today are marketed this way.
Some commentators have suggested that a test should be perfectit makes no mistakes identifying sick patients and makes no false diagnoses on healthy patients. No test is ever this good, and the real issue is to define acceptable performance. This question must be answered after considering the risks implied by false negativessick patients missed by the testand false positiveshealthy patients identified as sick. If the test is used in isolation, a false positive result could subject the patient to unnecessary surgery or therapy with the attendant risk of death or morbidity, or it may require expensive follow-up. Thus very high sensitivity is often not required but high specificity is demanded, the Papsmear test for cervical cancer is an example.
Drug discovery efficacy biomarkers are tools where the test is only one of several indicators that influence decisions. In this situation there are many subsequent check points, so the risk of a false indication of efficacy is low, and therefore, the use of relatively unproven research biomarkers or even correlated analytes would be acceptable. Thus we should consider biomarker tests in the context in which they will be used and not as abstract concepts to be validated based on rote formula.
The need for new biomarkers is clear, and new technologies promise new types of tests, many of which will change the course of disease management. However, development of these new biomarkers requires care and adherence to best practices for diagnostic test development. Approached properly this will yield new medicines, substantial improvements in patient care, and improved clinical outcomes.