"Without plants, we’d be in serious trouble."
--Preface to "Achievements of the National Plant Genome Initiative and New Horizons in Plant Biology," National Research Council, January 2008.
Two months ago, the draft sequence of the castor bean, Ricinus communis, was published to some fanfare, earning a spot as the cover article of Nature Biotechnology. Of course, the sequence itself had already been shared before publication, with continuous updating and annotation. It is one of 21 publicly available plant genomes—a list that includes grapes, corn, apple, sorghum, cucumber, cassava, rice, cacao, peach, papaya, and soybean. The other nine are: five model organisms—two varieties of Arabidopsis, a grass (Brachypodium), monkey flower, and one legume (Medicago)—as well as moss, spikemoss, poplar, and green algae.
For an exceedingly diverse plant world, the selection of which plants to sequence reveals the two major drivers of whole-genome sequencing in plants: agricultural productivity and scientific curiosity. With sequencing technology becoming faster, cheaper, and more exciting (sequence a billion bases a day—imagine that), whole-genome sequencing is well on its way from cutting-edge to customary. But how does one decide which plant to sequence, and what good will come of it?
Whole Genomes—a Big Deal?
One of the most significant limitations is genome size. In this regard, plant genomes vary widely: from about 450 Mb for rice to 2,500 Mb for maize and a stunning 16,000 Mb for wheat. For reference, the human genome is about 3,000 Mb.
It's useful to note that whole-genome sequencing is not, by any means, an exhaustive mapping of the organism—it is just one part of a portfolio that includes the epigenome, proteome, metabolome, and a laundry-list of other omes that are necessary to fully describe the plant's functioning.
One of the most important complements to genome sequencing is a library of expressed sequence tags (ESTs), sequences of cDNA taken from mRNA transcripts. ESTs serve as an excellent way to locate and copy useful genes, and for this reason EST projects often take precedence over genome sequencing.
Whole-genome sequencing isn't whole, either. The scope of genome sequencing projects usually doesn't include all of the DNA in an organism—some segments are too difficult to sequence because they are exceedingly repetitive, and other segments such as those near centromeres and telomeres are often left out because genes are not expected to be found in either location.
Once the sequencing has been done and aligned, annotation is still necessary to match genetic features to functions, and many draft genomes are published with only half the genes annotated in some useful manner. So why bother? What is the advantage of a full genome versus sequencing a few sections when you figure out you need them?
Some Assembly Required
The challenges of alignment and annotation bring us to the first answer to that question: the compounding effects of basic research. A draft genome makes the alignment and annotation of other related genomes much easier. It's hard to quantify the value added with so many other factors at play (improving techniques, falling sequencing prices), but the effect is there.
Just as the human genome project aided in individual sequencing to find variability, so too will these plant genomes assist in the cataloguing of plant diversity. The National Plant Genome Initiative counts among its achievements several caches of genomic information that are helping to identify markers for better fruit, root, and seed development, as well as blight resistance and stalk strength.
This is the advantage of whole-genome sequencing—when you don't know what you're looking for, you can identify differences between individuals and use those as a foundation for further research. Whole-genome sequencing wasn't necessary to copy a single enzyme out of Artemisia annua for use in biochemical production—at least, the genome of Artemisia annua wasn't necessary.
The authors instead sequenced cDNA libraries, looking for a sequence similar to the ESTs of the cytochrome P450 family the enzyme was expected to belong to. Whole genomes are great for comparisons and markers but not as helpful when you already know what you're looking for.
When you've found what you're looking for, how does a whole-genome sequence help you effect change? With respect to genetic engineering, whole genomes have been fairly useful. Some important technologies have come out of it, for example, the analysis of corn chromosomes led to the development of a self-replicating, persistent mini-chromosome that resists recombination by segregating early.
Another notable project is using the soybean sequence to find homologous genes in other members of the genus and recombine them back into the soybean genome for hardier plants. An entire segment of the Plant Genome Research Program's 2009 funding is dedicated to heterosis studies, where genomic data is used to figure out why heterozygous hybrids are stronger than their purebred counterparts.
The product of a growing knowledge base and new engineering applications may very well be the next Green Revolution but only if social and regulatory factors permit. But it's not easy: Consider the "Major Accomplishments" section of the National Plant Genome Initiative's five-year plan. In the "Discovery" section, you'll find all sorts of new genomic data and data analysis tools, but flip to "Translation" and you'll find a slightly different story: Marker-assisted selection constitutes the only two entries.
One mentions identification of blight markers, while the other describes the creation of new germlines from crosses selected with marker-assistance. What you don't see is cross-species gene recombination or new recombination techniques.
A small sidebar trumpets the development of "submergence tolerance rice," expected to greatly increase crop yields in Bangladesh and rescue many of the world's poor from malnutrition and worse. Dig a little deeper and you'll find that this rice was produced with precision breeding rather than genetic modification to avoid regulatory testing and public disapproval.
For all the advances in gene delivery and plant modification, these remarkable advances are being impeded by societal apprehension and regulatory hurdles. Indeed, in a recent article, researchers at Oregon State argued that onerous paperwork, excessive containment requirements, and legal liability were effectively strangling biofuel and agricultural GMO R&D.
It's a shame that these technologies have become the latest example of "the public giveth and the public taketh away." At a cost of millions of dollars per genome ($30 million for the main corn genome project, another $2–5 million for the mini-chromosome and structure), better understanding of the risks and rewards of GMO technology is essential for the public to get a full return on its investment.
And, for the record, no one is cracking the pistachio genome ... yet.