Unraveling the threads: Simplest cotton genome offers clues for fiber improvements

From the stockings decorating mantles to the new outfits in display windows calling to shoppers, cotton is woven into the fabric of the holiday season. For bioenergy researchers, however, fiber composition matters more than color and texture as each cotton strand is composed of more than two dozen coils of cellulose, a target biomass for next-generation biofuels.

In the December 20, 2012 edition of Nature, an international consortium of researchers from 31 institutions including a team from the U.S. Department of Energy Joint Genome Institute (DOE JGI) present a high-quality draft assembly of the simplest cotton (Gossypium raimondii) genome. Additionally, the team compared the genome from this ancestral species indigenous to the Americas to several other sets of cotton data contributed by the U.S. Department of Agriculture (USDA). The results have allowed the researchers to trace the evolution of cotton over millions of years from wild varieties to the domesticated species that are now associated with textile production.

Growing, processing and manufacturing cotton is a major global industry. In the United States, more than 200,000 domestic jobs are related to cotton production and processing, with an aggregate influence of about $35 billion on the annual U.S. gross domestic product. The cotton fiber grown is valued at about $6 billion per year, with cottonseed oil and meal byproducts worth nearly another $1 billion. U.S. textile mills convert much of the cotton processed annually into apparel.

The cotton plants seen growing in typical fields in the U.S. are polyploids, hybrids of two types of cotton (cotton A and cotton D) with multiple copies of their genomes or chromosome sets. As one of the closest extant relatives of the tetraploid cotton genome (containing four sets of chromosomes), the diploid G. raimondii was selected for sequencing in part because it has a smaller genome and fewer repetitive elements than A-genome cotton and is much less complex than the polyploid cotton. D-genome cotton does not produce spinnable fibers, unlike A-genome cotton. Having data from multiple genomes for reference, such as those provided by USDA for this study, enabled the team to trace cotton's lineage, the evolution of hybrids and the gene duplications that allowed fiber development.

"This cotton data will help accelerate the study of gene function, particularly cellulose biosynthesis, the understanding of which is fundamental to improved biofuels production," said Jeremy Schmutz, head of the DOE JGI Plant Program and a faculty investigator at the HudsonAlpha Institute for Biotechnology, who led the effort to sequence and assemble the genome. "In addition, the unique structure of the cotton fiber makes it useful in bioremediation, and accelerated cotton crop improvement also promises to improve water efficiency and reduce pesticide use."

The DOE JGI's contribution of sequencing and assembling the 760-million basepair genome stems from a Community Sequencing Program proposal led by University of Georgia professor Andrew Paterson. "This study represents the first time that a polyploid plant was compared to its progenitors over the entire genome," he said. "This study reveals evolutionary processes salient to all plants and provides a strategy to better understand the genome of many other crops, such as canola, wheat, and peanut."

Learning more about the genetic contributions of the D- and A-genomes to the common cotton species can help researchers improve fiber traits. One anecdote shared by Jay Keasling, Associate Laboratory Director for Biosciences at Lawrence Berkeley National Laboratory and CEO of the Joint BioEnergy Institute, is a reminder that the cellulose chains comprising his cotton shirt withstand repeated laundering. The story helps people understand the challenges involved in breaking down and cost-effectively converting cellulosic biomass such as plant matter into biofuels.

Don Jones, Director of Agricultural Research at Cotton Incorporated, said this G. raimondii gold standard genome will be the foundation for sequencing upland cotton, G. hirsutum, that makes up most of the worldwide field crop. Another species, G. barbadense, produces Pima cotton but accounts for less than two percent of the cotton crop. "This sequence is a cornerstone that will help advance our knowledge so we more thoroughly understand the biology that leads to enhanced yield, improved fiber quality, and better stress tolerance, all improvements that will benefit growers in the not-too-distant future."

Schmutz said this is a good example of a project in which the DOE's genomic contributions were matched by resources from the research community. These include genetic maps, RNA sequencing, additional genomic sequence and detailed genomic analysis, allowing a more detailed and meaningful interpretation of the results. Aside from the DOE JGI, the University of Georgia, the USDA, and Cotton Incorporated, other public and private agencies that participated in this project include the Iowa State University, Mississippi State University, the Consortium for Plant Biotechnology Research and the U.S. National Science Foundation.

Source: DOE/Joint Genome Institute