Whole Genome Sequencing: Is It Ready for Prime Time?

Rapid sequencing can create primary care opportunities for pathologists

CEO Summary: Pathologists at Beth Israel Deaconess Medical Center in Boston, Massachusetts, in a collaboration with GenomeQuest, Inc., will produce whole human genome sequences of patient tumors and other specimens. These whole genome sequences will be studied to learn what diagnostic, therapeutic, and prognostic information they contain. GenomeQuest CEO Richard Resnick discusses what is required to sequence a tumor specimen and how the resulting data will be used.

PATHOLOGY AND LABORATORY MEDICINE are poised to make a full entry into genetic medicine. That’s because the cost and accuracy of producing a whole human genome sequence is falling at startling rates.

This is a trend that has disruptive potential for both clinical laboratories and anatomic pathology groups. Once it becomes possible for clinical labs to cheaply and accurately sequence—and evaluate— hundreds and thousands of genes for a single patient, the resulting diagnostic, prognostic, and therapeutic knowledge will be immense.

One landmark event in the effort to bring rapid genome sequencing into clinical diagnostics is the collaboration announced last month involving the Department of Pathology at Beth Israel Deaconess Medical Center (BIDMC) in Boston and GenomeQuest, Inc., of Westborough, Massachusetts.

In exclusive interviews with THE DARK REPORT, Jeffrey Saffitz, M.D., Ph.D., Chairman of the Department of Pathology, and Mark Boguski, M.D., Ph.D, Associate Professor of Pathology at BIDMC, discussed the goals of this partnership. They introduced the concept of the “primary-care pathologist.”

GenomeQuest will sequence the genes of tumors and other patient specimens provided by BIDMC. GenomeQuest will then warehouse and manage the resulting data produced by whole genome sequencing. Pathologists and informaticists at BIDMC will analyze the whole human genome data to identify useful diagnostic markers and clinical information. (See TDR, November 15, 2010.)

Having explained how pathologists at Beth Israel Deaconess Medical Center will use whole human genome data, THE DARK REPORT now turns to GenomeQuest to provide lab administrators and pathologists with a more detailed understanding of advances in the field of rapid genome sequencing. Richard Resnick, CEO, spoke on behalf of GenomeQuest.

“Our company has been involved with the latest generation of sequencing machines since the earliest days of this technology,” stated Resnick. “Until recently, customers of this technology were largely pharma and academia. However, we’ve always suspected that the end game in this field would ultimately be pathology.

“When it comes to clinical applications of whole human genome sequencing, we think pathology will lead the charge and take us forward,” observed Resnik. “Pathologists at BIDMC recognize this opportunity.

“In order to produce clinically-actionable knowledge from a genetic sample, four steps must happen,” noted Resnick. “Three of these steps involve processing the specimen and collecting the data. The fourth step is where the pathologist evaluates this data and identifies information that is useful to the patient.

Analysis Requires Four Steps

“In our collaboration with the pathology department at BIDMC, GenomeQuest will perform the first three steps,” he explained. “Pathologists at BIDMC will then handle the fourth step.

“In the first step, the specimen is sequenced and mapped,” continued Resnick. “Sequencing technology produces strings of DNA sequences. Think of each as a little puzzle piece. We then assemble the puzzle pieces back together using the Human Genome Project as a reference dataset. The assembly of these DNA sequences is called ‘mapping.’

“In the second step, we compare the individual’s genome sequence to the canonical human genome,” stated Resnick. “This step allows us to identify regions where there is variation from the canonical human genome, if you will.

“During this step, a variety of algorithms are used to determine all the regions where the sample varies or differs from the canonical human genome. ‘Variant calling’ is the term used to “This brings us to the fourth step,” describe this process,” he said.

“The third step is called ‘variant annotation.’ We annotate each location on the specimen’s genome where there is a variation,” stated Resnick. “The annotation carries with it an explanation.

Annotating Gene Sequence

“For example, we might say, ‘this particular variation is inside of this gene, and if this actually were to happen, it would truncate the protein that is encoded by this gene. In turn, that would have the following effect downstream on these biological pathways’,” he explained.

“Our annotation goes further,” noted Resnick. “As a component of the annotation step, we identify whether each variant has already been identified by earlier research. Our annotation will include references to the papers which have been published about that particular genetic sequence to explain what the medical community already knows about that genetic variation. We then gather the full annotation for this genome into a report, which is an interactive data-base on this organism’s genome.

Searching for Variations

“That resulting database of the specimen is where the real magic begins to happen,” said Resnick. “Researchers can now query this database. They can ask questions like ‘Show me all of the variations that are on chromosome 2 inside of genes or on the 500 base pairs on either side of the target genes that affect the protein and are variations that have not been previously identified in the public domain.’

“The ability to investigate the specimen’s genome using these types of queries is rapidly advancing our knowledge of the human genome,” he declared. “Our existing customers—pharma and academic researchers—thrive because of this feature. It allows them to conduct basic research in the biology of disease and better build new drugs.

“This brings us to the fourth step,” observed Resnick. “The fourth step asks the question ‘which of these variants are clinically actionable, based on the patient’s presentation?”

“Here is where the pathologist will have a key role,” he noted. “Our collaboration with the pathologists at Beth Israel Deaconess Medical Center is aimed squarely at providing pathologists with the annotated database of the whole genomes of patients.

“Working together, our goal is to identify which key information sets must be developed to support accurate diagnosis,” said Resnick. “We want to identify and validate the data that are clinically relevant, and that form the basis of our collaboration.

Clinical Applications

“GenomeQuest and Beth Israel Deaconess plan to jointly share computational capability and analytical capability for the purpose of advancing the clinical methodology,” he commented. “Eventually we want to provide and generate diagnostic reports which are usable by a pathologist.

“What is exciting about this work is its potential to expand the value that the pathology profession contributes to clinical care,” stated Resnick. “Pathologists should want to ‘own’ the interpretation of genomic data for an important reason. This data will not only have diagnostic value, but it will also have prognostic value.

“This is a key insight,” he noted. “Once you sequence the genome of a healthy individual, or of a patient who presents with some kind of a disease, that [whole human genome] data is permanently available.

“That means any future care for the patient may be simply a query on that individual’s genome data set,” Resnick observed. “Whole human genome sequencing is disruptive because of this potential. It gives the pathologist thenresponsibility to assess this data and guide the patient’s care team.”

Clients and regular readers of THE DARK REPORT know about the race to be first to achieving the goal of the $1,000 whole human genome sequence. Resnick had useful insights about the pace of improvements to rapid gene sequencing technologies.

“In recent years, the capital invested in whole-genome sequencing and analysis is nothing short of astounding,” said Resnick. “It has played an essential role in driving down the cost of whole-genome sequencing and analysis to the point “The informatics support of a whole where we can sequence and analyze a whole human genome for about the same price as maybe five or 10 genetic tests.

“Depending on the technology and the specimen, the cost now ranges between $9,000 and $20,000,” he stated. “For comparison, recall that, just 10 years ago, the cost to do a single human genome approached $1 billion. That’s what was spent on the Human Genome Project. Today we can sequence 100 billion base pairs in a week.”

“This cheaper, faster, and more accurate sequencing technology now allows us to scale up and produce full sequences of patient specimens,” noted Resnick. “That opens the door for pathologists to step up and begin developing clinical applications using this technology.

According to Resnick, massive throughput in whole human genome sequencing is around the corner. “Each improvement in rapid sequencing technology adds orders of magnitude of efficiency,” explained Resnick. “It takes only 12 to 24 months for a new generation of sequencing technology to reach the market.

“This cheaper, faster, and more accurate sequencing technology now allows us to scale up and produce full sequences of patient specimens,” noted Resnick. “That opens the door for pathologists to step up and begin developing clinical applications using this technology.

“I believe pathologists will be one of the medical specialties where this new technology enables a whole new series of applications that were previously unavailable to us,” he continued.

“The informatics support of a whole human genome sequence now makes it possible for pathologists to understand what’s different between this individual and some canonical representation of the human genome,” noted Resnick. “Similarly, they can use this data to distinguish the differences between cancer tumors,” he said.

Advanced Genetics

“There are already examples of advanced genetics in hospitals across the country,” continued Resnick. “TGen as an example in Phoenix. These sites are doing whole human genome sequencing to treat advanced forms of cancer, simply by categorizing the cancer against what is already known.

“Now, if you overlay that clinical application with the industry’s current overall capacity to sequence, by 2011, we might be able to sequence something like 50,000 of these types of cases in the course of the year,” speculated Resnick.

“But that is a conservative prediction,” he added. “That number is 10 times more than our industry could have done in 2010 and it is predicted that the sequencing industry will add another 10 times more sequencing capacity by 2012, making it possible to sequence 500,000 individuals per year!”

Public Genome Data Sets

Resnick observes that plenty more needs to be done before pathologists will be able to use whole human genome sequences for diagnostic and therapeutic purposes. “Currently, in the public domain, there are a growing number of genome data sets,” he noted. “Many of these genome data sets were financed by the National Institute of Health (NIH).

“Other data sets are for commercial use and—for a particular genetic variation—describe the potential implications of particular variations,” said Resnick. “These data sets may also have information about the potential clinical actions a physician might consider when a patient presents with those genetic variations.

“The challenge is that, at the moment, these databases exist all over the world,” he continued. “They are not homogenized, and exist in many different formats. Thus, it will be important for the scientific community to establish standards for these types of data repositories.”

Meanwhile, the collaboration involving GenomeQuest and pathologists at Beth Israel Deaconess Medical Center is already moving forward. GenomeQuest will be sequencing the patient specimens provided to it by BIDMC. It will then annotate these whole human genome sequences and provide data storage and query services to the BIDMC pathologists.

Knowledge about Disease

For their part, pathologists at BIDMC will be interpreting this data and looking for ways that it can be used to support patient care. The initial research emphasis will be on certain types of cancer. However, that is likely to broaden as pathologists better understand how individual genetic varitions play a role in other diseases and health conditions.

THE DARK REPORT is first in the laboratory testing industry to provide pathologists and laboratory administrators with an inside understanding of this unique collaboration between the pathology department at Beth Israel Deaconess Medical Center and GenomeQuest. It can be expected that the research conducted by these two parties will confirm that pathology analysis of whole human genome sequences will generate useful clinical information.

Further, because the pace of technology enhancements in this field is so rapid, it may not take long for the knowledge developed by BIDMC and GenomeQuest to find its way into clinical practice.

Large Volume of Raw Data Produced by Whole Human Genome Sequencing

HOW BIG IS THE INFORMATION PRODUCED by a whole human genome sequence? How much storage is required to hold the entire genome? Can it be put on a single hard drive? Richard Resnick, CEO at GenomeQuest, Inc., in Westborough, Massachusetts, outlined what is required to store databases full of genomic information.

“A whole human genome is about 3 billion base pairs,” noted Resnick. “Each base pair is about a byte of information, so approximately three gigabytes of data must be stored. However, that is the finished whole human genome sequence.

More than One Copy

“In the first phases of sequencing, as the machines produce strings of genetic sequences, for each position, more than one copy of the same base pair will be produced,” he noted. “Thus, the raw data produced may be 30 to 40 times the data in the finished whole human genome sequence. That is why, during the sequencing step, as much as 100 gigabytes of raw sequence data is produced per whole human genome.

“Next comes the analysis of the raw data,” continued Resnick. “Sequence strings are mapped, variants are identified, and annotation is performed. Only at this point in the entire process do you end up with a reasonably small data set.

“That dataset for a whole human genome sequence actually may turn out to be far smaller than the expected three gigabytes, for an important reason,” he stated. “It is not necessary to store every base pair in an individual human genome. It is only necessary to store the base pairs that contain the differences and variations.

“This means that, early in the sequencing process, the informatics needs are immense in terms of storage and computing resources,” commented Resnick. “As the raw sequencing data is processed, the individual whole human genome sequence ends up being a much smaller amount of data that is easier to manage and easier to query.

“This is why the informatics of whole human genome sequencing are immense,” said Resnick. “Next year, the industry will sequence about 50,000 individuals. The following year, that number may explode to 500,000 individuals. Very quickly, this becomes a petabyte [one quadrillion bytes, or 1,000 terabytes] problem. Obviously, comparing 100 billion of anything to a reference will be an expensive informatics challenge.

“GenomeQuest currently stores this data on internal servers,” noted Resnick. “We consider storage of the whole human genome sequence data to be an added-value service. Our experience to date is that our academic and pharma customers want the benefit of a secure infrastructure where the data is stored, regularly backed up, and always available. GenomeQuest provides that service.”

 

Whole Human Genome Sequencing Costs Falling

“GENOME SEQUENCING COSTS are falling at an incredible pace,” stated Richard Resnick, CEO at GenomeQuest, Inc., based in Westborough, Massachusetts.

“One year ago, a $600,000 sequencing machine would require between two and four weeks to cover an entire human genome at a sufficient depth of coverage,” he explained. Now, just 12 months later, spend the same $600,000 on a current generation sequencing machine and it will take only half a week to process the same volume of genome sequences. It is expected that sequencing technology will continue advancing at this accelerated pace.

“The economics of whole human genome sequencing are thus changing favorably,” added Resnick. “Currently, considering the fully-depreciated cost of the instrument, reagents, and labor, it is now possible to do the entire sequence for between $9,000 and $20,000 at most.

“Expectations are that the cost of whole human genome sequencing, once it falls to $1,000, will continue dropping to as low as several hundred dollars,” predicted Resnick. “My expectation is that larger laboratories like Quest Diagnostics Incorporated and Laboratory Corporation of America will then acquire this technology and, because of their economies of scale, they may then find it possible to sequence the entire human genome for a cost that is much less than the cost of a single genetic test today.”

 

Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Enter Your Login Credentials
This setting should only be used on your home or work computer.

×

Send this to a friend