TY - JOUR
T1 - Performance comparison of second- and third-generation sequencers using a bacterial genome with two chromosomes
AU - Miyamoto, Mari
AU - Motooka, Daisuke
AU - Gotoh, Kazuyoshi
AU - Imai, Takamasa
AU - Yoshitake, Kazutoshi
AU - Goto, Naohisa
AU - Iida, Tetsuya
AU - Yasunaga, Teruo
AU - Horii, Toshihiro
AU - Arakawa, Kazuharu
AU - Kasahara, Masahiro
AU - Nakamura, Shota
N1 - Funding Information:
The data set used in this study was originally used for the de novo assembly competition “Master of De Novo” in the third annual conference of the NGS-Field community in Japan [http://www.ngs-field.org/top-page/service/ meeting3/]. We thank the Master of De Novo contestants and all 699 participants of the NGS-Field for supporting our project and allowing us to use the data. MK is supported in part by a Grant-in-Aid for Scientific Research on Innovative Areas (Genome Science). The supercomputing resource was provided in part by the Human Genome Center, Institute of Medical Science, University of Tokyo, and Super Computer Facilities of the National Institute of Genetics. DM is supported in part by JSPS KAKENHI Grant Number 24890103. SN is supported in part by the program of the Japan Initiative for Global Research Network on Infectious Diseases.
Publisher Copyright:
© 2014 Miyamoto et al.
PY - 2014/8/21
Y1 - 2014/8/21
N2 - Background: The availability of diverse second- and third-generation sequencing technologies enables the rapid determination of the sequences of bacterial genomes. However, identifying the sequencing technology most suitable for producing a finished genome with multiple chromosomes remains a challenge. We evaluated the abilities of the following three second-generation sequencers: Roche 454 GS Junior (GS Jr), Life Technologies Ion PGM (Ion PGM), and Illumina MiSeq (MiSeq) and a third-generation sequencer, the Pacific Biosciences RS sequencer (PacBio), by sequencing and assembling the genome of Vibrio parahaemolyticus, which consists of a 5-Mb genome comprising two circular chromosomes.Results: We sequenced the genome of V. parahaemolyticus with GS Jr, Ion PGM, MiSeq, and PacBio and performed de novo assembly with several genome assemblers. Although GS Jr generated the longest mean read length of 418 bp among the second-generation sequencers, the maximum contig length of the best assembly from GS Jr was 165 kbp, and the number of contigs was 309. Single runs of Ion PGM and MiSeq produced data of considerably greater sequencing coverage, 279× and 1,927×, respectively. The optimized result for Ion PGM contained 61 contigs assembled from reads of 77× coverage, and the longest contig was 895 kbp in size. Those for MiSeq were 34 contigs, 58× coverage, and 733 kbp, respectively. These results suggest that higher coverage depth is unnecessary for a better assembly result. We observed that multiple rRNA coding regions were fragmented in the assemblies from the second-generation sequencers, whereas PacBio generated two exceptionally long contigs of 3,288,561 and 1,875,537 bps, each of which was from a single chromosome, with 73× coverage and mean read length 3,119 bp, allowing us to determine the absolute positions of all rRNA operons.Conclusions: PacBio outperformed the other sequencers in terms of the length of contigs and reconstructed the greatest portion of the genome, achieving a genome assembly of " finished grade" because of its long reads. It showed the potential to assemble more complex genomes with multiple chromosomes containing more repetitive sequences.
AB - Background: The availability of diverse second- and third-generation sequencing technologies enables the rapid determination of the sequences of bacterial genomes. However, identifying the sequencing technology most suitable for producing a finished genome with multiple chromosomes remains a challenge. We evaluated the abilities of the following three second-generation sequencers: Roche 454 GS Junior (GS Jr), Life Technologies Ion PGM (Ion PGM), and Illumina MiSeq (MiSeq) and a third-generation sequencer, the Pacific Biosciences RS sequencer (PacBio), by sequencing and assembling the genome of Vibrio parahaemolyticus, which consists of a 5-Mb genome comprising two circular chromosomes.Results: We sequenced the genome of V. parahaemolyticus with GS Jr, Ion PGM, MiSeq, and PacBio and performed de novo assembly with several genome assemblers. Although GS Jr generated the longest mean read length of 418 bp among the second-generation sequencers, the maximum contig length of the best assembly from GS Jr was 165 kbp, and the number of contigs was 309. Single runs of Ion PGM and MiSeq produced data of considerably greater sequencing coverage, 279× and 1,927×, respectively. The optimized result for Ion PGM contained 61 contigs assembled from reads of 77× coverage, and the longest contig was 895 kbp in size. Those for MiSeq were 34 contigs, 58× coverage, and 733 kbp, respectively. These results suggest that higher coverage depth is unnecessary for a better assembly result. We observed that multiple rRNA coding regions were fragmented in the assemblies from the second-generation sequencers, whereas PacBio generated two exceptionally long contigs of 3,288,561 and 1,875,537 bps, each of which was from a single chromosome, with 73× coverage and mean read length 3,119 bp, allowing us to determine the absolute positions of all rRNA operons.Conclusions: PacBio outperformed the other sequencers in terms of the length of contigs and reconstructed the greatest portion of the genome, achieving a genome assembly of " finished grade" because of its long reads. It showed the potential to assemble more complex genomes with multiple chromosomes containing more repetitive sequences.
KW - Illumina MiSeq
KW - Ion Torrent PGM
KW - Next-generation sequencing
KW - PacBio RS system
KW - Roche 454 GS Junior
KW - de novo assembly
UR - http://www.scopus.com/inward/record.url?scp=84906823754&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84906823754&partnerID=8YFLogxK
U2 - 10.1186/1471-2164-15-699
DO - 10.1186/1471-2164-15-699
M3 - Article
C2 - 25142801
AN - SCOPUS:84906823754
SN - 1471-2164
VL - 15
JO - BMC Genomics
JF - BMC Genomics
IS - 1
M1 - 699
ER -