|
"The sequence of the human
genome," by J. Craig Venter and
284 others, Science, 291(5507):1304-51, 16 February 2001.
[Authors' affiliations: 14 institutions
worldwide]
Abstract: "A 2.91-billion base
pair (bp) consensus sequence of the euchromatic portion of the human genome
was generated by the whole-genome shotgun sequencing method. The 14.8-billion
pb DNA sequence was generated over 9 months from 27,271,853 high-quality
sequence reads (5.11-fold coverage of the genome) from both ends of plasmid
clones made from the DNA of five individuals. Two assembly strategies--a
whole-genome assembly and a regional chromosome assembly--were used, each
combining sequence data from Celera and the publicly funded genome effort. The
public data were shredded into 550-bp segments to create a 2.9-fold coverage
of those genome regions that had been sequenced, without including biases
inherent in the cloning and assembly procedure used by the publicly funded
group. This brought the effective coverage in the assemblies to eightfold,
reducing the number and size of gaps in the final assembly over what would be
obtained with 5.11-fold coverage. The two assembly strategies yielded very
similar results that largely agree with independent mapping data. The
assemblies effectively cover the euchromatic regions of the human chromosomes.
More than 90% of the genome is in scaffold assemblies of 100,000 bp or more,
and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of
the genome sequence revealed 26,588 protein encoding transcripts for which
there was strong corroborating evidence and an additional ~12.000
computationally derived genes with mouse matches or other weak supporting
evidence. Although gene-dense clusters are obvious, almost half the genes are
dispersed in low G+C sequence separated by large tracts of apparently
noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24%
is in introns, with 75% of the genome being intergenic DNA. Duplications of
segmental blocks, ranging in size up to chromosomal lengths, are abundant
throughout the genome and reveal a complex evolutionary history. Comparative
genomic analysis indicates vertebrate expansions of genes associated with
neuronal function, with tissue-specific developmental regulation, and with the
hemostasis and immune systems. DNA sequence comparisons between the consensus
sequence and publicly funded genome data provided locations of 2.1 million
single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes
differed at a rate of 1 bp per 1250 on average, but there was marked
heterogeneity in the level of polymorphism across the genome. Less than 1% of
all SNPs resulted in variation in proteins, but the task of determining which
SNPs have functional consequences remains an open challenge."
This early-2001 report from Science
was cited 104 times in current journal articles indexed in the
ISI database during September-October 2001. The paper represents the
culmination of the private-sector effort, led by first author J. Craig Venter
of Celera Genomics, to sequence the human genome. (The publicly funded effort,
led by Francis Collins, published its data at the same time in Nature.)
Based on its latest two-month total, this paper--less than a year after
appearing in print--is currently the most-cited biology paper published in the
last two years. In fact, it was one of only three papers (the other two being
reviews) to collect more than 100 citations during the September-October
tally. Prior to the most recent bimonthly count, citations to the paper have
accrued as follows:
July-August 2001: 58 citations
May-June 2001: 45
March-April 2001: 16
Total citations to date: 223
SOURCE: Hot
Papers Database (Available from the ISI
Research Services Group in a CD-ROM version containing data on
hundreds of highly cited papers published during the last two years.
User interface permits searching by author, organization, journal,
field, and more. Total citations, as well as citations accrued during
successive bimonthly periods, can be assessed and graphed. Database is
combined with subscription to the ISI newsletter Science
Watch®; updated discs containing the
most recent bimonthly data are mailed with each new issue, six times a
year.)

Previous Page | Return to SCI-BYTES
Main Menu
| Return to 2001 Menu
If you came from the Thomson Scientific Web site, click
here to return
|