Would you sum up the paper, "The Protein Data Bank," for our
readers?
The paper describes the goals of the Protein Data Bank (PDB)
and the systems in place in 2000 for data deposition,
processing, distribution, and query.
The PDB archive of biological macromolecules has been in
existence since 1971. In 1998, the management was moved from
Brookhaven National Laboratory to the Research Collaboratory for
Structural Bioinformatics (RCSB). This is the first paper that
discusses the work of the RCSB PDB.
At that time, we had created new systems for managing all
aspects of the data pipeline. New tools were available for users
to validate and deposit their structures. The RCSB PDB website
supported a database that could be used to search and report on
the archive. This paper also describes the beginnings of our
"data uniformity" project that eventually developed into an
international collaboration to make the entire archive uniform.
The worldwide PDB (wwPDB) just released the remediated archive
this summer!
The RCSB PDB is currently managed by two members of the RCSB:
Rutgers, The State University of New Jersey and the University
of California, San Diego. It is supported by funds from the
National Science Foundation, the National Institute of General
Medical Sciences, the Office of Science, Department of Energy,
the National Library of Medicine, the National Cancer Institute,
the National Center for Research Resources, the National
Institute of Biomedical Imaging and Bioengineering, the National
Institute of Neurological Disorders and Stroke, and the National
Institute of
Diabetes and Digestive and Kidney Diseases.
What are some of the features/advantages of the database?
|

“The PDB is a very heavily
used resource. Depositions have increased from a few per month to
more than 7,000 per year.” |
|
Using the RCSB PDB systems, it is possible to access and
download the coordinate files, query the database about specific
features of the data, create reports about groups of structures,
access summaries of individual structures, visualize the
structures using different tools, browse the database for
information that has been integrated from other data resources,
and access educational materials for learning about biological
molecules. The Molecule of the Month feature in
particular is very useful for teachers and students.
Who benefits from using the PDB?
The PDB is used by many people: structural biologists who
contribute and access structural data; biologists who need a
molecular explanation for their research findings; computational
biologists who are developing methods for relating sequence,
structure, and function; pharmaceutical companies who use the
data for drug discovery; teachers and students in K-12, as well
as undergraduate and graduate teachers and students.
How was the PDB received by the community? Do you feel this is
reflected in the amount of citations the paper has been receiving?
The PDB is a very heavily used resource. Depositions have
increased from a few per month to more than 7,000 per year.
Usage of the ftp site is very heavy—more than 6,235,000 files
were downloaded in June 2007 alone. When people deposit data or
when they use the resources of the RCSB PDB, they cite the 2000
paper.
Has the PDB been further developed since the 2000 Nucleic Acids
Research paper?
The Protein Data Bank Group
|
The PDB archive has been under continuous development. One
striking change is the globalization of data deposition and
distribution services. In contrast to the earlier days of the
PDB, data are now deposited at three data centers: RCSB PDB,
Macromolecular Structure Database at the EMBL's European
Bioinformatics Institute (MSD-EBI), and Protein Data Bank Japan
(PDBj). In 2003, these centers formed the wwPDB to ensure that
the PDB archive would continue to be single and uniform. The
BioMagResBank joined the wwPDB in 2005. Each week, the data
centers forward data to the RCSB PDB (the "archive keeper") for
release into the PDB archive at
ftp://ftp.wwpdb.org. Each wwPDB site has a website that
offers different views of the underlying data.
Another development has been the remediation of the archive
by the wwPDB. This project worked to remove inconsistencies,
update and check data, and to utilize the IUPAC-standard
nomenclature. The release of these data makes the PDB archive
even more useful—the more uniform the data are, the more
powerful and accurate the searches can be.
The RCSB PDB site has grown both in terms of the underlying
architecture and in terms of improved query and browsing
functionality since the publication of that 2000 paper. The
underlying database was reengineered in 2006, and again in 2007
to support the remediated data. New features and resources are
continually being developed to support our broad community of
users.
Yet much of our work as described in that paper is still
current. We still find the times to be exciting and challenging,
and depend heavily on user input to improve the resource. When
that paper was written, the archive held approximately 10,500
structures, and we thought that it could triple or quadruple in
five years. Seven years later, there are more than 45,000
structures, with individual structures growing in size
themselves. We look forward to what challenges the coming years
will bring.
Helen M. Berman, Ph.D.
Chemistry and Chemical Biology Department
Rutgers University
New Brunswick, NJ, USA