Beginning in mid-February 2008, the 1997-2007 online version of the Science Watch® newsletter, ESI-Topics.com, and in-cites.com, will all be featured together on the redesigned ScienceWatch.com. All previous content from the three sites will be permanently archived, and remain accessible from any existing bookmarks to the archived pages. No new content will be added to this site. Updates and new content (updated biweekly) are available at ScienceWatch.com now.
The Thomson Corporation inin-cites logoites
ScientistsPapersInstitutionsJournalsCountriesH O M ERSS feeds


S E A R C H
incites



SCIENTISTS

Scientists
Papers
Institutions
Journals
Countries
 

The Top 10...
Analysis of...
Site Map by Fields
Overview Menu of all Interviews
Podcasts
Hot Papers published within the last 2 years
Current Classics
SCI-BYTES - What's New in Research
What's New in Research

in-cites, March 2002
Citing URL: http://www.in-cites.com/scientists/
DrDavidDonoho.html

Scientists

             
An essay by:
Dr. David Donoho
           

n this in-cites essay, Dr. David Donoho of Stanford University talks about the results of an ISI survey done in 2000 on high-impact papers of the 1990s. Dr. Donoho ranks in the top 5 authors in the field of Mathematics in the ISI Essential Science Indicators Web product, with 26 papers cited a total of 1,146 times to date. His work is also listed in the Computer Sciences field. Dr. Donoho is the Anne T. and Robert M. Bass Professor in the School of Humanities and Sciences, Department of Statistics, at Stanford.

How to be a Highly Cited Author in the Mathematical Sciences

I am a statistician, so when ISI contacted me with the news that I was a "Highly Cited Author" my mind turned immediately, like any good statistician, to explaining the "why" and the "how." Accordingly, I began a small data analysis project; I requested information about ISI's data processing procedures and definitions and also specific data about my own papers and their citations. Later I asked for information about other authors in ISI's list of the 25 most-cited authors in the mathematical sciences. I learned that ISI's definition concerned citations throughout the sciences and engineering to articles originally published during the 1990s in journals in the mathematical sciences (as they defined them), and that only articles in the 200 most-cited for the year of publication would be considered. I learned I was author or co-author of 10 articles that counted in this way, or 1/2 percent of all the 2,000 "highly-cited" articles in mathematical sciences that decade, and these papers obtained 632 citations total in the 1990s.

Two papers that I wrote with co-authors both had over 150 citations, makingDr. David Donoho's WAVELAB 802 for Matlab5.x them both "centurions," with a chance of scoring a "double century." Another is about to "score a century." (see Biometrika 81: 425- 1994; J. Am. Stat. Assoc. 90: 1200- 1996; and J. Roy. Stat. Soc. B. 57: 301- 1994). The first two papers were joint with Iain Johnstone of Stanford; the last was joint with Johnstone and also Dominique Picard and Gérard Kerkyacharian of Université de Paris. These papers described the method known as wavelet shrinkage for removing noise from signals, images, and other sorts of data by the device of wavelet transforming one's data, applying a thresholding function to the resulting wavelet coefficients (i.e. setting to zero coefficients below a certain threshold), and then transforming back from the wavelet domain to the original signal domain. The method is simple to describe, and because of the invention in the 1980s of the fast wavelet transform, it is very easy to apply to digital data such as the kind that arise in signal and image processing. Variants of this method have been the subject of articles in theoretical and applied statistics, and in at least ten scientific fields that I know of, where signal and image processing is important, and also in many engineering fields.

Digesting the information from ISI, I identified a number of factors which seem broadly correlated with citation counts among the top 25 cited authors in the mathematical sciences. In my case these factors all seem to be at work at the same time, which seems to have helped my citation counts. I will phrase my description of these factors as How-To Instructions. The first two factors concern choice of topic areas:

A. Work in Statistics. Three of the top 4, 9 of the top 15, and 15 of the top 25 most-cited authors were statisticians.

B. Work in Wavelets. Three of the top 5 most-cited authors, 4 of the top 15, and 6 of the top 25 published extensively in the 1990s on wavelets.

Statisticians create a steady stream of methodology which can be used-and cited-in many other fields. Wavelets arose in the 1980s as a theoretical topic which rapidly became very applicable after the fast wavelet transforms were developed; as they became applicable, of course, citations followed.

The next two factors concern choice of colleagues:

C. Work in Sequoia Hall. Sequoia Hall is the home of the Statistics Department at Stanford University. Five of the top 15 cited authors in the ISI list are faculty in that department. No other single department in the mathematical sciences has more than two highly cited authors.

D. Work with a Highly Cited Co-Author. I am fortunate to work with Iain Johnstone as a co-author; he also was an ISI Highly Cited Author in this period, number three on the ISI "top 25" list, with 559 total citations and eight papers in the highly-cited category. This illustrates that one shouldn't take me all that seriously. My co-author had nearly as many citations as I had, and with a small difference in initial conditions (e.g., if I had written one or two fewer papers, or he had written one or two more) our relative order in the ISI citation listing could be reversed. 

These last two factors are, of course, highly personal; others might not have noticed them at work in the cited authors list, but I was bound to do so.

The above factors correlate with success but don't really cause it. So, again, as a good statistician, I looked at my list of 10 highly cited papers and thought a bit about what the true causal factors might be, based on my knowledge of what was different about those 10 papers from the other 50 or so I had worked on. I came up with a list of causal factors, which I again put in prescriptive form.

1. Develop a method which can be applied on statistical data of a kind whose prevalence is growing rapidly. 
The 1990s was definitely a decade of exploding interest in the use of computer-based methods to process data of all kinds-signals, images, spectra, densities, and so on. Statistical methodology with applications to such data has a chance for citations in many fields.

In our case, my co-authors and I worked in the 1990s to develop wavelet methods for removing noise from noisy signals (and images, and spectra, and densities, and tomograms...). Because of the exploding demand for methodology for dealing signal data this meant that they our work had rather a large potential audience.

2. Implement the method in software, place examples of the software's use in the paper, make the software of broad functionality, and give the software away for free.

In our most-cited papers, we developed methodology for wavelet-based noise removal which was implemented in MATLAB, a quantitative programming environment. Our implementation was embedded in a MATLAB toolbox we called WaveLab, which included many tools for wavelet analysis and could be used not only for the specific tasks described in our papers, but for many other wavelet-based applications. WaveLab was available for free download over the Internet, starting in late 1993, coinciding with the explosion of interest in downloading of software caused by the success of the World Wide Web. The software in its various releases has had well over 10,000 downloads. See http://www-stat.stanford.edu/~wavelab/.

Our decision to distribute free software was grounded in a philosophical stance explained in the article "WaveLab and reproducible research," written with Ph.D. student John Buckheit; http://www-stat.stanford.edu/~donoho/Reports/1995/wavelab.pdf. In that article we argued, following ideas of Stanford geophysicist Jon Claerbout, that computational-based science is not fully reported unless algorithms are published as well, together with all the scripts to reproduce the results.

Part of the WaveLab system is a set of scripts that can reproduce all the figures in a series of papers my co-authors and I published. Presumably, many individuals who downloaded the WaveLab software with the simple goal of getting a wavelet toolbox for free, ran some of these scripts and thereby became interested in the papers-leading to additional citations.

3. In developing a methodology, develop synthetic test cases which you distribute freely over the Internet.

In our most highly cited papers, we used certain synthetic test cases to illustrate what our methods could do on certain kinds of noisy data, and we distributed the test cases as part of the WaveLab environment. Those test cases have since become very well-known; I have seen numerous papers and conference presentations referring to "Blocks," "Bumps," "HeaviSine," and "Doppler" as standards of a sort (this is a practice I object to but am powerless to stop; I wish people would develop new test cases which are more appropriate to illustrate the methodology they are developing). Presumably, anyone using these test cases in a published article refers to the original papers-leading to additional citations.

4. In developing a methodology, leave room for improvement.

It is absolutely crucial not to kill a field by doing too good a job in the first outing. In our case, our initial thresholding ideas were theoretically inspired, and we considered our task to prove theorems. We considered the synthetic test cases simply as illustrations of features of thresholding methods, and we didn't spend time "tinkering" with the theoretically derived thresholds. However, once these test cases became standards, it developed that in those specific test cases one could do an even better job by thresholding with thresholds set a little differently than our theory predicted. Many later authors have developed different schemes for threshold choice, sometimes without a theoretical motivation, but simply evidence that it works better on specific examples such as "Blocks," "Bumps," and "HeaviSine." Presumably, each author who has an improvement cites our papers as originator of the test case and the performance-leading to additional citations.

If we think about these four proposed prescriptive "causal" factors, we can see that they help explain the four earlier prescriptive `correlative' factors. For example, correlative factor A-"Work in Statistics"-can be explained as follows. Statisticians are always developing new methodology for treatment of data, and causal factor 1 posits that there is a large audience for helpful ideas in that direction. Correlative factor B-"Work in Wavelets"-has a similar explanation.

But what about the many papers I have written, alone and with others, that are not highly cited? What are the lessons there? It seems to me that those papers were, by and large, "theorem-proof" papers, and so more heavily theoretical, less data-driven, less methodological, with no software associated with them. It also seems that the low-citation papers were some of my "hardest" papers-both hard for me to obtain the results and hard to read. They also include some of my favorite papers, papers which convinced me I was really doing something that would leave a mark. It is truly dispiriting to see that the papers one thought, in youthful innocence, might leave a mark actually got what seems like few citations! Nevertheless, for one's own self-respect, it is important to do work that seems hard and deep. Also, and this is very important, the basis for several of the highly cited papers in my list of 10 was actually laid out in certain other papers which were hard, deep, and got very few citations. For example, my paper with Brenda MacGibbon and Richard Liu, "Minimax risk over hyperrectangles," (Annals of Statistics, 18: 1416-37; Sept., 1990) and my paper with Iain Johnstone, "Minimax risk over L(P)-balls for L(Q)-error," (Probability Theory and Related Fields, 99: 277-303; September 1994) contain, from a mathematical point of view, most of the essential ideas needed to understand why wavelet shrinkage should be successful, and they were completed well before the wavelet work began, but they receive a trickle of citations compared with the flood of citations for papers developing similar ideas in a more applied way.

Conclusion: The above-listed prescriptive factors certainly do correlate with and even cause success in citation counts. However, nobody should really follow these prescriptions exclusively. If one only worries about how to get citations, one will never do the difficult groundbreaking work which won't initially get citations but which can be the foundation, later on, of heavily cited work. Moreover, nobody has a crystal ball forecasting what may be the highly-cited "hot" areas of the next decade; so direct imitation of what has worked in the past is almost surely not going to lead to citation success in the future. However, Factors 1-4 above are reliable ways for many readers to increase their citation counts.End

Dr. David Donoho
Department of Statistics
Stanford University
Stanford, CA, USA

in-cites, March 2002
Citing URL: http://www.in-cites.com/scientists/DrDavidDonoho.html


ScienceWatch.com - Tracking Trends and Perfomance in Basic Research
Go to the new ScienceWatch.com

Home | Search | Disclaimer | Terms of Use | Privacy Policy | Copyright
Contact Webmaster with questions/comments |
(c) 2008 The Thomson Corporation.