A historical analysis done for this paper revealed that stability over time is certainly a hallmark of the peer ratings in chemistry. Of the top 15 programs in the 1924 ratings, 11California at Berkeley, Cal Tech, Chicago, Columbia, Cornell, Harvard, MIT, Stanford, UIUC, Wisconsin at Madison, and Yaleremained consistently in the top 15 by peer ratings of scholarly quality in 1964, 1969, 1981, and 1993. Of these 11 programs, eightChicago, Columbia, Cornell, Harvard, MIT, UIUC, Wisconsin at Madison, and Yalewere listed among the 9 top U.S. chemistry departments by Cattell (1910, 685) in the first ranking of such departments. Moreover, the 4 chemistry programs in the top 15 of 1924 not remaining there still ranked in the top 35 of the 168 chemistry programs rated in 1993.
However, the stability of the chemistry peer ratings is not limited to the top but manifests itself throughout the entire ranking. The NRC database contains the peer ratings of the scholarly quality of program faculty from 1964 through 1993, although the ones for 1964 are not the actual ratings but an ordinal ranking system constructed from the ratings. A correlation matrix was constructed from these four ratings, and the correlations ran from a low of 0.78 between 1964 and 1981 to a high of 0.93 between 1981 and 1993. In general, the closer the rating years were together, the higher the correlation, showing a slow change over time.
A major advance of the 1993 NRC evaluation of U.S. research-doctorate programs over the preceding ones was that the ISI provided the numbers of citations to the publications of the faculty of the rated programs in the sciences, engineering, and social sciences. These citation data are contained in the database developed by the NRC as part of the 1993 evaluation. It was decided to use these ISI data for two citation measures of the chemistry programs: (1) total number of citations to faculty publications in the period 198892, and (2) total number of citations to faculty publications in the period 198184. The purpose of these measures was threefold: (1) to verify the relationship of citations to the peer ratings of faculty scholarly quality, (2) to test the stability of citation patterns to academic departments over time, and (3) to quantify the dominance of the elite departments over the others.
Correlation analysis was utilized to verify the relationship of total citations to peer ratings and to test for the stability of citation patterns over time. With respect to the first relationship, it was once again revealed that peer ratings and total citations are virtually equivalent measures of scientific value, because the correlation between the 1993 peer ratings of the scholarly quality of the chemistry program faculty and the number of citations to their publications in 198892 was 0.91. Concerning the second relationship, it was found that citation patterns resemble peer ratings in that they also are highly stable over time. The correlation between the total citations to chemistry program faculty publications in 198184 and 198892 was 0.93. As a further sign of the stability of both peer ratings and citation patterns the correlation between the 1993 peer ratings of the scholarly quality of chemistry program faculty and total citations to their publications in 198184 was 0.89.
The dominance of the elite chemistry research-doctorate programs over the others is evident in the fact that the top 42 of 168 programs (25%) accounted for 63.6 % of the total citations to the publications of the rated program faculty in 198892, leaving the other 126 programs (75%) to share the remaining 36.4% of the citations. This dominance becomes more striking when one realizes there was a zero class of 35 chemistry programs that awarded doctorates in 198692 but were not evaluated by the 1993 NRC study (Goldberger, Maher, and Flattau 1995, 20). These 35 programs accounted for 17.2% of the chemistry programs awarding doctorates in chemistry in 198692 but only for 3.0% of the doctorates in chemistry given during this period.
As part of the correlation analysis, tests were made to estimate the probability distributions underlying the data in order to make the proper mathematical transformations. For both sets of citation data, the variance was considerably greater than the mean, which indicated the NBD with its probabilistic mechanisms of qualitative inhomogeneity and cumulative advantage. However, the variance was substantially less than the mean in all four sets of peer ratings, and this suggested the positive binomial, which in large samples such as these approximates the normal distribution (Snedecor and Cochran 1989, 11719). The positive binomial models a uniform distribution, and in its presence one estimates the probability of success by dividing the mean of the distribution by the maximum count possible in any given sample (Grieg-Smith 1983, 5758; Elliot 1977, 17).
Excluding the special case of the 1964 peer ratings, whose actual scores are not in the NRC database, the maximum count or score a chemistry program could have was 5, and the means of the peer ratings were 2.69 for 1969, yielding 0.54 probability of success, and 2.60 for both 1981 and 1993, giving a 0.52 probability of success. A 50/50 chance of success does not jibe with the fact that the same 11 chemistry programs appeared in the top 15 programs by peer ratings in all the years even in the face of an ever increasing number of programs being rated99 in 1969, 134 in 1981, and 168 in 1993. It suggests that the peer rating method designed in 1964 is seriously flawed.
The main problem with the peer rating methodology designed in 1964 is that it is only a grading of the scholarly quality of a program by persons purporting to be familiar with its faculty. Those not familiar with the program faculty are allowed to exclude themselves from the grading process by marking "Insufficient information." As such, it is not a measure of the impact or the influence of the faculty. The designers of the 1981 and 1993 surveys of research-doctorate programs were aware of this problem, and they created measures to capture the latter effect. In 1981, a familiarity index was created in which raters were asked to describe their knowledge of the program in the following terms that were given the accompanying numerical weights: Considerable familiarity2; Some familiarity1; Little or no familiarity0. The familiarity index was constructed by taking the mean of the numerical weights of the responses. In 1993, a visibility index was constructed by calculating the percentage of the raters who did not mark their questionnaires "Insufficient information" or "Little or no familiarity."
It was decided to use the NRC database to build the familiarity index into the 1981 chemistry peer ratings and the visibility index into the 1993 chemistry ratings. For 1981, the method was simply to multiply the peer rating score of scholarly quality by the familiarity index, and for 1993 the visibility index was first divided by 10, the reciprocal of this quotient was taken by dividing it into 1, and then the resulting reciprocal was divided into the peer rating of scholarly quality. These multiplicative methods were deliberately chosen, because science, like many other biological and social processes, is a multiplicative process with data frequently requiring logarithmic transformations to conform to the additive and linear requirements of standard parametric statistics. The multiplicative nature of science was succinctly summarized by Zuckerman (1977, 60) in her book on Nobelists in a passage illustrative of the stochastic processes involved in the NBD:
When the familiarity index was structured into the 1981 chemistry peer ratings of scholarly quality and the visibility index was built into the 1993 ratings, the peer rating distributions resembled the total citation distributions as the variances became greater than the means. This suggested the operation of the qualitative inhomogeneity and cumulative advantage of the negative binomial. However, despite the differences in the underlying probabilistic mechanisms, the correlation of the traditional 1981 peer rating measure with the one augmented by the familiarity index was 0.99, while that of the traditional 1993 peer rating with the one augmented by the visibility index was 0.94, showing that the peer rating methodology established in 1964 had captured the overall ranking structure of the scientific stratification system, if not its skewness.