It is in the Polya-Eggenberger form that the NBD passed into library and information science as well as other social sciences as the model of "social contagion," "cumulative advantage," or the "success-breeds-success" phenomenon (Rapoport and Horvath 1961; Coleman 1961, 288380; Price 1976; Tague 1981). This process was given its most elegant formulation by Merton (1968) in his concept of the Matthew Effect, whereby rewards were allocated among scientists according to the biblical dictum of St. Matthew (13:12): "For unto every one that hath shall be given, and he shall have abundance: but from him that hath not shall be taken away even that which he hath."
Price (1976) described the Polya urn NBD as modeling the "double-edged" Matthew Effect, because in it success is rewarded by increased chance of further success and failure is punished by increased chance of further failure. He contrasted it to the beta function, which he found to model the "single-edged" Matthew Effect with an urn scheme where success increases the chance of success, but failure has no subsequent effect in changing the probabilities.
In a series of articles devoted to the foundations of information science, Brookes (1980a; 1980b; 1980c; 1981) utilized a discography of phonograph recordings devoted wholly to the works of one composer and issued in the period 197276 to demonstrate frequency-rank statistics in contrast to frequency-distribution statistics. To illustrate the former, he applied the mathematics of Bradford's law to segregate composers into groups ranked in descending order of number of recordings on which their works appeared and then measured the Matthew Effect of the degree to which composers with the most recordings "robbed" those composers with the least. Brookes then exemplified frequency-distribution statistics by fitting the NBD to the discographic data, and stated that although the NBD explained the underlying probability mechanism of the recording industry, its application entailed the loss of important empirical information.
Incongruously, Brooks based the NBD on the gamma-Poisson version, which models qualitative inhomogeneity, after demonstrating cumulative advantage. Brookes then proceeded to argue that information quantities should be measured logarithmically to place them in proper perspective. As noted above, the NBD is converted into the normal distribution for parametric statistical operations by logarithmic transformations.
Regardless, given Feller's conundrum, if one finds the negative binomial, one still does not know, for example: whether the LSU chemistry faculty ranked one journal higher than another due to its inherent quality, or due to collegial influences; whether some of the journals selected by the LSU chemistry faculty were cited more than others due to their inherent quality, or because they had been cited heavily before; or whether some of these same journals were priced higher due to inherent propensity of publishers to price differently, or due to the ability of some publishers to raise prices continually, thus reducing the ability of other publishers to do likewise.
The negative binomial distribution models all these possibilities, and all these possibilities are not only conceptually plausible but can be conceived of as interacting with each other. Thus, with the NBD, statistics and conception merge in a particularly elegant fashion.
An interesting facet of the NBD is that it appears to link the production, dissemination, and use of human knowledge with other life processes. The NBD is widely used in the biological sciences, where it has been found to be the most useful mathematical model for contagious distributions (Elliott 1977, 23, 51). From this viewpoint it is also interesting to note that Williams (1964, 295) described the logarithmic series, into which the NBD converges as k approaches zero, as the biological equivalent of "nothing succeeds like success."
The work of Cohen (1971, 1980, 1981) in primatology forms a bridge from the biological to the social and information sciences. He formulated his basic premise with the classic understatement that "Who sleeps with whom interests primates of several species" (Cohen 1971, 3). Using a zero-truncated gamma-Poisson version, Cohen found the NBD to be the equilibrium frequency distribution of size predicted by stochastic models for the dynamics of freely forming primate social groups. According to Cohen, not only is the NBD descriptive of the way monkeys distribute themselves into troops in the tree tops for sleeping and breeding purposes as well as of how children gather into play groups in nursery school, but it also describes the way scientists distributed themselves over the laboratories at Rockefeller University, the National Cancer Institute, and the British National Institute for Medical Research. Cohen found publication rate to be linearly related to the size of the laboratories at a rate of about 1.1 publications for each additional scientist.
Cohen's findings bring into perspective those of Rapoport and Horvath (1961), Coleman (1964, 32632), and Ehrenberg (1959). In their study Rapoport and Horvath discovered that the distribution of popularity among junior high school students fitted the NBD. This finding was replicated with data from seven 26-member cottages of girls by Coleman, who called the NBD the "contagious Poisson." Because of Feller's conundrum, neither Rapoport and Horvath nor Coleman could definitively state whether the skewed distribution of popularity was due to the inherent qualities of those chosen as popular or to some process of social contagion whereby the students and the girls influenced each other's decisions. For his part, Ehrenberg introduced the compound NBD into marketing as the model for consumer buying, with purchases following the Poisson distribution in time and the purchasing rates of different consumers being proportional to the chi-square or gamma distributions.
However, these are relatively simple situations. When Kochen, Crickman, and Blaivas (1982) and Blaivas et al. (1982) attempted to apply the NBD to the ratings by scholars of other scholars in seven academic disciplines, they ran into severe problems of set definition and levels of consensus within the disciplines. Despite these difficulties, they found that a law of cumulative advantage provided the best theoretical approximation of peer ratings but was fully effective only in well-defined disciplines with high levels of consensus. Their work shows the need for proper set definition to control for contaminants as well the effect of Kuhnian paradigms.
Pioneering work in the application of the NBD in library and information science has been done at the University of Western Ontario. Here, at the School of Library & Information Science, Tague and Farradane (1978) found that the NBD modeled the processes of document retrieval, and Tague (1981) utilized single- and multiple-urn models to demonstrate that the NBD arises as a result of the success-breeds-success phenomenon. However, the most interesting work on the NBD was done by Ravichandra Rao, who obtained his doctorate at Western Ontario. In a further development of Lotka's work, Ravichandra Rao (1980) demonstrated that the NBD describes the pattern of the productivity of scientists under the success-breeds-success condition in a wide variety of social circumstances.
At approximately the same time, the sociologist Allison (1980, 17073) also found the NBD to describe scientific productivity. However, Allison was aware of Feller's conundrum through the work of Coleman (1964), pointing out that the NBD could have arisen as a result of either the qualitative inhomogeneity of the scientists or a cumulative advantage process. Huber (1998) found that the gamma-Poisson NBD model of inhomogeneity fit the distribution of patents across a population of inventors, but he rejected cumulative advantage, because there was no evidence of increasing productivity with experiencegrounds one of his referees found questionable.
In an extremely interesting paper, Ravichandra Rao (1990) confronted the problem of proper set definition in fitting the NBD to informetric data. He analyzed the distribution of 4,130 articles over 744 journals in economics. When he attempted to fit the negative binomial to the data on a global basis without any set definitions, he found that the NBD did not describe the distribution. Hypothesizing that he was dealing not with one but several NBD populations, he then conducted two experiments. First, he defined the journals that provided the most articles as contaminants originating from a different NBD set and eliminated them by truncating the distribution on the right. Chi-square tests showed that the NBD fit this truncated distribution very well. Second, he classified the journals under 15 subject rubrics such as "Methods," "History of Economic Thought," "Organization of Production," etc., thereby controlling for contaminants by defining the data into more homogeneous sets. When this had been done, the NBD fit 12 of the 15 subject groups, demonstrating the importance of proper set definition.