LSU Libraries

The Controversy over the NBD and Monographic Circulation

Ravichandra Rao (1982; 1988) dedicated his doctoral dissertation at Western Ontario to testing probability distributions against data from the automated circulation systems of six large Canadian academic libraries. These data sets covered circulation periods lasting from 1 academic year for the University of Guelph up to 11 academic years for the University of Saskatchewan. Ravichandra Rao tested no less than 17 probability distributions against 203 document frequency distributions and 200 user frequency distributions for different types of user populations. In both cases he found the NBD to be the best probability distribution for both theoretical and practical reasons. The full NBD fit 92 (45.3%) of the 203 document distributions tested at the 0.01 level, and the truncated NBD fit 102 (51.0%) of the 200 user distributions tested at the 0.01 level. In line with the work of Tague and his own work on scientific productivity, Ravichandra Rao located the causal process of the NBD in the success-breeds-success phenomenon.

Most interestingly, Ravichandra Rao found that in the majority of cases the NBD did not fit the document distributions from undergraduate populations. Undergraduates may be considered a preparadigmatic population in the Kuhnian sense. Therefore, this finding of Ravichandra Rao corroborates the conclusion of Kochen, Crickman, and Blaivas (1982) that a certain level of knowledge and consensus is necessary for the NBD to form. It also corroborates Metz (1983, 81) that knowledge of an undergraduate's major was significantly less predictive of the library materials the undergraduate would borrow than knowing the departmental affiliation of a faculty or graduate student borrower.

However, the application of the negative binomial to library circulation data is chiefly associated with the name of Burrell at the Department of Mathematics Statistical Laboratory of the University of Manchester. Burrell developed his model in a series of papers over the years (Burrell 1980, 1982; Burrell and Cane 1982; Burrell 1985, 1986, 1987, 1988). His motivation was the appearance in Great Britain in 1976 of the Atkinson Report, in which the principle was set forth that the assessment of future university library building requirements should be based on the concept of the "self-renewing" library, i.e., a library that is limited in size in which after a certain point material should be removed in proportion to the rate of acquisition. Burrell's aim was the development of a simple stochastic model that librarians could use to decide whether to purchase multiple copies or relegate stock, and he concentrated on monographic circulation at various university libraries in the Britain and the United States.

Burrell decided upon the gamma-Poisson NBD, finding that it approximated Trueswell's 80/20 rule in certain cases (Burrell and Cane 1982, 460) . Although he was aware of Feller's conundrum through the work of his collaborator Cane (Burrell and Cane 1982, 450), he deliberately chose to emphasize the processes of inhomogeneity in contrast to Ravichandra Rao, who based his work on the principle of contagion. As Burrell's model emerged in the mid-1980s, it consisted of three basic tenets. First, the borrowing of individual monographs is a Poisson process with a rate that varies from item to item. Second, the different borrowing rates of the individual monographs is described by a desirability distribution, which is the gamma function. And third, the aging of the desirability occurs exponentially at the same rate for all monographs, which results in fairly stable distributions over time, with a permanent and growing zero class, because certain monographs have zero desirability to begin with (Burrell 1985, 1986, 1987). It is interesting to note that in his analysis of monographs Burrell mathematically modeled on the basis of one side of Feller's conundrum, inhomogeneity, what Bensman (1985b, 24–26) deduced at about the same time in his study of journals as a logical consequence of the operation of the double-edged Matthew Effect, itself a reflection of the other side of Feller's conundrum, contagion, i.e., stable distributions of library usage over time with a large zero class.

Burrell developed his model largely on global library circulation data, without subject set definitions. However, in an interesting application of the NBD to public library circulation, Brownsey and Burrell (1986) constructed a model consisting of a mixture of three NBDs to account for the three gross subject classes of British public libraries—adult fiction, adult nonfiction, and junior. The result was a much improved fit to the data. This result was confirmed by Kinnucan and Wolfram (1990), and it corroborates the conclusion of Kochen, Crickman, and Blaivas (1982) as well as of Ravichandra Rao (1990) on the need for proper set definitions when dealing with information concerning human knowledge. After this, Burrell (1988, 303) wrote, " . . . when we speak of a collection we do not necessarily mean the entire holdings of the library but rather some well-defined set of items within the library, e.g., all books acquisitioned in a particular year in a particular subject class."

The development by Burrell of his model was accompanied by a number of major controversies. The first of these involved his concept of a permanent zero class. For Burrell (1982, 2–3) the zero class as it appeared in circulation statistics was a highly complex phenomenon because it contained not only items that had zero desirability but also those that could not appear in these statistics because they were lost, stolen, placed on reserve, etc. Therefore, in his opinion, the zero class could not be treated as an item of hard data as the other circulation frequencies. To deal with it, he initially used a technique called "with added zeros," which basically involves first estimating the parameters of the distribution truncated by the omission of the zero class and then estimating the size of the zero class by assigning an artificial probability to it (Johnson and Kotz 1969, 205–7).

When Burrell presented his model calculated in this fashion to a session of the Royal Statistical Society, it drew fire from Chatfield, a professor of marketing and collaborator of Ehrenberg, who had introduced the NBD into marketing. Chatfield criticized the concept of zero desirability, noting that it had been found impossible to distinguish between "never-buyers" and buyers with a low mean rate of purchase who just had not purchased during the time period under review (Burrell and Cane 1982, 467). He recommended calculating the parameters on the full distribution with an estimated zero class.

Chatfield's criticism was repeated in a study of public library circulation by Bagust (1983), who described Burrell's concepts of desirability and zero class as "gratuitous assumptions" (p. 25). Accusing Burrell of "data-fitting," Bagust declared that (p. 25) " . . . if a book is exposed to the client population no one can be certain that one day it will not be borrowed, i.e., it has a non-zero-probability of circulation." He then proceeded to fit the NBD to the full distribution of a public library, declaring (p. 32) that "the absence of a 'zero class' in the Negative Binomial model ensures that every acquisition kept on open access shelving will eventually circulate (if not eaten by bookworms first!)." Burrell (1984) responded with a harsh attack on both Bagust's reasoning and mathematics. Burrell (1985) then proceeded to develop his aging concept, the logic of which inevitably leads to a certain proportion of the collection never circulating (p. 103).

A second controversy arising from the development by Burrell of his model related to the other end of the distribution. It, too, began during the discussion of the model at the Royal Statistical Society with an observation by Chatfield that the NBD tended to overestimate the number of monographs at the high-circulation end of the distribution. Chatfield found this overestimation natural, because there is an upper limit to the number of times a book can go out in a year (Burrell and Cane 1982, 467). However, the matter took a serious turn when the tendency of the NBD to overestimate the number of high-circulation monographs caused Gelman and Sichel (1987) to question the validity of applying the Poisson process to library monographic circulation. An understanding of the nature of the controversy can be found in the following passage (Coleman 1964, 291):

Based on this difference, Gelman and Sichel (1987) believed that external monographic circulation more closely resembled the binomial process of discrete trials for two reasons: the books could not be continuously borrowed, because they were out for extended periods; and there was a finite bound to the number of circulations in a given time period. Therefore, in place of the gamma-Poisson NBD, they proposed for external monographic circulation the beta-binomial distribution (BBD), which is a compound binomial distribution with the beta function as the mixing function. Testing both the BBD and the NBD against the external monographic circulation of two university libraries, Gelman and Sichel found that the BBD provided a much better fit to the high-circulation end of the distribution.

Haight (1978, 158) describes the BBD as the discrete time analog of the gamma-Poisson NBD in that it models qualitative inhomogeneity for short time periods so that only a success or failure can be recorded. Interestingly enough, the mixing beta function is the very function that Price (1976) demonstrated as modeling the single-edged Matthew Effect. Moreover, the NBD arises as a limit of the BBD (Boswell and Patil 1970, 8–9). In library terms, as Gelman and Sichel (1987) describe it, the binomial process turns into a Poisson process as the loan period shortens and the time the item is available for further use lengthens. Therefore, they suggested that binomial mixture models be applied to low-frequency use such as book lending and that Poisson mixture models be applied to high-frequency use such as journal or in-library use.

The controversies surrounding Burrell's development of his NBD model with aging on the basis of external monographic circulation came to a head with a study done by Tague and Ajiferuke (1987) at the Western Ontario School of Library and Information Science. They utilized University of Saskatchewan monographic circulation data for the academic years 1967–68 through 1977–78, which were organized into Collection I and Collection II. Collection I consisted of all those monographs that had circulated in the initial year 1967–68, and it traced their circulation history through the subsequent 10 academic years. It contained a zero class. Collection II contained monographic circulation data for the 11 academic years from 1967–68 through 1977–78. It was different from the first in that it provided information not on one set of monographs over time but on the 11 differing sets of the monographs that had circulated in each of the 11 academic years. Collection II did not have a zero class.

Tague and Ajiferuke applied the NBD to both of these collections. With respect to Collection I, they used two different ways to estimate the parameters of the NBD. The first way was to estimate the parameters by the method of moments in combination with another method that incorporated Burrell's aging factor (a proportion crudely obtained by dividing the circulation mean of the initial year into the circulation mean of the following year). This way comprised a technique for testing the predictiveness of Burrell's model. The second way was to use the method of moments to estimate both parameters for each year of circulation. As for Collection II, Tague and Ajiferuke employed a technique for estimating the parameters of the zero-truncated NBD, whose own inventor (Brass 1958, 59) described as suitable for exploratory work or to provide first-stage values for iterative maximum likelihood solutions. Tague and Ajiferuke then employed chi-square goodness-of-fit tests on the various circulation distributions, and in all cases the NBD was rejected as the appropriate model.

At this point it is necessary to pause to describe the general features of Collection I of the Saskatchewan circulation data and the results of Tague and Ajiferuke's tests upon it in order to bring into focus precisely what is at stake in these controversies. Collection I contained circulation data on 68,590 monographs, and in the first year, 1968–69, the zero class comprised 51,992 or 75.8% of the monographs in the set. Over the years, the zero class rapidly expanded until in the last year, 1977–78, it contained 63,251 or 92.2%. For the complete ten-year period, the mean of the zero class was 86.2% of the monographs in the set. Such a phenomenon is not unusual in library use. For example, in his seminal article on library use, Trueswell (1969) showed that 50% to 60% of library holdings satisfied 99% of circulation requirements.

During the 1970s Kent et al. found that 39.9% of the monographs acquired in 1969 by the Hillman Library at the University of Pittsburgh never circulated in the period from 1969 to 1975 and that in 6 branch science and technical libraries the zero class for journals ranged from a low of 63.1% in the Physics Library to a high of 93.2% in the Engineering Library. These zero classes can consume a considerable amount of resources, and the researchers found that the subscription costs of the zero class ranged from 47.9% of the Physics Library's serials budget to 86.5% of the Computer Science Library's serials budget (Kent et al. 1978, 61–62, 104–10; Kent et al. 1979, 9–104, 209–68; Flynn 1979). However, the Pittsburgh figures for serials might be overstated due to poor methodology. Whereas at Pittsburgh the Chemistry Library's serials zero class was estimated at 85.5% of the serials collection and its cost at 64.8% of the serials budget, a more careful study with a better sample by Chrzastowski (1991) at the University of Illinois at Urbana–Champaign (UIUC) Chemistry Library found the size of the zero class to be only 9% and its cost to be merely 3%.

In comparison to the zero class, the high-circulation class of the Saskatchewan data—defined here as 5 uses per year or more—was extremely small and shrank rapidly. In 1968–69, the top monograph circulated in the range of 17 to 19 times, but by 1977–78 the highest number of circulations for any monograph had fallen to 6. The size of the high-circulation class shrank parellel to the fall of the upper limit. In 1968–69 the high-circulation class contained 2,011( 2.9%) of the monographs in Collection I, and by 1977–78 it had diminished to 39 (0.06%). Over the ten-year period the mean of the high-circulation class was 1.0% of the monographic set.

Tague and Ajiferuke's test of Burrell's NBD model with aging found that it underpredicted the zero class 8 of the 10 years and overpredicted it 2 of the 10 years. The absolute error rate for the zero class ranged from 141 (0.22%) of the predicted frequency to 5,671 (10.4%) of the predicted frequency, averaging out to 3.1% for the 10 years. However, viewed from the perspective of the entire set of 68,590 monographs, the picture drastically changes. The highest absolute error of 5,671 was then 8.3%, and the average absolute error rate was 1,747 monographs (2.6%). Burrell's model was much less accurate with respect to the high-circulation class, and this is not surprising, given the much smaller size of this class. His model consistently underpredicted the high-circulation class 10 of the 10 years, and its error rate ranged from 10.1% to 228.6%, tending to grow larger as the high-circulation class became smaller. The average error rate in predicting the high-circulation class was 92.7%. However, once again, viewing the error rate from the perspective of the entire set of 68,590 monographs radically alters the picture. The highest underprediction was 605 (merely 0.9%), and the average underprediction of 177.4 (only 0.3%) of the total set.

The standard NBD without aging performed much better in Tague and Ajiferuke's tests, and this is understandable, because the parameters were estimated for each year without the element of predictiveness in Burrell's model. With respect to the zero class, the standard NBD's expected frequencies were consistently below the observed frequencies in all of the years. These differences ranged from 62 (0.1%) to 1,258 (2.3%), resulting in an average underestimation of 0.8%. Needless to say, the perspective from the entire set of 68,590 monographs leads to a much different assessment. From this viewpoint, the largest underprediction of 1,258 was only 1.8%, and the average underprediction error of 458.4 equaled only 0.7% of the entire set.

The performance of the standard NBD on the high-circulation class resembled that of Burrell's model, being much more erratic here, but its error rate was much smaller. Out of the 10 years, the expected frequencies were under the observed frequencies for 6 years, over for 3, and exactly correct for 1. The absolute error rate of the standard NBD on the high-circulation class ranged from 0.0% to 16.9%, averaging out to 8.8%. This error rate drops considerably when the entire set of 68,590 monographs is taken into account. In these terms, the highest absolute error of 268 equaled 0.4% of the set, whereas the average absolute error of 65.5 amounted to only 0.1% of the set. If the authors of this paper could bet the ponies or play the stock market with such odds, they would not be writing this paper! Moreover, without going into the highly technical question of the choice of estimators, it should be pointed out that Tague and Ajiferuke were running their tests on a global database without any division into well-defined subject sets, a procedure that Brownsey and Burrell (1986) as well as Ravichandra Rao (1990) have shown would have very possibly led to far better fits to the NBD.

The studies of Gelman and Sichel as well as of Tague and Ajiferuke provoked an outburst of exasperation from Burrell. Pointing to the general predictive success of his model in the tests by Tague and Ajiferuke, he concentrated his fire on them and wrote (1990, 166):

Burrell concluded with the declaration (p. 167):

Nevertheless, he went on to incorporate loan periods in his library book circulation model (Burrell and Fenton 1994).

The last word in this controversy will be given to several library school students in Belgium, which has become a center of informetrics due to the efforts of Egghe and Rousseau. As part of "bibliometric field work" for a course taught by Rousseau at the University of Antwerp, Leemans et al. (1992), collected book circulation data from several Flemish public libraries and fitted the NBD to it. They also sent one data set to Sichel to be fitted to the BBD. Although the BBD better fit the data, the students decided in favor of the NBD, pointing out that two attitudes are possible in the study of circulation data. The first is that of a statistician trying to fit the data as precisely as possible. In that case the NBD will often not be good enough, and more complicated models with more and more parameters will be necessary. The second attitude is to admit that there is more variation than simple statistics can explain and admit some discrepancies at the high end of the distribution. In this case simple statistics such as the NBD yield excellent trend curves, which are all the practicing librarian really needs. At the conclusion of their paper the students recommended that the NBD be taught in introductory library management courses.

It appears from the above literature survey that the NBD is a workable general probability distribution for library and information science. Therefore, if one finds a highly and positively skewed distribution in such work, one may operate under the assumption that one is dealing with the NBD or—if not precisely the NBD—a probability distribution closely related to it and modeling the same, often interacting processes of qualitative inhomogeneity and cumulative advantage. External monographic circulation might well be a special case, and even here the NBD works reasonably well. Gelman and Sichel themselves recommend mixed Poisson distributions for journal and in-library use. Therefore, for most purposes, practitioners can limit themselves to the simple index of dispersion test (Elliott 1977, 40–44), and, if the variance is found to be significantly greater than the means—and it almost invariably is, indicating a contagious distribution—one only has to carry out the proper logarithmic transformations and proceed to other questions.

Much of the work described above was done by statisticians trying to solve the problem mathematically without either proper set definitions or without reference to the sociological factors in human knowledge. Library use is strongly affected by these sociological factors, which comprise not only the Kuhnian concept of the "paradigm" but also the social bases of ST value. The case for the NBD is strengthened by the fact that the social bases of ST value are measured by such variables as peer ratings and citation rates, which are not subject to the periodicity limits of external library monographic circulation. It is to the problem of the social bases of ST value that attention is now directed.


Previous Section | Table of Contents | Next Section


LSU Libraries | Louisiana State University | Collection Development | Collection Development Policies


[ Collection Development/Acquisitions ] [ Collection Services ] [ LSU Libraries ] [ LSU Home Page ]
Copyright © 1997-2009 LSU Libraries
URL: http://www.lib.lsu.edu/collserv/lrts/ST10.html
Contact the Collection Services Webmaster (LIBCS@lsu.edu) about this site.