About This Item

Share This Item

The AAPG/Datapages Combined Publications Database

AAPG Bulletin

Abstract


Volume: 73 (1989)

Issue: 8. (August)

First Page: 967

Last Page: 976

Title: Estimating Potential for Small Fields in Mature Petroleum Province

Author(s): John C. Davis (2), Ted Chang (3)

Abstract:

A histogram of the number of fields discovered in a basin, plotted in categories of increasing volumes of fields, is called a field-size distribution. Its shape reflects the parent size distribution of oil pools that exist in the basin, the efficiency of the discovery process, and economic constraints that limit the development of extremely small pools. Typically, field-size distributions are approximately lognormal in form, with a pronounced tail extending to the larger field sizes on the right. In a maturely explored basin, the right tail of the field-size distribution will closely correspond to the shape of the underlying distribution of pools originally in place because almost all larger fields will have been discovered. The shape of the left tail, however, reflects e onomic truncation. Increases in crude oil prices may shift the point of economic truncation to the left, so previously submarginal discoveries may be placed into production. Most of the remaining undiscovered economic potential of mature basins may lie in these submarginal pools.

The size distribution of pools originally in place in a basin is not directly observable, but attempts have been made to infer its form from discovery process models applied to maturely explored basins. Models that postulate that the original pool-size distribution is J-shaped forecast that extremely large numbers of submarginal-size pools await discovery. These optimistic predictions have been cited in state and national petroleum policy statements. Statistical analyses of distributions of discovered fields in two mature areas indicate that J-shaped models of the pool-size distribution cannot be justified from available data. Information contained in the size distribution of discovered fields is not adequate to predict the number of submarginal pools remaining in the basin.

Text:

INTRODUCTION

The sizes of oil fields discovered in a petroleum province occur in a characteristic pattern. The greatest number of fields are relatively small in size, and there are decreasing numbers of larger fields. When field size is plotted as a graph with number of fields discovered as the abscissa and increasing sizes of fields to the right along the ordinate, the resulting curve is asymmetric, with a long tail extending to large field sizes. Typically, the curve drops abruptly on the left because very small discoveries tend to be uneconomical and do not become producing fields. In effect, the distribution is truncated by monetary considerations; the minimum size of discovery that can be placed into production is called the economic limit. Arps and Roberts (1958) first formally described the characteristic form of the distribution of oil-field sizes, based on their study of the Denver-Julesburg basin of Colorado. Subsequently, other studies found that many mature petroleum provinces have similar field-size distributions (Kaufman, 1963; Drew and Griffiths, 1965; Griffiths, 1966; McCrossan, 1969).

If field sizes are transformed by taking their logarithms, or the field-size distribution is plotted on semilog paper, the right tail is compressed, the left side is expanded, and the curve tends to become symmetrical. Figure 1A shows a histogram of the sizes of fields discovered in the Denver basin, and Figure 1B shows the same data plotted on a logarithmic scale. Kaufman (1963) was perhaps the first to point out the resemblance to the normal, or Gaussian, probability distribution, leading to speculation that field-size distributions are lognormal in form. Other investigators have also invoked the lognormal model to describe field-size distributions (Drew, 1972; Kaufman et al, 1975; Meisner and Demirmen, 1981; Lee and Wang, 1983; Forman and Hinde, 1985).

However, the field-size distribution describes only discovered fields, not the endowment of oil pools that originally existed in the basin or petroleum province. The lognormal is a model of the observed distribution of field sizes; the relevant distribution for evaluating the remaining potential of a region is the distribution that describes the sizes of remaining undiscovered pools. In the early stages of exploration, the distribution of remaining undiscovered pools is essentially identical to the original distribution of sizes of pools initially in place. As exploration proceeds, the remaining undiscovered distribution is the difference between this original distribution and the observed distribution of discovered fields. At maturity, the original distribution and the observed distr bution become more nearly alike, especially in the right tail,

End_Page 967------------------------------

and the remaining undiscovered distribution is confined to the less heavily sampled left side (Figure 2). Many discovery process models try to estimate the form of the remaining undiscovered distribution of a mature basin by simulating the effects of preferentially exploring for and discovering the larger fields (Arps and Roberts, 1958; Meisner and Demirmen, 1981; Schuenemeyer and Drew, 1983).

Obviously, the form of the remaining undiscovered field-size distribution of a mature basin depends equally upon the forms of the original distribution of pools and the distribution of discovered fields. Unfortunately, while the latter is known exactly, the former can only be hypothesized. Although Kaufman and others have speculated that the sizes of oil accumulations originally in place might form a lognormal distribution, no one has suggested a plausible reason why oil pools might occur in this way. If the original pool-size distribution is presumed to be lognormal, the implications for continued discoveries are quite different than if the original pool-size distribution in a basin is assumed to be J-shaped. For a fixed, economically determined minimum size of field, Newendorp (1975 pointed out the differences are of purely academic interest since the curve to the left of the economic limit will never be sampled. However, if the economic limit changes, the differences between the two models can be profound because a J-shaped pool-size distribution will yield many more small fields than will a lognormal distribution.

The extreme changes in crude oil prices since 1973 have dramatically affected the worth of marginal discoveries. Wells that would have been abandoned as dry before 1973 were completed and placed in production in later years. In any area, marginal-size discoveries are most probable, but in a mature province, they constitute almost all of the remaining exploration potential. This means forecasts of the remaining potential of a mature basin or petroleum province must consider two imponderables; the future price of crude oil, because this will change the future economic limit, and the shape of the original pool-size distribution to the left of the present economic limit.

The 1986 collapse of oil prices abruptly shifted the economic limit to the right, and exploration in mature regions within the United States essentially ceased. The sudden halt of exploration activity severely depressed the economy of many states, especially in the Mid-Continent and Gulf Coast. Oil prices remain depressed, and the petroleum industry, the public, and particularly state governments are apprehensive about the future of domestic oil exploration. In turn, this has led to renewed attempts to evaluate petroleum potential, especially the potential of economically sensitive deposits of marginal size. Future economic policies on corporate, state, and national levels are being formulated in part on the basis of these assessments (e.g., Meyer and Fleming, 1985; Fisher and Finley, 1986; Fisher, 1987). This paper examines, for maturely explored basins, the statistical validity of the estimated pool-size distributions upon which these assessments are founded.

Fig. 1. Histogram of sizes of oil fields discovered in Denver basin through 1986. (A) Plotted on arithmetic scale in 50,000-bbl size classes. (B) Plotted on base 2 logarithmic scale.

Fig. 2. Relationship between size distribution of pools originally in basin, observed or known size distribution of discovered fields, and distribution of remaining as-yet-undiscovered pools.

End_Page 968------------------------------

J-SHAPED PARETO DISTRIBUTION

The interpretation that very large numbers of marginal-sized fields await discovery after the price of crude oil rises and the economic truncation point shifts to the left is based in part on statistical analyses that suggest the original pool-size distribution is J-shaped rather than lognormal (Schuenemeyer and Drew, 1983). A J-shaped distribution implies that the smaller the volume of oil in an accumulation, the greater the number of such accumulations in the basin. Since assumptions about the shape of the left side of the pool-size distribution curve are critical to forecasts of the quantity of oil that might be discovered and produced economically from mature petroleum provinces in the United States, we need to examine these analyses and see if they are substantiated by alternativ procedures.

Among the J-shaped population models that have been considered are the exponential and Pareto distributions (for mathematical development, see Appendix 1). The Pareto distribution is an empirical function originally used to describe the inverse relationship between the size of income and the number of persons having that income. The distribution is now used to represent many J-shaped inverse frequency distributions. Zipf's law, used to model the sizes of mineral deposits, is a special case of the Pareto distribution (Arnold, 1985). In addition, Davis (1987) reported preliminary investigations of the hyperbolic distribution (familiar as a widely used model of production decline curves), the mean-shifted lognormal distribution, and the mean-shifted log-gamma distribution, which includes as special cases both the lognormal and Pareto distributions.

Schuenemeyer and Drew (1983) do not base their interpretation that the original pool-size distribution is J-shaped directly on the observed distribution of the number of fields actually discovered in each size class, but rather on a distribution of the hypothetical number of fields that will ultimately be discovered in each size class. This "ultimate number" is estimated by a discovery process model such as the negative exponential model of Arps and Roberts (1958) or the area of influence model of Drew et al (1980). Both procedures express the ultimate number of fields in a size class as a function of the number of fields already discovered, the proportion of the basin occupied by these fields, and the proportion of wildcat wells that led to their discovery. The drilling efficiency, a coefficient estimated from the discovery history, must also be estimated. Only the right tail of the distribution of ultimate numbers of fields can be estimated because the procedure does not work for those size classes where economic truncation has occurred (Schuenemeyer and Drew, 1983).

Figure 3 is based on Schuenemeyer and Drew's analysis of the Texas Permian basin and shows both the observed field-size distribution and the estimate, from their discovery process model, of the "ultimate field-size distribution." (Their table is given in log base 2 field-size classes, which are also used in Figure 3.) The influence of the estimated values will far outweigh any effect of the actual observations if a fitted model is used to describe field-size distribution. This introduces a troublesome question: Is their interpretation of the original pool-size distribution as being J-shaped based on the characteristics of the observed distribution of field sizes or on the behavior of their discovery process model?

DIRECTLY FITTING THE FIELD-SIZE DISTRIBUTION

Uncertainty about the effect of the discovery-process model can be avoided if we assess the pool-size distribution using only observed data, not controvertible estimates of future discoveries. To do this, we must assume that the observed and ultimate distributions coincide in the larger field-size classes. Then, models of pool-size

Fig. 3. Field-size distributions for Permian basin, Texas. Black columns represent observed distribution of discovered fields. Ruled columns represent "ultimate field-size distribution" estimated by a discovery process model. Data from Schuenemeyer and Drew (1983).

Fig. 4. Percentage of fields containing 256,000 bbl or more of oil discovered in northern Central Kansas uplift in successive five-year periods. Note smallest bar is for two-year period (1986-1987).

End_Page 969------------------------------

distributions can be fitted to the known upper tail and extrapolate into the zone of economic truncation on the left. If there are increasing numbers of smaller deposits, presumably a J-shaped model will provide a statistically significant better fit than will a lognormal or alternative model. Conversely, if a J-shaped model does not fit significantly better than an alternative, we must conclude that the large field-size data themselves do not contain enough information to forecast the form of the pool-size distribution in the smaller size region and that the results of Schuenemeyer and Drew are therefore based on the mathematical characteristics of the discovery process model itself.

This testing procedure is valid only if we are reasonably certain that all large fields have been discovered and the observed field-size distribution is complete on the right side. This is true only in very mature regions, but these are the ones for which the question of remaining potential in the (currently) subeconomic size classes is critical. For comparison, the method is applied to data from the Denver basin, one of the areas studied by Schuenemeyer and Drew, and the northern part of the Central Kansas uplift. Two alternative models, one J-shaped (Pareto) and one lognormal, are fitted to the observed field-size distributions. Size data for fields discovered in the Denver basin (through 1986) and in the northern Central Kansas uplift (through 1987) are given in Table 1. In the fol owing analyses, we assume that essentially all fields above a certain size (usually 256,000 bbl) have been found. For the central Kansas data, we can check this assumption with Figure 4, which shows for successive 5-year intervals the new fields that contained 256,000 bbl or more of oil as a percent of all fields discovered.

To test the fit of field-size distributions to alternative theoretical models, the distributions must be expressed in probabilistic terms. Then we evaluate if we can realistically expect the observed distribution of field sizes to have been drawn from a hypothetical population represented by the models. Let V be the volume or size of a discovered field, ^florin(v) an assumed probability density for the parent population of pool sizes, and v0 an arbitrary truncation point. The conditional density ^florin(v|v >= v0) of V, given that V >= v0, is the probability density of the fields whose size is at least v0. (The precise definition of the frequency distribution is given in Appendix 1.)

If we assume that all fields equal to or larger than v0 in size have been found, we can test the adequacy of ^florin(v) as a model for the distribution of field volumes by examining the goodness of fit of the conditional density ^florin(v|v >= v0) to the observed distribution of the sizes of fields larger than v0. To examine the goodness of fit, we use a X2 test because of its wide use and easy interpretation. More importantly, the X2 test is optimal when the data are given in broad categories or size classes. Categorizing the field-size data into size classes reduces the sensitivity of the analysis to uncertainties in the ultimate yields from still-producing fields.

Generally, an assumed field-size distribution depends upon one or more unknown parameters. For example, the Pareto distribution, defined as

EQUATION (1)

has two unknown parameters: ^agr and ^Thgr. The parameter ^agr represents the minimum conceivable field size. No fields can exist that are smaller than ^agr; however, ^agr must be greater than zero. ^Thgr expresses the rate of decline of the distribution. The conditional density function of the Pareto has only the single unknown parameter ^Thgr, since the truncation level v0 is preset:

EQUATION (2)

We fitted a conditional Pareto distribution to the field-size data from the Denver basin using a truncation value of v0 = 256,000 bbl. The unknown parameter, ^Thgr, was estimated by a minimum X2 estimate (see Appendix 1). The best estimate of the decline-rate parameter, ^Thgr, is 0.709, with a goodness of fit X2 of 46.91. This value of X2 is significant at a level much less than 0.0001, leading to the conclusion that the Pareto distribution does not adequately represent the Denver basin field-size data.

Figure 5 shows the data from the Denver basin and the best-fitted Pareto distribution. Note in Figure 5B that the Pareto distribution is wildly optimistic and predicts an unreasonable number of large fields. The results might

Fig. 5. Pareto distribution with best fit to field-size data from Denver basin. Size classes are given in log base-2 bbl, so size class n represents 2n × 1,000 bbl. (A) Complete distribution covering entire range of size classes. Size classes larger than 256,000 bbl (size class 9) are fitted; smaller size classes are extrapolations. (B) Fit of Pareto distribution within largest size classes, showing tendency of Pareto to overestimate number of largest fields.

End_Page 970------------------------------

be dependent in some manner on the choice of 256,000 bbl as the truncation point. To evaluate this possibility, Table 2 gives the ^khgr2 statistics and ^Thgr (the estimated Pareto parameter) for different truncation points. Because ^khgr2 values based on different degrees of freedom cannot be directly compared, they are converted to p-values.

The p-value of a test statistic is the probability level at which the statistic is just significant. A ^khgr2 test statistic of 4.35 with 5 degrees of freedom, for example, has an associated p-value of 0.5 because it is the 50th percentile of the ^khgr25 probability distribution, and the value defines a critical region containing half the probability distribution. Similarly, a test value of 11.07 has an associated p-value of 0.05 because it corresponds to the 95th percentile of the ^khgr25 distribution and only 5% of the probability distribution lies in the critical region to the right. However, if the Denver basin data set contained one more category of sizes of large fields, the test statistic would have an additional degree of freedom. Then, a ^khgr2 test statistic of 11.07 would have an associated p-value of 0.089 because 8.9% of the ^khgr26 distribution lies to the right. For any test statistic, the p-values are uniformly distributed from 0 to 1 and can be used to compare ^khgr2 values with different degrees of freedom. Since all the p-values in Table 2 are less than 0.05, changing the truncation point does not affect our conclusion that the Pareto distribution does not adequately represent the form of the field-size distribution in the Denver basin.

Field size, V, has a lognormal distribution if the natural logarithm of V is normally distributed. The normal distribution is fully defined by two parameters: the mean µ and standard deviation ^sgr. Therefore, the lognormal distribution is characterized by the same parameters, the mean µ and standard deviation ^sgr of log V. Table 2 also shows the lognormal distribution fitted by the minimum ^khgr2 procedure to logarithms of field sizes in the Denver basin. Again, only the right tail of the observed field-size distribution was used. The p-values in Table 2 show that the lognormal provides a much better fit to the distribution of field sizes, even after compensating for the extra parameter in the lognormal model. When v0 equals 256,000 bbl, ^khgr2 UP> is 3.34 with 4 degrees of freedom. This value is not significant, showing that the lognormal model fits the Denver basin data. The fitted lognormal curve (Figure 6) underestimates the number of fields in size classes 1-8, which were not used in the fitting process. The collection of lognormal models that also are acceptable by the ^khgr2 criterion includes many curves that produce

Table 1. Number of Fields Discovered by Size Classes, Denver-Julesburg Basin and Northern Central Kansas Uplift

Fig. 6. Lognormal distributions fit to field-size data from Denver basin. Size classes larger than 256,000 bbl (size class 9) are fitted; smaller size classes are extrapolations. Although fits to largest size classes are almost perfect, small size classes are underestimated. Model 1 is minimum ^khgr2 fit with µ of 12.77 and ^^sgr of 1.55. Lognormal model 2 is also acceptable by ^khgr2 criterion and has µ of 11.99 and ^sgr of 1.88.

End_Page 971------------------------------

much smaller underpredictions of these smaller field-size classes; an example is shown in Figure 6. In contrast, the Pareto distribution fitted with ^Thgr equal to 0.0709 predicts 22,492 fields with volumes between 1,000 and 256,000 bbl remain to be discovered, giving a total volume of undiscovered oil of 169 million bbl. (Here, we used a truncation point of ^agr equals 1,000 bbl. When extrapolated backward into the left tail, the Pareto distribution projects an increasing number of fields of small size. In fact, as ^agr approaches 0, the projected number of fields and their total volume both go to infinity.)

Data on field sizes in central Kansas (Table 2) were fitted by the ^khgr2 procedure, yielding the results shown in Table 3 for the Pareto and lognormal distributions. The p-values provide little evidence to suggest that one model is superior. The lognormal models fitted with different values of v0 yield quite different estimates of the lognormal parameters, probably because the uninformative nature of the right tail of the Kansas data make the estimates unstable. Graphs of the Pareto and lognormal models fitted to the central Kansas field-size data, using a truncation point of v0 equal to 256,000 bbl, are shown in Figure 7. Although the fit of the alternatives is similar for the larger field-size classes, they predict substantially differing numbers of fields in the smaller size classes. In fact, the collection of lognormal models acceptable by the ^khgr2 criterion includes many that predict even fewer numbers of small fields (see Figure 7).

We can gain some insight into the relative merits of the alternative models of field-size distributions by fitting a more broadly defined model to the data. We chose a shifted log gamma distribution (see Appendix 1) defined by an exponential equation with three parameters: threshold parameter a, scale parameter ^Thgr, and shape parameter ß. These roughly correspond to the smallest conceivable field size, an average size value, and a skewness measure. We are primarily interested in ß. If ß equals 1, the log gamma distribution is a Pareto distribution; as ß approaches infinity, it approaches the lognormal. For even moderate values of ß, say around 10, the log gamma distribution is essentially indistinguishable from the lognormal distribution (Figure 8). The efore, both alternatives for characterizing field-size distributions are special cases of the log gamma distribution for V. The ability of the log gamma distribution to model a wide variety of functions led to our choosing it as a model. In the absence of a physical theory that dictates an appropriate form for the parent pool-size distribution, it is important to choose a model which can exhibit an appropriate assortment of forms for the distribution function.

For the Denver basin field-size data, ß equals 5.61 and ^khgr2min (ß) is 1.26 with 3 degrees of freedom. The ^khgr2 value is not significant, indicating no difference between the observed distribution and a conditional log gamma model. In addition to the lognormal model fitted previously, the field-size distribution in the Denver basin can be modeled by a conditional log gamma model having characteristics between the Pareto and lognormal distribution (see Figure 8).

In tests of field-size data from the northern Central Kansas uplift with v0 of 256,000 bbl, the best-fit value of ß is 38.30 with an associated ^khgr2 value of 2.52 with 4 degrees of freedom. Again, the test statistic is not significant, indicating the central Kansas field-size data can be adequately modeled by a conditional log-gamma distribution; the best-fitting model has characteristics essentially the same as a lognormal distribution.

An additional step for examining the alternative values of the shape parameter, ß, is to compute its confidence interval. Suppose we fit various gamma distributions to field-size data, holding ß constant but allowing allowing a and ^Thgr to vary; we can then find a minimum value ^khgr2min(ß) that represents the goodness of fit of the model with the specified value of ß that best fits the data. Then a 95% confidence interval for ß is the collection of ß for which

EQUATION (3)

The value of 3.84 is the 95th percentile of a ^khgr2 distribution with 1 degree of freedom.

Figure 9A is a graph of ^khgr2min for field-size data from the Denver basin. The plot shows the 95% confidence interval, which consists of all values of ß greater than 1.81. Note that the confidence interval does not include the Pareto distribution.

Field-size data for the northern Central Kansas uplift yield the ^khgr2min plot shown in Figure 9B. The 95% confidence interval includes all possible values of ß. We see again the uninformative nature of the right tail of this field-size data set. On the basis of the distribution of the sizes of larger fields already discovered, we cannot determine if the pool-size distribution in this area is J-shaped, lognormal, or somewhere between the two.

Table 2. Minimum ^khgr2 Estimated Pareto and ^khgr2 Estimated Lognormal Fits to Right Tail of Denver Basin Observed Field-Size Distribution

End_Page 972------------------------------

Thus, the observed distribution of sizes of discovered fields in the Central Kansas uplift cannot reliably predict the number of undiscovered economically submarginal pools. Fitting a general family of distributions that includes both J-shaped and lognormal alternatives indicates a wide range of possible shapes may equally well represent the original pool-size distribution.

CONCLUSIONS

Uncertainties of the present economic climate, coupled with declining reserves in many domestic petroleum provinces, have stimulated interest in estimates of the number of submarginal reservoirs in mature basins. With changes in the economy and price of crude oil, these unexploited resources may become economical. Because of their small volume, individual reservoirs in the subeconomic category cannot contribute significantly to petroleum reserves; but if vast numbers of such deposits can be discovered, the impact on reserves may be profound.

Fig. 7. Pareto and lognormal distributions with best fits to field-size data from northern Central Kansas uplift. Size classes are given in log base-2 bbl, so size class n represents 2n × 1,000 bbl. (A) Complete distributions covering entire range of size classes. Size classes larger than 256,000 bbl (size class 9) are fitted; smaller size classes are extrapolations. Pareto model is minimum ^khgr2 fit with ^Thgr of 0.713. Lognormal model 1 is minimum ^khgr2 fit with µ of 2.15 and ^sgr of 1.24. Lognormal model 2 is also acceptable by ^khgr2 criterion and has µ of 7.53 and ^sgr of 3.23. (B) Fit of best Pareto distribution within largest size classes. (C) Fit of two alternative lognormal distributions within largest size classes. /P>

Fig. 8. Shapes of gamma distribution for various values of parameter ß. Distributions have been standardized to a common scale and arbitrary mean µ for comparison. (A) Selected gamma distributions from range ß = 1 to ß = ^infinity. (B) Selected gamma distributions from range ß = 1.0 to ß = 1.5. Note horizontal axis has been expanded to include only shaded interval shown on (A).

End_Page 973------------------------------

Unfortunately, there is no direct evidence of the numbers of such small oil accumulations, since by definition subeconomic deposits have not been developed. Their abundance must be inferred from the number of deposits of economic size discovered in the same basin, i.e., from the observed field-size distribution. However, an extremely optimistic extrapolation into the small size range can be produced by assuming the pool-size distribution curve is J-shaped or, conversely, a pessimistic extrapolation can be made by assuming the curve is lognormal. Some studies have attempted to compensate for the less-than-total discovery of all presently economic fields by using a discovery process model. However, this leads to a prediction of the number of subeconomic pools based almost entirely on es imates from the discovery process model itself, not on the actual number of discovered fields. These predictions result in J-shaped pool-size curves which may be modeled by the Pareto distribution. The procedure inevitably produces estimates of very large numbers of undiscovered pools of subeconomic size.

In the absence of any physical theory to derive a hypothetical pool-size distribution, it seems a tremendous leap of faith to assume that a fairly restrictive family of distributions, such as the Pareto, can be used to model all petroleum basins. The Pareto family of curves has essentially only one parameter--a rate of decline. Its mathematical characteristics embody the following implicit assumption: The probability that V is actually rv, given that it is at least v, does not depend upon v.

For example, if r, the arbitrary multiplier, is 100, the probability that a reservoir which is known to contain at least 1,000 bbl actually contains 100 × 1,000 = 100,000 bbl is the same as the probability that a reservoir known to contain at least 100,000 bbl actually contains 100 × 100,000 = 10,000,000 bbl. Such an assumption is debatable at best. It seems more reasonable that the probability should decrease with increasing v. This characteristic of the Pareto distribution accounts for the overestimation of the large field-size classes shown in Figure 5B and the increase in ^Thgr with increases in v0 seen in Tables 2 and 3.

A more prudent approach is to assume a wider family of pool-size distributions, such as the (shifted) log gamma, which can assume a variety of shapes. By fitting

Table 3. Minimum ^khgr2 Estimated Pareto and ^khgr2 Estimated Lognormal Fits to Right Tail of Observed Field-Size Distribution for Northern Central Kansas Uplift

Fig. 9. ^khgr2min(ß) vs. ß. (A) Field-size data from Denver basin; 95% confidence interval (shaded) includes all values of ß greater than 1.81. (B) Field-size data from northern Central Kansas uplift; 95% confidence interval (shaded) includes all possible values of ß.

End_Page 974------------------------------

the model only to the essentially complete right tail of the known field-size distribution, the complications introduced by a discovery process model are avoided. The p-value of the ^khgr2 statistic can help determine the relative goodness of fit of alternative forms of the family of distributions. Field-size data from the Denver basin indicate that the distribution in that petroleum province is not a J-shaped Pareto distribution and that the Pareto predicts an unwarranted number of subeconomic petroleum deposits. A similar analysis of the northern Central Kansas uplift shows that the alternative forms for the field-size distribution cannot be distinguished. The right tail of the observed field-size distribution contains too little information to predict the form of the lef tail.

Although results from the Central Kansas uplift are inconclusive, they offer the valuable lesson that the possible number of undiscovered subeconomic deposits cannot be predicted reliably from available data. Economic analyses that assume the potential discovery of extremely large numbers of small deposits should be regarded as suspect. Comparison of results from the Denver basin with those from Kansas suggests that it is simplistic to assume a single, restricted model can adequately represent the pool-size distribution in different petroleum provinces, even if their observed field-size distributions appear to be similar in shape. The problem of predicting the number of small deposits that remain to be discovered is unsolved, even for extremely mature basins, although we would like to believe that vast numbers of small pools will be found, if only the price of crude oil will rise again.

APPENDIX 1

Let X = log V be the (natural) logarithm of the field size V. X is said to have a (shifted) exponential distribution if its density is

EQUATION (A1)

If X has the distribution in equation A1, then V has the Pareto distribution:

EQUATION (A2)

^Thgr defines the rate of decline of the distribution, ^agr is the minimum conceivable field size, and a is the logarithm of ^agr. Equation A1 assumes that no fields exist that are smaller than ^agr. X is said to have a normal distribution and V a lognormal distribution, if the density of x is

EQUATION (A3)

µ and ^sgr2 represent the mean and variance of X, the natural logarithm of the field size. Finally, X is said to have a (shifted) gamma distribution and V to have a log-gamma distribution if the density of x is

EQUATION (A4)

Here ^Ggr(ß) is the gamma function:

EQUATION (A5)

The parameter a represents the minimum log field size, and ^Thgr is a rate of decline parameter. The interpretations of a and ^Thgr are the same as in equation A1. ß is a parameter that controls the shape of the gamma distribution. In particular, if ß = 1, equation A4 degenerates to equation A1, so that the (shifted) exponential distribution is a special case of the (shifted) gamma distribution. Alternatively, we may say that the Pareto distribution is a special case of the log-gamma distribution. The mean µ and variance ^sgr2 of the gamma distribution (equation A4) are given by

EQUATION (A6)

As ß ^rarr ^infinity, while a and ^Thgr are adjusted to keep µ and ^sgr2 constant, the gamma distribution (equation A4) will approach the normal distribution (equation A3) (Figure 8). In other words, the lognormal distribution is a limiting case of the log-gamma distribution. A good source of information about these distributions is Johnson and Kotz (1970).

As would be expected from the shape of their density functions, the distributions of equations A1 and A3 project vastly different amounts of oil remaining in pools whose volumes are less than some v0. Let x0 = log v0. For a general density g(x), the total projected volume of oil from pools with volumes at most v0 is

EQUATION (A7)

where N is the total number of fields whose volumes exceed v0. If equation A7 is applied to the Pareto distribution, we obtain

EQUATION (A8)

If equation A7 is applied to the lognormal distribution, we obtain

EQUATION (A9)

where ^phgr(x) is the standard normal cumulative distribution function. For the log-gamma case, equation A7 yields a formula (which will not be given here), in terms of the incomplete gamma function. To estimate the remaining projected volume of oil in pools having volumes at most v0, the already-discovered volume of oil must be subtracted from equations A7, A8, and A9.

In general we use log field size X rather than V. The conditional density g(x|x > x0) of X, given X > x0, is

EQUATION (A10)

If equation A10 is applied to an exponential density (equation A1), we arrive at

EQUATION (A11)

Note that a disappears from the conditional relationship. The conditional density of V is similarly defined (we do not use it here) and if g(v|v > v0) is calculated for the Pareto density (equation A2), the parameter ^agr also disappears.

Suppose we have field-volume data V which have been grouped into size classes, usually on a logarithmic scale. Working now with X = log V, if we have a hypothesized conditional density g(x|x > x0) for the size classes greater than x0, we can compute the probability Pr(x1 < X < x2|X > x0) that a field will have log volume between x1 and x2, given that its log volume is at least x0:

EQUATION (A12)

Using equation A12, we can calculate the probability pi of the size classes above v0. If oi is the observed number of fields in each size class, we can compute the ^khgr2 goodness of fit statistic:

EQUATION (A13)

End_Page 975------------------------------

Here N is the number of fields whose volumes exceed v0; the summation extends over those size classes that exceed v0. In summary, we replace the observed field-size distribution by the observed of field sizes that exceed v0 and the hypothesized density function g(x) by its conditional density g(x|x > x0). Then we perform an ordinary ^khgr2 goodness of fit test.

Usually, the density g(x) depends on at least one unknown parameter ^Thgr, and it follows that the ^khgr2 statistic (equation A13) also depends on ^Thgr. When necessary, we use the notation ^khgr2(^Thgr) to explicitly note the dependence of ^khgr2 on ^Thgr. For example, if the exponential equation A1 (and hence ultimately equation A11) is used, ^khgr2 will depend on ^Thgr; if the normal (equation A3) is used, ^khgr2 will depend on µ and ^sgr2; and if the gamma (equation A4) is used, ^khgr2 will depend on a, ^Thgr, and ß. We use a "minimum ^khgr2 estimate" of the unknown parameters. That is, the estimates are those values for which ^khgr2 is minimized. The advantages of the min mum ^khgr2 estimation procedure are that its value depends only on the field-size classes. The common practice of reducing the degrees of freedom in ^khgr2 by one for each estimated parameter is strictly valid only if minimum ^khgr2 estimates are used (Kendall and Stuart, 1969). After finding ^Thgr, the value of ^Thgr which has the smallest associated value of ^khgr2, we use this minimum ^khgr2 value [symbolically indicated as ^khgr2min(^Thgr)] to test the goodness of fit of the family of distributions to the data.

For example, Figure 10 shows the dependence of ^khgr2 on ^Thgr if the exponential distribution is fit to the logarithms of field sizes in the Denver basin (with a truncation level, v0, of 256,000 bbl). This graph achieves its minimum value at ^Thgr = 0.709, which is the minimum ^khgr2 estimate of ^Thgr. The minimum value of ^khgr2 is 46.91, which indicates the exponential distribution is not an adequate model for the right tail of the distribution of log field sizes.

In practice, ^khgr2 must be minimized numerically by integrating equations A10 and A12. For the exponential distribution, these integrals can be determined explicitly in closed form. For the normal distribution, the integrals are available on computers in error function routines. For the gamma distribution, the integrals may be numerically calculated. In this study, IMSL (International Mathematical and Statistical Library) routine ZXMIN was used for minimization and DMLIN for integration. ZXMIN uses a quasi-Newton method in which the derivatives are numerically estimated.

Finally, the confidence interval given in equation 3 is an unusual, but not original, use of the ^khgr2 statistic. Its justification is similar to the justification for the ordinary ^khgr2 test for testing the goodness of fit of a parameterized family of distributions using a minimum ^khgr2 estimate for the unknown parameters.

Fig. 10. ^khgr2(^Thgr) vs. rate parameter ^Thgr for Pareto distribution fitted to field-size data for Denver basin. Minimum value of ^Thgr is 0.709.

References:

Arnold, B. C., 1985, Pareto distribution, in S. Kotz, N. L. Johnson, and C. B. Read, eds., Encyclopedia of statistical sciences, v. 6: New York, John Wiley and Sons, p. 568-574.

Arps, J. J., and T. C. Roberts, 1958, Economics of drilling for Cretaceous oil on the east flank of the Denver-Julesbrug basin: AAPG Bulletin, v. 42, p. 2549-2566.

Davis, J. C., 1987, Statistical evaluation of petroleum deposits before discovery, in C. F. Chung, A. G. Fabbri, and R. Sinding-Larsen, eds., Quantitative analysis of mineral and energy resources: Dordrecht, Holland, D. Reidel Publishing Co., p. 161-186.

Drew, L. J., 1972, Spatial distribution of the probability of occurrence and the value of petroleum; Kansas, an example: Mathematical Geology, v. 4, p. 155-171.

Drew, L. J., and J. C. Griffiths, 1965, Size, shape and arrangement of some oilfields in the USA: Symposium on Computer Applications in the Mineral Industries Transactions, p. FF1-FF31.

Drew, L. J., J. H. Schuenemeyer, and D. H. Root, 1980, Petroleum-resource appraisal and discovery rate forecasting in partially explored regions--an application to the Denver basin: USGS Professional Paper 1138A, 11 p.

Fisher, W. L., 1987, Can the U.S. oil and gas resource base support sustained production?: Science, v. 236, p. 1631-1636.

Fisher, W. L., and R. J. Finley, 1986, Recent production trends and outlook for future oil and gas supplies in Texas: University of Texas at Austin, Bureau of Economic Geology Geological Circular 86-4, 31 p.

Forman, D. J., and A. L. Hinde, 1985, Improved statistical method for assessment of undiscovered petroleum resources: AAPG Bulletin, v. 69, p. 106-118.

Griffiths, J. C., 1966, Exploration for natural resources: Operations Research, v. 14, p. 189-209.

Johnson, N. L., and S. I. Kotz, 1970, Continuous univariate distributions, v. 1: New York, John Wiley and Sons, 300 p.

Kaufman, G. M., 1963, Statistical decision and related techniques in oil and gas exploration: Englewood Cliffs, New Jersey, Prentice-Hall, 307 p.

Kaufman, G. M., Y. Balcer, and D. Kruyt, 1975, A probabilistic model of oil and gas discovery, in J. D. Haun, ed., Methods of estimating the volume of undiscovered oil and gas resources: AAPG Studies in Geology 1, p. 113-142.

Kendall, M. G., and A. Stuart, 1969, The advanced theory of statistics, v. 2: inference and relationship: London, Charles Griffin and Company, 307 p.

Lee, P. J., and P. C. C. Wang, 1983, Probabilistic formulation of a method for the evaluation of petroleum resources: Mathematical Geology, v. 15, p. 163-181.

McCrossan, R. G., 1969, An analysis of size frequency distribution of oil and gas reserves of western Canada: Canadian Journal of Earth Science, v. 6, p. 201-211.

Meisner, J., and F. Demirmen, 1981, The creaming method: a Bayesian procedure to forecast future oil and gas discoveries in mature exploration provinces: Journal of the Royal Statistical Society, Series A (General), v. 144, p. 1-31.

Meyer, R. F., and M. L. Fleming, 1985, Role of small oil and gas fields in the United States: AAPG Bulletin, v. 69, p. 1950-1962.

Newendorp, P. D., 1975, Decision analysis for petroleum exploration: Tulsa, Oklahoma, PennWell, 750 p.

Schuenemeyer, J. H., and L. J. Drew, 1983, A procedure to estimate the parent population of the size of oil and gas fields as revealed by a study of economic truncation: Mathematical Geology, v. 15, p. 145-161.

End_of_Article - Last_Page 976------------

Acknowledgments:

(2) Kansas Geological Survey, 1930 Constant Avenue, Lawrence, Kansas 66046.

(3) Kansas Geological Survey and Department of Mathematics, University of Kansas, Lawrence, Kansas 66045. Present address: Department of Mathematics, University of Virginia, Charlottesville, Virginia 22903.

Field-size data for the Denver-Julesburg basin were compiled from information provided by the Oil and Gas Conservation Commissions of Colorado, Nebraska, and Wyoming. Data for the northern Central Kansas uplift were provided by Technical Information Services, Kansas Geological Survey. Rick Brownrigg, Geof Bohling, and JoAnne DeGraffenreid assisted in computer analyses and data preparation.

Copyright 1997 American Association of Petroleum Geologists

Pay-Per-View Purchase Options

The article is available through a document delivery service. Explain these Purchase Options.

Watermarked PDF Document: $14
Open PDF Document: $24

AAPG Member?

Please login with your Member username and password.

Members of AAPG receive access to the full AAPG Bulletin Archives as part of their membership. For more information, contact the AAPG Membership Department at [email protected].