About This Item

Share This Item

The AAPG/Datapages Combined Publications Database

AAPG Special Volumes

Abstract


Pub. Id: A078 (1975)

First Page: 113

Last Page: 142

Book Title: SG 1: Methods of Estimating the Volume of Undiscovered Oil and Gas Resources

Article/Chapter: A Probabilistic Model of Oil and Gas Discovery

Subject Group: Oil--Methodology and Concepts

Spec. Pub. Type: Studies in Geology

Pub. Year: 1975

Author(s): G. M. Kaufman (2), Y. Balcer (3), D. Kruyt (4)

Abstract:

A probabilistic model was constructed of the size of pool discovered in order of discovery within a geologic zone. The model predicts a decline in the average size of discovery as the resource base is depleted. It is built on assumptions about the size distribution of hydrocarbon deposits and the way which this size distribution interacts with exploratory activities. These assumptions govern the behavior of additions to discovered oil (gas) in place as a function of wells drilled in a play. Statistical properties of major Alberta plays were compared with properties of a Monte Carlo simulation of the model. It is possible to interface the model with expert subjective judgment to generate probabilistic forecasts of the size distribution of pools in prospective areas.

Text:

INTRODUCTION

A coherent national energy policy cannot be formulated without reliable estimates of the quantities of oil and natural gas remaining to be discovered in United States territories, supplemented by a forecast of what fraction of each can be recovered using currently available technology. Unfortunately, there is wide disagreement about what methods should be used to generate these estimates, as well as about their magnitude: the highest publicly cited estimate of recoverable oil remaining to be discovered is about 17 times the lowest!

A national energy policy based on the lowest of these estimates may differ radically in form from one based on the highest. Therefore, it is of critical importance to develop methods for estimation of oil and gas reserves that have scientific credibility and that simultaneously generate estimates in a form immediately useful for policy analysis. Unfortunately, none of the methods currently employed to estimate amounts of undiscovered oil and gas recoverable by use of current technology possess both of these attributes. The primary purpose of the research program proposed here is to develop methods that possess both. In order to be scientifically credible, a method must be based on explicitly stated postulates whose validity can be empirically confirmed using observed data. In order to be useful for policy analysis, it must provide not only single-number estimates, but an explicit measure of the degree of uncertainty of each such estimate.

In addition, methods should be designed so as to allow construction of an economic supply function (i.e., a description of how additions to reserves from new discoveries behaves as a function of wellhead price, exploratory effort, and the costs of exploration). Our goal is the construction of a predictive model which provides probabilistic answers to two questions:

1. How many undiscovered pools remain in a given region, and what is their size distribution?

2. What additions to economically exploitable reserves will accrue from an increment of exploratory effort?

End_Page 113------------------------

The model can be interfaced with expert subjective judgment to provide an answer to the first question for unexplored areas, as well as for areas where data on drilling successes and failures and sizes of discoveries have been generated by exploration activity.

It is a process-oriented probabilistic model. By "process-oriented" we mean a model that explicitly incorporates certain geologic facts and, in addition, is based on assumptions that describe the manner in which exploration technology and observed statistical regularities of the size of pools interact to generate discoveries.

The proposed model of the discovery process has four major components:

1. A submodel of pool sizes discovered in a homogeneous geologic population of pools in order of discovery.
2. A submodel of wildcat drilling successes and failures.
3. A submodel of the economics of a single exploratory venture.
4. A submodel of the "capital market" for exploratory ventures.

When assembled, these submodels constitute a probabilistic model of the returns in barrels of oil and/or Mcf of gas generated as a function of price and physical nature of the reservoirs available for exploitation. We shall discuss properties of only the first of these four components.

DISCOVERY PROCESS

"Discovery process" is a descriptive label for the sequence of information-gathering activities (e.g., surface reconnaissance and magnetic, gravimetric, and seismic surveys) and acts (drilling of exploratory wells) that culminate in the discovery of petroleum deposits. In building models of the discovery process, we will regard it as being effectively described by a small number of quantitative attributes (such as the number of exploratory wells drilled into a geologic formation in a given area and the oil [gas] in place in a newly discovered pool) and postulated relations among them. Although doing descriptive injustice to the way in which geologists extrapolate geologic facts to guide exploratory activity, a model composed solely of such attributes can embody many of the essential f atures of the discovery process.

A petroleum basin or area the size of Alberta will in general contain reservoirs or pools with distinctly different geologic characteristics. We shall regard the totality of pools in Alberta as being classified into a collection of subpopulations of pools of similar geologic type. By definition, a play begins with the exploratory well that discovers the first pool in a subpopulation of pools. Thus there are, in principle, as many potential plays as subpopulations or geologic types. The choice of typology depends on the use to which it will be put; our choice will be coincident with a generally agreed-upon description of major plays in Alberta (e.g., Cardium, D-2, D-3, Viking, Beaverhill Lake).

A key component of our model is a set of (probabilistic) assumptions which govern the behavior of additions to oil (gas) in place as a function of the number of wells drilled in a play. When plays are set in relation to one another on a time scale, total additions to oil (gas) in place in a given time interval may be regarded as generated by a temporal superposition of individual plays. One might also superpose plays on a scale composed of the cumulative number of exploratory wells drilled in the province. A model that effectively describes the behavior of the number of exploratory wells drilled into each play in any given time interval automatically generates a description on this scale.

To the degree that we can separate physical and engineering aspects of the discovery process from economic considerations, we shall do so.

End_Page 114------------------------

A partitioning of assumptions into two classes, one physical and the other economic in character, leads to substantial simplifications both in the structure of the model and in procedures for making inferences about its parameters. In particular, classification of pools into geologically homogeneous subpopulations leads to a corresponding statistical homogeneity of the economic attributes of pools within each subcpopulation. Thus we are able to trace the influence of price, exploration costs, and development costs on additions to reserves from new discoveries in a much more meaningful way than if all subpopulations of pools are aggregated into a single population.

Assumptions about the physical nature of the discovery process are stated in a way which tacitly implies that economic variables may influence the temporal rate of drilling exploratory wells in a play, but they do not affect either the probability that a particular well will discover a pool or the size of a discovery within a given play. This assertion is patently false if applied to a population consisting of a mixture of subpopulations with widely varying geologic characteristics. For example, a large price rise may accelerate exploratory drilling in high-risk (low probability of success) subpopulations with large average-pool sizes at a substantially different rate than in subpopulations with small pool sizes but high success probabilities. The overall probability of succe s for a generic well among the wells drilled in a mixture of these subpopulation types, as well as the size of discovery, will depend on the relative proportions of wells drilled in each subpopulation; and these proportions are influenced by prices and costs. By contrast, it is reasonable to assume that, within a given subpopulation, the precision of information-gathering devices and the quality of geologic knowledge of that subpopulation are the principal (perhaps sole) determinants of the probability of success of a generic well. A price rise may accelerate the temporal rate of drilling within that subpopulation, but it will not affect the quality of geologic knowledge at any given point on a scale of cumulative wells drilled into it. Exceptions can be found, of course, but this assump ion is plausible as a broad descriptive principle. As stated, its adoption yields important analytical bonuses: it simplifies the modeling process and allows us to be parsimonious in choice of parametric functions for components of the model.

PHYSICAL POSTULATES

Our postulates or assumptions about the physics of the unfolding of a play reflect both petroleum folklore and the content of a variety of statistical and analytical studies of the discovery process. The principal ones are:

Ia The size distribution (in barrels or Mcf) of petroleum deposits in pools within a subpopulation is lognormal.

II Within a subpopulation, the probability that the "next" discovery will be of a given size (in barrels or Mcf) is equal to the ratio of that size to the sum of sizes of as-yet-undiscovered pools within the subpopulation.

The assertion that the size distribution of pools is lognormal implies that an individual accumulation, no matter how small, may be regarded as a pool. The number of such pools in a given play can be enormous. Since "tiny" accumulations are of no practical importance, we can, a priori, restrict the definition of elements of a subpopulation of pools to include only accumulations of a given size A0 or greater. Although the choice of

End_Page 115------------------------

A0 is arbitrary to a degree, A0 might be chosen small enough to encompass all pools detectable by use of current technology. It must be chosen small enough to include pools producible at a price far greater than that currently obtaining, so as to avoid a confounding of the definition of subpopulation elements with price. A modified version of Ia thus suggested is:

Ib The size distribution (in barrels or Mcf) of petroleum deposits in pools within a subpopulation is truncated lognormal with truncation point A0>0.

When A0 is chosen to be very small, Ia and Ib lead to essentially similar results, although the problem of inference about parameters of the underlying size distribution is somewhat more complicated given Ib. An interpretative advantage of Ib is that one does not have to rationalize the proposition that, when A0=0, there are "pools" within the subpopulation being sampled which are so small as to have virtually zero probability of ever being discovered. In addition, the parameter N takes on added meaning for, given Ib, it denotes the number of elements in a set, each member of which can, in principle, be discovered and identified using current technology. Henceforth we shall refer to "assumption I" and disting ish between Ia and Ib only where necessary.

The probabilistic behavior of amounts of oil (gas) in place discovered by each discovery well in order of discovery is completely determined by a conjunction of assumptions I and II. That is, assumption II implies that, "on the average," the larger (in size of oil [gas] in place) pools will be found first and, as the discovery process depletes the number of undiscovered pools in a subpopulation, discovery sizes will (again, "on the average") decline.

Our third assumption structures the behavior of the success ratio within a play once it has begun. Often a play begins with a stroke of geologic insight. With this insight, application of geophysical technology coupled with geologic analysis will identify a population of prospects, some of which will be pools and others of which will be dry. Letting S denote the sum of sedimentary volumes of all undiscovered pools in the play, and letting U denote the sum of sedimentary volumes of all undrilled prospects that are potentially identifiable by use of currently available exploration technology, we state:

IIIa The probability that an exploratory well will discover a new pool is equal to

[EQUATION]

where ^kgr > 0 is a constant.

The constant ^kgr is to be interpreted as an index of the difficulty (or ease) of discovery of pools within a given subpopulation once a play has started within it. Hence it may vary among subpopulations. For example, lense-type stratigraphic traps are more difficult to identify by seismic means than are pinnacle reefs, and thus might be assigned a smaller value of ^kgr.

Assumption III is an extension of the idea behind assumption II (sampling proportional to size) and says that the probability of a discovery of any size shares the same general property. If the value of ^kgr--the index of difficulty (or ease) of discovery--is 1, then drilling is "random" in the sense that predrilling exploration technology does not enhance the probability of discovery. Exploratory drilling in this particular case is like throwing darts into a three-dimensional volume, where each piece of equal volume has the same probability of being hit

End_Page 116------------------------

no matter where it is located. Even in this special case, the probability of discovery will change as ^kgrS/^kgrS + U changes with each exploratory well drilled.

A variant of IIIa is to replace "volume" by "areal extent"; we call this assumption IIIb. It is a more natural assumption than IIIa in certain respects. If drilling is completely random with respect to longitude and latitude, then the probability that a generic pool will be discovered is exactly equal to the ratio of the areal extent of the pool to the total areal extent of all undrilled prospects in that pool's geologic producing zone, and IIIb, not IIIa, is the relevant assumption. In fact, the areal extent and volume of many (but not all) pool types are highly correlated. Where such is the case, IIIa and IIIb are, in the use to which we shall put them, almost interchangeable.

Assumptions I, II, and III imply that once a play has begun the probability of discovery decreases on the average as the play unfolds. Although descriptively harsh--there are plays in which the success ratio continues to rise for a time after drilling of the initial discovery well--the specific functional form for the probability of success within a play implied by assumptions I, II, and III is fairly simple and allows us to calculate an estimate of ^kgr.

Assumptions I, II, and III describe the physical evolution of a play, once it has begun. To articulate accurately in mathematical terms how and when a play begins is substantially more difficult. Geologic knowledge generated by seismic, gravity, magnetic, and surface surveys and analysis of exploratory well data, costs, and prices are determinants of the probability that a new play will begin at a given point on either a time scale or a scale composed of cumulative exploratory wells drilled. The spatial configuration and geographic location of sediments also play an important role. There are, nevertheless, several simple, descriptive assertions about the genesis of a play that lead to plausible postulates about the occurrence of a new play at a given point on a scale of cumulative num er of wells drilled.

1. The cumulative number of exploratory wells drilled in the province is an index of geologic knowledge.

2. As the volume of unexplored sediment in the province decreases, so does the likelihood that a new play will occur.

3. Exploratory wells drilled in an existing play (intensive wells) are less likely to lead to a new play than wells drilled in an area not contiguous to an existing play (extensive wells).

The cumulative number of exploratory wells drilled is at best a crude surrogate for geologic knowledge. However, geologic knowledge does grow as the number of wells drilled grows, and so the latter is an index of the degree to which the geology of the region is understood. Assertion 3 suggests that the interarrival times between successive plays, measured on a scale of exploratory wells drilled, on the average becomes shorter as the proportion of extensive wells per well drilled becomes larger. Assumption IV articulates this idea more carefully, although considerable further refinement of it is necessary before it can be used to structure a probabilistic model of interarrival times between successive plays.

IV Interarrival times between successive plays are uncertain quantities. The mean time between two successive plays, measured on a scale of cumulative exploratory wells drilled, (a) increases with an increase in the proportion of wells drilled extensively subsequent to the beginning of the first of these two plays, and (b) increases as the volume of unexplored sediment in the province decreases.

End_Page 117------------------------

The analog of assumption IV for interarrival times measured on a time scale requires consideration of costs, prices, and investor behavior in the face of uncertainty (i.e., the economic returns to exploratory ventures within each subpopulation).

BACKGROUND FOR ASSUMPTIONS I, II, III, AND IV

A variety of studies supports the assertion that the size distribution of oil (gas) pools is adequately represented by a lognormal distribution.

Allais (1957), in a large-scale study of mineral resources of the Sahara Desert, concluded that the lognormal distribution provided a surprisingly good fit to frequency histograms of the value of deposits of ores such as iron, copper, gold, zinc, diamonds, etc. Krige's (1951) analysis of gold deposits in the Witwatersrand was, perhaps, the first to use the lognormal distribution as a characterization of the size distribution of a mineral deposit, and Allais' study strongly reinforced the reasonableness of the law in this context. Oil and gas were notably absent from the list of mineral resources treated in the published version of Allais' paper (an expected omission in view of the political complexities of France's relations with Algeria and her desire to hold onto the vast m neral resources of the region, irrespective of whether Algeria became independent).

Arps and Roberts' (1958) study of Cretaceous fields on the eastern flank of the Denver-Julesburg basin (using a large sample and grouping data) lent credence to the hypothesis that the size distribution of petroleum deposits is lognormal. Although Arps and Roberts proceeded heuristically, eschewing standard statistical testing procedures, their data plot very close to a straight line, even in the extreme right tail, when plotted on lognormal probability paper (cf Kaufman, 1963).

Kaufman (1963) examined what can be regarded at best as incomplete data and found that Arps and Roberts' assumption of lognormality was not unreasonable. McCrossan (1969), using the Alberta Province Energy Resources Conservation Commission's detailed compilation of data on individual pools discovered in Alberta, did a similar but more refined analysis. He first classified pools according to geologic type and then plotted fractile estimates derived from each sample so derived on lognormal-probability paper. Within classes composed of 50 or more pools (e.g., Viking Reef), the lognormal distribution provided a good visual fit. He also showed that one plausible explanation of the appearance of bimodality and/or deviations from lognormality in the tails is that observations from geological y distinct populations are being mixed together.

The studies by Allais (1957), Kaufman (1963), and McCrossan (1969) considered sizes in order of observation as observing values of a sequence of mutually independent and identically distributed random variables. In fact, oilmen long have observed that, within a play, the larger pools tend to be found first and the average size of new discoveries decreases as the play matures. Thus the process of observing pool sizes in order of discovery is more akin to sampling without replacement and proportional to (random) size than to sampling values of independent, identically distributed random variables.

The conjunction of these two features introduces novel complications and renders more difficult a careful theory of inference about parameters of the underlying size distribution--and about what remains to be discovered. If a play regarded as a sampling process possesses both of these attributes, there is a possibility of serious error in making inferences under the assumption that they are not present, and care should

End_Page 118------------------------

be taken to determine, in a systematic way, under what conditions these attributes must be explicitly taken into account.

More specifically, it is reasonable to postulate that discovery sizes in order of observation are generated by sampling without replacement from a finite population of pools whose sizes (area, volume) are generated by yet another random process; the finite population of pools is a random sample from a hypothetical infinite population (a superpopulation) whose size distribution is of known functional form. This characterization of sampling from a finite population is well known in the statistical literature and has been used to develop classical, fiducial, and Bayesian procedures for estimation of finite population parameters (cf Cochran, 1939; Fisher, 1956; Ericson, 1969; and Palit and Guttman, 1973). However, if the sample drawn from the finite population is random, without replacement and proportional to size in the sense that the probability of observing the ith finite population element at the jth sample observation is equal to the ratio of the size Ai of that element to the sum of sizes of the as-yet-unobserved finite population elements, then, although the general framework is relevant, none of the specific techniques developed in the literature just cited can be applied directly. An additional complication is that the number of elements in the finite population is generally not known with certainty in this particular problem.

Uhler and Bradley (1970) analyzed the spatial occurrence of petroleum pools in Alberta, hypothesizing that the number of pools per unit area is describable by a negative binomial probability law. They obtained an excellent fit to actual frequencies of observed occurrences in "well-explored" areas (i.e., to the frequencies of pools per unit area discovered up to 1970). Their method provides one way of estimating the frequency of occurrence of pools in a given area. Drew (1972) conducted an empirical study of the spatial distribution of petroleum within land tracts in Kansas. Accounting the effect of land ownership on the number of deposits discovered per unit area in Kansas leads to a probability law substantially different from the negative binomial.

Cox (1969) discussed sampling proportional to size from an infinite population with particular reference to the sampling of textile fibres, and his results have relevance as the size of the finite population approaches infinity.

Arps and Roberts' (1958) model, Kaufman's (1965) recharacterization of it in terms of a system of differential equations, and Crabbe's (1969) modification of Kaufman's work obliquely embody the notion of sampling proportional to random size. However, the models in these three papers are formulated in such a way that rigorous statistical testing of the basic assumptions underlying them is difficult if not impossible. Drew (1974) conducted an interesting retrospective simulation study of exploration in the Powder River basin by using historical data on pool sizes and their location as an empirical base. This study descriptively embodies sampling proportional to size and without replacement. Drew carefully pointed out that, since his study is retrospective in structure, explicit predicti ns about future discoveries can be made only by assuming similarity between the resource base in the control area on which the simulation is conducted and an unexplored target area.

Assumption III asserts that the probability that an (intensive) exploratory well will discover a new pool within a given subpopulation is proportional to the ratio of the volume of undiscovered pools to the volume of potentially hydrocarbon-bearing sediment of that population's geologic type; it is a logical extension of assumption II. Although the assumption is plausible in nature and has appeared in disguised form in several papers, it has never been validated empirically. Ryan (1973a), in an important paper on the crude-oil discovery rate in Alberta, based

End_Page 119------------------------

his analysis on a set of assumptions similar to II and III, the most important one being an amalgam of deterministic versions of those assumptions: "The rate of discovery of oil in a play is proportional to the undiscovered oil in the play and the knowledge of existence of the play." A probabilistic version of this postulate, not empirically validated, appears in Kaufman and Bradley (1972).

Ryan (1973a) was the first to investigate a deterministic model of discovery consisting of a superposition of models of individual plays on a scale of cumulative number of wells drilled. The differential equation he obtained for additions to reserves per well drilled is a deterministic version of the stochastic model that follows from assumptions I, II, III, and IV. Although extremely useful in providing rough estimates of additions to oil in place from plays already known to be in existence, his model contains no mechanism that generates new plays as more wells are drilled. Ryan gives a thorough discussion of the strengths and weaknesses of his model. In a second paper, Ryan (1973b) point-forecasts growth of potential crude-oil reserves in Alberta as a function of new-field wildcats rilled.

PROBABILISTIC MODEL OF AN INDIVIDUAL PLAY

Our mathematical description will be done backwards; that is, we shall begin by describing in detail a submodel of amounts discovered within a play per discovery in order of observation. The size of a discovery will be measured in stock-tank barrels of oil (Mcf equivalent if gas) in place. We shall assume that assumptions I (lognormality of size distribution) and II (sampling without replacement and proportional to random size) hold. These two assumptions are sufficiently rich to guarantee logical completeness of this submodel. This submodel describes both the outcomes of drilling (discovery or dry hole) and the size of a discovery when one is made. It evolves on a scale of wells drilled in a given play, and its properties are not influenced by either the rate at which wells are drill d or by economic variables. A superposition of individual plays on a time scale requires additional assumptions about the influence of economic variables on the amount of exploratory effort in a given time interval and how it is allocated among plays. In a later report we shall develop a model of returns to drilling in each play using assumption III.

EMPIRICAL SIZE DISTRIBUTIONS

A cursory examination of empirical size distributions of oil pools shows that these distributions are usually unimodal and highly skewed, having very long right tails; that is, a small proportion of observed values is very large and a large proportion is very small. There is an infinity of functional forms for unimodal distribution functions concentrated on zero to infinity and having long right tails. The problem of deciding which particular functional form best fits observed data on sizes of petroleum pools is quite complicated.

Some of the reasons are: (1) reported oil (gas) in place is only an engineering estimate, not a direct observation of actual oil (gas) in place; and (2) the accuracy of reported oil (gas) in place in a pool generally will improve as the pool is developed, so that the initial estimate may be far from that given when the pool is nearly depleted. In addition to reporting bias, truncation of sample observations

End_Page 120------------------------

may be present. (3) An exploratory well may yield only a "show" of oil (gas). In such cases hydrocarbons are present, but in an amount far below the economic breakeven point. Such "pools" will not be exploited and the well may be reported as a dry hole. Another complicating feature is that, when sampling is without replacement and proportional to random size, the distribution of sample observations is not the same as the size distribution of pools in nature.

One can buttress a choice among functional forms by appealing to basic principles or postulates describing the process of hydrocarbon deposition. One popular postulate (cf Matheron, 1955; Rodionov, 1964) is that the accessory mineral content of rock is generated by a law of proportionate effect. This leads (via a central limit theorem) to the assertion that mineral deposits in rock are approximately lognormal. Mandelbrot (1960) has argued that stable probability laws are a more appropriate representation. Though having the virtue of skewness and very fat right tails, stable densities (concentrated on [0, ^infinity]) are analytically intractable and may be without mean or variance. The only analytically tractable ersion is, for ^odash > 0,

[EQUATION (1)]

This density has neither mean nor variance. It can be labeled "inverted gama," since X = 1/Y has a gamma density.

Unfortunately, it is possible to distinguish with precision between lognormal and stable probability laws only with very large sample sizes. Prokhorov (1964) gave an instructive account of why this is true. His mathematical development is motivated by consideration of the absolute mineral content in a given volume of rock, but it is relevant in general. He stated that, "It is possible to distinguish the exponential distribution with density exp{-Y}, Y > 0 from a lognormal distribution with density (1/^radical2^pgrY)exp{-1/2(log Y)2}, Y > 0 with errors of Type 1 and Type II equal to .05 only on the basis of a sample size close to 100." Prokhorov's as ertion is based on a chi-squared test applied to grouped data.

In the present case, an attempt to distinguish between competing hypotheses about the size distribution of deposits is even more difficult if sampling is without replacement and proportional to size, for the sampling distribution for observed sizes is not the same as the distribution that generates the size of pools deposed by nature. A crude pretest is to test the hypothesis that observed pool sizes are lognormally distributed against the specific hypothesis that they are gamma distributed. The next section presents the results of such a test, one designed so that both hypotheses may be simultaneously rejected or accepted. One would expect that both hypotheses will be accepted if the sample size is small, and both hypotheses will be rejected if the sample size is large and t e size distribution in nature is lognormal. At the 1-percent level of significance, this latter event occurs only once among 21 samples--but, significantly, the sample size for this case is very large by comparison with all other cases. When plotted on lognormal probability paper (cf following section), most of the 21 samples show substantial deviation from lognormality in the extreme tails. The implications are that (1) much larger sample sizes may result in decisive rejection of both the lognormal and gamma hypotheses, and (2) a better test is lognormality against the specific hypothesis that the probability law of observed sizes has a functional form dictated by assumptions I and II.

End_Page 121------------------------

LOGNORMAL VERSUS GAMMA

Karen Sharp (FOOTNOTE 5) has developed a program for testing the hypothesis of lognormality against the (specific) alternative hypothesis that the underlying probability law is gamma. Her test procedure is based on methods developed by Jackson (1969) and Cox (1962), methods that seem to be more discriminating than a chi-squared test of a specific hypothesis against an unspecified alternative.

She says:

The test has two distinct but equally necessary parts. They can be described basically as follows. Four statistics are calculated from the sample distribution. If the distribution is actually lognormal these four estimated parameters will bear a certain relation to each other. The first half of the test is constructed to measure how closely the actual numbers conform to this relationship. If they do not conform closely the assumption of a lognormal distribution is rejected. The two way test is symmetrical. If the distribution is actually gamma, the four parameters will be related in another specific way which has been derived from the nature of the two distributions. A test statistic compares the actual numbers to this assumed relation. This two way test is necessary to avoid acceptan e of a false alternative. It is possible that the frequency distribution observed conforms poorly to neither of the two distributions tested; in this case each test will signal rejection of the corresponding assumption. It is also possible that the sample does not provide enough information to permit a choice between the distributions; neither test will reject the hypotheses.

Each distribution is specified by two parameters. The four "statistics" calculated are the sample parameter estimates. Each half of the test uses the standard method of testing a null hypothesis against a single possible alternative.(FOOTNOTE *) With one half of the test, a lognormal distribution with parameter (µ,^sgr2) is assumed. Maximum likelihood estimates, µ and ^sgr2, of the parameters are calculated from the sample data. Knowledge of the form of the two distributions permits the derivation of shadow gamma parameters as functions of µ and ^sgr2. These shadow parameters ß1(µ,^sgr2) and ß2(µ,^sgr2) are estimates of the numbers that one would get from attempting to fit a gamma distribution to a sample which is actually lognormal with parameters µ and ^sgr2. If the hypothesis of a lognormal distribution is true, then a maximum likelihood estimate ß1 and ß2 of gamma parameters ß1 and ß2 will closely approximate ß1(µ,^sgr2) and ß1(µ,^sgr2). The larger the sample size the closer these estimates should be to ß1(µ,^sgr2) and ß2(µ,^sgr2). The test statistic measures the difference between (ß12) and (ß2(µ,^sgr2), ß2(µ,^sgr2)) and each half of the test has a standard ormal distribution. If the hypothesis of a lognormal distribution is true the test statistic will be approximately zero, and hence insignificant. If the test is significantly different from zero (by comparison to the usual table of normal deviates) the hypothesis can be rejected. The other half of the test is similar, involving shadow parameters µ(ß12) and ^sgr212).

Table 1 displays the results of applying Sharp's test to 21 categories of pools in Alberta. The three columns under "Lognormal Assumed" display

FOOTNOTE *. In a simple example with a null hypothesis H0: x = a, a simple alternative would be H1: x = b as compared to a composite hypothesis such as H2: x > a.

FOOTNOTE 5. Economics Department, Energy Resources Conservation Board.

End_Page 122------------------------

Table 1. LOGNORMALITY VERSUS GAMMA HYPOTHESIS TESTS

End_Page 123------------------------

for each category an estimate µ of the mean µ of the (natural) logarithm of reservoir size, an estimate ^sgr2 of the variance ^sgr2 of the logarithm of pool size (in units of 1,000 STB), and a test statistic. These three numbers are computed under the hypothesis that the sample of observed pool sizes in each category is generated by a lognormal process. Similarly, the three columns under "Gamma Assumed" display for each category estimates ß1 and ß2 of parameters ß1 and ß2 of a gamma density with mean ß1 and variance ß<pl>2<abov >1</pl>/ß2 fitted to observations in each category, and a test statistic, under the hypothesis that these observations are generated by a gamma process.

The decision rule for a 1-percent (two-tailed) test of significance is as shown by Table 2. For example, where |T1| > 2.576 and |T2| > 2.576, the conditional probability of making an error in asserting that the observations leading to T1 and to T2 come from neither a lognormal nor a gamma process is less than, or equal to, 0.01.

Table 3 is a summary of test results. At the 5-percent level of significance, in no case is the gamma hypothesis accepted and the lognormal hypothesis rejected. In the four cases where both are accepted, sample sizes are relatively small, ranging from 16 to 29. Both hypotheses are rejected for Keg River 7880 (195 observations) and Keg River 7882 (31 observations). In 15 of 21 cases the lognormal hypothesis is accepted and the gamma hypothesis rejected.

At the 1-percent level of significance, the gamma hypothesis is accepted and the lognormal hypothesis rejected in one instance--Keg River 7882. In no case are both hypotheses rejected. Here also the lognormal hypothesis is favored, although less strongly than at the 5-percent level of significance.

Although this battery of tests makes a prima facie case for favoring the lognormal hypothesis over the gamma hypothesis, it is dangerous to conclude that the process by which new pools are discovered (the observational process) is lognormal. In particular, the test procedure used here is not terribly sensitive to tail behavior until sample size becomes very large. We plotted a number of samples on lognormal probability paper (Fig. 1); it is clear that the empirical cumulative distribution functions have, on the average, much fatter right tails than one would expect if the observations were in fact values of mutually independent, identically distributed, lognormal random variables. As we shall show, approximate lognormality in the interquartile range with a fatter-than-lognormal right ail is precisely what is implied by assumptions I and II taken together.

LOGNORMAL VERSUS PROBABILITY LAW IMPLIED BY ASSUMPTIONS I AND II

Testing the hypothesis of lognormality of observed pool sizes against the hypothesis that pool sizes are gamma distributed is informative but logically out of joint with our model of the discovery process, since assumptions I and II taken together lead to the hypothesis that neither hypothesis is appropriate. It is informative in that we can conclude with reasonable certainty that a gamma probability law is not as accurate a characterization of the sampling density of observed sizes as a lognormal probability law--although, as we shall show, the latter is, in turn, a less satisfactory hypothesis than one dictated by assumptions I and II (cf following section). If we call the lognormal hypothesis "H1" and the hypothesis that the sampling density is as displayed in the follow ng section, "H2," a Bayesian test of hypothesis H1 versus H2 is

End_Page 124------------------------

possible; asserting that a priori H1 and H2 are equally likely, we compute the odds, posterior to observing the data, that the data were generated according to H2 rather than H1.(FOOTNOTE 6)

DEFINITION OF SAMPLING PROCESS FOR DISCOVERY SIZES

Basically, assumption II is that discovery sizes in order of observation are generated by sampling without replacement from a finite population of pools whose sizes constitute a sample from a hypothetical infinite population. We shall call this latter population a superpopulation. Observations of discovery sizes come about as follows: nature generates a sequence of values A1,...,AN of N mutually independent, identically distributed random variables (rvs) A1,...,AN with common density f concentrated on [0,^infinity); f characterizes the superpopulation from which the sizes A1,...,AN of N pools deposed by nature are drawn. These values are not observed in the order in which they are generated. Rather, elements of the finite set QN = {A1,...,AN} of pool sizes is sampled without replacement and proportional to size; the finite population being sampled is QN.

Where QN is known, assumption III says that the probability of observing Ai1, Ai2,...,Ain, n <= N, in that order, is (upon relabeling elements of QN so that [i1,i2,...,in] = [1,2,...,n]):

[EQUATION (2)]

Clearly, the values A1,...,AN are not known a priori, and, even after observing values of n of the Ai's, N-n, such values remain unknown with certainty.

Let Yj denote the observed value of the jth pool discovered, define Y = (Y1,...,Yn) as the vector of observations in a sample of size n <= N, and assume that f is a member of a class of densities (all of whose members are concentrated on [0,^infinity]) indexed by a parameter ^thgr^egr ^odash so that Ai

Table 2. DECISION RULES FOR 1-PERCENT TEST OF SIGNIFICANCE

FOOTNOTE 6. See, for example, Chapter 2 of Zellner (1970).

End_Page 125------------------------

has density f(·|^thgr). Then, if ^thgr, N, and infinitesimal intervals dY1,...,dYn are known, the probability of observing Y1^egrdY1,...,Yn^egr dYn in that order (or, equivalently, of observing Y ^egr dY) is:

[EQUATION (3)]

Here, (N)ndef = N(N-1)...(N-n + 1) and f*N-n is the density of the sum of N-n Ai's. (This expression for P{Y^egrdY|^thgr,N} is predicated on the existence of the integral in equation 3.)

Using equation 3, we may, in principle, make inferences about the parameter ^thgr when it is not known with certainty (as is usually the case), as well as about the sum of unobserved finite population elements. This sum is of particular interest because it constitutes the sum total of undiscovered oil (gas) in place within the play being sampled.

PROPERTIES OF ASSUMPTIONS I AND II VIA MONTE CARLO SIMULATION

In order to give an intuitive "feel" for the implications of assumptions I and II, we describe here the output of a Monte Carlo simulation of the sampling process for discovery sizes dictated by these assumptions. Our attention is focused on three objects:

Table 3. RESULTS OF LOGNORMALITY VERSUS GAMMA HYPOTHESIS TESTS

End_Page 126------------------------

Fig. 1. Samples plotted on lognormal probability paper. A. Belly River, 38 observations; B. Lower Manville, 40 observations; C. Keg River 7880, 195 observations; D. Manville, 48 observations; E. D-2, 53 observations. (Continued on next two pages.)

End_Page 127------------------------

Fig. 1. Continued.

End_Page 128------------------------

Fig. 1. Continued.

End_Page 129------------------------

1. The probability distribution P{Y^egrdY|^thgr,N} of observed sizes;
2. The probability distribution of undiscovered sizes where Y = Y;
3. The probability distribution of the mean SN-n of undiscovered sizes where Y = Y.

We will examine (1) and (2) relative to assumption I in the following way: assume that the size distribution of petroleum deposits in pools is lognormal with parameter (µ,^sgr2); that is, A1,...,AN are mutually independent with common density for -^infinity < µ < +^infinity and ^sgr2 > 0:

[EQUATION (4)]

To simulate observations generated according to assumptions I and II, we first generate values A1(i),...,AN(i) according to equation 4, and then, given {A1(i),...,A(i)} ^identity QN(i), we generate Y(i) according to equation 1. Here the index i indexes replications of our Monte Carlo experiment. It is obvious (see equation 2) that elements of Y(i) are neither independent nor marginally, identically distributed as lognormal. However, suppose that we incorrectly assume that they are and examine fractile plots of the yj(i)'s on lognormal probability paper as if they constitute independent sample observations from a lognormal process--as has been d ne by several authors. How does the empirical cumulative function so generated deviate from lognormality? Undiscovered sizes, the complement UN(i) of {Y1(i),...,Yn(i)} in QN(i), are treated similarly in our experiment.

The following are the salient facts:

1. On lognormal probability paper the graph of fractiles computed from Y1(i),...,Yn(i) is, on the average, close to linear within the interquartile range but exhibits a fatter right tail than that of a lognormal distribution. The graph is tilted, having a smaller slope than that exhibited by the (straight line) graph of fractiles of the underlying lognormal distribution of the Aj(i)'s, and lies entirely above the latter graph.

2. The graph of fractiles computed from undiscovered sizes is, on the average, linear within the interquartile range but exhibits a smaller right tail than that of a lognormal distribution. It is also tilted, having a smaller slope in the interquartile range than that of the graph of fractiles of the underlying lognormal distribution, and lies entirely below the latter graph.

3. The mean E(Yj|^thgr,N) of the size of the jth discovery is far above the mean of the underlying (lognormal) population for small values of j, but it declines faster than exponentially with increasing j at first and then declines slower than exponentially.

In order to reduce the effects of Monte Carlo sampling variability, we replicated our experiment 1,000 times. For the following example, graphs of the sample means of fractile estimates cited above are shown in Figures 2-19. In addition, we computed sample estimates of the marginal means of the Yj's and of the covariance structure of Y. Coincident with our intuition:

4. The distribution of Yj is very close to lognormal for small values of j but, as j increases, right-tail probabilities become smaller than those of a lognormal distribution with the same mean and variance.

End_Page 130------------------------

The graphs displayed in Figures 2-16 were generalized by averaging 1,000 Monte Carlo replications of sampling n pool sizes without replacement and proportional to random size from a finite population of N pools. Values chosen for n and N were:

N = 1,200 and n = 10, 20, 30, 40, 50, 75, 100, 150, 200;
N = 600 and n = 10, 20, 30, 40, 50, 75, 100, 150;
N = 300 and n = 10, 20, 30, 40, 50, 75, 100;
N = 150 and n = 10, 20, 30, 40, 50, 75;
N = 100 and n = 10, 20, 30, 40, 50.

Elements of the finite population have sizes generated according to a lognormal probability law with parameters µ = 6.00 and ^sgr2 = 3.00. In 103 bbl of oil in place, the corresponding density has median exp{6.00} = 403.4 and mean exp{µ+1/2^sgr2} = 1,808.

Figures 2-11 display simulated versions of (1) the expectation of the empirical cumulative distribution function of observed pool sizes and (2) the expectation of the empirical cumulative function of sizes of pools remaining to be discovered, computed under the assumption that observed sizes are mutually independent and identically distributed--which they are not. The graph is plotted on lognormal probability paper with ordinate expressed in natural logarithms and abscissa in probabilities of less than the corresponding ordinate values. By computing this expectation and plotting it as described, we can see how far "off" we are by making the assumption that observed sizes are in fact independent and identically distributed as lognormal. The right tail curves noticeably away from a stra ght line and is displaced upward from the straight line (graph) of the underlying population's cumulative distribution function.

The graph of the expectation of the empirical cumulative distribution function for given N and n, plotted in a similar manner, lies below the straight line (graph) of the underlying population's cumulative distribution function, and the right tail becomes progressively "thinner" as a larger proportion of the finite population of pools is sampled.

Figures 12-16 display the graphs of the expectation of the empirical distribution function of the observed Yj's plotted in a similar manner.

Figure 17 displays the graph of the (simulated) mean of observed pool sizes in order of discovery for several values of N. The first few discoveries have mean sizes which are orders of magnitude greater than the mean size--exp{µ+1/2^sgr2} = 1,808,000 bbl in place--of the underlying lognormal population. For example, when N = 100, the mean size of the first discovery is more than 12 times the underlying-population mean, and, when N = 1,200, it is more than 17 times this mean! The mean size of the jth discovery declines at a very rapid rate; for example, for N = 100, only the first 20 discoveries have mean sizes larger than the underlying-population mean.

Figure 18 shows graphs of (simulated) means of observed pool sizes in order of discovery for several values of N as a function of the proportion of undiscovered pools that have been discovered; E(Yj|^thgr,N) is plotted as a function of j/N-j. As N varies from 100 to 1,200, the graphs remain virtually indistinguishable. If they are in fact indistinguishable, the implication is that E(Yj|^thgr,N) is the same for every pair (j,N), j < N, of positive integers such that j/N-j is a constant; that is, ^lgr = j/N-j, j=1,2,...,N-1 is the "natural" scale for E(Yj|^thgr,N).

A least-squares fit of a third-degree polynomial to simulated values of log E(Yj|^thgr,N) of the form

[EQUATION (5)]

End_Page 131------------------------

Fig. 2. Simulated cumulative distribution functions for observed pool sizes and for pools remaining undiscovered when N = 1,200.

Fig. 3. Simulated cumulative distribution functions for observed pool sizes and for pools remaining undiscovered when N = 600.

End_Page 132------------------------

Fig. 4. Simulated cumulative distribution functions for observed pool sizes and for pools remaining undiscovered when N = 300.

Fig. 5. Simulated cumulative distribution functions for observed pool sizes and for pools remaining undiscovered when N = 150.

End_Page 133------------------------

Fig. 6. Simulated cumulative distribution functions for observed pool sizes and for pools remaining undiscovered when N = 100.

Fig. 7. Simulated cumulative distribution functions for observed pool sizes and for pools remaining undiscovered--for fixed sample size, n = 10.

End_Page 134------------------------

Fig. 8. Simulated cumulative distribution functions for observed pool sizes and for pools remaining undiscovered--for fixed sample size, n = 20.

Fig. 9. Simulated cumulative distribution functions for observed pool sizes and for pools remaining undiscovered--for fixed sample size, n = 30.

End_Page 135------------------------

Fig. 10. Simulated cumulative distribution functions for observed pool sizes and for pools remaining undiscovered--for fixed sample size, n = 40.

Fig. 11. Simulated cumulative distribution functions for observed pool sizes and for pools remaining undiscovered--for fixed sample size, n = 50.

End_Page 136------------------------

Fig. 12. Simulated cumulative distribution functions for size of 10th pool discovered when finite population size N = 100, 150, 300, 600, and 1,200.

Fig. 13. Simulated cumulative distribution functions for size of 20th pool discovered when finite population size N = 100, 150, 300, 600, and 1,200.

End_Page 137------------------------

Fig. 14. Simulated cumulative distribution functions for size of 30th pool discovered when finite population size N = 100, 150, 300, 600, and 1,200.

Fig. 15. Simulated cumulative distribution functions for size of 40th pool discovered when finite population size N = 100, 150, 300, 600, and 1,200.

End_Page 138------------------------

Fig. 16. Simulated cumulative distribution functions for size of 50th pool discovered when finite population size N = 100, 150, 300, 600, and 1,200.

Fig. 17. Simulated means of size of nth pool discovered for finite population sizes N = 100, 150, 300, 600, and 1,200.

End_Page 139------------------------

Fig. 18. Simulated means of size of nth pool discovered for finite population sizes N = 100, 150, 300, 600, and 1,200, displayed as a function of n/N-n, the proportion of undiscovered pools discovered.

Fig. 19. Simulated means of size of nth pool discovered for finite population sizes N = 100, 150, 300, 600, and 1,200, displayed as a function of loge n/N-n, the proportion of undiscovered pools discovered.

End_Page 140------------------------

fits quite well. For the range of values of n and N shown in Table 4, standardized coefficients corresponding to the ßi's are reasonably stable.

PRELIMINARY CONCLUSIONS

The preliminary results reported here suggest that a model of discovery sizes based on assumptions I and II is a promising improvement over a variety of earlier models. In particular, a decline in the average size of discovery as the resource base is depleted appears as a logical consequence of these assumptions. However, before the model can be regarded as empirically valid, additional statistical testing of underlying assumptions and of the model's predictive accuracy must be done.

We believe that it is possible to interface our model with expert subjective judgment so as to generate probabilistic forecasts of discovery sizes for prospective plays in which no drilling has been done. That is, if one views our model as a generator of discovery sizes, one whose parameters are not known with certainty, and codifies expert judgment about these parameters in the form of subjective probabilities, the calculation of a predictive probability distribution for discovery sizes is conceptually straightforward, albeit computationally involved.

Table 4. STANDARDIZED COEFFICIENTS

References:

Allais, M., 1957, Method of appraising economic prospects of mining exploration over large territories--Algerian Sahara case study: Management Sci., v. 3, no. 4, p. 285-347.

Arps, J. J., and T. G. Roberts, 1958, Economics of drilling for Cretaceous oil on east flank of Denver-Julesburg basin: AAPG Bull., v. 42, no. 11, p. 2549-2566.

Barouch, E., and G. Kaufman, 1974, Sampling without replacement and proportional to random size: unpub. ms.

Cochran, W. G., 1939, The use of analysis of variance in enumeration by sampling: Jour. Am. Statistical Assoc., v. 34, p. 492-510.

Cox, D., 1962, Further results on tests of separate families of hypotheses: Jour. Royal Statistical Soc., ser. B, v. 24, no. 2, p. 406-423.

End_Page 141------------------------

Cox, D., 1969, Some sampling in technology, in N. L. Johnson and H. Smith, eds., New developments in survey sampling: New York, John Wiley and Sons, p. 506-527.

Crabbe, P. J., 1969, The stochastic production function of oil and gas exploration in mature regions: Operations Research Branch, Natl. Energy Board, unpub. memo.

Drew, L. J., 1972, Spatial distribution of the probability of occurrence and the value of petroleum; Kansas, an example: Mathematical Geology, v. 4, no. 2, p. 155-171.

Drew, L. J., 1974, Estimation of petroleum exploration success and the effects of resource base exhaustion via a simulation model: U.S. Geol. Survey Bull. 1328, 25 p.

Ericson, W., 1969, Subjective Bayesian models in sampling finite populations (with discussion): Jour. Royal Statistical Soc., ser. B, v. 31, p. 195-233.

Feller, William, 1966, An introduction to probability theory and its applications, v. 2: New York, John Wiley and Sons, 627 p.

Fisher, R. A., 1956, Statistical methods and scientific inference: London, Oliver and Boyd.

Jackson, O. A. Y., 1969, Fitting a gamma or lognormal distribution to fibre diameter measurements on wool tops: Applied Statistics, v. 18, no. 1.

Kaufman, Gordon, 1963, Statistical decision and related techniques in oil and gas exploration: Englewood Cliffs, N. J., Prentice-Hall, 307 p.

Kaufman, Gordon, 1965, Statistical analysis of the size distribution of oil and gas fields, in Symposium on petroleum economics and evaluation: AIME, p. 109-124.

Kaufman, Gordon, and P. G. Bradley, 1973, Two stochastic models useful in petroleum exploration, in Arctic geology: AAPG Mem. 19, p. 633-637.

Krige, D. C., 1951, A statistical approach to some basic mine valuation problems on the Witwatersrand: Jour. Chemical, Metall., and Mining Soc. South Africa, v. 52, p. 119-139.

Mandelbrot, B., 1960, The Pareto-Levy random functions and the multiplicative variation of income: Yorktown Heights, N. Y., IBM Research Center Rept.

Matheron, Georges, 1955, Application des methodes statistiques a l'evaluation des gisements: Annales des Mines, December.

McCrossan, R. G., 1969, An analysis of size frequency distribution of oil and gas reserves of Western Canada: Canadian Jour. Earth Sci., v. 6, no. 2, p. 201-211.

Palit, C. D., and I Guttman, 1973, Bayesian estimation procedures for finite populations: Commun. in Statistics, v. 1, no. 2, p. 93-108.

Prohkorov, Yu., V., 1964, On the lognormal distribution in geochemical problems: Jour. Applied Probability and Its Applications.

Rodionov, D. A., 1964, Distribution functions of element and mineral content of igneous rocks: Moscow, Izd. "Nauka" (in Russian).

Ryan, J. T., 1973a, An analysis of the crude-oil discovery rate in Alberta: Bull. Canadian Petroleum Geology, v. 21, no. 2, p. 219-235.

Ryan, J. T., 1973b, An estimate of the conventional crude-oil potential in Alberta: Bull. Canadian Petroleum Geology, v. 21, no. 2, p. 236-246.

Uhler, R., and P. G. Bradley, 1970, A stochastic model for determining the economic prospects of petroleum exploration over large regions: Jour. Am. Statistical Assoc., v. 65, p. 623-630.

Sharp, K., 1969, Lognormal vs. gamma distribution: Energy Resources Conservation Board, unpub. memo.

Zellner, A., 1960, An introduction to Bayesian econometrics: New York, John Wiley and Sons.

End_of_Record - Last_Page 142-------

Acknowledgments:

(2) Alfred P. Sloan School of Management, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139.

(3) Department of Economics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139.

(4) Department of Economics, Harvard University, New Haven, Connecticut 06520.

Copyright 1997 American Association of Petroleum Geologists

Pay-Per-View Purchase Options

The article is available through a document delivery service. Explain these Purchase Options.

Watermarked PDF Document: $14
Open PDF Document: $24