hrubin@odds.stat.purdue.edu (Herman Rubin) wrote in message news:<9l4kv0$2evu@odds.stat.purdue.edu>...> In article <66b61316.0108091708.7d6b9958@posting.google.com>, > S. F. Thomas <sfrthomas@yahoo.com> wrote: >> robert@localhost.localdomain (Robert Dodier) wrote in message >> news:<9kt895$rs$1@localhost.localdomain>... >>> In the interest of brevity, I've indulged in wanton snippage, >>> but I hope what's left yields something comprehensible. > >>> S. F. Thomas <sfrthomas@yahoo.com> wrote: > >>>> Robert Dodier wrote: > > .............. > >> Goodness, no. What I do argue however is that the semantics of >> likelihood do not just fall neatly out from the semantics of >> probability. Probability provides some of the underpinning, but not >> all. Otherwise Fisher would not have been led up a blind alley by >> asserting that the "likelihood of a or b is like the income of Peter >> or Paul, we don't know what it is until we know which is meant." > > I am by no means convinced that Fisher understood this, but > I can see no way that the likelihood of "a or b" makes any > sense at all.
That has precisely been the problem for all the generations of statisticians since Fisher. I presume you to refer to the original probability model from which likelihood derives, f(x;w) where x ranges over sample space, and w ranges over parameter space and f is the density function for the random variable in question. For any point hypothesis w=a, f is clearly defined. But for a composite hypothesis {a,b}, it is not clear how f is defined. Therefore -- and this was the precise thrust of Fisher's metaphor -- we don't know what the likelihood of "a OR b" is until, like the income of Peter or Paul, we know which is meant. I presume that it is thinking along these or similar lines that leads you to say that the likelihood of "a or b" makes no sense at all. Or to say that the likelihood or "a OR b" is the likelihood corresponding to the stronger element. Which is what leads to a maximum rule for likelihood disjunction. Or, one uses the probability metaphor as in the Bayesian set-up, rescales the likelihood function to sum to unity, whereupon the likelihood of "a OR b" becomes the sum of the two (rescaled) likelihoods, with appropriate modification if the likelihood is construed as density function and the integral calculus is applied. Like it or not, that is essentially what Bayes does, although the story and the argumentation to get there are very different, requiring ritualistic obeisance to priors of one form or another, in particular "uninformative" if need be. Be all that as it may, if you have an inferential method that purports to give a direct characterization of uncertainty in model parameters, then you are perforce computing likelihoods of sets or of composite hypotheses, ie. you have a method for computing something like L(a OR b).> This >> leads to a likelihood calculus in which set evaluation is of the form > >> L( {a,b} ) = L(a OR b) = Max( L(a), L(b) ) > > Are you taking a view of a linear truth value system?
Maybe... I don't know what you mean by "linear truth value system".> > AFAIK, this was first proposed by Lukasiewicz, and does > not work at all well. > > Likelihood is NOT probability, and "a OR b" does not > mean anything from the standpoint of likelihood.
But see above.> >> which rather quickly proves to be inadequate. Had it not been >> inadequate, I don't think classical statistics would have gone to all >> the trouble it has to develop indirect methods of describing the >> uncertainty in model parameters consequent upon sampling. Nor would >> there have been a neo-Bayesian revival intended to supplant the >> classicists precisely by offering a method of *direct* >> characterization. Indeed, Bayes offers a likelihood calculus in which > >> L(a OR b) ~ (L(a) + L(b)) > > Bayes never offered anything about a likelihood calculus.
Nor did Savage, de Finetti and the others. My point was different. It is that that, *in effect*, is what Bayesian inference is doing. The whole song and dance about the prior just confuses this core issue, to which it is easy to return simply by imagining a completely "uninformative" prior (if such a thing is not a contradiction in probabilistic terms), and seeing the posterior for what it then is, ie. the likelihood function appropriately rescaled, and now interpreted as probability or probability density.> > To Bayes, Fisher, Neyman, Laplace, Gauss, Kolmogorov, and > others, one can take the or of statements or the union of > events, but this is for probability. Likelihood is not > probability, although it is an equivalence class of formal > entities derived from probability.
I am most certainly under no confusion on that score.>> where ~ is to indicate that some normalization, appropriate to the >> construction of likelihood as a metaphorical (belief) probability, is >> necessary. It is only with the fuzzy set theory that semantics >> suggests itself > >> L(a OR b) = L(a explains the data OR b explains the data) > >> where "explains the data" is a fuzzy predicate no different in >> principle from "is tall", and subject to calibration in conceptually >> the same way. This leads, albeit with some reworking of the Zadehian >> fuzzy set theory along the way, to > > "Explains the data" is philosophical gobbledygook. Assuming > that we can assume that we have a binomial model, and we get > a positive number of successes and failures, ALL binomial > distributions with 0 < p < 1 "explain" the data; there is > a positive probability that the data could have come from > such a model.
But some explain the data better than others. As Fisher said, the likelihood function supplies a "natural order of preference for the possibilities under consideration". It is exactly analogous to the notion of semantic likelihood (or membership function) for a term such as "tall" providing a natural order of preference for what a competent speaker of the language *could* mean when she uses the term tall to characterize one's height. Therefore, analogously to some speaker (witness) saying "the unknown attacker is tall", the result of sampling from a probability distribution is the implicit assertion of "the data" to the effect "the unknown probability model parameter is [an explanation of the observed sample]", and the membership function of the term in brackets may be identified with the (absolute) likelihood function generated by the data under the model. Call that philosophical gobbledygook if you like. All philosophical abstraction is in the end metaphor. Some such abstractions never make it down to ground I quite agree. But that is not the case here. What I propose is quite computable. And the essential insight seems to me to be quite plain, though I would readily admit that the semantics are unfamiliar.> >> L(a OR b) = L(a) + L(b) - L(a)*L(b) > >> where indeed the laws of probability are invoked, and at that in a >> very simple way, but it is the fuzzy set semantics, and the device of >> the calibrational proposition, that provides the essential frame that >> Fisher overlooked. > > The likelihood function can be multiplied by any constant, > and often is; L and c*L are the "same" likelihood function > for any statistical purpose.
Not if you are using the product-sum rule of disjunction. For that purpose, one must distinguish the absolute likelihood function from the *relative* likelihood, which I quite agree is unique only up to similarity transformations, and to which you allude. THus for example, if you are computing a marginal likelihood function, you would work with the absolute likelihoods to accomplish the marginalization, and only then may you rescale. In the theory I am concerned to develop I in fact use the term membership or characteristic function for the absolute likelihood, since I am essentially drawing on the insights and semantics of the fuzzy set theory (reworked to admit the notion of calibrational proposition with which this thread was begun) and the term possibility distribution for the relative likelihood. Why should anything be> independent, even if it can be considered probabilities? > > .................
I am not sure I get your point here in this context. But if it is what I think it is, then the reworked fuzzy set theory continues to have the min-max connectives in certain circumstances, in particular when there are constraints of strong positive semantic consistency linking the respective affirmation probabilities ... in such cases there is clearly no independence. Likewise, where there is strong negative semantic consistency (for example those affirming an exemplar to be tall tending systematically to disaffirm him to be short), the appropriate rules for the conjunction and disjunction connectives are the bounded-sum (Lukasiewicz) rules. It is only when semantic independence may be assumed that the product and product-sum rules are appropriate. That would appear to be the appropriate assumption in the case of statistical inference involving in a sense the interpretation of what "data" say. Regards, S. F. Thomas