William Chesters <williamc@paneris.org> wrote in message news:<m2lml28e4l.fsf@beertje.william.bogus>...

> sfrthomas@yahoo.com (S. F. Thomas) writes: >>> my intuition is led quickly to consider it a likelihood, making >>> each expert's opinion very like an uncertain measurement, if you >>> are thinking in Bayesian terms. In other words, exactly Thomas's >>> "semantic likelihood". >>> >>> Except, I don't understand his problem with "the vain Bayesian attempt >>> to treat likelihood as though it were probability". >> >> The Bayesian inferential procedure is conceptually >> >> Posterior = Likelihood * Prior >> >> Take away the prior > > wahh! luckily not necessary for what you go on to say: > >> Etc. It is powerful, heady stuff. But I maintain it is ultimately >> vain, because the frequency notion of probability contained in the >> likelihood is different from the belief notion of probability >> contained in the prior, > > Well, is it necessary to interpret the likelihood as any kind of > "frequency notion" of probability? For serious Bayesians, > probabilities are degrees of belief---or perhaps in this context one > might better say: degrees of surprisal---, the axioms are normative, > and everything is conditonal. So to understand a formula like > > p(Wilco says X is tall | X is 127cm from toe to nose) = 0.003 > > as saying > > "I would be surprised to a degree 5.809 if Wilco said X was > tall when he believed s/he was actually 127cm etc." > > is not merely acceptable to a Bayesian, it's actually the basic > interpretation. The connection which can in many cases be made > between belief/surprisal numbers and frequency under repeated > experimentation comes later and is not at all what Bayesian > probabilities are constructed to mean. > >> and the belief notion in the prior is not sufficient to elevate the >> point function that is likelihood to the set function that is the >> posterior. > > I don't follow here ... can you expand?

Suppose we are making inference about a Bernoulli parameter. If x successes out of n trials gives us a likelihood function characterizing the relative *possibilities* for the parameter under consideration, by what alchemy of prior belief can the qualitative nature of the uncertainty change. Presumably one's prior beliefs are based on prior experience with the process in question, formal and informal. But presumably also, one's prior experience with the process could at best be summarized as being the equivalent of having observed b successes (more or less) out of m trials (more or less), in which case the prior should be qualitatively identical to likelihood. In which case prior belief is not sufficient to transmute the qualitative nature of the uncertainty in question. Nor in that case should the posterior uncertainty be any different from the likelihood uncertainty generated by the data.

> >> A lot of people are just not buying it, especially in "scientific" >> inference where belief priors are actively kept to one side, rather >> than incorporated into the analysis. > > Um, large and growing numbers of people are buying it, even in > industry ... > >> What the data say is captured entirely in the likelihood, given the >> probability model that is hypothesized. > > Fine, if you have a 1- or 2-dimensional problem, why not plot the > likelihood for people and let them draw their own conclusions > (i.e. apply Bayes rule for themselves)? > >> And the real challenge is to develop a likelihood calculus that is >> as easy of manipulation as the probability calculus, Bayesian or >> otherwise. This was the challenge posed by Fisher many years >> ago. > > If the question is "What is it that observations of the outcomes of > imperfectly predictable processes in the world tell us, and how can we > represent it scientifically?" then Bayes gives a very powerful answer.

It's powerful no doubt, but the issue has always been, starting with Bayes himself, whether it is justified.

> The posterior = likelihood * prior formula is really a great insight > here. The answer is: what observations do is modify our beliefs, in > ways captured quantitatively by the probability calculus. Of course > the theory is an idealisation, but only in the sense that logic is an > idealisation---both are very useful for practical work.

I have no problem with using prior beliefs, only the form that such prior beliefs are assumed to take. In the extended likelihood calculus I am concerned to develop, it is absolutely possible to combine prior likelihoods, with experiment likelihoods, to obtain posterior likelihoods. And when there is no prior, or one wishes to characterize only what the *formal* experimental data say, then priors are left out. And if it is a group decision problem, it is a snap to combine likelihood priors from different individuals. I'd like to see how in the Bayesian belief probability setup it is possible to maintain coherence across different decision-makers in a group setting.

> That question seems to me to subsume any other question one might > want a theory of likelihoods to answer? > >> even though Zadeh himself seemed to get stuck with a maximization >> method of disjuction (evaluation of composite hypotheses) which >> encounters the same difficulties observed many years earlier in the >> different problem domain of statistical inference, and for the same >> reason: putting the value of the set as that of its strongest >> member. > > That's certainly a problem (it always seemed rather ad hoc to me, and > product-sum is clearly a beter bet), but isn't there a deeper problem > in the blindness of FST towards the full complexity of disjunctive > reasoning in the presence of general conditionality relationships?

That depends on whose FST you're talking about. :) In _Fuzziness and Probability_ I have been very concerned to develop a unified formula for connectives, both conjunctive and disjunctive. Sometimes the min-max rules are appropriate, but not always. Sometimes the product and product-sum rules. Sometimes the bounded-sum rules. And always there is in principle a linear combination of these basic rules in which the precise blend is determined by considerations of semantic consistency relations between the halves of the conjunction/disjunction. In other words, it is internal to the logic of the theory, not an ad hoc external imposition.

>>>> There is plenty new semantics in the fuzzy set theory >>>> of which probabilists have been blissfully unaware, and which in >>>> fact helps to illuminate some problems in the foundations at >>>> least of statistical inference theory. >>> >>> Specifically? >> >> One. The notion that data are fuzzy in general. To say that "John is >> tall" is a height measurement, no different in principle from "John >> is 1.92 m". Only the degree of fuzziness is different. [...] If you >> relax the notion of point measurement from which theories of >> statistical inference (classical as well as Bayesian) proceed, > > Hmm, Bayes has no problem at all with uncertain measurements. In fact > for practical engineering applications, fuzzy measurements (in the > colloquial sense!) are precisely where Bayes really shines. Think of > Kalman filters, information/communication theory, ... > > The Bayesian account "proceeds" from the _concept_ of point > measurements because it is built on top of the scientific language of > numerical quantities---the whole aim is to explain how to do reasoning > about an uncertain reality in terms of our continuum-based, idealised > physical theories. It explicitly doesn't force you to make all your > _measurements_ point measurements, or collapse your beliefs into point > beliefs, or anything. > > Why isn't there a very close analogy between the Bayesian (or > naively statistical and actually Bayesian) models used in engineering > for accounting for sensor measurement uncertainty, and the obvious > Bayesian account, in terms of speakers' utterance dispositions, of > what "John is tall" means? If the two cases are no different in > principle, and you disagree with the latter, then do you also > disagree with the former? > > To be concrete, what I mean here is models of the form > > p(widget height | sensor measurement, previous hypothesis) \propto > p(sensor measurement | widget height) \times > p(widget height | previous hypothesis) > > and analogously > > p(Mary's height in microns | John's tallness assertion, > my previous hypothesis) \propto > p(John's tallness assertion | Mary's height in microns) > p(Mary's height in microns | my previous hypothesis)

The short answer is that I would also disagree with the former. Not, mind you, in terms of the principle of combining prior uncertainty with new evidence to yield a revised characterization of the uncertainty. Where prior information is available, by all means use it. Where I disagree is in the qualitative nature of the uncertainty characterization. John's tallness assertion yields uncertainty about Mary's height of the possibilistic, or "semantic likelihood" sort, not of the probabilistic sort. In the intimate mixing of the two kinds of uncertainty -- possibilistic or fuzzy or likelihood, on the one hand, and probabilistic on the other -- which I have been concerned to address, there is no privileging of one form over the other. The nature of the uncertainty is usually crystal-clear from the problem set-up. And I deplore any thinking which sees the two kinds of uncertainty in any way in competition one with the other. They are dualistic complements, not competitors. But with respect to the point measurement assumption, I have no quarrel with its usefulness as idealization. It has served us well. And no engineer worth his salt was ever confused as to issues of precision, and the limits to it in any given problem context. That is not what I am saying. What I am saying is that in the formalism itself, data may now be construed as fuzzy sets in general, and the relaxation of the data-as-point assumption can be formalized within the theory, rather than corrected out of common-sense understanding outside of the theory. That is sometimes a useful thing, sometimes not. But just in the formalism itself, it is useful to have the insight it provides. As an example, it is my opinion that theories of measurement, of the Tversky/Luce/Suppes variety, are hopelessly confounded on the issue of error, precisely because it proceeds from the point idealization of data, and the total ordering axiom. Thus errors of measurement cannot be part of the theory itself, it must be injected from outside as a statistical afterthought. Bayesians are happy to oblige, I know. But the point idealization of data is confounding of subjective probability/utility estimation also, because there too there is an ordering axiom that effectively requires the subject to express a sharpness of discriminability in his expressions of belief and preference that is inconsistent with the sensitivity limitations of the measuring device, the human judge! Relax the point idealization, and errors of intransitivity may be seen in a different light, as the natural outcome of fuzziness in the reports that may be made by the measurement device. So, the fresh semantics that follow in the wake of the fuzzy paradigm could be helpful even to the Bayesians, except that the Bayesian schema itself may now be obviated! Btw, there is an in-principle problem of infinite regress faced within the Bayesian schema when it seeks to characterize measurement uncertainty such as is easily captured within a (re-formulated to be sure) fuzzy set theory. In practice, I have no doubt that higher-order uncertainty can be ignored in real-world applications. But it should be troubling as a matter of principle that the characterization of uncertainty should *in principle* have no end. This is not an *in principle* problem when the uncertainty in model parameters is characterized possibilistically, nor when the uncertainty in the measurement reports produced by a measurement device is characterized possibilistically.

>> one is led to a different set of semantics where, for one thing, the >> partial ordering axiom of subjective probability may be relaxed, and > > Right. I can see how that could lead to something different from > Bayes, because Bayes (afaic) proceeds initially by saying "_suppose_ > we do uncertain reasoning with ordered real numbers, what do the > axioms have to be?" But I don't yet agree that there is an unmet need > here. For me Bayes over numerical probabilities works really well for > the problems we have been discussing.

Perhaps so, but the issue is one of principle, and the "logical clarification of thoughts". I quite agree that people and engineers facing real problems find "good" solutions, even when the strict formalism they employ would never provide it for them.

>> likewise the indifference curves of utility may be >> *thick*. Estimates of probability may now also be fuzzy, as can >> expressions of preference. Think of what that would mean for >> eliciting probability judgments from human judges who have >> difficulty putting on the Bayesian strait-jacket of "coherence". > > I think at some level there has to be a strait-jacket; any theory of > rationality is bound to impose a kind of coherence which is > "normative" in the philosophical sense. We can only figure out > someone's beliefs (even in principle) to the extent that we take them > to be a rational agent. If they're "free" from the "contraints" of > rationality then it's arguably not clear that they have beliefs at > all ...

Sometimes probabilities are *fuzzy*; in fact that is usually the case, certainly the general case. A Bayesian who forces his client to render a fuzzy probability into a crisp one, especially when the crispness is illusory, may well be the one in need of a strait-jacket! Be that as it may, there needs to be understood the distinction between the underlying universe of discourse, which is the "real" interval between 0 and 1, made up of a continuous infinity of points, each precise to an infinite number of decimal places, and the actual measurement of a point on that line. The latter will inevitably be *fuzzy*, whether within the formalism or not. So it is inevitably going to be a vain attempt to impose "coherence", unless from outside of the formalism you decide on the number of significant figures on which you will settle. If three significant figures are adequate, why three? Why not four? Why not two, why not one? Why not whatever is consistent with "most Swedes are tall"? Or whatever the equivalent in the problem domain of concern to your client. The point is that "coherence" and "precision" are two different things. One can be coherent while being grossly fuzzy. And one can be incoherent while being, or attempting to be, precise to an inifinite number of decimal places. Bayesian concerns with coherence are a snare and a delusion.

>> A prior expression of "belief" may be expressed for example in a >> simple statement such as "most Swedes are tall", which imposes a >> (Computable, actually) fuzzy restriction on the height distribution >> of Swedes. This notion of quantification was a master-stroke by >> Zadeh, IMO. > > OK, I have to show my ignorance here. Why can't "most Swedes are > tall" be represented perfectly plausibly by any of a whole variety of > Bayesian belief/expectation/probability distributions? Over the > height distributions of Swedes, or over the heights of each new person > I meet given the hypothesis they are Swedish, or (facetiously) over > the output of my "height of person in visual field" > neurons---whatever.

Sure you can. But the issues that go to the in-principle validity of the Bayesian schema would remain. And in addition, there is the clumsiness of the Bayesian semantics when stretched in the manner that would be necessary.

> I can see that one is faced with the question of _which_ distribution > to pick (or which possible distributions to integrate over, etc.), but > that surely must be a feature of any semantic account of that > sentence: it must at least be a question for psycho-linguistic > research, or even to be decided on a speaker-by-speaker basis by a > hierarchical model. I don't see a problem of principle here?

The statement fuzzily constrains the height distribution of Swedes. Which means that the mean and variance of the distribution so constrained would themselves be fuzzy. And any probability statement flowing from such a fuzzy distribution would itself be fuzzy. That sounds vaguely unacceptable to those wedded to Science's point idealization of data. But consider, if you took a random sample of whatever sample size, then what the data say about the height distribution of Swedes would be also effectively a fuzzy statement about the true probability model, and probability statements deriving therefrom should likewise be fuzzy. We make our point estimates and construct confidence intervals, but underneath it all, there is fuzziness in the mean and variance, and likewise fuzziness in relevant probability statements. Bayes is wedded to integrating out the (second-order) uncertainty in the model parameters of course. Hence, whether the expected value of loss is founded on "most Swedes are tall", or a sample size of 100,000, tends to be obscured within the Bayesian schema. Not that it has to be.

>> Two. The notion that uncertainty about probability models and model >> parameters is essentially of the fuzzy sort. The data from a sample >> are like a fuzzy term, which gets progressively narrower in its range >> of uncertainty the greater the sample size. > > Again, this sounds like a description of Bayesian probability to me :).

Objectively it's likelihood. Subjectively, who is to argue?

>> Fisher so very long ago. Ultimately, it allows us to work entirely >> with likelihood, without the inordinate effort that must go into >> Bayesian analysis to develop conjugate priors, improper priors, > > I think I mentioned above that for me, the "prior" is a key insight > into what it is that observations tell us. As part of a theory of > rationality I think it plays an extremely important role; no other > account of what rational inference from evidence _is_ has been so > complete or so fruitful, afaics.

Posterior likelihood = Likelihood from the data * Prior likelihood would work just as well, without any problem of justification, as far as I can see.

> But you don't need conjugate or improper priors for the theory. > > In practice there may often be computational reasons why conjugate > priors are handy. It's not clear to me how often improper priors are > really needed; at least the putative examples I have seen in real-life > situations arose from requirements like "we must take care that we > impose absolutely no prior beliefs on the mean of this plant species' > growth rate", which is clearly ridiculous ... > > If you can show examples in which improper, or otherwise > "troublesome", priors arise in contexts other than a misguided attempt > to construct a "totally noninformative" prior without stopping to > think what "totally" means, I would be interested to hear. I reserve > the right to say "OK, that's a hard case, but as a theory of happens > in the ideal when an agent reads a voltmeter or hears someone say `X > is tall', the prior * likelihood thing is the only game in town so you > will have to show a better theory or learn to live with it!" :) I don't > see how you can provide a full story about rationality---with or > without fuzzy logic---based only on likelihoods; the likelihoods have > to engage with a prior at some point, or at least stack up on top of > an "unknown" prior.

I don't think so. People can be rational without either taking or placing the Bayesian bets...

>> There is more besides. But it's 1:00 am, and like Dodier, I need to go >> to sleep. > > Thank you for your long, readable and thought-provoking posting! > > Best wishes, > William

Same here. Regards, S. F. Thomas