William Chesters <williamc@paneris.org> wrote in message news:<m2k80nis4a.fsf@beertje.william.bogus>...

> Robert Ehrlich <bobehrlich@home.com> writes: >> IMHO fuzzy memberships reflect the degree of hybridness of samples and >> have nothing especially to do with prob. > > But the only concrete suggestion you see in fuzzy logic books for how > to obtain fuzzy memberships numbers is to use the proportion of domain > experts who say the man is tall, or whatever. > > Thinking through the implications (of issues like: variability due > to the expert set being finite), my intuition is led quickly to > consider it a likelihood, making each expert's opinion very like an > uncertain measurement, if you are thinking in Bayesian terms. In > other words, exactly Thomas's "semantic likelihood". > > Except, I don't understand his problem with "the vain Bayesian attempt > to treat likelihood as though it were probability".

The Bayesian inferential procedure is conceptually Posterior = Likelihood * Prior Take away the prior (or find a perhaps improper prior that achieves the same effect) and you are essentially equating the posterior to the likelihood. Now you proceed to treat the likelihood exactly as though it were probability, and you can now integrate in particular as your method of marginalization (essential for eliminating nuisance parameters) and of evaluation of composite hypotheses (essential for the Bayesian equivalent of confidence intervals and the like). It is simple, easy (well, some of the time), and powerful. Moreover, you can do changes of variable relating your posterior uncertainty regarding a parameter of interest to uncertainty surrounding a loss function of interest, and you can optimize over alternative decision actions whose consequences have uncertainty captured in the afore-mentioned loss function. Etc. It is powerful, heady stuff. But I maintain it is ultimately vain, because the frequency notion of probability contained in the likelihood is different from the belief notion of probability contained in the prior, and the belief notion in the prior is not sufficient to elevate the point function that is likelihood to the set function that is the posterior. A lot of people are just not buying it, especially in "scientific" inference where belief priors are actively kept to one side, rather than incorporated into the analysis. What the data say is captured entirely in the likelihood, given the probability model that is hypothesized. And the real challenge is to develop a likelihood calculus that is as easy of manipulation as the probability calculus, Bayesian or otherwise. This was the challenge posed by Fisher many years ago. But his semantics led him to consider only marginalization by maximization, and that quickly ran into difficulties. Strangely enough, the Zadehian semantics associated with fuzzy-set theory provides some of the breakthrough, even though Zadeh himself seemed to get stuck with a maximization method of disjuction (evaluation of composite hypotheses) which encounters the same difficulties observed many years earlier in the different problem domain of statistical inference, and for the same reason: putting the value of the set as that of its strongest member. There is a long story here, but the short of it (don't wake Dodier, he is sleeping) is that a method of disjunction based on the product-sum operation gives the benefits of integration (all elements of the set make a contribution in accordance with its strength) but without having perforce to "integrate likelihood".

> Bayes treats likelihoods as conditional pdfs, and that is a powerful > way of looking at problems involving measurements, observations and so > on. One of the big questions which FL attempts to answer is "what is > it that I can conclude from hearing someone make a vague assertion > like `X is tall', and how can we represent it scientifically?" Bayes > has an answer to that already, in terms of [my beliefs about] the > speaker's utterance disposition: i.e. what I expect them to say under > certain circumstances, i.e. the conditional probability that they will > assert tallness of X given [their beliefs about] X's numerical height. > This does the job: it answers the question, and in terms uniform with > those offered by Bayes for other "uncertainty" issues.

I certainly would agree that probability enters the picture, whether the fuzzicists like it or not, because language-use in the kind of context with which fuzzy-set theory is concerned has chance elements, and at the very core of the notion of membership itself. And I certainly would agree that there are subjective elements in play that would seem to provide invitation to Bayesians to "do their thing". But I think FST and FL provide fresh semantics that are not so elegantly expressed within a Bayesian framework; or at least I haven't seen it. I would still have an issue with the unwarranted (IMO) mix of frequency and belief probability that is necessary in the Bayesian inferential schema (though not in Bayes "theorem" per se).

> In short it seems to me that Thomas over-stresses the value added > by FL. That FL is (afaics) a "mere" branch of Bayesian probability > is not _so_ strange if you recall how committed Bayesians consider > probability as akin to logic, and in particular how they often think > of log probabilities as "degrees of surprisal". A formula like > > p(tall | height=1.92) = 0.2 > > reads better if you think "how surprised would one be to hear a > 1.92-high person decribed as tall?". > > Disclaimer: I am not any kind of expert, I just haven't ever found > an FL devotee who was able seriously to engage with these points, > so I'd love to hear more from Thomas. > >> There is plenty new semantics in the fuzzy set theory >> of which probabilists have been blissfully unaware, and which in >> fact helps to illuminate some problems in the foundations at >> least of statistical inference theory. > > Specifically?

One. The notion that data are fuzzy in general. To say that "John is tall" is a height measurement, no different in principle from "John is 1.92 m". Only the degree of fuzziness is different. In the latter, the fuzziness is evident if we make explicit the inherent rounding: 1.92 +/- .01, recognizing further that the boundaries of the implied interval are fuzzy rather than crisp, and arising from the same issues as in a calibrational experiment to determine the characteristic function for the term "tall": accidental and systematic errors conspire to ensure that presented with the same exemplar, two different measurement processes may disagree at the third significant figure, leading to fuzziness at the edges. If you relax the notion of point measurement from which theories of statistical inference (classical as well as Bayesian) proceed, one is led to a different set of semantics where, for one thing, the partial ordering axiom of subjective probability may be relaxed, and likewise the indifference curves of utility may be *thick*. Estimates of probability may now also be fuzzy, as can expressions of preference. Think of what that would mean for eliciting probability judgments from human judges who have difficulty putting on the Bayesian strait-jacket of "coherence". A prior expression of "belief" may be expressed for example in a simple statement such as "most Swedes are tall", which imposes a (Computable, actually) fuzzy restriction on the height distribution of Swedes. This notion of quantification was a master-stroke by Zadeh, IMO. Two. The notion that uncertainty about probability models and model parameters is essentially of the fuzzy sort. The data from a sample are like a fuzzy term, which gets progressively narrower in its range of uncertainty the greater the sample size. Three. The notion mentioned earlier that marginalization and evaluation of composite hypotheses could be founded on a richer calculus than mere maximization, with the difficulties known to be associated with this measure. Fisher once famously said that the likelihood of w1 or w2 (referring to alternate hypotheses) is like the income of Peter or Paul ... we don't know what it is until we know which is meant. He was wrong, brilliant though he was. And the semantics associated with fuzzy-set theory allow us to pose a different statement of the same question, that leads away from the maximization dead-end of the early likelihood calculus. Specifically, semantics of the form "w1 OR w2 is [an explanation of the data]", where the expression in square brackets is like a fuzzy term, no different from "tall", helps lead us to a different rule of evaluation for composite hypotheses, which avoids the problems confronted by Fisher so very long ago. Ultimately, it allows us to work entirely with likelihood, without the inordinate effort that must go into Bayesian analysis to develop conjugate priors, improper priors, and to interrogate decision-makers until they and you are blue in the face seeking to impose coherence in a situation where the partial ordering axiom simply does not hold. Etc. There is more besides. But it's 1:00 am, and like Dodier, I need to go to sleep. Regards, S. F. Thomas