From: owner-bisc-group@EECS.Berkeley.EDU on behalf of flaca [flaca@EECS.Berkeley.EDU] Sent: 11 lipca 2003 03:29 To: BISC-Group@EECS.Berkeley.EDU Cc: Lotfi Zadeh; Masoud Nikravesh Subject: BISC: Prof. Zadeh's messages and responses to the UAI & BISC lists ********************************************************************* Berkeley Initiative in Soft Computing (BISC) ********************************************************************* To BISC Group: For your information, following are my messages to the UAI and BISC lists together with comments which were posted to the UAI list. Please feel free to post your comment to the UAI list directly at or send it to Dr. Masoud Nikravesh for posting to the BISC list. Regards to all, Lotfi ------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------- In a message posted on June 10th, 2003, under the heading of "A deceptively simple test of deductive capability," the following problem was posed. Given the premises: (a) Most tall men wear large-size shoes; and (b) Robert is tall. What is the probability, P, that Robert wears large-size shoes? The correct answer--which may come as a surprise to some--is: P is indeterminate, that is, P is either undefined or unknown. In other words, the premises convey no information about P. The fact that P is indeterminate calls into question the validity of much of probability-based reasoning in the realm of law. To justify this answer, it is convenient to consider a more general, generic version of the problem. Given the premises: (a) QA's are B's or, equivalently, Count(B|A) is Q); and (b) X is A, where A and B are specified fuzzy subsets of a universe of discoupe, U, and Q is a specified fuzzy quantifier (number), e.g., most. What is the probability, P, that X is B? It is understood that in "X is A" and "X is B," A and B are possibility distributions of X, that is, the fuzzy sets of values which X may take. There are two cases: (1) X is a non-random variable, e.g., X=Robert, on the understanding that Robert is a specified member of U, with U being a collection of individuals; and (2) X is random variable taking values in U with an unknown probability distribution, implying that Robert is an unspecified member of U. In case (1), it is not meaningful to ask: What is the probability that X is B, since X is not a random variable, and hence P is undefined. In case (2), since the probability distribution of X is not known, all that can be said about the probability, P, that X is B, is that its value lies between 0 and 1, implying that P is unknown. Invocation of the maximum entropy principle is not admissible because the principle is of questionable validity and, in any case, is not applicable when events and/or their probabilities are fuzzy rather than crisp. What else can be said? If we assume that X is a random variable with a uniform probability distribution, and A, B and Q are fuzzy sets, then the following can be established. If (a) "Count(B/A) is Q" is interpreted as "Sigma-count(B/A) is Q," where sigma-count (B/A) is the relative count of the elements of B which are in A; (b) the intersection of A and B is defined in terms of the min norm; and (c) the probability of the fuzzy event "X is B," is defined as a weighted sum or integral (L.A. Zadeh, "Probability Measures of Fuzzy Events, Journal of Math, Analysis and Applications, vol. 22, pp. 421-427, 1968), then what can be asserted is that "P is Q," meaning that Q is the possibility distribution of P. Returning to the original problem, what we see is that the general, generic version cannot be analyzed through the use of standard probability theory. What can be analyzed is a crisp version, e.g.: Given the premises: (a) Over 70% of men whose height exceeds 180 cm wear shoes whose size exceeds 11; and (b) Robert's height is over 180 cm. What is the probability that Robert wears shoes of size 11 or over? Assuming that Robert is chosen at random from U with uniform probability, the answer is: P is between 0.7 and 1. A simpler crisp example is the following. Given: (a) Over 99% of professors have a Ph.D. degree; and (b) Robert is a professor. What is the probability, P, that Robert has a Ph.D. degree? The correct answer is that P is indeterminate. And yet, most people, including those with scientific training, would say that P is over 0.99. This answer is correct only if it is assumed that Robert is drawn at random from U with uniform probability. In general, there is no valid justification for the assumption. This is why the usual modes of probability-based reasoning in the realm of law may be open to challenge in legal proceedings. Returning to the general, generic version of the Robert example: Given the premises (a) QA's are B's; (b) X is A; and the question: What is the probability, P, that X is B, my claim that the correct answer is "P is indeterminate," is not likely to be accepted without challenge. Can anyone point to an analysis of the example in question in the literature of probability theory? ------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------- [UAI] A deceptively simple example of deduction capability. Addendum to the correct answer. The Robert example touches upon a fundamental issue in probability theory and statistics--the relationship between counts and probabilities. The conventional wisdom is that probabilities are derivable from counts. What is widely unrecognized is that the relationship is far more complex, far less straight-forward and far less well-understood than is commonly believed to be the case. In fact, what the Robert example shows is that counts, per se, convey no information about probabilities unless some probabilistic information is added or assumed, explicitly or implicitly, at some stage of the deduction. Some of the respondents drew attention to earlier discussions of this issue in the literature, especially in the realm of law, and Professor Kyburg, one of the world's leading authorities on the foundations of probability theory, provided deep insights. The problem is that nontrivial versions of the Robert example cannot be adequately analyzed within the conceptual structure of standard probability theory (PT). What can be analyzed are crisp examples such as: (a) 90% of Yale alumni earn more than 80k/year; and (b) Robert is a Yale alumnus. Then, the probability, P, that Robert earns more than 80k/year is 0.9, with the understanding that Robert is chosen at random with uniform probability from the set of Yale alumni. But, if there is no information about Robert other than that he is a Yale alumnus, then P is indeterminate. Versions in which the premises and/or the assumed additional information are perception-based, cannot be analyzed within PT because PT provides no methods for dealing with perception-based information. As is shown in my message of June 10, in the generic deduction schema: (a) Count(A/B) is Q, (b) X is A; Prob(X is B) is ?P; the conclusion that P is Q is valid only if specific assumptions are made about (a) the probability distribution of X; (b) the definition of Count(A/B); (c) the definition of conjunction, (d) the definition of what is meant by a random sample drawn from a fuzzy set; and (e) the definition of the probability of a fuzzy event. What may be surprising to some is that P remains indeterminate no matter how much count-based information is provided by the premises. For example, in the deduction schema: (a) Count(A/B) is Q; (b) Count(C/B) is R; (c) X is A and C; (d) Prob(X is B) is ?P; P is indeterminate if no probabilistic information is provided or assumed. The maximum entropy principle is frequently invoked when probabilistic information is lacking or incomplete. The problem is that the principle is not applicable when probabilistic information is imprecise, as it is in most realistic settings. The reason is that the concept of maximization breaks down when the side-conditions are imprecise, as in: maximize f(X) over the interval [approximately a, approximately b]. In summary, the intent of the Robert example is to draw attention to the basic importance of the issue of the relationship between counts and probabilities, and to the fact that it is far more complex and far less well-understood than is generally believed to be the case. The need for a clear understanding of this relationship is of particular importance within the realms of law and medicine. ------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------- Paul Snow Paulusnix@cs.com Wednesday, June 11, 2003 5:05 pm Greetings, all :- Lotfi Zadeh has asked for decades whether probability theory handles selected everyday uncertain inferences. Veterans of the UAI list may remember his "challenge to Bayesians" in August 2000, with occasional follow-up postings since then. Recently, a broader inquiry moved on his Berkeley Initiative in Soft Computing (BISC) list. His questions are no longer put exclusively to Bayesians, but now to adherents of any "standard" probability theory based upon bivalent logic. Most readers are familiar with the kind of inferential task which bothers Professor Zadeh about probability. Imagine that we are aboard a ship, out of sight of the land. When the ship is near the land, we often see birds. When the ship is far from the land, we see birds less often. We see birds for the first time since leavng port. From this observation and the premises, many would conclude that it becomes more credible that we are near the land, compared to before the sighting. At least two features place this story outside the scope of some probability theories. (1) "The ship is near the land" is vague, or what Professor Zadeh lately calls "perception-based," rather than categorically and determinably true or false. Among those who would object are admirers of a Jaynes-style "clarity" principle, or those whose semantics for probability depends upon unambiguous betting contracts. (2) "Often" and "credible" are imprecise descriptions of something probabilistic.Bayesians typically insist upon precise numerical probability. Some others relax that, but still require numbers which, say, bound probability intervals. While the example is fairly representative of the puzzles that Professor Zadeh has offered over the decades, this one is not Zadeh's. It is George Polya's, who derived the expected conclusion by what he took to be valid probabilistic means. Polya was prolific about (2), the legitimacy of frankly qualitative probability. As to issue (1), Richard Threlkeld Cox felt that probability could be applied to "what song the Sirens sang, or what name Achilles assumed when he hid himself among women." Closer to the kind of "fuzzy sentence" associated with Zadeh over the years, Cox discussed The stranger was a short, fat old man without coat or hat. as something his notion of 'proposition', and so his notion of probability, could take in its stride. Cox and Polya are canonical authors. It is hard to imagine what "standard probability theory" could mean if taken to exclude them. That's a start. I hope other readers will also assist our colleague in his inquiries. Best regards. Paul Snow Cox's prose example can be found on page 5 of his 1946 'Probability, frequency, and reasonable expectation' (_American Journal of Physics_ volume 14, number 1). The poetic matter is from the introductory quote of Thomas Browne in Cox's 1978 'Of inference and inquiry' which appeared in Levine and Tribus (eds.) _The Maximum Entropy Formalism_ (MIT Press, 1979, pages 119-167). Polya's example appears on page 37 of his _Patterns of Plausible Inference_, which is the second volume of his _Mathematics and Plausible Reasoning_ (Princeton University Press, 1954) Paul Snow ------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------- Christopher Elsaesser 703.883.6563 (office) Mail Stop W432 The MITRE Corporation 7515 Colshire Drive McLean, Virginia 22102-7508 Wednesday, June 11, 2003 4:13 am Lotfi Zadeh wrote: > > Premises: > (a) Most tall men wear large-size shoes > (b) Robert is tall > Question: > (c) What is the probability that Robert wears large-size shoes? > What is the correct answer? 0.63 Its easy, I used a commercial tool. that's the sort of answer I hear government contractors use, and the government contracting officers usually buy it -- literally :\ In my thesis many years ago I found a set of 9 (out of 40 or so) common English expressions of uncertainty that had, over 250 subjects, narrow and stable interperations. "Most" was not one of them. You could have made the problem harder by saying "Robert is taller than average" :) chris ------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------- Francisco Javier Diez Phone: +34-91-3987161 Dpto. Inteligencia Artificial Fax: +34-91-3986697 UNED. Senda del Rey, 9 E-mail: fjdiez@dia.uned.es 28040 Madrid. Spain WWW: http://www.ia.uned.es/~fjdiez Wednesday, June 11, 2003 6:26 am Lotfi Zadeh wrote: > Premises: > (a) Most tall men wear large-size shoes > (b) Robert is tall > Question: > (c) What is the probability that Robert wears large-size shoes? The same as the proportion of tall men who wear large-size shoes. :-) Regards, Javier Díez ------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------- Minh Ha Duong Chargé de Recherche au CIRED, CNRS Engineering and Public Policy dept., Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA 15213-3890, USA +1 412 860 5708 (cell) +1 412 268 3757 (fax) Minh.Ha.Duong@cmu.edu http://www.andrew.cmu.edu/user/mduong/ Friday, June 13, 2003 10:30 am The probability that Robert wears large-size shoes is "Greater than 0.5". Minh. -- On Tue, 2003-06-10 at 18:37, Lotfi Zadeh wrote: > Premises: > (a) Most tall men wear large-size shoes > (b) Robert is tall > Question: > (c) What is the probability that Robert wears large-size shoes? > What is the correct answer? ------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------- Gordon Hazen Department of Industrial Engineering and Management Sciences McCormick School of Engineering and Applied Science 2145 Sheridan Road Northwestern University Evanston IL 60208-3119 Fax 847-491-8005 Phone 847-491-5673 Web: www.iems.nwu.edu/~hazen/ hazen@iems.northwestern.edu Friday, June 13, 2003 10:33 am At 05:07 PM 6/11/2003 -0700, Francisco J. Diez wrote: >Lotfi Zadeh wrote: > > Premises: > > (a) Most tall men wear large-size shoes > > (b) Robert is tall > > Question: > > (c) What is the probability that Robert wears large-size shoes? > >The same as the proportion of tall men who wear large-size shoes. :-) - Or - In the spirit of the hypothesis, P(x wears large-size shoes | x is tall) = "Most" the answer should be P(Robert wears large-size shoes) = "Most". :-) :-) ------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------- Gary Robinson CEO Transpose, LLC grobinson@transpose.com 207-942-3463 http://www.transpose.com http://radio.weblogs.com/0101454 Wait a minute. Robert could be a kid, not a man. A kid is considered "tall" if he is taller than other kids! I suppose Robert could also be a woman, but that seems a good deal less likely. Of course, Robert could also be a giraffe. ;) - --Gary ------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------- > Gordon Hazen > Department of Industrial Engineering and Management Sciences > McCormick School of Engineering and Applied Science > 2145 Sheridan Road > Northwestern University > Evanston IL 60208-3119 > > Fax 847-491-8005 > Phone 847-491-5673 > Web: www.iems.nwu.edu/~hazen/ > From: Gordon Hazen > Date: Fri, 13 Jun 2003 10:33:02 -0700 > To: uai@cs.orst.edu > Subject: Re: [UAI] A deceptively simple test of deductive capability > > At 05:07 PM 6/11/2003 -0700, Francisco J. Diez wrote: >> Lotfi Zadeh wrote: >>> Premises: >>> (a) Most tall men wear large-size shoes >>> (b) Robert is tall >>> Question: >>> (c) What is the probability that Robert wears large-size shoes? >> >> The same as the proportion of tall men who wear large-size shoes. :-) > > - Or - In the spirit of the hypothesis, > > P(x wears large-size shoes | x is tall) = "Most" > > the answer should be > > P(Robert wears large-size shoes) = "Most". > > :-) :-) Gordon Hazen ------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------- Peter Tillers http://tillers.net Professor of Law Cardozo School of Law, Yeshiva University 55 Fifth Avenue, New York, NY 10003 E-mail Wednesday, June 18, 2003 2:11 pm To whom it may concern: Professor Zadeh suggests that his "deceptively simple test of deductive capability" raises a serious question about "the validity of much of probability-based reasoning in the realm of law." Please see latest message to the UAI list, below. I am a law teacher. I would like to comment on Prof. Zadeh's challenge and suggestion. I would like to begin by circling around Prof. Zadeh's suggestion/challenge: I will begin by providing a bit of legal background. After doing that, I will then give my own crude "take" on the question of the limits and uses - on the question of the epistemically legitimate uses - of conventional probability theory in legal proceedings. (I will not comment directly on the formal details of Prof. Zadeh's argument; I have breathed in his argument, without fully understanding it. But my guess, my hope, is that my comments below have at least a tangential bearing on his formal argument.) *** There has been a vigorous debate in some U.S. & U.K. academic circles for over 30 years about the use of probability theory and statistical methods in legal proceedings. There are very significant differences of opinion what that debate has been all about. For example, some protagonists view the debate as being fundamentally about the admissibility of probability theory and statistical methods on the back (so to speak) of scientific evidence such as genetic evidence. Other protagonists in the debate maintain (at least occasionally) that their concern is not fundamentally with the question of the use of probability theory in trials but with the question of the nature of uncertain reasoning about states of the world ("factual questions"). Other protagonists have maintained that the debate is or ought not to be about either of those matters, but about the social effects of the use of probability theory and similar methods in trials - for example, about the effect of the use of "overtly probabilistic" evidence in criminal trials to show "guilt beyond a reasonable doubt.") The truth, of course, is that "the" debate I am referring to has been about all of these issues. As a general matter, there is fierce resistance in the real-world legal profession - in the real world of litigation - to the use of conventional "formal" probability theory in proof in litigation. There are exceptions to this rule of resistance. For example, when the issues in a lawsuit involve well-defined ("crisp") and repetitive phenomena - e.g., mechanisms involved in biological heredity, or radioactive decay, etc. -- experts whose judgments rest in part on reasoning involving formal probability theory are often allowed to testify [but rarely if ever are such experts allowed to combine their probabilistic assessments about phenomena within the purview of their expertise with "soft" uncertain judgments - such as uncertain judgments about a person's intentions on some occasion (and experts are not allowed to opine how someone else such as a juror should combine "hard probabilities" with whatever "soft uncertainties" that that someone else -- that juror, for example -- might happen to entertain)]. There are some other exceptions, some more surprising exceptions to the rule of legal resistance to the use of standard probability theory in trials. One striking exception of this sort is the common admissibility of statistical evidence in employment discrimination cases, statistical evidence that can be admitted, for example, as presumptive or prima facie evidence of an employer's probable behavior toward a particular employee on a particular occasion. But these are exceptional situations - in at least the two senses mentioned next. First, no court in the United States or (as far as I know) anywhere in the Commonwealth believes that the law allows probability theory to be used in ordinary trials to assess "ordinary" evidence. (In the occasional situations where this has been attempted in trials, appellate courts have quickly and firmly condemned them. The most famous case of this sort is People v. Collins, 68 Cal.2d 319, 438 P.2d 33 (1968). But there have been other such appellate decisions - in those very rare situations in which trial judges have allowed trial lawyers, for example, to use the product rule in an argument to a jury about the probability of the guilt of some defendant. See, e.g., the recent case in the U.K., the Sally Clark case, http://www.sallyclark.org.uk/, involving attempted probability computations, effectively with the product rule, in a "sudden infant death syndrome" case. Cf. Wilson v. Maryland, 370 Md. 191, 803 A.2d 1034 (August 7, 2002). But compared to _Collins_, these are borderline cases - or cases lying closer to the border of legitimate use of probability - because here at least there were some statistics at hand, however ill-suited they were for their intended purpose.) Second, almost no judge in the U.S. or the in the Commonwealth believes that the law would be well-advised to allow judges or jurors to use probability theory to evaluate the evidence put before them in a trial. The legal profession's opposition to the use of probability theory for the assessment of "ordinary" evidence rests only in part on the legal profession's awareness of the widespread innumeracy of judges, lawyers, and jurors. There is also a very firm sense among almost legal professionals that the attempt to translate the law's injunctions about the handling of inconclusive evidence ends up getting things wrong, that the attempted translation is, necessarily, imperfect, incorrect. This is where things stand. The question is, in part, whether it is an accident that things stand where they now do or whether there is something in the nature of most evidence and factual issues in trials (and in litigation generally) that makes them unsuitable for dissection via standard probability theory. It would be too much to expect that the legal profession would speak with one voice when trying to explain why it seems so obvious to the legal profession that formal probability theory is the wrong instrument for the occasion. But there is at least one persistent strand in the legal profession's thinking about uncertain "historical" events - (hypothetically) non-recurring events - that I think may begin to touch the kind of argument that Prof. Zadeh may have made in his message below and with his deceptively-simple hypothetical problem. Lawyers and judges always worry about the validity of "extrapolating" from one set of experiences to another or from one set of individuals to another or from (the behavior of) a set of individuals to (the behavior of) a particular individual with a unique set or combination of attributes. One way to see this concern is to see it (coldly) as a variant of the fabled problem of intersecting reference classes. The legal profession (if not all philosophers, logicians, statisticians, etc.) is generally powerfully impressed by the hypothesis that "every situation [person etc.] is different" - and, in the face of statistical studies attempting to account for relevant variables, legal professionals are quick to spot variables about which data has not been collected. Equally important, legal professionals (as a general matter) are quick to see or ready to assert that some or many "samples" - collections of observations - are useless because (in part, sometimes) the criteria for determining whether some event has or has not occurred are imprecise, vague, indeterminate, or fuzzy. (For example, I have little doubt that practically all lawyers could and could and would launch a legally-devastating attack on a statistical study that relied on a collection of data about the events & the relationships between "violent behavior" and "jealous rage.") So let's put it this way: most legal professionals have the sense that it is in principle impossible to collect meaningful data about certain kinds of possible states of affairs - either because there is a serious question about the extent to which the prior events observed are "like" the event whose occurrence or non-occurrence is now in issue or because there is something about the current event in issue - it has some an additional attribute or attributes - that quite possibly distinguishes it from the events that are described by the collected data. In fancy terms: two problems: the events in an alleged reference class may not be truly representative (because our methods of classifying those events are vague, fuzzy, rough, etc.) and, even if we've got nice crisp reference classes, the events in issue have other attributes that may make the predictions based on existing reference classes bad. (I will leave it to you folks to convert my rude language into -mathematically and -philosophically and statistically-refined and rigorous language.) · Thus far, my argument may suggest that statistical reasoning has little value only in connection with fuzzy events or categories such as "anger," "irritation," and "jealousy." But the problem or phenomenon here goes much farther, much deeper: many people - many witnesses - talk in rough or fuzzy ways about crisp and recurring events. · This fuzzy and rough talk adds, of course, uncertainty to the problem at hand. The law can try to force witnesses to speak in crisp terms - judge: "just the facts, m'am" - but past legal experience - e.g. with a version of the lay opinion rule that told witnesses to speak only in terms of basic sense data rather than inferences [scenario: law tells witness: don't tell me whether you think he was drunk, but describe the {precise} behavior that led you to believe or conclude that he was drunk {typical answer: "well, he walked in a sort of herky-jerky way, he stumbled around quite a bit, he didn't seem to see straight"]--past legal experience firmly suggests that it is often just impossible - without forcing people to embrace propositions that they don't actually believe - it is often impossible or very, very difficult to get people, witnesses, to always or usually speak purely in crisp terms. One escape route from these sorts of problems is to say that although we can't really (ordinarily) use (rigorous) statistical inference to deal with the sort of evidence and issues we usually have in legal proceedings, we can and should think _logically_ about factual uncertainty and, thus, we should honor principles such as the complementarity of the probabilities of disjoint and exhaustive hypotheses, and we should - it has been repeatedly argued - we should use something like subjective probability (bereft of any real statistical basis; use probability judgments not based on any real statistics, just use them as expressions of the degree of our own personal or subjective uncertainties) to reason about evidence - or, at least, to think about how jurors etc. reason about ordinary evidence. This escape hatch is known to all of you. Is it available in law? My own answer: it is useful, sometimes, to think about problems of evidence and inference in law by casting them (or portions of them, simplified versions of them) into expressions sanctioned by, having meaning, in standard probability theory. But one should not expect too much of such Gedankenexperimente. First, real-world problems of evidence and inference almost always have so many ingredients - they involve so many cascaded inferences, they involve inference networks with so many arcs and nodes - that it is beyond human capacity to use formal probability to capture all recognized points of uncertainty. See David Schum's analysis of a very small portion of the evidence in the Sacco Vanzetti case. It took Schum several years to analyze just a small fraction of the evidence in that case. Second, practically all evidence in trials comes clothed in "semantic uncertainty": Witness Officer Smith testifies: David Defendant made a "furtive gesture"; testimony: David Defendant was "a bit nervous," he was "edgy"; hearsay evidence: Witness X heard Witness Y, not now in the courtroom because of death, say that Peter Plaintiff was driving "carelessly"; John Smith, accused of later driving while under the influence, is reported to have said before the alleged drunk driving that he was feeling "free and easy," and John Smith invokes the 5th privilege at trial and does not testify, we have only his out-of-court words or confession; etc. etc. etc. For what it's worth: the idea of using probability theory to capture the uncertainty associated with such vague words and expressions boggles my mind. Next question: can the theory of fuzzy sets or the theory of rough sets do any better? My answer: I don't know. · The question here is, obviously, not whether lawyers or judges can be convinced to allow actors and decision makers in trials to use fuzzy or rough probability theory in trials. The answer to that question, at present, is clearly: this will not happen any time soon simply because lawyers and judges do not, as a general matter, have any idea of how such theories or methods work. The question is, now, for my purposes, whether fuzzy set theory or rough set theory or the two approaches taken together do a better job of picturing the kind of uncertainty involved when there are ambiguous or fuzzy testimonial reports or when there are testimonial reports about fuzzy or fluid things. I don't yet have an answer. But let me say one more thing: the theory of fuzzy sets, it seems to me, has at least one big advantage over a theory such as Pearl's causal interpretation of Bayes' nets: As I understand fuzzy set theory and Pearl's theory, the theory of fuzzy & rough sets does not depend on - does not need - an understanding of the underlying "factors" that make language work as well as it does; but Pearl's approach, if it is to be successful, is ALL about finding hidden or omitted variables. I just cannot imagine that we can use in the here and now any formalization of reasoning that demands - seems to demand - that we know or be able infer with some assurance what factors variables - ALL factors - "really" explain pr account for the phenomenon or phenomena that we (think we) have before us. The power of fuzzy sets is in part (oddly) its proclamation or claim or presupposition that we can control or manage our environment (to a substantial extent) just by understanding how our language already works - or, by what perhaps amounts to the same thing, by constructing an artificial logic or language that mimics natural language. There is an affinity, isn't there, between this fuzzy set theory's confidence in natural language and the law's - and some AI's - belief in common sense reasoning: they all assume, with some pretty good "reason," that human beings can use "crude" expressions such as "hard" and "soft" to good effect. *** Have I said anything interesting or important? I wonder. But the above is the best I can do in a couple of hours late at night. I hope to write with much more care and after much more reflection in a week or two. Peter T ------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------- Charles R. Twardy, Res.Fellow, Monash University, School of CSSE ctwardy at alumni indiana edu +61(3) 9905 5823 (w) 5146 (fax) Jun 18, 2003 A simpler crisp example is the following. Given: (a) Over 99% of professors have a Ph.D. degree; and (b) Robert is a professor. What is the probability, P, that Robert has a Ph.D. degree? The correct answer is that P is indeterminate. And yet, most people, including those with scientific training, would say that P is over 0.99. This answer is correct only if it is assumed that Robert is drawn at random from U with uniform probability. In general, there is no valid justification for the assumption So that's the game, eh? Ask a question in ordinary language and then switch to formal logic to say the answer doesn't follow deductively from the stated premises. Pfssh. I'm not playing anymore if you keep changing the rules. correct answer is "P is indeterminate," Not in any given real situation. In the legal case about which you are concerned, the proper challenge is to show that in Robert's true context (we deliberately chose a professor from Podunk University of Lesser Gondwanaland) the probability is different to that in the assumed context. But if your main point is that our inference is fallible, well, yes. If it's even the point that we're so hopeless at accounting for the proper context that we shouldn't BELIEVE our probability estimates, well, you may be right. But that's hardly the fault of probability theory. - -Charles ------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------- Paul Snow Paulusnix@cs.com Jun 18, 2003 David Poole writes: > In this example, the clarity priciple doesn't depend on any definition of "near", as the adherents to fuzzy would like to claim. All we need is to make a test for whether the ship is near the land. I would ask the captain of the ship. "The captain of the ship would, if asked, concur that the ship is near the land" is a perfectly clear clarity priciple: we could bet on what the captain would say (we might need to have some protocol if the captain refused to talk to us on the grounds that we are just troublesome academics). We could even derive a probability distribution of what the captain would say conditioned on the distance... On what basis is the captain supposed to answer your question? How does she use the information about the birds? Are probabilistic means foreclosed to her? Paul Snow ------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------- David Poole Wed, 18 Jun 2003 22:11:05-0700 Paulusnix@cs.com wrote: > David Poole writes: > > In this example, the clarity priciple doesn't depend on any definition of "near", as the adherents to fuzzy would like to claim. All we need is to make a test for whether the ship is near the land. I would ask the captain of the ship. "The captain of the ship would, if asked, concur that the ship is near the land" is a perfectly clear clarity priciple: we could bet on what the captain would say (we might need to have some protocol if the captain refused to talk to us on the grounds that we are just troublesome academics). We could even derive a probability distribution of what the captain would say conditioned on the distance... On what basis is the captain supposed to answer your question? How does she use the information about the birds? Are probabilistic means foreclosed to her? > > > Paul Snow The captain is just supposed to answer "near" questions. We don't ask her theoretical questions, just "are we now near land"? She can use whatever information she likes, her answer can depend on the time-of-day or the weather (both of which I would expect to be relavant to the answer) or she can just pick random answers. That she says yes to this question, is a well defined proposition that we can have bets over or have probabilities over. We can now gather evidence, have prior probabilities, or just argue theoretically about what her answer would be. We can have a probability of her saying yes, given the distance (and given other relevant features). We don't need any "near" fuzzy concept. If I was to build a system based on closeness of ships to shore, I'd much rather trust (and model) the opinions of experts, than of me or other computer scientists or logicians making up an arbitrary definitions of what "near" may mean. "Near" probably has quite a useful meaning in the nautical world. In the birds example, the "near land" now has a precise meaning. My guess is that she does not use the information about the birds in her judgement of closeness, although, as in the example, it could be used as evidence about whether or not she would say we were close. This may not seem so obvious for concepts such as "near" where it seems obvious that it is a function of distance (although my guess is that it is also a function of the weather; what may no be "near" in calm waters may be very near in stormy waters) but we just don't know the function. However, consider the concept of "Beauty". There are no obvious properties that make a scene or part of a scene beautiful. But we can still have probabilities distribution over whether someone (or even when a random person) would say that a scene is beautiful. We can learn what makes things beautiful (i.e., what properties would predict that someone would say it is beautiful). There is a common saying that "beauty is in the eye of the beholder" that emphasises the subjective nature of beauty. I must admit that I have never understood the motivation for fuzzy logic. I can't see why concepts such as "near" or "beautiful" can't be modelled as above with standard probability, as above. Perhaps I can be enlightened. David ------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------- Dr. Marco Zaffalon Senior Researcher IDSIA Galleria 2 CH-6928 Manno (Lugano) Switzerland phone +41 91 610 8665 fax +41 91 610 8661 email zaffalon@idsia.ch web http://www.idsia.ch/~zaffalon Friday, June 20, 2003 7:30 am David Poole wrote: > > (2) "Often" and "credible" are imprecise descriptions of something > > probabilistic.Bayesians typically insist upon precise numerical > probability. Some > > others relax that, but still require numbers which, say, bound probability > > intervals. > >Bayesians are like that because we have to make decisions. I'm quite >happy to have abstract problems (like the one Lotfi Zadeh posed, for >which the answer is "probability that Robert wears large-size shoes is >bigger than 0.5") but when I have to act, I have to decide on something. >If my decision doesn't depend on which value bigger than 0.5 is the >appropriate value, then I could pick any value and make the same >decisions. If my decision does depend on which value, then having the >range doesn't help. When I make a decison, I am implicitly assuming a >particular value. So I might as well have the value that best reflects >my knowledge. A major focus of UAI seems to be building computer systems that assist us in taking decisions. I am happy with systems that can recognize the limits of their knowledge and suspend the judgment when these limits are reached, in the same way that I prefer to be told "I do not know" when I ask for road information rather than being recommended a wrong route. Also good human experts know when they should suspend the judgment. Having to occasionally suspend the judgment is logical consequence of working with probability intervals, or with more general frameworks (e.g., lower probabilities and previsions). So intervals are likely to be needed by real rather than abstract problems. What about having to make a decision? Consider a prospective expert system to diagnose a disease, which, given information on a specific patient, tells the doctor: given my current knowledge, I cannot decide between "disease" and "no disease". This is likely to motivate the doctor to look for further sources of information externally to the system, for example, by examining recent medical literature, by asking more experienced colleagues, by doing medical tests that are not considered by the system, etc..., in the direction of reliable diagnosis. This appears to be a safer approach to reduce the uncertainty than making strong assumptions to have the expert system always produce determinate conclusions, perhaps making it produce "no disease" in some occasions when the evidence does not actually justify this. Marco ------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------- Henry Kyburg Friday, June 20, 2003 4:44 PM From: owner-uai@cs.orst.edu [owner-uai@cs.orst.edu] On Behalf > Of kyburg > Sent: Friday, June 20, 2003 4:44 PM > To: uai@cs.orst.edu > Subject: Re: [UAI] Lotfi Zadeh's venerable questions Dear folks: I have hesitated to join this discussion, since I've been writing about these issues (to little effect!) for more than forty years. If we separate out the issue of fuzziness -- what "most", "tall", and "large" have in common -- what I left is the relation between probability and (approximate)relative frequency. When Charles is chosen "at random" from the set of tall people, it is surely reasonable (pace "most") to say that his feet are probably (more probably than not) large. Foundationally and practically, however, this observation is no help. It is foundationally no help, since we can find no gloss for "at random" that does not presuppose an understanding of probability. It is practically no help, since the set of instances in which we have truly Sender: owner-uai@maillist.cs.orst.edu Precedence: bulk random selections is of measure zero among our applications of probability. An alternative is to stipulate that we "know nothing" about Charles other than that he is tall. Again this is no help, since we always know a great deal about any object those properties interest us. (We know that Charles is a friend of Susan, that he likes fishing, ...) What is important is that we know nothing relevant to the size of Charles's feet other than that he is tall. In that case, provided we know that Charles is tall, and that most tall people have large feet, we have some grounds for taking the set of tall people to be a potential reference class for the statement in question. Of course we may also have other potential reference classes. We may know that Charles is Swedish and that most Swedes do not have large feet. And we may have no information -- even fuzzy information -- about the intersection of two potential reference classes: we may have no knowledge about the proportion of tall Swedes who have large feet. There are three questions: 1) In the case of every sensible probability, can we find at least one reference class in which we know, at least roughly, the relevant relative frequency? I claim that the answer is "yes". Remember that the next toss of this newly minted coin that will be tossed once and then melted down, is also the next toss of a coin, and that we know that about half of coin tosses land heads. 2) Are there principles that allow us to take a collection of potential reference classes as input and output an approximate probability? I claim that the answer is "yes". There are intuitively plausible principles that can lead us to disregard a potential reference class-- for example, the existence of a potential reference class that is a subclass of that class, and that conflicts with that class. We may be led to a collection of potential reference classes that should not be disregarded;in that case we may formalize the output of the process as the cover of the intervals corresponding to the surviving classes. 3) Given plausible background knowledge, does this process lead to probability intervals that are narrow enough to be useful as a guide in life? My intuition is that it does, but only time and examples will tell. Note that degrees of belief have played no role: A person's beliefs have no evidential relevance. Of course the beliefs of experts may be relevant, but that is because the experts have a lot of statistical data on which to base their beliefs. Henry ------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------- I.R. Goodman goodman@spawar.navy.mil Fri, 20 Jun 2003 14:13:57-0700 I have gone over this issue and this is my preliminary comments based in part on conferring with colleague Don Bamber, viewing Henry Kyburg's "Class Reference Problem" response to you, and my own experience. As I see it, the issue here is NOT one where fuzzy sets provides the natural answer to the exclusion of probability, NOR the converse. The principal issue here is one of "pseudo-instantiation" or, similarly, identification of a finite set of random outcomes (often, a singleton) with the entire random variable (or random quantity), or in a pure fuzzy logic context, identification of a specific element of a population with the "typical behavior" of the population. That identification is carried out in order to be able to have a collection of equations or relations in some common finite set of ordinary variables to lead to a solution set or solution space for the inference problem at hand. Let me illustrate this point through two simple examples, the first a classical statistical one and the second example provided first in a pure fuzzy logic context, then in an equivalent one-point random set coverage one. Of course, it is not the same as the classical logic rule of instantiation, whereby if every element of a population has a property (or a known probability of having that property can be assigned), the we can conclude soundly that so does "Robert" and safely use this in all subsequent reasoning. While both examples illustrate different facets of this pseudo-instantiation or pseudo-substitution principle, nevertheless, they have a good deal in common: Example 1. Statistical Example : One makes two measurements/ observations x1*, x2* of a table's length and wants to infer about the true length u. Assuming theoretical random variable measurements x1, x2 are independent identically distributed gaussian n(u, s), where s > 0 is the known variance and u(> 0) is the totally unknown mean corresponding to table length, for any real number a, 0 < a << 1, such as a = 0.05, or 0.10, and letting F be the cdf of a chi-square r.v.with 1 degree of freedom, consider the random closed interval [y- (s/2)Finverse(1-a)), y+ (s/2)Finverse(1-a)], relative to true parameter value u, where sample mean r.v. y = (1/2)(x1 + x2). Thus, in terms of the r.v.s x1, x2, and hence y, we have the relation for all possible nonrandom real u P(u in [y- (s/2)Finverse(1-a)), y+ (s/2)Finverse(1-a)] | u) = 1-a. But, in practice, we replace in the above (1-a) % confidence interval relation, the only random variable present y by its corresponding outcome value y* = (1/2)(x1* + x2*), producing two numerical endpoints and "cliam" also (1-a)% confidence about u being in the now computable interval [y*- (s/2)Finverse(1-a)), y*+ (s/2)F-inverse(1-a)] , conditioned on u, etc. The validity of the transition or substitution from the theoretical r.v. y (an entire function!) to the single numerical outcome value (of that function) y* is the key issue in the above example. To my knowledge, very few texts in mathematical statistics or probability theory have discussed this. (Don has suggested some of J. Hailpern's work stemming out of expert systems and AI considerations also has considered this issue to some extent.) However, a short discussion in the classic text of Cramer does consider this -- but with no additional insight or rigid mathematical perspective. Perhaps there is some literature on this topic as part of the philosophical foundations of probability (besides Kyburg's work). Example 2 -- Non-probabilistic form of your original "Robert" example couched in a slightly modified fuzzy logic format and its equivalent random set formulation Here, let s, t be any real numbers -- in the unit interval -- in practice, reasonably close to 1 (but not necessary). Let universe of discourse be the (finite) population pop of living human males in a particular country of choice, where specifically pop = {x1, x2,...., xm, xm+1,...., xn}, where xm = Robert. Consider two measurement functions: height: pop --> [2', 10'] and shoelength: pop --> [3, 30], where height is known for at least Robert, but shoelength is unknown. Consider the predetermined known fuzzy sets (at least) tall: [2', 10'] -->[0,1] , (at least) large: [3, 30] --> [0,1], most:[0,1] --> [0,1], and also consider the (known) fuzzy logic operators of (&): [0,1]x[0,1] --> [0,1] (such as min, prod, or some copula (or T-norm), so that in any case component-wise, (&) > or = min) and count, where for any fuzzy set f: pop --> [0,1], count(f) = w1f(x1) + w2f(x2) + ... + wnf(xn), where wj is weight of importance of xj -- often chosen to be 1/card(pop), corresponding to equal/uniform weighting -- but in general need not be so simple. Then, we can pose the problem as a function of chosen s, t: 2a. "Pure" Fuzzy Logic Form Given: most[ count[large(shoelength(.))(&) tall(height(.))] / count[tall(height(.))] ] = s , tall(height(Robert)) = t, Find possible values or estimate large(shoelength(Robert)), as a function of s, t. Simple suggestion: as in Example 1, where reasonable, identify behavior of more global entity with specific element: i.e., substitute large(shoelength(Robert)) for count(large(shoelength(.)) and tall(height(Robert)) for count(tall(height(.))). Then, assuming "most" is monotone increasing, with above substitution and assumption, problem becomes Given: [large(shoelength(Robert))(&) tall(height(Robert))] / tall(height(Robert))] ] = s , tall(height(Robert)) = most-inverse(t). Find possible values or estimate large(shoelength(Robert)), as a function of s, t. Simple calculations show large(shoelength(Robert)) > or = large(shoelength(Robert))(&) tall(height(Robert))] = most-inverse(t) x s . So far, in this example no probabilities have appeared. 2b. Equivalent One-Point Coverage Random Set Representation Now suppose with the same assumptions, we also desire to find the probability that Robert has a large shoe. In order to consider the most appropriate way to transform from the general population to Robert and account for probabilities, we must now convert the above fuzzy logic form of the problem to its equivalent random set(s) form: Following the techniques well-established in various papers (such as in Biennial Review 2001, SPAWAR, San Diego, pp.58-69 and Information Sciences 148, 2002, pp.87-96), it follows that because the fuzzy sets "tall" and "large" are equivalent to the one-point coverage behavior of right-anchored random closed intervals S[tall] = [U, 10'] of domain [2', 10'] and S[large] = [V, 30] of domain [3, 30], i.e., U and V are random variables over their respective domains. In the general case S[tall] and S[large] are constructed from the choice of (&) and "tall"', "large". But Iin fact, we must have here that the cdf of U is the membership function of "tall" and the cdf of V is that of "large". Thus, the equivalent problem is: Given: P(shoelength(X) in S(large) | height(X) in S(tall)) = s, P(height(Robert) in S(tall)) = most-inverse(t), where it can be shown that X is a random variable with probability function identified with (w1, w2,...., wn), independent of random sets S(tall), S(large). Find P(shoelength(Robert) in S(large)) ? But, the above is equivalent to: Given: P(V < or = shoelength(X) | U < or = height(X) ) = s, P(U < or = height(Robert) in S) = most-inverse(t), where X is a random variable with probability function identified with (w1, w2,...., wn), independent of random variables U, V.. Find P(V < or = shoelength(Robert)) ? Thus, if we make the substitution of (V < or = shoelength(Robert)) for (V < or = shoelength(X)) , and (U < or = height(Robert)) for (U < or = height(X)), we have Given: P(V < or = shoelength(Robert) | U < or = height(Robert) ) = s, P(U < or = height(Robert) in S) = most-inverse(t), Find P(V < or = shoelength(Robert)) ? Using again basic properties of probabilities, thus yields analogous to the original fuzzy logic approach: "P(shoelength(Robert) is large)" = (by def.) P (shoelength(Robert) in S(large)) = P(V < or = shoelength(Robert) > or = P([V < or = shoelength(Robert)] and [U < or = height(Robert)]) = sxmost-inverse(t) . Summary. All of the above elementary computations in Example 2 -- once the random set coverage relations are used -- lead essentially to the same inference, whether it be via fuzzy logic (Ex.2a) or equivalent random set coverages (Ex.2b). However, in order to make both approaches tractable, the substitution principles of the individual Robert for the entire population (either directly or via r.v. X) is used. This principle, whether in the form of Example 1 or Example 2 (a, b), is a weak spot in the bridge between theoretical probability and its applications. It would be nice to have more researchers consider this all important issue that seems to have been neglected to a large extent; if there is any on-going work in this area, it would be most desirable to know about it. Thanks and looking forward to hearing from you, I.R. Goodman ------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------- Paul Snow Paulusnix@cs.com Jun 20, 2003 Subj: The correct answer(s), plural and singular Date: 6/20/03 To: uai@maillist.cs.orst.edu Dear All :- 'The premises provide no information about the desired P' is a tenable view. Even for those who agree, however, it is untrue that P is undetermined according to all standard probability theories. For example, an orthodox Bayesian who accepted that "Robert wears large-sized shoes" was fit for probabilistic treatment would have a specific _a priori_ probability for that statement. If the other statements inspire no opinion change, then the answer for this analyst is "P equals ," which is a definite number. There is no obligation to assent to 'the premises provide no information about the desired P.' If we understand what is being said in the shoe problem to imply that Most members of a population to which Robert belongs wear large shoes then we might think about direct inference and the fraught relationships between statements about a population and statements about a member of that population. In the birds-at-sea example cited in an earlier posting *, Polya makes a typical direct inferential leap, that what is true of bird sightings in general says something about this particular occasion. It is probability theory which tells us that that is a leap. Whether or not one jumps is up to the believer. Describing "the logical laws of probability," de Finetti wrote These laws are the conditions which characterize coherent opinions (that is, opinions which are admissible in their own right) and which distinguish them from others that are intrinsically contradictory. The choice of one of these admissible opinions from among all the others is not objective at all and does not enter into the logic of the probable... 'More likely than not, Robert wears large shoes, on the available information' is an admissible conclusion. Many other conclusions are also admissible. Professor Zadeh is correct, then, that probability theory does not dictate an answer in his example. It would be magical if any method for non-demonstrative deliberation could prescribe the content of someone's beliefs. Probability offers advice about whether opinions are consistent, not whether they are right, nor even impeccably justified. The question in the BISC root posting was not whether probability theory decides the issues arising in the example. The question was whether probability theory addresses them. The correct answer is yes. Best regards. Paul Snow The de Finetti quote appears on page 110 of Henry Kyburg's translation of "Foresight, its logical laws, its subjective sources" in Kyburg and Smokler (eds.) _Studies in Subjective Probability_, Wiley, 1964. For de Finetti's application of his views to direct inference, see pages 115 and following. A discussion of direct inference by Henry himself starts at the bottom of page 283 in his classic "Bayesian and non-Bayesian evidential updating" (_Artificial Intelligence_ 31, 1987). * Polya's example: Imagine that we are aboard a ship, out of sight of the land. We adopt the premises When the ship is near the land, we often see birds. When the ship is far from the land, we see birds less often. We see birds for the first time since leavng port. Polya concludes that it becomes more credible that we are near the land, compared to before the sighting. Polya accepted the qualitative and the perception-based character of the problem. The example appears on page 37 of Polya's _Patterns of Plausible Inference_, which is the second volume of his _Mathematics and Plausible Reasoning_(Princeton University Press, 1954). Paul Snow ------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------- Vern R. Walker, Ph.D., J.D. Professor of Law Hofstra University School of Law 121 Hofstra University Hempstead, New York 11549 Tel. (516) 463-5165 Fax (516) 463-4962 Friday, June 20, 2003 7:34 am Vern's comment: The problem posed by Professor Zadeh has been studied by a number of philosopher/logicians (including Levi, Kyburg, and Pollock) as a species of "statistical syllogism," and in particular as the problem of "direct inference." I have applied some of this work in law to the so-called "lost chance" cases (Vern R. Walker, "Direct Inference in the Lost Chance Cases: Factfinding Constraints Under Minimal Fairness to Parties," Hofstra Law Review 23:247-307 (1994)), where I also rejected (as does Professor Zadeh) the "random selection" rationale as being usually unwarranted in legal settings. But then what rationale can there be? What is the warrant structure for Professor Zadeh's problem? Before suggesting one answer, I want to point out that the problem is not at all peculiar to law. It is not only a frequent pattern of reasoning in everyday life ("It will [probably] rain this afternoon, and my old umbrella will [surely] collapse in such a wind"), but also essential to gathering scientific data ("This sample contains 100 mg of chlorine"; "Subject #132 is a Caucasian male, age 42") and implicit in all reasoning about specific perceptual acts ("I see that the grass in the front yard is turning brown"). If anything, law simply provides one circumstance where it is important to study our reasoning patterns very carefully, and it presents many kinds of cases where the reasoning is epistemically messy. Getting back to Professor Zadeh's posed problem, the evidence in a particular situation sometimes warrants a third solution: (3) X is a random variable with a probability distribution that is warranted by an adequately specified causal model of the form B = f(A, C1, C2, C3, ... , Cn, where A, C1, C2, C3, ... , Cn are causally relevant variables for producing B. That is, some parts of human experience are relatively closed causal systems that are adequately understood - at least within the uncertainty tolerances that we find acceptable for particular practical purposes. My tolerance for uncertainty about rain and my umbrella depends upon what I want to do this afternoon. Moreover, there is a complex logical structure for the many types of uncertainty inherent in such reasoning. Finally, this version of (3) is far too simple because it fails to capture the richness of the concept of "individual." But this first approximation will have to do for now, and may provoke some much needed discussion. Vern R. Walker Hofstra University School of Law ------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------- Kathryn Blackmond Laskey Friday, June 20, 2003 11:32 am At 7:30 AM -0700 6/20/03, Marco Zaffalon wrote: >...I am happy with systems that can recognize the limits >of their knowledge and suspend the judgment when these limits are reached, >in the same way that I prefer to be told "I do not know" when I ask for >road information rather than being recommended a wrong route. Even better is a system that can say: "If forced to give advice, my recommendation is X, with justification Y, but my confidence in that recommendation and its justification is only Z." If the answer is not very trustworthy, I would be well-advised to consult other sources if resources permit, but at the very least I have a recommendation and a justification to chew over. A system that says nothing but that it suspends judgment gives me no help at all. Kathy ------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------- Henry Kyburg Friday, June 20, 2003 7:43 am Dear folks: I have hesitated to join this discussion, since I've been writing about these issues (to little effect!) for more than forty years. If we separate out the issue of fuzziness -- what "most", "tall", and "large" have in common -- what I left is the relation between probability and (approximate) relative frequency. When Charles is chosen "at random" from the set of tall people, it is surely reasonable (pace "most") to say that his feet are probably (more probably than not) large. Foundationally and practically, however, this observation is no help. It is foundationally no help, since we can find no gloss for "at random" that does not presuppose an understanding of probability. It is practically no help, since the set of instances in which we have truly random selections is of measure zero among our applications of probability. An alternative is to stipulate that we "know nothing" about Charles other than that he is tall. Again this is no help, since we always know a great deal about any object those properties interest us. (We know that Charles is a friend of Susan, that he likes fishing, ...) What is important is that we know nothing relevant to the size of Charles's feet other than that he is tall. In that case, provided we know that Charles is tall, and that most tall people have large feet, we have some grounds for taking the set of tall people to be a potential reference class for the statement in question. Of course we may also have other potential reference classes. We may know that Charles is Swedish and that most Swedes do not have large feet. And we may have no information -- even fuzzy information -- about the intersection of two potential reference classes: we may have no knowledge about the proportion of tall Swedes who have large feet. There are three questions: 1) In the case of every sensible probability, can we find at least one reference class in which we know, at least roughly, the relevant relative frequency? I claim that the answer is "yes". Remember that the next toss of this newly minted coin that will be tossed once and then melted down, is also the next toss of a coin, and that we know that about half of coin tosses land heads. 2) Are there principles that allow us to take a collection of potential reference classes as input and output an approximate probability? I claim that the answer is "yes". There are intuitively plausible principles that can lead us to disregard a potential reference class-- for example, the existence of a potential reference class that is a subclass of that class, and that conflicts with that class. We may be led to a collection of potential reference classes that should not be disregarded; in that case we may formalize the output of the process as the cover of the intervals corresponding to the surviving classes. 3) Given plausible background knowledge, does this process lead to probability intervals that are narrow enough to be useful as a guide in life? My intuition is that it does, but only time and examples will tell. Note that degrees of belief have played no role: A person's beliefs have no evidential relevance. Of course the beliefs of experts may be relevant, but that is because the experts have a lot of statistical data on which to base their beliefs. ------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------- Charles R. Twardy, Res.Fellow, Monash University, School of CSSE ctwardy at alumni indiana edu +61(3) 9905 5823 (w) 5146 (fax) Jun 24, 2003 OK, I'm going to expose my ignorance and biases here and give y'all an opportunity to set me straight. People keep saying things about Bayesians that I just don't get. For example, that Bayesians require precise probabilities and that they can't represent the *uncertainty* of our probability estimates. But a standard introduction to Bayesian estimation is estimating the probability of Heads for an unknown coin. And as given in, for example, Silva, you watch as an initially broad distribution becomes more and more narrow as you gather evidence about the coin. Isn't that all we need? Doesn't the flatness of our pdf encapsulate "degree of ignorance"? Rolf Haenni writes about degrees of support What would Bayesians do in such a case. They would start by saying }p(X|A)=1 and p(A)=0.1. So what is p(X)? } p(X) = p(X|A)p(A)+p(X|NOT-A)p(NOT-A) = 0.1 + p(X|NOT-A)*0.9. }Correct. But what is p(X|NOT-A)??? Bayesians tend then to assume }p(X|NOT-A)=0.5 and to compute p(X)=0.55. I would have thought such a "max entropy" Bayesian would put a flat prior between 0 and 1 on p(X|NOT-A), rather than a Dirac delta around p=0.5. Am I missing something? -Charles ------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------- David Poole Tuesday, June 24, 2003 10:56 am David Poole's Reply: Marco Zaffalon wrote: > A major focus of UAI seems to be building computer systems that assist us in taking decisions. I am happy with systems that can recognize the limits of their knowledge and suspend the judgment when these limits are reached, in the same way that I prefer to be told "I do not know" when I ask for road information rather than being recommended a wrong route. Also good human experts know when they should suspend the judgment. Having to occasionally suspend the judgment is logical consequence of working with probability intervals, or with more general frameworks (e.g., lower probabilities and previsions). So intervals are likely to be needed by real rather than abstract problems. I agree that advising systems need to be able to say "I don't know", but when to say "I don't know" depends on the relative costs of false-positives, false-negatives and not making a decision. So what do we do in these cases? An advising system does not just do probabilistic reasoning, but needs to make decisions about what actions it should take. Saying "I don't know" is an action (of the advising system) that has an expeted value, just like having any other action. We need to combine these utilities with the probabilities to determine the best action of the advising system. There are many cases where the utilies are the deciding factor (even given the same probabilistic setup). If you are requiring road directions right now, it may be better for the system to say "turn right", even if this can't be proved to be the optimal response, to avoid an accident. Whereas if, the utilities change, and there is less cost associated with delaying a decision, it may be more prudent to suggest a careful check of all available information. > What about having to make a decision? > Consider a prospective expert system to diagnose a disease, which, given information on a specific patient, tells the doctor: given my current knowledge, I cannot decide between "disease" and "no disease". This is likely to motivate the doctor to look for further sources of information externally to the system, for example, by examining recent medical literature, by asking more experienced colleagues, by doing medical tests that are not considered by the system, etc..., in the direction of reliable diagnosis. Again, it depends on the utilities. And we can determine the cost and value of information. I have seen no evidence that "... intervals are likely to be needed by real rather than abstract problems." I have seen good arguments as to why we need to have probabilies + utilities. The main problems I see are in knowledge representation: how can we actually represent real problems so that we can acquire the information necessary (from people and data) and effectively compute what we need to to make appropriate decsions. But I can't see how intervals helps us here. In these foundational arguments (which are very important), there are many of us who think that we should let a thousand flowers bloom. It is quite possible that the Bayesian manifesto is wrong (I give it a low prior of being right, but a high posterior based on its success). However, I don't think that the resulting "winner" will include all of these formalisms; I think it will include very few. Most of these flowers will wither and die. Do I think that intervals will be part of the winning formalism? No. Do I think that research should continue on these formalisms? Certainly! It is quite likely that I am wrong. Each of us needs to make decisions about our research time; we are all (implicitly or explicitly) betting as to what will win and form the foundation of future undertanding. We need to reasonably exhaustively explore the search space before we declare a winner, and discussions like this are important! David ------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------- John M. Agosta Tuesday, June 24, 2003 10:55 am I want to point out a characteristic of Bayesian practice that is implicit in several of the answers to L. Zadeh's question, and can reconcile several of the responses so far. In short, a Bayesian approach engages the decision-maker in an interactive "game" that eventually discovers the decision-choice. From the Bayesian point of view the question what to do about incomplete information (really an incomplete model) raises the question, why was the game interrupted just at this point? Implicit in the Bayesian formulation is that I, the analyst, pick the questions I want to ask. And the decision maker understands that its worth participating in the game because they have something riding on the outcome of their decision. Rolf Haenni particularly lucid response includes a nice example where he wrote: > Let me illustrate this by a simple example. Suppose we know that: > 1) A implies X > 2) p(A) = 0.1 > What can be said about X? ... > > What would Bayesians do in such a case? Well not to be glib, the short answer is, if the decision maker volunteered this information so far, what impediment is there to her revealing the rest of her mental model? We see this investigative flavor of the analysis in some of the other discussions: In David Poole's response: > The captain is just supposed to answer "near" questions. We > don't ask her theoretical questions, just "are we now near > land"?... a further indication of this kind of interaction in a fragment from Kathryn Blackmond Laskey: > ...I would be well-advised to consult other > sources ... These fragments, admittedly taken out of context, describe an analyst in an active role, who is engaged in eliciting information and is not constrained with just using the information at hand. So, if the Bayesian presumes access to the persons involved in the decision, then without that access clearly "P is indeterminate", to use L. Zadeh's term. Well but, one might ask, what if the analyst is REQUIRED to make a decision on an incomplete basis? As a Bayesian then I might ask, what are the rules for this new game? For instance, in the new game does the unavailability of information imply something about its contents? Is there really a decision maker from whom I can elicit the problem or is she just a hypothetical construct? Perhaps the game aspect is somewhat tangential to just the question L. Zadeh proposed. It is instructive however to anyone who wants to apply a Bayesian method in a practical setting -- for them to know that a large part of their day-to-day effort as analyst will be to engage (interrogate might be a better term) the problem's stakeholders and not to be constrained to use just the information that comes across their desk. ------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------- Jon Williamson Department of Philosophy, King's College, Strand, London, WC2R 2LS, UK http://www.kcl.ac.uk/jw Friday, June 27, 2003 1:57 am I'd like to respond to one point made by Prof. Zadeh: The maximum entropy principle is frequently invoked when probabilistic information is lacking or incomplete. The problem is that the principle is not applicable when probabilistic information is imprecise, as it is in most realistic settings. The reason is that the concept of maximization breaks down when the side-conditions are imprecise, as in: maximize f(X) over the interval [approximately a, approximately b]. I don't think this is really a problem for the maximum entropy principle, which is a highly defeasible way of assigning probabilities. If the constraining interval [a,b] is imprecise or approximate, that doesn't really matter; one can maximise entropy over [a,b] to get a probability function p, and then when one comes to learn a better approximation [a',b'] one can update p (normally by minimising cross entropy) to yield a new function p'which fits the new constraint. Just because quantitative information is only approximate, that doesn't mean one shouldn't or can't use it to determine point-valued degrees of belief (probabilities) - it just means that one should be prepared to change these beliefs as approximations improve. Practically every science faces the difficult task of sharpening qualitative perceptual judgements into the precise, quantitative language of the science in question. The way a science does this is rarely formulated explicitly as part of the science itself, and often seems mysterious from the outside. Probability theory, for instance, tells you what to do when you have certain probabilities and certain assumptions hold, not how to arrive at this information. It is the job of statisticians, knowledge engineers and philosophers of science to better articulate the sharpening process, but just because it isn't written down in books on probability theory, that doesn't mean it can't be done. cheers, Jon ------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------- Paul Snow Paulusnix@cs.com Tuesday, July 1, 2003 9:26 am Dear All :- That was a good thread. Imprecise probabilities and direct inference have been thoroughly and engagingly treated. As to the remaining issue, it turns out that all along there have been probabilistic accounts of bivalent propositions containing qualitative descriptions, e.g. Cox's 'short, fat old man.' Even Boole trafficked in them. It is not so bad that some other probabilists disagree with those accounts. "Probabilists fail to agree" stops no press. What matters more is that there really is a technically rich alternative to the abandonment of familiar notions of truth as the price for speaking and listening as human beings familiarly do. With a concrete alternative comes the possibility of pragmatic investigation of otherwise hopeless theoretical wrangles. Does the speaker of "short, fat old man" intend to testify truthfully, or only sort-of truthfully? Is it advantageous or disadvantageous that the listener has a choice between treating that utterance near-enough categorically or by goodness of fit? If goodness of fit is chosen, are the best rules for that always compositional, or is flexibility in representing dependencies often important? Despite appearances, those are not rhetorical questions. Nature has surprised all of us before. What the questions are is a big step up from "we have nothing to say to one another, except that you are irrational (respectively, unfit to live in the real world) and wrong." Best to all, regardless of commitments. Special thanks to Professor Zadeh for his energy in afflicting the comfortable with exquisite personal grace. Paul ------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------- -------------------------------------------------------------------- If you ever want to remove yourself from this mailing list, you can send mail to with the following command in the body of your email message: unsubscribe bisc-group or from another account, unsubscribe bisc-group