All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.

Share

Description

CS 188: Artificial Intelligence Spring 2007. Lecture 11: Probability 2/20/2007. Srini Narayanan – ICSI and UC Berkeley. Announcements. HW1 graded Solutions to HW 2 posted Wednesday HW 3 due Thursday 11:59 PM. Today. Probability Random Variables Joint and Conditional Distributions

Transcript

CS 188: Artificial IntelligenceSpring 2007Lecture 11: Probability2/20/2007Srini Narayanan – ICSI and UC BerkeleyAnnouncementsHW1 graded Solutions to HW 2 posted Wednesday HW 3 due Thursday 11:59 PM TodayProbability Random Variables Joint and Conditional Distributions Bayes Rule Independence You’ll need all this stuff for the next few weeks, so make sure you go over it! What is this?Uncertainty UncertaintyLet action At = leave for airport t minutes before flight Will At get me there on time? Problems: partial observability (road state, other drivers' plans, etc.) noisy sensors (KCBS traffic reports) uncertainty in action outcomes (flat tire, etc.) immense complexity of modeling and predicting traffic A purely logical approach either Risks falsehood: “A25 will get me there on time” or Leads to conclusions that are too weak for decision making: “A25 will get me there on time if there's no accident on the bridge, and it doesn't rain, and my tires remain intact, etc., etc.'' A1440 might reasonably be said to get me there on time but I'd have to stay overnight in the airport… ProbabilitiesProbabilistic approach Given the available evidence, A25 will get me there on time with probability 0.04 P(A25 | no reported accidents) = 0.04 Probabilities change with new evidence: P(A25 | no reported accidents, 5 a.m.) = 0.15 P(A25 | no reported accidents, 5 a.m., raining) = 0.08 i.e., observing evidence causes beliefs to be updated Probabilities Everywhere?Not just for games of chance! I’m snuffling: am I sick? Email contains “FREE!”: is it spam? Tooth hurts: have cavity? Safe to cross street? 60 min enough to get to the airport? Robot rotated wheel three times, how far did it advance? Why can a random variable have uncertainty? Inherently random process (dice, etc) Insufficient or weak evidence Unmodeled variables Ignorance of underlying processes The world’s just noisy! Probabilistic ModelsCSP/Prop Logic: Variables with domains Constraints: map from assignments to true/false Ideally: only certain variables directly interact Probabilistic models: (Random) variables with domains Joint distributions: map from assignments (or outcomes) to positive numbers Normalized: sum to 1.0 Ideally: only certain variables are directly correlated Random VariablesA random variable is some aspect of the world about which we have uncertainty R = Is it raining? D = How long will it take to drive to work? L = Where am I? We denote random variables with capital letters Like in a CSP, each random variable has a domain R in {true, false} D in [0, ] L in possible locations Distributions on Random VarsA joint distribution over a set of random variables: is a map from assignments (or outcomes, or atomic events) to reals: Size of distribution if n variables with domain sizes d? Must obey: For all but the smallest distributions, impractical to write out ExamplesAn event is a set E of assignments (or outcomes) From a joint distribution, we can calculate the probability of any event Probability that it’s warm AND sunny? Probability that it’s warm? Probability that it’s warm OR sunny? MarginalizationMarginalization (or summing out) is projecting a joint distribution to a sub-distribution over subset of variables Conditional ProbabilitiesA conditional probability is the probability of an event given another event (usually evidence) Conditional ProbabilitiesConditional or posterior probabilities: E.g., P(cavity | toothache) = 0.8 Given that toothache is all I know… Notation for conditional distributions: P(cavity | toothache) = a single number P(Cavity, Toothache) = 2x2 table summing to 1 P(Cavity | Toothache) = Two 2-element vectors, each summing to 1 If we know more: P(cavity | toothache, catch) = 0.9 P(cavity | toothache, cavity) = 1 Note: the less specific belief remains valid after more evidence arrives, but is not always useful New evidence may be irrelevant, allowing simplification: P(cavity | toothache, traffic) = P(cavity | toothache) = 0.8 This kind of inference, guided by domain knowledge, is crucial ConditioningConditional probabilities are the ratio of two probabilities: Normalization TrickA trick to get the whole conditional distribution at once: Get the joint probabilities for each value of the query variable Renormalize the resulting vector NormalizeSelectThe Product RuleSometimes joint P(X,Y) is easy to get Sometimes easier to get conditional P(X|Y) Example: P(sun, dry)? Lewis Carroll's Sack ProblemSack contains a red or blue token, 50/50 We add a red token If we draw a red token, what’s the chance of drawing a second red token?Variables: F={r,b} is the original token D={r,b} is the first token we draw Query: P(F=r|D=r) Lewis Carroll's Sack ProblemNow we have P(F,D) Want P(F=r|D=r) Bayes’ RuleTwo ways to factor a joint distribution over two variables: Dividing, we get: Why is this at all helpful? Lets us invert a conditional distribution Often the one conditional is tricky but the other simple Foundation of many systems we’ll see later (e.g. ASR, MT) In the running for most important AI equation! That’s my rule!More Bayes’ RuleDiagnostic probability from causal probability: Example: m is meningitis, s is stiff neck Note: posterior probability of meningitis still very small Note: you should still get stiff necks checked out! Why? Inference by EnumerationP(sun)? P(sun | winter)? P(sun | winter, warm)? Inference by EnumerationGeneral case: Evidence variables: Query variables: Hidden variables: We want: First, select the entries consistent with the evidence Second, sum out H: Finally, normalize the remaining entries to conditionalize Obvious problems: Worst-case time complexity O(dn) Space complexity O(dn) to store the joint distribution All variablesIndependenceTwo variables are independent if: This says that their joint distribution factors into a product two simpler distributions Independence is a modeling assumption Empirical joint distributions: at best “close” to independent What could we assume for {Weather, Traffic, Cavity}? How many parameters in the joint model? How many parameters in the independent model? Independence is like something from CSPs: what? Example: IndependenceN fair, independent coin flips: Example: Independence?Arbitrary joint distributions can be poorly modeled by independent factors Conditional IndependenceP(Toothache,Cavity,Catch) has 23 = 8 entries (7 independent entries) If I have a cavity, the probability that the probe catches in it doesn't depend on whether I have a toothache: P(catch | toothache, cavity) = P(catch | cavity) The same independence holds if I don’t have a cavity: P(catch | toothache, cavity) = P(catch| cavity) Catch is conditionally independent of Toothache given Cavity: P(Catch | Toothache, Cavity) = P(Catch | Cavity) Equivalent statements: P(Toothache | Catch , Cavity) = P(Toothache | Cavity) P(Toothache, Catch | Cavity) = P(Toothache | Cavity) P(Catch | Cavity) Conditional IndependenceUnconditional (absolute) independence is very rare (why?) Conditional independence is our most basic and robust form of knowledge about uncertain environments: What about this domain: Traffic Umbrella Raining What about fire, smoke, alarm? The Chain Rule IICan always factor any joint distribution as an incremental product of conditional distributions Why? This actually claims nothing… What are the sizes of the tables we supply? The Chain Rule IIITrivial decomposition: With conditional independence: Conditional independence is our most basic and robust form of knowledge about uncertain environments Graphical models (next class) will help us work with independence The Chain Rule IVWrite out full joint distribution using chain rule: P(Toothache, Catch, Cavity) = P(Toothache | Catch, Cavity) P(Catch, Cavity) = P(Toothache | Catch, Cavity) P(Catch | Cavity) P(Cavity) = P(Toothache | Cavity) P(Catch | Cavity) P(Cavity)CavP(Cavity)Graphical model notation: Each variable is a node The parents of a node are the other variables which the decomposed joint conditions on MUCH more on this to come! TCatP(Toothache | Cavity)P(Catch | Cavity)Combining Evidence P(cavity | toothache, catch) = P(toothache, catch | cavity) P(cavity) = P(toothache | cavity) P(catch | cavity) P(cavity) This is an example of a naive Bayes model: CE1E2En

Related Search

Previous Document

Next Document

Related Documents

Nov 19, 2017

Feb 28, 2018

Mar 9, 2018

Mar 9, 2018

Mar 13, 2018

Mar 14, 2018

Mar 14, 2018

Mar 14, 2018

Mar 15, 2018

Mar 15, 2018

Mar 21, 2018

Apr 1, 2018

Apr 15, 2018

We Need Your Support

Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks