we’ll start so the first thing we have to do is ta evaluation I’m going to hand these out if you have interacted with any of the T ace when you interacted with mostly you could just pass them back and there’s a bundle of pencils coming with them or if you run out of pencils who got more lots more up over there who forms over there such etc you know the drill and you know the TAS Mehran azeemi we unpause we shall PT ot their pictures are on the first set of slides if you forget her they are lets me run up there you know he is boo-yan was here last time he’s the big tall guy the other guy is the one who’s not goes to ya section 0:02 public good enough 0:02 if you all put 0 0 2 then you won’t lose okay so today the last class and this we have another class so it’s a final review class we will see announcements we will be posting solutions for Simon floor not all of you have Hanna Dean yet so I don’t think we’ll post them yeah we’ll wait wait a while list of short questions posted the previous finals posted the TA hours to continue size 12 for single Said’s exercise 13 for multistage they posted I would really like you to do the teaching evaluation if you haven’t done it yet please do it it doesn’t take very long it’s important for all sorts of reasons and if you want to have some effect on the course or the university or the department it’s a really good way to give some feedback positive or negative and then I got just three requests for review topics constructing a belief network more complicated the elimination problem and be related pruning with the relevant variables those were the three requested nobody else put one in yeah so I’ll deal with those but the first thing we have to do is finish the last class which have too much material so we’re going to do finish out decision three we were looking at computing the optimal policy for a multi stage decision networks by variable elimination so the same algorithm handles you know all Bayes networks all of single stage all the multi stage even all of csps for that matter we didn’t look at that application of variable eliminations but it’s a very general and very efficient algorithm once you’ve decided on the order of eliminating variables that’s the trick we’ve given some heuristics for that so let’s quickly look at this materials let’s say this is not we’re not covering in great detail so I’m responsible for the details it’s covered very well in the book I’m sure you’ve read the book on it if you haven’t you should because it goes by quickly so we’re going to look quickly at that and then we’ll summarize the decision in probabilistic section of the course there any questions of course start so you recall we’re dealing with multiple for dinner with networks like this that have more than one decision in them if you just have one decision then that decision leads directly to utility which is also affected perhaps by other

random variables but if you have more than one decision then you know this decision the result of this decision will affect what you do there plus other random variables may also affect it so how do we compute the optimal policy remember a policy is a set of decision functions the decision function we have one for each decision node which is a rectangular box before you’re making a decision when you make that decision you know all the values of all the parents whether their previous decisions or random variables you know the map at execution time not at planning time all right so a planning time you have to construct a policy which which allows for the contingency of all possible sets of values for the parents of decision node because you don’t know what they’re going to be it when you’re executing the policy we serve to allow for all possible possibilities so the decision is not just what to do the decision is what to do in all these different circumstances based on what the parents are telling you okay so a policy is a set of decision functions each decision function is a mapping from all the possible instantiation of the parents to a decision okay that’s that’s an important technical definition of policy okay so we have these three previous operations on factors we’re going to add one more and that is maxing out a variable it’s like marginalization in that we’re getting rid of a variable but in marginalization you’re going to sum out the variable essentially here we’re just going to take the maxim of the values okay and you can see why this is important for decision because typically and in in a factor that represents a decision you know how to say random variables here and this is the decision now in this case V will be the decision because we’re going to max out B so what we’re interested in is what is the value B that gives us the maximum utility for each of the possible instantiation of the other two variables okay so if we look at so this is the sort of symbolic representation of this operational factors it’s max with respect to some variable have to name it I’m actually with respect to B of F 3 of ABC will give us a factor on a and C we’ve maxed out P okay we’re just saying we’re going to do the best thing we’re going to choose the best value of B for each of these values of a and C okay so we’ve matched up the colors here so given green here shows the values of AC turned false true and false B could either be true or false we get the best value with false here so the actual value we choose is the max of those two it’s 36 okay I haven’t remember which one it is I will have to remember that separately which value gave which value B gave me this optimal value for now I’m going to forget about it okay so if we come down to say this these white variables here and what’s the value there got your cards Eggbert miss Easton Ellis it’s easy to take the max with point zero six and point four eight point four eight this one’s 0.32 so when you max out a variable the size of the table will decrease but factor of how many values are in the domain of B right because you’re just choosing one of those values for each possible set the values of a and C so this is a new factor and that’s basically to say you know if if you have a decision if this represent if this factor represents utility because we’re using factors to represent both utilities and probabilities which is why we call them factors in the first case because we want to have this general idea of a factor and this utility represents given that a decision be knowing the values of these AMC what should you do well if a and C are true and true then you should the decision should be B as false and that will give you a value of 0.5 for okay the other important property we have to have for decision networks since not all decision networks will have this property but it seems fairly fairly straightforward idea that the the agent never forgets that the agent remembers all the decision is made in the past and the values of the

variables that let it to make those decisions so it a long decision process that might be a lot of memory but anyway we’ll assume that the agent never forgets those and so technically you have had the total ordering on the decision variables 1 through m and if a decision the I count before DJ then di is a parent of DJ in other words it it can influence any previous decision can influence the current decision and any knowledge that was available to that previous decision is also available to this decision so we can check this network and see it’s no forgetting has the know forgetting property because here these decisions are totally ordered this one comes before that one that’s the first thing and secondly the parents of Czech smoke our report and that’s also a report of call which is good and that’s all we need to check it has the no forgetting property call knows what decision we made there and it knows what information check smoke needed to use to make that decision okay and the algorithm we give you only works on these no forgetting decision networks so how do you compute the optimal policy where you’ll work backwards from the last decision I mean we know again we already have an algorithm to compute the best decision you take look at all possible policies we’ve already did this in the last lecture it’s a gigantic set of possible policies for each one you can determine its utility and you just pick the one that’s got the highest utility it’s an algorithm it halts etc etc but it’s not as good as this one and we can prove that through the complexity analysis okay so the idea is so-called dynamic programming which means you pre compute the optimal future decisions you don’t have to compute every possible combination of all the decisions you can figure out what the what the policy for the last decision is and once you’ve got that policy you can essentially eliminate that decision because you know what utility you’re going to get and you’ve got a new network with one less decision to make and you just go backwards work backwards computing the optimal policy in that direction okay so that’s the basic intuition so consider the last decision to be made find the optimal decision D for this last decision in that case this is this call decision for each instantiation of these parents okay so here call has three parents report check smoke and see smoke so if these are all boolean there are eight possibilities right for each of those eight possibilities I have to figure out what’s the best thing to do so they call or not call which leads the best utility ok once I’ve done that I can essentially eliminate that call decision variable I’ve got the policy or not I’ve got the decision function that I’ll use them when I get to that decision now I can think of having a new network where I replace this call decision by the utility so I can make these three variables go straight into the utility with a new utility function because I’ve eliminated this decision I know what to do there and I know what the outcome will be what utilities will be so I’ve got a new network which has one less decision and no in this case I’ve only got one left and I could run my previous algorithm on but if you’re doing that for each instantiation of these this is just like a one decision network which we know how to solve ok so you create a factor of those maximum values then you max out that decision call in this case D reach and saturation the parents was the best utility I can achieve by making this last decision optimally and then just do it again recursively find the optimum policy working backwards through the decisions so you have no decisions to make so this is the algorithm create a factor for each conditional probability table for the Bayes net like this kind of stuff here and a factor for the utility here which will be a in this case will be a factor on four variables fire check smoke and call and then here’s the loop doing this for each of the decision variables backwards some out all random variables that not parents a decision node because you’re averaging over all those max out the last decision variable D and the total ordering and just remember on the side the decision function that allowed you to max out that decision as we talked about in the last few slides and once you’ve got rid of all the decisions that way by just slowly replacing utility by a new utility which will be you know probably a growing factor if there are any

remaining variables in that factor you just sum them out in the standard way and you end up with you the expected utility optimal policy and as a side effect along the way you’ve constructed the decision functions that you need and this and this is just a more intuitive way of explaining the algorithm that we give in the book in this section nine point three point three so you should look carefully at that and again you can run it so it’s always better to use an example if this is too abstract exercise thirteen the NAI space we’ll just see how we’re constructing the factors as we work backwards how we construct the decision functions as we work backwards through this through this example here okay so I claimed this was more efficient the way you can see that is we already saw that if you had D decision variable each have paid binary parents and B possible actions was B to the two to the K to the D policies okay that’s the case because each decision has B to the two to the K decision functions possible you have to look at consider them all and then there D decision so you have to multiply that number of functions at each decision together which is where you get this power D up here for D decisions this is of course for any reasonable values of B K and D is a very large number but by by using that dynamic programming trick we consider each decision at a time in isolation okay so we don’t have to consider it combined with all other possibilities for all the other decisions right because you this is the last one we’re considering the room all we have to do is say for each of the parents values what do we do what’s the best thing to do forget about the other two students for now okay so that’s what gives us this factoring of the algorithm so instead of P to the 2 to the K to the D because you consider the only decision each decision function once the complexity is just D times e to the 2 to the K does it still be the true the K for each decision we can’t get rid of that one but this is now multiplying instead of being raised to the power so of course that’s a huge win for this dynamic programming algorithm so it’s still a big number for reasonable problems that maybe is too large there are lots of approximation algorithms and in 422 if you go on to do the second day of course you’ll look at approximation algorithms for computing the policies so say regard through this too quickly to really Crockett but you’ve got the book you’ve got the examples we’ve got a high space so this just gives you a picture where you are and where we are in terms of what you know and what you would learn if you went on to 422 in this area so we’ve talked about probability we talked about decision theory using explicit decision trees and again it’s the States versus features you can think of it that way the decision tree is really a state-based representation and it’s got all the disadvantages of that it’s going to explode on you right and we even prefer small trees and see how big it gets as soon as you have a reasonable number of decisions a reasonable number of possibilities the number of states will explode exponentially so the the track of moving to decision networks as opposed to decision trees is going to a feature based representations where we have variables like cold decisions and you know features corresponding to random variables in this situation and they have values so it again is that win of going from a state based representation to feature base which is having a big theme of this course okay so then we talked about one-off decision sequential decisions I mean these are already useful for things like decision support systems where you’ve been dealing with a very large decision space people can’t keep all this stuff in their heads so you can provide support tools which allow people to make explicit what are the borders in which your acquiring the information when you make each decision what the information is available and even if you’re willing to sign numbers the utilities work out the best possible policy using interaction with the mashenka these are quite common on the other side over here we’re dealing with we talked about Markov processes and so on if you have Markov decision processes you have an infinite process where you’re making a whole series of decisions each time affecting the state that’s also covered in the book and we’re not going to talk about it but you can read about it we can go into 4:22 and these have enormous impact on you

know self-driving cars almost all of robotics is absolutely dependent on this kind of technology because now we can do these things efficiently using sampling techniques and so on again these will be very large so we usually won’t solve them exactly but we’ll use approximation techniques statistical sampling techniques which you might encounter in the machine learning courses but they in economics control systems robotics and have enormous numbers of applications and just to mention some areas of application computational sustainability I think I mentioned this before is a an area that’s becoming increasingly important using computational techniques including especially AR 2 model ecological systems to model decision processes for those systems managing and allocating resources they’re often like constraint optimization problems like for example where should if you’ve got green energy generation where should you put those resources to use the most effectively how should you price them at cetera suppose you’re worried about global warming and you worry about how the Bears are going to survive well what’s going to happen in course is that they’re gonna have to move right to fit into a same ecological niche so when you’re designing a park a wildlife corridor is what you really need they need to be able to move in the direction of changing environmental niche and so there’s actually project in the Rockies that Canadian American park managers are collaborating on design car doors for large mammals like the Bears to be able to move and they’re you know you look at the price of each piece of property how much does that find that property gets you in terms of improved survivability of the species and so on or it could be bird species migrating through Boundary Bay and so on how much how important is it to expand Roberts Bank coal port versus saving the birds etc etc all the trade-offs can be made explicit for urban planning is another big example palm DP models you know we’ve done a lot of work in aging so we have a smart wheelchair project which uses palm to pee models this is another project they’re suggesting is a UBC grad worked on applying palm DPS to hand washing for Alzheimer’s patients so modeling the patient has since their finite state machine moving from getting your hands wet picking up the soap applying soap to your hands turning on the water rinsing your hands tiling it off sometimes they get it in the wrong order you know to get to the rinse their hands off before they towel it off and so on so you can remind them and prop them using models of palm deep heat and I think I showed this example and ratings helicopters real cool stuff using amazing ability to control very unstable dynamical systems like helicopters better than humans can do it right and of course the smart part of the intelligent car that’s completely autonomous that Google is using using public PMDB methods so these were the learning goals for today’s class that was yesterday’s day before yesterday’s class we finished those now so you should know a lot and it’s good for the exam to look over the learning goals I think because it’ll give you a good high-level view of what you should know so any questions about any of that I’m sort of working through this but I’ll talk about the review topics that were requested how much time we have and again ask questions if you don’t understand I think this people have a little trouble with this on assignment for you know there’s always sort of a canonical Bayes net for a problem that is the causal network right that’s the where the physical intuition says first of all I’m going to order the variables in time if I know when these things occur my linear ordering which I have to have in order to have a Bayes net x1 through xn like in factorial ways doing that but it’s usually pretty obvious what the right order is in terms of temporal order right so trouble is what if some devious instructor constructs examples which are non temporal and says you know is this the right is this a correct network or not and then when you’re out of ordering time it’s very hard to to reason about things that are backwards

in time that’s all right I think that was the problem with the assignment for and it’s sort of an artificial example but you should be able to figure that so we could talk about that a more complicated variable elimination problem and pretty irrelevant variables we very briefly touched on that and then you know actually I could give about I couldn’t put someone else who knows that stuff better could give three lectures on pretty irrelevant variables there’s all sorts of things on D separation and so on in basements but we haven’t touched we just dealt with a very simple rule of pruning the leaf nodes recursively okay and that’s really all you have to know about that cool look at that okay so those are the topics I was asked to talk about and they’re all about Bayes nets this tells me something that we didn’t maybe it’s just wait most recently on the mind and get it I think the thing about Bayes Nets they look complicated but they’re really very simple is that it’s just a way of representing the joint probability never forget that it’s really just a shorthand concise description of the joint probability distribution that is how all of these variables Co vary okay for each value of each variable you statute all the variables you’ll get a number right between 0 and 1 which is that your belief that that will occur if you do this experiment if this thing happens if you do it over and over again you’ll believe that this will happen 0.3 is your boolean it’s gonna happen that’s a subject that’s the agents version each of these Bayes Nets it is relative to an agent and remember it’s the agents beliefs about the JPD never forget that that’s the first thing to remember and then once you remember that then you just think causally to construct them and think about what can influence what right if we have modeling a student student understands the material that will affect the assignment grade in the exam grade okay these are not independent variables of course the higher person’s exam grade is the high you would expect that their assignment grade is and this something went wrong or West versa and the other way something could have so this we have here two events with a common cause here we have an event a common event with two different causes right two different causes which are independent and both of these situations occur quite frequently and so if an alarm goes off you don’t know whether it’s caused by smoking or or black fire if you know the alarm goes off and you know it’s caused by smoking it’s less likely that there was a fire okay you can you actually work this out numerically but it called explaining away if I have an event and I know it’s cause I know those both things are true then you’ll find out the probability of fire is less okay at least if the alarm is true then least one of these has to be true but if one is already true then this one will go down that’s called explaining or that’s not always intuitive but it’s true and then here’s a really simple situation we just looked at where things are evolving in time and it’s just the agent or the object whatever this dynamical system is is changing over time randomly like a drunk walking backwards and forwards along the line each time is point nine probability they’re going to go to the right and point one they’re going to go to the left right that could be an example of this kind of Markov chain that we talked about so it could go through all definition I don’t think they need to let’s just first just go through a causal example so say there’s a total ordering I mean this says there’s fire there’s tampering is alarm the smoke was leaving report if you know the physics of the situation you all necessarily get a total ordering I mean I don’t know the order between fire and tampering I could have put tampering before fire because they’re independent events there’s no way I can know which comes before or after each is is consistent with causal so causal doesn’t mean a unique total order right you could have a causal ordering which is a partial order and any total order that respects that partial order is still legitimate a way of constructing the Bayes net okay that’s the first problem and now once we want you chosen a total ordering out of the causal orderings choose the parents for each variable by evaluating conditional independence so

fire obviously the first one doesn’t that can’t depend on anything but the second one tampering I meaning in general when you’re drawing a Bayes net you have to allow for all possibilities right so if I have DC hey just as an example suppose D occurs first then C then B than a temporarily and that’s causal then obviously I have those possible dependencies but of course B could also depend on D right I mean if it didn’t that we’d have the Markov property remember the markov property that says that a state only depends on the previous state right if this if the Bayes net where that that would be a Markov chain right but in general you have to put in all the back arcs right so then we got them all yeah so you know 4-node and I have to put in n minus 1 arcs to the previous variables right make it complete yeah it’s a problem but then I want to delete as many as I can the more I can delete they the smaller the conditional probability tables for for each of these nodes I need a conditional probability table right which for each value of the antecedents tells you the probability of getting that value for the know we’re looking at so ideally I want to delete arcs and so in this case fire and tampering I actually can delete this hunk all right this is fire this is tampering it doesn’t depend on that right so they’re independent right I mean I’m using my intuition about the physics of that to make the assumption that people tamper with alarm don’t depend on whirling up as a fire right but alarm so this is this is fire this is tap ring this is alarm it does depend on both of these so I can’t get rid of those two arcs and smoke is caused by fire and so is independent of tampering and alarm given whether there is a fire right so in each case I’m essentially one side reversed the arms but that’s the point okay so here leaving does not depend directly on tampering it only depends on alarm in other words that leaving given alarm and tampering is the same as leaving given alarm all right so this is conditional independence all right so to be consistent in my confession reversal my arrows sometimes we draw in one way sometimes the drawing the other work that should certainly be consistent okay so that’s so what we’re trying to do because we know that as I say this is just the JPD just the drug probability distribution it’s just a product of factors and so it’s a problem either tampering times the probability of fire given tampering but the property of fire given tampering is the same as the probability of fire because they’re marginally independent right so I can drop the given tampering there so on the moors trying to drop as many variables that I can from the conditioning part because in general anything down here will depend on all the previous nodes right but I think we can you know using the physics in the causal situation you can usually figure out which ones which are back arts to draw right but the other question is you know what about these up non-causal orderings and so here’s an example suppose which

is the one are drawn we have the variables DC VA and that temporal causal ordering there are four factorial minus one 23 other ways of writing them which might be non-causal right for any of those you’ve got to construct the complete graph I’m going to assume that they’re all these dependencies exist and then decide which arcs you can draw right so in general the probe of the ABCD if I’m writing this thing upside down so I’m writing this network this way and being perverse and drawing it backwards in temporal order then gonna have to decide which ones to drop on the way to decide you know if I’m interested they’re probably a see I’m here I’m interested in here I have to represent conditional probability table which the probability of C given a and B I have to decide if I can drop either one and if I’m lucky if I can evaluate you know conditional independence they asked the question is probably OC given a and B equal to the probability of C given B if so dropped the arc from C to a right and to do that use intuition or you can actually work it out I mean you can actually compute the probability of C given a and B and you can compute the probability of C given B and if they’re the same and you’ve got conditional independence all right and you can usually use your intuition there right I mean the I think the example in assignment 4 was the telescope example right and you’re making measurements the measurements were based on two things they’re cooking the causal case they were based on the number of stars in the sky and the error of the telescope right and that gave you the measurements and one of them too so there was an example where M 1 and M 2 were not we dropped the arc between M 1 and M 2 you can only do that if they’re marginally independent you say well how do I figure that out where they’re marginally dependent well you just think about the physics of it right if if I’ve got a telescope looking at the sky and it’s evaluating the number of stars I’ve got independent errors but the two measurements are going to be correlated under any reasonable definition of a telescope right even though they’ve got errors right the more stars there are both M 1 and M 2 will increase on average you know statistically so if I know M 1 the line and I know it’s very very big then I know M 2 is going to be very very big or if I know M 1 is very very small that into it it’ll shift my distribution that’s all I need to know will it shift the distribution is the knowledge of M 1 then a shift the distribution of n 2 or not that’s the fundamental question you’re asking and usually can do that using the physics or your intuition or understand your situation ok questioned since okay no questions so then the other question was how about another example of variable elimination that’s a bit more complicated okay well I figure well why not do the one on the exam the last exam but I said I wasn’t going to give you the answer to give me the answer so this guy this is more complicated because you have to work out the numbers for bring a non programmable calculator you missed that last time somebody in the midterm with the emphasis on non-programmable it’s just a simple four function calculator right no memory no programs no cell phone I mean actually the the exam regulations are much stricter than they used to be but you can keep your your iPhone’s with you but they have to be turned off under the seat you don’t have to put them at the

front of the room anyway so here’s a somewhat hokey example of video game quality suppose you’re interested in evaluating the quality of video games and you believe it’s affected by two things the graphics have to be very cool high res etcetera etcetera good dynamics good surface modeling smoke and all that stuff and there has to be some storyline behind it just to make sense and they both independently affect the quality but the storyline has a deeper cause that is whether or not some stories come from movies like Disney movie titles and some are just invented don’t come from so person who invented this problem believed that if they’re based on a movie it actually makes the storyline less so if you look at this conditional probability if it’s based on a movie is true then the probability a good storyline is true is point four but if it’s not based on a movie probability that’s a good storyline it’s point seven from is true given the quality of movies so how do we solve this problem where the idea of solve is that you’re asked I think to get the probability of good quality right so figure 3 shows a little blah tells you the numbers but sharing those so these factors just represent the conditional probabilities so these are the CPT’s for these root nodes that have no parents then you just need to know what where the chances are true or false for this good storyline turned false it’s based on a movie true these two numbers have to add up to 1 right because if it’s based on a movie then it’s good story lines either true or false so these two numbers add up to 1 and these two add up to 1 that’s the conditional probability table for that and then the last one involves two variables through graphics good storyline and again for each instantiation of the parents you have to give a probability distribution all right that’s important to remember so if the good story line good graphics are both true I have to know the probability the quality would be true that’s point 9 false is point 1 so of course you don’t know it really need to specify the second line because you know that they have usually to do similarly for all the other possibilities for the parents so the question is used the variable eliminations album to determine the probability P of good quality starting with an expression that relates be of good quality to the joint probability submission we know how to do that just write down as the product right of the cpt s what I’ve been hammering right use and it gives you an elimination order so you don’t have to figure one out use this elimination order to eliminate based on movie first good storyline second good graphics third and show your work including definition of all the intermediate factors created it’s not necessary to give numerical values for the intermediate factors you can just give you know the algebraic expressions but you do have to work out the number in the end just as you did in the midterm okay do as much as possible algebraically the relative working American big him okay so it’s very straightforward and I’ll post this how about I post this in connect for you so probably good quality is just summing summing out all the other variables we’re only interested in one of the four variables so it’s just some of the other ones that’s the definition of the marginal probability of good quality right okay that’s the first line second line so we’re summing them out in this order this one first then this one then that so we write down the joint as the product of the CPT’s one two three four factors here okay that’s straightforward we know how to do that and then we sum out based on movie so it’s in two of these factors remember when you’re summing out a variable what you do is you move that summation sign in as tight as you can get it to the factors that involve that variable all the other factors go outside the summation sign right that’s a correct algebraic manipulations is very efficient so that’s what we do right so we find two factors these two

have based on movie and we’re summing it out so we’re going to multiply those two together and sum up based on movie and that’s what this that’s where this factor came from here F 0 good storyline you’ve some doubt based on movie so it’s only a factor of this one this variable so we call that this new factor f 0 on good storyline which is summing out based on movie of P based on movie good storyline victims right and we just keep doing it it’s very straight forward so now we want to sum out good storyline so we find there’s a factor there with it in oh there’s another one over there these two probability good graphics can go outside and we sum out good storyline by multiplying those two together here we call that and leaving a factor on what’s left the other two variables good graphics and good quality some dark good storyline here by multiplying this factor by that factor and summing our good storyline that gives us this line I’m going to want one variable left to some out there graphics so we multiply those two together and we get a factor on good quality f2 which is defined here now you compute the factors this one f0 and compute that one by just multiplying these two and summing out based on movie this is just a definition here’s the multiplication and there’s the summation oh okay you can check that similarly for f1 once you’ve got this factor f 0 then f1 you get by multiplying f0 by this factor that you were given again you just multiply the two multiply each drug sum up the rows when you’re summing out a good storyline here that’s where that sum comes from you get a new factor of 1 with these numbers and finally good quality same thing multiplied this factor by f1 and you get for factor up to on good quality which is point five seven and point four three in general you’d have to renormalize these numbers but it turns out in this case they do already add up to one these didn’t add up to one you’d have to divide each one of them by their sum to get them to the probabilities so now using variable elimination you’ve determined this for a bit of good quality is true is this right that’s it done say I’ll post that and one last question we’re all miss there questions about that was well yeah that’s what we did question five on the practice final pretty irrelevant knows leaf nodes not in the evidence or query variables are irrelevant they couldn’t affect it think causally okay they’re coming after both the evidence and the query so what happens in the future I could think of me the evidence the query being in the past the evidence is right now I’ve got some evidence I want to know what happened in the past to cause this evidence it’s typically what we’re doing and that may again go on to cause future things in the future but I can’t affect the present and this physics is wrong in time codes both ways which it doesn’t so any leave notes which are in the future you can just prune them I mean you can prove this mathematically you can just sum them up but you can just delete them from the Bayes net okay of course once you’ve deleted those leaf nodes there may be new leaf nodes that are not involved in the evidence or query variables you can delete those until you’ve got a network where there are no leaf nodes that are not in the query for the evidence variables so in this example they want to remember this example this long one we did I didn’t want to do the long one exam again but this one here so so I said suppose we’re interested in computing probability that given a I want the probability of F this is the I’m going the other way around but it’s this is a cause so this is causing that F given a then there are two leaf nodes here we check and delete once I’ve deleted those this is a leaf node I can delete it oh there’s a new leaf node here and delete that I calculate that it’s not a leaf node because it’s causing that okay so I’ve got a new Network which just involves these variables and now I can go ahead and do my factor multiplication and summing out the remaining variables and get the answer but it’s a much simpler computation because I got rid of all those leaf nodes okay so let me just finish up here so

you could also look at the solution for assignment for question 5 which will be coming once you guys have finished ending an assignment for this is the stuff on the final I’ve already said all this I could have another review session but I didn’t get any more questions to review so what do you think you want another review session I’m willing to do it but I’m not getting any more questions you might it’s possible it’s really possible say that again if you’re asked questions to the review questions so I have to answer them on the spot I can actually do that sometimes but be known to think on my feet so I prepared this slide just in case that was the answer I thought you might want one just in case so what do you think I mean I could I could make any of these times so we could is it better to do it earlier or later closer to the exam this is the popular choice 15 well let’s just vote on them okay so how about this one it’s just short hands on who wants the review session here Wednesday April attempt that’s mix wimzie’s one two three four okay this one Friday one two three four five six seven eight nine ten okay you can’t go from where the one you wrote it for more than one okay Monday April 15 okay it’s a clear outstanding winner okay so this is the review class and I’ll see if we can do it here I’ll put an announcement I’m pretty sure I can get this room but I’ll put an announcement and connect okay