Welcome to the last lecture It’s a smaller group of people today So, um, yeah, so just a quick announcement The project reports are due next Friday, so just make sure that you return those And yeah, Poster session was awesome So I showed up for a little bit of it, but the, the posters that I saw was really amazing, and I really enjoyed like talking to the groups I talked to and good job on all the projects, it was great Uh, we’re going to have a Best Poster Award thing too And we’re going to announce that on Piazza later So just an announcement on that. Okay. All right Cool. So, uh, let’s, let’s conclude the lecture, so CS221 So the plan for today is we’re going to do a quick summary of what we have talked about So general summary of the class, like all the things that we have learned And then I wanna talk a little bit about some of the next courses that you might wanna take, uh, after taking 221 Uh, if you were in 229, I know in the morning they, they went over a bunch of next courses, uh, from the perspective of 229, all the AI courses one would wanna take We’re kind of doing a similar thing here, but from the view of 221, what would be some of the next courses, that would be good to take And then after that, I wanna talk a little bit about the history of AI We did this in the first lecture So it would be a little bit of review of that But then, I wanna talk a little bit also about some of the next directions that might be interesting to think about and some of the research that is done currently in various topics of- subtopics of AI, and what are some of the problems that people are struggling with So it would be fun to think about that and, and if you’re interested in any of that, you can go do research in that area or take classes in those particular areas So that’s kind of the plan for, for today’s lecture It is gonna be shorter than usual So I think it’s going to be probably an hour-ish Okay. All right So, so let’s talk about the summary of the class So we started the class talking about this paradigm of modeling inference and learning, right? So, so we started thinking about how there exists a real-world problem, you’re gonna pick up a real-world problem and we’re going to do modeling So modeling would be an abstraction of that real-world problem And in general, we are interested in reasoning about that real-world problem like finding the shortest path or solving so- some sort of optimization about that problem and we call that inference, right? So we had a model of the world and then we would do inference reasoning on that model And the idea of the learning, the learning part of the lectures was that, well, our models are not gonna be perfect, right? You’re not gonna be perfectly modeling everything in the world, instead, uh, we might have a partial model, and in addition to that, you might have some data around the world and around the things that are happening in the world So we would like to use that data to, uh, to learn about the model and kind of complete our model So, so this was kind of the common paradigm through the class And, and this was, uh, the topics that we covered We started with machine learning and we treated machine learning as a tool that allows us to, to better learn these models, parameters of these models Uh, and then we, we talked about various levels of intelligence in the course, right? So, so we started with reflex based models Then we increased the level of intelligence a little bit, talk about state-based models, variable-based models, and, and finally logic So let me briefly, I’ll just remind you of some of these, some of these topics So in machine learning, kind of the common thing that we started looking at was, uh, this idea of loss, uh, minimization So, so we have some data, we have some training data, inputs and outputs, x and y’s And then we, we define some sort of loss We looked at different types of loss functions and properties of these loss functions And then the idea was we would want to minimize this training loss for the hope of generalizing to a new scenario, right? For the hope of if I get some sort of new input, I would be able to give the best output possible, uh, with respect to that So, so I would- in, in general I would like to minimize my tests- test error but one way to go about that is to minimize this training loss based on some set of variables that we have in our model And, and the approach that we followed for that was using techniques like a stochastic gradient descent So this would be the most common thing one would want to do So we have this loss function How do I go about optimizing it? I take the gradient and move in the negative direction of the gradient So, so stochastic gradient descent was commonly used, uh, when we’re doing loss minimization So, so these two things like, like writing out the loss, minimizing it, and doing something similar to a stochastic gradient descent, this was kind of common across a lot of different machine learning algorithms that, that we used throughout the course and we apply that to a wide range of models, right? So, so we would like to apply this to all sorts of, uh, models that we had in reflex-based models or state-space models, it was kind of the same framework throughout So and, and in the first, uh, set of lectures that we had, we started talking about reflex-based models The simplest form of which was these linear models

If you remember regression, like we would have this linear class, er, regression classification We had these linear models, linear classifiers, and we just wanted to learn the parameters of that And, and a more complex version of that are things like neural networks or deep neural networks or even like nearest neighbors would be an example of these reflex-based models And what was inference? Well, inference was pretty easy when, when it came to these things, right? Inference was just a feet-forward run of your model and that would give you the output So we weren’t doing that much hard work when, when it would come to inference And in terms of learning, well, we looked at stochastic gradient descent, we looked at other things like alternating minimization as a way of learning these models Okay. So, so that was reflex-based models Then increasing the level of complexity, we started talking about state-based models And, and the key idea that I want you guys to remember from, from the state-based models is, is the idea of a sta- like what is a state? A state is a summary of all past actions that is sufficient to choose the future actions optimally And then we spent a good amount of time thinking about how to define a state like how, how to pick a good state and how to, how to do this, this modeling when it comes to the state-based models And, and we looked at things like search problems or we have deterministic systems We looked at things like MDPs when, when we’re playing against nature, we have a little bit of a stochasticity And then we looked at things like games where you’re playing against some other intelligent agents So there are some other intelligent agent that’s coming in and playing against us And, and in terms of inference, we looked at a couple of really cool algorithms, right? We, we talked about uniform cost search and A-star We talked about dynamic programming, value iteration, minimax So we covered a set of number of different, different ways of intelligently looking at these models and doing inference And, and when it came to learning, we looked at structured Perceptron, Q-learning, TD learning, reinforcement learning in general Like those were some of the learning algorithms that we applied when it came to state-based models, okay? So, so that was state-based models Then, then we moved the level of complexity and intelligence a little bit higher and we looked at things like variable based models And the idea of variable-based models was that the ordering of these states doesn’t really matter It’s just the relationship between them that matters And we define things like factor graphs So if you remember the map coloring example, we define a factor graph around it and, and the idea was there’s this graph that- and that graph structure here captures the conditional independence between different variables that we have So different variables in this case was these different provinces and, and you would want to color them differently And the relationship between them is going to be represented by a factor The two types of models we discussed in this setting were constraint satisfaction problems and Bayesian networks Uh, Bayesian networks but in the case where we have probabilistic relationships, uh, and, and we talked about inference, specifically backtracking, er, forward-backward, beam search, Gibbs sampling, as various ways of doing inference on these algorithms And then when it came to learning, we looked at things like maximum likelihood and EM to, to try to do learning on these, on these types of systems, okay? And then finally, the last few lectures we’ve been talking about logic So pushing the level of complexity just a little bit higher and, uh, thinking about formulas like actual like logical formulas that represent intelligent things about your system So, so the key idea of logic is we’re gonna have these powerful formulas that are going to represent powerful like meaningful things about your system And we’ve talked about things like prepositional logic or first- and first order logic And we talked about model checking which is commonly used in satisfiability We talked about modus ponens, resolution as various types of inference algorithms that could be used when we have logic And we didn’t really talk about learning when it comes to logic And I would say this is kinda like an open question still, like, how, how do you combine ideas from learning and ideas from logic and get the best of the both worlds? So there are ways of combining them, but how do you ensure that you’re getting the best of both worlds like from data-driven ways of looking at things and a model-based look- way of looking at things? So, so what did CS221 gave us really? The CS221 is this, is this class where it gives us a set of tools to look at the world and, and think about difficult problems in the world and pick the right models Pick, pick the right, like way of formulating that problem and the right inference algorithm to go about solving them So, so I would say it’s pretty much like we’ve covered a lot of material So, so we covered, uh, breadth like pretty broad set of, set of topics And the idea of it is to know that you have all these tools and you can pull out these tools and you can go deep into any of them But, but the, but the goal of CS221 was to just give a broad view of what is artificial intelligence and what are some of these tools that, that we would have So- but if you’re interested in going a little bit deeper in any of these topics that, that we’ve discussed in the class, there are a good number of classes that, that you can take And I want to just briefly mention some of them, an overview of some of these courses So I would categorize the classes- the next classes that you can take into,

into two main categories You can take foundational classes that you go deeper in some of the foundational things you are talking about or you can take application classes where you go deeper in like the specific applications like natural language processing, vision, robotics The specific applications that we kind of briefly covered in this class but we didn’t go that deep in So, so if you’re interested in foundational classes, some of these other AI-based classes or things like Probabilistic Graphical Models, CS228 If your interested in Machine Learning, there’s 229 and 229T And, uh, there is the Deep Learning class If you’re more interested in the optimization side of things, there is Convex Optimization, Decision Making Under Uncertainty And also if you’re interested in logic side of things then, uh, there’s this Logic and AI course And also there’s the, there’s the Big Data class too that if, if you’re interested in that So I’m gonna go a little bit deeper in some of these courses but this is just more of an overview of some of the foundations, uh, if, uh, next courses that you might be interested in taking And all of these are also posted on the AI website So ai.stanford.edu/courses So, so that’s foundations But if you’re interested more on the application side of things, there’s a good number of courses around Natural Language Processing Um, I’m gonna go again a little bit deeper in some of these courses And then there’s a good number of courses around vision, good number of courses around robotics There’s also this other course around General Game Playing, which would be fun to take if you’re interested in that side of things. All right So, so let me just briefly mention like one slide on some of these courses that I think would be good courses to take after this class So, so one of these is CS228 So CS228 is a probabilistic graphical models course If you remember the variable, variable-based models part of, part of the lecture, um, this would be kind of a next course that goes deeper in that So, so we talked about algorithms like forward-backward, variable elimination But, uh, if, if you take 228, you’ll be talking about more general type algorithms like belief propagation, variational inference, Markov Chain Monte Carlo, and so on And another thing that 228 is going to cover is invariable based models like the way we treated things was, the model was given, the structure was given to us, right? We would say, well this is an HMM, and given that it’s an HMM I’m gonna do these extra things on it But in 228, you’d actually be thinking about learning the structure, learning the right structure to put in and, and how you think about these different variables and the relationships between them So if you want to go deeper in that, that would be the course to take Another interesting course to take and some of you might have already taken it is the Machine Learning course So in this class, the way we treated machine learning was just as a tool, right? Like we, we had a few lectures on it and we just learned about machine learning just enough for us to, to do some of the things we wanna do in this AI course, but, but it’s definitely broader than what we have discussed in the class And some of the ways that it is, it is broader and more general than what we have discussed in the class is, first off in this class we talked about discrete, actions, and undiscrete, undiscrete time, discrete action and state type systems 229 is going to cover a more broader set of, set of, set of models where you’re actually thinking about continuity a little bit We talked about linear models 221 we’ll talk about Kernel methods, decision trees, uh, boosting, bagging, feature selection, like all these sorts of different types of algorithms and models that are go, go- that are gonna go beyond what we have discussed in this class Uh, we talked about k-means Then we’re going to talk about more broader set of clustering algorithms, like mixture of Gaussians, PCA, ICA, all these sorts of things So a really useful good class to take if you, if you want to just learn more on the machine learning side of things, more from practical perspective If you’re more interested in a theoretical proce- eh, theoretical side of machine learning, there is this other course called to 229T So this is Statistical Learning Theory, and this is going to actually think about the mathematical principles behind learning So, so it doesn’t necessarily cover the particular algorithm but it, it- it’s going to cover, uh, like properties- mathematical properties around that algorithms Say things like uniform convergence Let’s say you have a predictor and you want to make sure that that predictor with high probability is going to have some bounded error How are you going to bound the error of that? How are you going to bound the test error with respect to the training error and some properties of, of your predictor Or, or how would you formalize things like regret, uh, of your learning algorithm So, so thinking about, uh, complexity, thinking about putting bounce, convergence, regret, these are going to be the topics that will be discussed in, in the Statistical Learning Theory, CS229T So if you’re more theoretically minded, I think this would be a good course to take So now thinking more on the application side of things Um, so a couple of good applications of AI after this course are things around vision, NLP, and robotics I would say those would be kind of, uh, three main applications that you might want to consider going deeper and if you’re interested in any of these areas If you’re interested in vision, there are a good number of classes around vision There are a lot of interesting tasks around vision, some of them are more solved and some of them are more researchy things around like object recognition,

detection, segmentation, but also things like activity recognition Like if you, if you have frames of- different frames of a video of, let’s say a soccer game, how would you predict where the ball goes? Ho- how would you predict where the person goes in the next few frames? Like that’s actually a pretty difficult problem Like doing activity recognition from just frames of images So, um, if you’re interested in some of these problems from the vision perspective, um, I, I would recommend taking 231 type classes, okay? So robotics would be another interesting, uh, application to look at So in robotics in general, we are interested in problems around manipulation and navigation and, and grasping So, um, the main applications that you might think about are things around self-driving cars, medical robotics, assistive robotics Um, and, and the interesting thing that robotics brings is that there is a physical system sitting there so you’re putting your AI algorithm, the things you have developed in this class and some stuff beyond it on this physical system, physical robot And you need to deal with things like continuity You need to deal with things like uncertainty and, and you need to deal with physical models that could come from kinematics and control So there are a lot of interesting robotics classes if you’re, uh, so I think Intro to Robotics is offered next quarter with, Oussama Khatib is teaching it But there’s also a new robotic series course, uh, that just came out So this is the Robot Autonomy 1 and 2 Um, so advertisement for myself, I’m teaching this next quarter with Marco Pavone and Jeanette Bohg So this is the Robot Autonomy 2 Robot Autonomy 1 was already offered in the fall And, and the idea of Robot Autonomy 1 is to just cover the dif- different layers of the robotic stack And, and at the end of the day, they actually have this project, this really cool project where you have, you have a robot platform and you have a lighter on top of it and you want this robot to just move around in a fake city So if you’re interested in s- I think the project, the project presentations is not already done So, so if you’re interested you can show up to, to do a round and see how, how these robots are moving around But they basically have this fake city where this robot just navigates around in this fake city and does autonomous driving So you can see a picture of a bicycle in the back where the robot is to detect it and the [LAUGHTER] bicycle it’s actually like moving so [LAUGHTER] the robot needs to detect it and do obstacle avoidance, do coordination and with, with other agents around it in this particular environment So that- that’s Robot Autonomy 1 In Robot Autonomy 2, which is, which is offered in winter, what we wanna do is you want to put a manipulator on top of the robot so we are looking at mobile manipulation where we actually have an arm and we have this arm pick-up objects and blocks and put them on top of each other and, and do interactions so, so the class is- two, two big chunks of the class out of four is focused on interaction with the world- with the physical world and interaction with other agents So there are interesting multi-agent like game theoretic questions that could come up and you have, uh, multiple robots trying to interact with each other So ideas from games could come up there like naturally. All right So, uh, so that was robotics And then another interesting application is natural language processing Um, and natural language processing is particularly very interesting because if you think about it, the world is continuous, but the words that you’re using are, are discrete and these discrete words have continuous meaning So there is a lot of mismatch between the fact that we have discrete words in a continuous world and, and, and we need to use these words to describe, uh, the, this continuous world and, and there are very interesting questions and challenges that arise in NLP around like compositionality and, and grounding And if you’re interested in these types of tasks, I think there are a lot of again interesting tasks that are more solved versus less solved and more researchy around NLP So if you’re interested in any of them I would recommend taking classes like 224 So, so some of these tasks are around like machine translation, let’s say text summarization, dialog, some of them are much harder, uh, like text summarization, dialog, those sort of things So, um, and we had a couple of, uh, homeworks around this too So, so if you’re interested in going deeper in that, I do recommend taking NLP classes So those are some of the foundation and more application-y courses that I would recommend taking I want to briefly mention two other courses too that are, that are not necessarily directly in AI but they’re in the neighboring fields that, that would be still interesting to look at One is looking at cognitive science So, uh, so cognitive science in general looks at how human minds work And it’s one of those fields that, that kind of grew together with AI, right? Like the cognitive science and AI, they kind of started together and they went their ways but they, they still tend to inform each other And there’s a group of cognitive scientists who are looking at computational cognitive science And they use ideas like Bayesian modeling and probabilistic programs when they look at cognitive science So there’s this course, uh, PSYCH204 which is cross-listed as CS428 I think Noah Goodman teaches this usually And, and I think th- this would be a great course to take if you’re interested in the cognitive science side of things, uh, and you would have ideas and topics from,

uh, other pe- other cognitive scientists like Josh Tenenbaum, and Tom Griffiths, and Noah who are working in this particular area of, uh, computational cognitive science So, so that’s one But if you think about cognitive science as kind of the software here, neuroscience on the other hand is, is the hardware of, of the problem So, so another neighboring field that you might be interested in looking in a little bit deeper is, is neuroscience And if you think about neural networks, like back in the day when they first start, like when people first started looking at neural networks, they were kind of thinking of them as computational models of brain But today’s neural networks- modern neural networks aren’t really like biologically plausible in any ways, right? So, so they’re not really models of, of the brain but still they’re, they’re interesting, there are interesting connections and insights that could be used in neuroscience from the perspective of AI or from AI to neuroscience So I do recommend taking kind of cross-section neuroscience courses with AI, I think Dan Yamins would be someone who works in this area if you’re interested in looking deeper in courses around neuroscience All right. So that was kind of a quick summary of what we have discussed so far in the class, what are some next courses you are interested in taking, think deeper around them if you’re interested in learning more and just come, come chat with me, I’m, I’m around, like, if you want to talk about them But for the rest of the class what I wanna do is, I want to just do a quick like history of AI again, and then after that let’s just talk about what are some of the problems, like what is left, like we’ve talked about all these cool algorithms toolsets, but what is difficult, what is not solved? So, so I want to spend a little bit of time on that So let’s talk about the history of AI All right. So birth of AI So we talked about this during the overview lecture This was the first lecture we co- came in and were like, okay, how did AI happen? So the birth of AI really like people refer- referred to it as this workshop that happened in 1956 This was a summer school in Dartmouth, um, and basically everyone famous in the field showed up to this workshop, including people like John McCarthy, Marvin Minsky, Claude Shannon, all of these people showed up to this- to the Summer School, and the reason for this summer school was to kind of understand the general principles of intelligence So what they really wanted to do was they wanted to figure out all the features of intelligence, and if they could formalize that, they could go ahead and like simulate that and have simulate intelligence That is what they really wanted to do And the workshop was really useful because after that, these people went back their ways, and then started doing really cool stuff, and this was the first rise of AI, and, and we started seeing things like problem-solving type, type, type systems, things like Samuel’s checkers program, which was able to kind of beat a strong amateur level pla- pla- player We started seeing other types of problem-solving programs like theorem provers, and then people, like, started really using logic as a way of thinking about AI and thinking about problem-solving So that was really exciting, because people, like, at a time were thinking they have just solved everything, right, like it was exciting, the logic was there, like people had all these cool programs, they were super excited about the, the potential that, that these systems can bring Uh, here are some of the quotes from people around that time Things like, machines will be capable within 20 years of doing any work a man can do So this is what Herbert Simon said Marvin Minsky said, within 10 years, the problems of artificial intelligence will be substantially solved Claude Shannon said, I visualize a time when we will be to robots what dogs are to humans, and I’m rooting for the machines [LAUGHTER] So none of these really happened, but one thing to notice is these are not random people on the street, like, these are like fathers of the field, like, these are people who, who were like in- in it, like, they were looking at, they had a lot of insight like in terms of what is ca- what we can do and what we can’t do And it’s kind of interesting that even like them, like they had all this like overwhelming optimism, and this did not pan out, right So there was a lot of optimism here, and, and we started getting really underwhelming results So lots of optimism, government came in, government was like here’s my money, take my money, go do stuff And, uh, basically the, the problem that government was really excited about was machine translation, right? So they wanted to take Russian texts and translate that And, and the outcome of that was something like this, so the translation is the vodka is good, but the meat is rotten, so that’s not really, like, a good translation of that text And people started feeling like these types of problem-solving algorithms are not going to do it So, so on and at point, then the government cut funding, and this was the first winter of AI So we had the rise of AI with problem-solving with that summer school, lots of excitement, and then it didn’t really work when it came to machine translation, and then we had, like, the first winter of AI One thing that I wanna kind of like point out is, is we are at a good place for AI right now, right, like I,

I would say like AI right now is also pretty overhyped, right? And then I wanted to put this quote here So this is from Roy Amara, who says, “We tend to overestimate the effect of a technology in a short run, but actually underestimate the effects of it in a long run.” Like if you think about any system, any technology that we have developed, it’s always like oh within two years it’s going to solve everything, and it’s not going to solve everything within two years, but if you look at what it has achieved in 20 years, it’s actually achieved a lot of things, and we usually underestimate that And I think it’s the same thing with AI, like, we are going to think, well, we’re gonna have autonomous cars tomorrow, or by 2020 Actually, autonomous car companies were saying we’re going to have autonomous cars by 2020 when I first started working on that, that’s next month [LAUGHTER], I’m don’t think we’re going to have autonomous cars by 2020, but we are going to see a lot of advances, like, we are, we are seeing a lot of improvements in terms of the algorithms and the systems that we are developing So, so I think in general, we should be aware of that, and we should be, we should be smart about it Like, like surely AI is overhyped, but what can we do to actually address some of these problems? And going back to this first era of AI, this problem-solving e- era, well, why didn’t it work? Well, the reason it didn’t work was we had limited computation, we had limited information This is the thing that we actually, like, started this class off He said, well, a lot of AI algorithms, they haven’t changed that much, right? But the with thing that has changed over the years is we have lots and lots of computation, we have lots of lots of data, and that’s the thing that has really made the bigger difference here And that’s kinda, like, one of the reasons that it didn’t work But even though, like, we had these problems, and we had this first winter of AI, there were lots of interesting contributions from that era The LISP programming language came out around that time, garbage collection came out that time, time-sharing, like, really interesting foundational ideas of computer science emerged during this period And, and also the key paradigm that you are using in this class, thinking about separating modeling and inference, that actually came out around the same era too Like, like the fact that we shouldn’t have this declarative model thing, and at the same time this procedural inference algorithm kind of separated out from each other and think of them as separate things, like, i- is, is a huge contribution that came out around that time All right. So, so that was kind of rise up and down of first up and down of AI The second rise of AI was around ’70s and ’80s This was when the knowledge-based systems came out, the expert systems, and, and I would argue that the reason that we had the second rise was people, people started thinking about AI differently Like, originally people wanted to build artificial intelligence because they were interested in intelligence They were interested in understanding intelligence, that- that’s kinda what the summer school was about But at this point, people were not interested inteli- in intelligence, what they were interested in was just building really useful systems that can do things, like they didn’t care about intelligence at this point And then that’s why we had this rise of expert systems So we think about logic, and we think about using domain knowledge to, to have things like if, if-then-else type statements, like, if we have a premise, then we have some sort of conclusion, and, and building these experts, expert systems allows us to do a lot of cool tasks in a real world So, like, we had, we had actually impact around this time on things like inferring molecular structures, like diagnosis- diagnosis of blood infections, things like converting customer orders into parts and specification So actual applications in the world, people started taking each of them and thinking about the expert knowledge that you have in that particular application, and formulating that in these expert-based systems, and then putting an AI on top of it, that, that does actual, actual work So, so that was really cool So the contributions of, of this era is that, first off, we had real applications that actually impacted industry, and, and this domain knowledge, the idea of I’m going to pick the domain knowledge, and this knowledge is actually going to help me make exponential growth, was the thing that, that was really powerful at this time But why didn’t it work? It didn’t work, right? This, this was the, the second rise, and we had another winter, the second winter of AI So the second winter of AI came because there was a bunch of problems One of the problems was knowledge in general, it’s not deterministic, right? Like we have a lot of uncertainty when we think about knowledge, and these systems were not able to, like, encode uncertainty the way you want it to be And in addition to that, there was a lot of information, right? Like, if you think about any of these expert-based systems that requires a lot of many- manual effort to write down these rules and all of these relationships between all these different subparts of the system So an example of that is SHRDLU, so SHRDLU is one of the first natural language understanding programs, computer programs Uh, this was written by Terry Winograd, who is at Stanford now I think this was when he was at MIT And, and he created this system, this computer program, where you have this block world environment in it, and, and you can actually have a person that interacts with this computer program, uh, and maybe the person says, pick up a big red block, and the computer says okay because the computer understands, like, the relationships between big and small,

and red, and different colors, and where the blocks are placed, and what can be picked, and what cannot be picked, right? So, so this had a who- a whole bunch of relationships and, and rules around- around it, and, and you could actually converse something with this computer program, and that was really powerful But even, even Terry himself, like a couple of years after, uh, had a statement talking about his worries about, how, how SHRDLU- programs like SHRDLU are not going to solve the problem, they’re not going to go all the way Like, like, he was saying, this is kind of a dead end in AI, and, and thinking about these complex interactions, there’s just a lot of them, and, and it just doesn’t seem feasible to write down all these rules that you would have between each one of your subparts with, with other par- sub-parts, and there is no easy footholds So, so at this point people were thinking this complexity barrier is not gonna really allow these AI systems to do cool big things So then we had the second winter of AI, and then finally, there is this third rise of AI that we are still on, and God knows where it’s going to [LAUGHTER] come down again Is, is this modern vie- modern view of AI that started around 1990s, and, and- I, I would argue that this, this modern AI, the reason that we had this, this new rise is due to two main things One is the idea of using probability in AI, this is- this was not a thing that existed from early on, this is actually due to efforts of Judea Pearl, who, who was very adamant about using Bayesian networks in AI to model uncertainty So, so finally people were able to, to use probability to bring ideas from probability to model uncertainty Because if you remember like expert-based systems, they were super deterministic, like, we didn’t really have a way of talking about uncertainty But Judea Pearl, Pearl’s idea was let’s bring probability into this Let’s actually talk about uncertainty, and let’s have our models and make predictions And then the second reason is machine learning, right? So, so people started inventing support vector machines to tune parameters, and then from that point on, we started seeing the rise of neural networks and the fact that you have lots of lots of data now and that can actually help us build better models So, so given that we have these two, two big new viewpoints in, in AI, we have started seeing all these new advances And then one point that I just want to make at the end of this is, is that AI is really a melting pot of a bunch of ideas from different fields, right? Like, not all of these are from pure AI, right? Like if you think about Bayes rules, it comes from pro- probabilities, least squares come from astronomy, er, first-order logic, or from logic, maximum likelihood from statistics, we have ideas from neuroscience, econ, optimization, algorithmic- algorithms theory, control theory, like, like we can think about value iteration that came from Bowman, from the field of control theory So it’s really, like, if you think about artificial intelligence, it just brings all these different ideas from these different fields together to solve this AI problem, and in general, I think it’s a good idea to be mindful of that, and to be open to that because, um, the new insights and ideas really come from, like, having this broader view of things, and kind of the boundaries you put between different fields are really superficial and don’t really need to be there All right. So, so that was the history of AI, right? Like rise of AI, downfall, rise, downfall And then we’re on this, this last rise now So, so let’s just think a little bit about what have we achieved, what are the cool things we’ve had in the past couple of years and then what can go wrong and what should be- what should we worry about in the meantime Okay. All right So, so I think I’ve argued enough that AI is everywhere, right, like, AI is being used in consumer service, in advertising, in healthcare, transportation, manufacturing, and, and AI is going to make decisions now, right, because it has shown all these advances and because of that, like, we are using AI these days to make decisions for us To make decisions for our education, to make decisions for credit, employment, advertising, healthcare, all of these different applications So, so if AI is making decisions then we should actually be really careful about how AI is making decisions And the fact that we should, we should think about, like, all the possible things that could go wrong or could not go wrong and understand the system fully, uh, before making it make decisions for us So, so what are some of the advances? So one of the huge advances we have seen in recent years is this Google Neural Machine Translation So, so the idea is this was kind of a huge advancement when it comes to machine translation The idea is you could have a bunch of different languages and you can have a way of translating let’s say from Korean to English and English to Korean and you can have a lot of data on that, and that would be great And then maybe you can have Japanese to English and English to Japanese and train on that And that’s a lot of data and that’s all great But then even, like, if, if you put all of that data in the same, like in a melting pot, then what you can do is you can actually go from Korean to, to Japanese without having any data that just goes directly from Korean to Japanese and that’s kind of exciting, right, like, because you had, you had like no data for that and if you’re putting all of these together then you can

actually make a lot of advances in terms of language translation So this came out around 2016, lots of excitement, language translation just became so much better after that There are still problems though Here is one of them. So here is the problem of bias, okay, so let’s say that you’re starting from a language, like Hungarian that doesn’t have gender And then you start from this language and then you go to English, a language that has gender And, and this is a translation that you’re gonna get You’re gonna get she’s a nurse and he is a scientist and, and, and she- she’s a baker, and then of course he’s a CEO, right So, so you’re gonna get like all these, like, gendered, ah, proving us here that there’s, there’s no reason for- well, just, just by looking at it and assuming that the algorithm is neutral, there is no reason for it to pick up, like, the- these particular genders But the reason that it’s picking up these genders is this algorithm is trained on data, our data is biased If our data is biased, the algorithm has learned to pick up, pick up patterns, so it’s going to pick up this bias and sometimes even reinforce it So, so we’re going to see all sorts of these weird behaviors I wouldn’t say it’s weird, it’s biased behavior but we should actually be aware of this if you are building these types of systems And even in addition to bias, bias I can explain it, you might get weird behaviors that are even harder to, to explain So you might have a text that looks like this like, dog, dog, dog, dog, dog And then that could be just, just translated to, to, like, something else that is kind of crazy So, so, like, under- like understanding what goes on Um, a lot of times with these kind of closed form black-box systems are, are a little challenging And I think there’s a lot of research around trying to better understand and give transparency to some of these systems and understand what goes on So that was, that was language, right, that was translation Another example is image classification So image classification has just become so much better over the years Around 2015, it just hit human performance So we have image classifiers that are just much better than humans And, and that is amazing, right, like, that is really exciting because, because perception is a difficult problem If I can do image classification, then I can use these systems on real s- real world like systems, like my phone, or my autonomous car and that’s really exciting, right, but there are again a lot of issues around this One of the issues we actually discussed this in the first lecture is the idea of having adversarial examples, right, so I can have AlexNet, a system that does image classification And an AlexNet is going to classify these images on the left perfectly fine That’s a school bus, that’s a temple, like, it’s gets going to classify them correctly But then what I can do is I can add some sort of noise to them And when you add this noise to this picture, you’re gonna get this third picture And that kind of looks like a school bus to me like I don’t- I, I can’t tell the difference between th- the first and third picture But what’s going to happen is that AlexNet is going to predict ostrich for all of the pictures on, on, on that side on the right So, so that’s not great, right, that having these adversarial examples it’s not that great because the system is not really robust, right, your, your AI-based, uh, im- image classifier is not very robust when it comes to just adding the- these sort of adversaries And then- and, and after this, this work came out basically, people started writing all sorts of papers about how to create adversarial examples and how to be robust towards that particular adversarial example and then breaking that again and creating more robustness and a lot of back and forth One of my favorite papers actually around this area came out this year, so this is from, eh, Shamir and others And what they have shown is for a specific type of neural network when you have ReLus, what they’ve shown is, um, what you can do is you can always make the system classify, uh, classify the number in this case as something also So, so let me give a concrete example So this is a MNIST dataset, I have numbers in it from 0, 1, 2, 3, 4, I have 10 numbers here, right, and, and what Shamir and others have shown is you can pick this seven and you have 10 classes so you, you need at most 11 pixels So pick 11 pixels that they, they pick the 11 pixels carefully So it’s not any 11 pixels but pick 11 pixels and change it as much as their algorithm tell- tells you And then the seven is going to be classified as zero So you can make this seven be classified as any of these numbers, 0, 1, 2, 3, 4, 5, and 6, and 7, and 8, and 9 by just picking the right 11 pixels to modify and they tell you like how much you modify, which is, like, crazy because like gi- give me anything, I’ll create this adversarial example for you to, to, to just mis-classify it as something else and you’d only need 11 pixels because there were 10 classes here Um, so there are a bunch of assumptions that I haven’t really discussed here, like one of the assumptions is the way you are modifying these pixels is unbounded in this picture So the, the greens and reds are just very high and very low

So, so it is not actually between 0-255 It’s, it’s numbers greater than 255 and less than 0 But they’ve actually shown that if you’re allowed to have more than 11 pixels, let’s say you have 20 pixels, you could actually fit it between 0-255 to make it like a realistic, realistic figure Anyway, so lots of work around this, ah, lots of exciting theory work and, and practical work thinking about adversarial examples when it comes to images But, yeah, what are the implications of this? Why are we so scared of this? Because, well, these systems are going to run on, on our phones doing image recognition on our cars, doing recognition of other vehicles and they can easily be, like, they can easily be attacked Um, a group at, uh, Berkeley, Dawn Song’s group, what they did was they had these stop signs and they put stickers on stop signs Again, the stickers are at the right place like the place they wanted it to be But then the stop signs are, are now classified as like a speed limit sign, um, which is not what you want your autonomous car to detect Or, or here’s another example, another work where you have these pictures and you put thesefunny glasses on them And when you put the glasses on to pictures, then they are classified as a celebrity’s pictures So, so you can easily attack these systems, not easily, but you can attack- systematically [LAUGHTER] attack these systems Ah, and that can actually affect the security of your, your vehicle or your image recognition system. All right Another example that’s pretty challenging is, is around reading comprehension So, so what is reading comprehension? So if you remember your SAT or a GRE like type of- type exams, you have a text, you have a lot of text and you have to read that text and you have to answer questions So you’d have a question like this Ah, so the number of new Huguenot colonists declined after what year? So this is the thing you’ve got to answer So, so Google put out this system BERT, which is actually really amazing, it can, can do this, this reading comprehension And, and BERT can answer this question perfectly, it’s gonna say 1700 and then that’s great Um, but, but what people have shown is you could actually just add an extra sentence at the end of this text that has nothing to do with the rest of the text like, like, it something to do, it has the word year in it but it doesn’t have to do anything with this particular question that’s asked And now BERT is going to respond 1675 So, so you can again easily trick these systems and the way that they are tricked is just not the same way that humans are tricked And, and that is I guess weird to people And that’s kind of expected but, um, but that is something that we are dealing with these days All right. So another example, so I’m basically gonna talk about a bunch of examples throughout the lecture and the rest of the lecture So another example I wanted to briefly talk about is, this idea of optimizing for clicks So, so is that a good thing? Is that a thing we should be doing? Right. So-so sometimes the objectives that we have, the reward function we are writing for our system We know what it is We wanna do machine translation We know exactly what we want to do and it’s very clear But sometimes it’s actually not clear what we should be optimizing Right. Like Facebook let’s say wants to make money, like should they optimize for clicks? Is that an ethical like rewards function to put in, and what could be some of the effects of optimizing for clicks? Let’s say that I have a reinforcement learning algorithm I’m making this up. Let’s say I have a reinforcement learning algorithm that wants to optimize for clicks and, um, I have my own Facebook account and it’s optimizing clicks from Dorsa, right So this reinforcement learning algorithm what it can do is, it can learn that, well, maybe if I show outrageous articles to Dorsa, Dorsa is more likely to click on these outrageous articles and I’m gonna get more rewards because I’m optimizing for clicks So that’s all good, right. That’s expected But another thing that the reinforcement learning algorithm by itself can figure out, is that if I show outrageous articles to Dorsa, Dorsa is going to become more and more outrageous, and then I’m gonna get more clicks because then I’m going to show more articles, and it’ll be great And then, that’s kind of amazing because these systems are not interacting in a closed loop world They’re interacting with other systems like humans, we’re also changing, we’re also adapting, and this system through this RL algorithm by itself could figure out how to change me to like more outrageous things And then we would end up in a situation where we are right now with very bipolar views, right? Because- because you’re optimizing for clicks So- so it’s quite interesting to think about, what are the objects we should be optimizing and what world are we dealing with? We’re not always in a Pacman role where we can control everything, right? Usually these systems are running in a real society where there are people being affected by them and their responses are going to change And there is, the changes in the responses is going to affect things even more So- so it’s interesting to think about these feedback loops And speaking of humans, I think, another thing- another question that comes up usually when it comes to robotics, or when it comes to AI, is, well, what is it that humans want? Like in general, if I- even- even in the case of robotics it’s a big problem I have a robot arm and I want my robot arm to pick up- pick up this object

That’s all I want, right This is the thing that me as a human wants, right? I want a robot arm to pick up- the robot arm to pick up this mobile phone So back in the day, this was called good engineering, right? Good engineering was good engineers would write down the correct reward function, the correct objective, and the robot arm would go and pick up the object and everything would be great The problem is that doesn’t always work, right? It’s really hard to write the correct reward function and get the robot to do that And because of that, people these days are more interested in trying to do things that are around imitation learning or things around preference based learning, where you just try to learn from how humans do it Like how a human would do this as opposed to just a human sitting down and saying, well, this is the object that I want you to pick up-pick up the robot arm because- because the robot might end up doing very weird things Like an example of that that commonly comes up is, this vacuum cleaner example Let’s say- let’s say you have a vacuum cleaner You have a robot vacuum cleaner that wants to clean your house And your objective for the vacuum cleaner is to suck up dirt That- that’s all it needs to do, okay So you write your objective. Everything is great And one way that the vacuum cleaner could suck up dirt is it could just go to a place, suck up dirt, put it out, suck up dirt, put it out, suck up dirt, put it out, and just keep doing that, right Obviously, you didn’t want your vacuum cleaner You don’t- you don’t want that vacuum cleaner because you didn’t want your vacuum cleaner to do that, right? That wasn’t the thing you were thinking But the objective of go suck up dirt, could end up in that behavior Another behavior it could end up with is you could have your vacuum cleaner and your vacuum cleaner by itself could just break its own like sensors, so now it doesn’t sense dirt Now you’re good because there are no dirt around us because we can’t see them I’m gonna close my eyes so I can’t see the dirt So I’m not going to suck up anything So- so all of these are things around reward hacking Like if you- if you just write the reward function that you think the robot should optimize, it’s not necessarily going to work And thinking about what are some good objectives that you should optimize is actually a really difficult problem And this is something that I’m very interested in in my group, we focus on that a lot Actually, another work that has recently came out on this is this work by, this- this new book by Stuart J. Russell on- on Human Compatible And- and basically, what Stuart is kind of arguing is, is the fact that there is a mismatch between what humans actually want, what is the reward function that’s in their head, and- and what is it that the AI system or the robot thinks the human wants And- and those two are not always the same thing and that could cause a lot of issues around it So interesting book, take a look All right, what else can go wrong? So, um, generating fake content That was the thing that came out a couple years back So- so you could create like videos that just- or images, uh, that- that look exactly like, uh, Obama in this case And- and you can just put fake content on that And- and that, that again raises an ethical question, right Just because you can build it, should you build it or not Like- like we can build that We have the- we have the system to create ca- fake content It sounds fun, but- but should we do it just because we can’t do it Another place that this question comes up, and- and I do encourage you guys in general to think about that in your future like when you can build something, but should you build it? And- and yeah Another place this comes up is in autonomous weapons systems So, um, having, like thinking about military and thinking about having autonomous weapons, right Like we would have- we could pote- we can have autonomous weapons these days, right? We can have systems that automatically detect an enemy and- and automatically just- just do- like just do the job, right Yeah, you- do the task So, um, should we do it? Should we have autonomous weapons systems or does there need to be a person in the loop? And if- so- so just like thinking about it, like, let’s say that, yeah, we do not- we never want to have autonomous weapons systems and we always want to have a person in the loop Well, why? Like- like what is it about the person that we want to be in the loop? Like- like that kind of tells us that there is something about the person Maybe it’s empathy, maybe it is something about what- what people know, or what people have, that the-the autonomous system doesn’t have yet And just like understanding that, I think, by itself is a very interesting problem And- and there’s a whole debate around us like of- of aut- autonomous weapon systems, should we have them? If we don’t have them, what if other countries have them? Like how do we go about it? Uh, should we put a moratorium on it, and- and lots of debates around these types of systems So- so in general I do encourage you to think about some of these ethical aspects of building AI systems All right, next up, fairness So, um, so fairness is a big problem. [NOISE] I think a lot of you know this already, right So- so we might have a classifier that prob- like on your majority of dataset, perfectly separates your majority of datasets, um, such as the- the picture in the left, and then you might have some data points from minority group And- and the classifier just does exactly the opposite thing for the minority group So- so if you- if you put all these datas together,

then you’re probably going to get data- a classifier that looks like the first one, and it’s just not gonna work on the minority dataset And- and that is kind of, uh, that’s a big problem, especially when it comes to applications like let’s say healthcare Like you might have different populations and a drug might just act very differently in different populations And the question is, how should we address these fairness questions? And one way to go about it is- is to think about our errors So- so, uh, you might have two classifiers and both of them might give you 5% error Uh, but one of them could give you 5% random error and the other one could give you 5% systematic error And- and I think it’s pretty important to think about if you’re getting systematic error or random error and what type of error on what population are you getting an- and- and that could address some of these questions around fairness There’s a lot of work actually around fairness these days There’s a- there’s a conference around it, uh, around fairness, accountability, and transparency This is worked by Moritz Hardt So if you’re interested in this, uh, take a look at some of the- some of the work from Moritz group Um, another example of fairness, I think, we did talk about this in the overview lecture, uh, is around this criminal risk assessment So, um, so Northpointe is a company that put out the system called COMPAS And what COMPAS does is- is it predicts if- if a criminal is going to re-offending or not The risk of a criminal re-offending or not And it’s going to give a score of 1-10 So- so that’s what the system does And- and they put out this system, this system was actually being used And what happened was ProPublica which is a non-profit, came out and did a study and ProPublica showed that given that an individual did not re-offend, African Americans were twice likely to be wrongly classified five or above, okay So- so that just seemed not fair So- so ProPublica put us- puts out this article being like, well, the system is not fair Why are we using this? Like- like it doesn’t satisfy this fairness criteria And then Northpointe actually did further studies Northpointe did further studies and they showed that, well, they said, no, our system is fair because we are looking at this definition of fairness Our definition of fairness is that given a risk score of seven, 60% of whites reoffended and 60% of blacks reoffended, so we wanna make sure that we get the same percentage to be fair and- and that’s our fairness- fairness property We do satisfy that. And this kind of, uh, thes- these two fairness definitions, um, kinda made a group of, uh, researchers, um, from, actually, Stanford, Cornell, a bunch of different places to start thinking about definitions of fairness And what they’ve actually shown is that these two definitions of fairness, um, they are not going to be satisfied at the same time They’re always going to go against each other You can’t have both of them at the same time So- so then if that is the case, then what is the right definition of fairness that- that we should use? Right. If we can’t have both of those at the same time, then- then how- how do we make sure that we can use this system, or should we ever use these systems again? So, um, lots of interesting questions about formalizing fairness Omer Reingold, uh, here in the CS department, works a lot around ide- ideas of fairness from the algorithmic side of things So if you’re interested in that, you can take Omer’s classes, learn- learn from- about that And- and kind of going back to this idea of are algorithms neutral Like when you talk to people who haven’t taken necessarily algorithm classes or AI classes, they usually think, well, yeah right? Algorithm’s gotta be neutral, like they’re doing math, they gotta be neutral But as you have seen already, they’re not really neutral because by design we really want our algorithms to pick up patterns That’s what they’re good at They’re good at picking up patterns and- and biases and all sorts of weird things that we see, uh, in our data. They- they’re just in our data There are patterns in our data, and these algorithms just will pick them up and- and even reinforce them at times And- and that’s why we see bias in our algorithms and all of these issues around fairness and- and security and all these other things, uh, in our data And another problem that comes up is this feedback loop that I was talking about earlier, right So- so if- if algorithms are picking up patterns, well, they’re putting out, you know, those patterns, if they have biases, they’re putting out those biases in a world where there are humans, and those humans are observing those biases and can get even more biased and give more biased data And- and this could be like this negative feedback loop that could go forever So again, we gotta be really careful about what we were putting out and- and what it is- like how it is affecting the bigger society Next stop is privacy I guess I have like a couple of more things around these and- and after that I’ll- I’ll wrap up Um, another- another thing- another issue in general is- is privacy, right? So we’re using a lot of data and in- in a lot of our algorithms, and- and in general, uh, some of them could be- could be sensitive data and we don’t want to- we don’t want to actually reveal that sensitive data So- so be- so- so to address that- one way to address that is, instead of putting out the actual data, putting out the right statistics that gives us the right information So for example, you might want to com- uh, compute the average statistics

And like if- if you’re asking if someone has cancer or not, instead of getting the yes-no answer, you could just- you might just need the average statistics and that would just be enough for you So- so- so in general when you’re collecting data, you shou- you should- you could randomize your data or you could change your data so- so you can get the average statistics as one way of protecting privacy Another way of protecting privacy is in general randomized responses So- so you might have a question of, do you have a sibling? So- so that is a question you can ask a user And the user might not wanna reveal exactly if they have a sibling or not So- so one way of responding that, is the user could flip two coins And then if both of them can come- come heads, then they can say answer yes-no randomly Otherwise, they can answer yes-no truthfully So- so based on the answer that you get from a particular user, you wouldn’t be able to tell if that particular user is- is going to have- it has a sibling or not But you could actually compute the- the true probability of that Because now you have observed this probability, three-fourths of the time their- by true probability, they’re telling you the truth, one-fourth of the time they- they’re telling you randomly So- so then you have this observed probability and then from that you can recover the true probability, and that is probably enough for like the type of data that you- you- you need to deal with So- so randomized responses in general could be one way of going about some of these privacy issues Um, another issue that comes up is- is causality So, um, this a little bit and- and a variable based models right, so- so you might wanna look at the effects of something Let’s say you’re- you want to look at the effects of a treatment on survival And this is your data So you have, for untreated patients, 80% of them survived, and for treated patients, 30% survived. This is your data So the question is, does treatment help or not? How many of you think treatment helps? Treatment helps. Think carefully [LAUGHTER] So- so the answer is actually- who knows? [LAUGHTER] Because, well, if you think about it, the- there sick people are probably more likely to take treatment, right? Like i- i- if sick- sick people are more likely to undergo treatment, then- then you can’t really like takes this data like at it- at its face and- and- and say, well, treatment helped or didn’t help Beca- because your- your data actually there’s this- this extra causality that you didn’t really consider the fact that, well, those people who took treatment they were sick, so you have to actually consider that, how the sickness is it going to affect and the- the rate of survival or not? And then, finally the last- I think this was the last thing I want to briefly talk about is- is this idea of interpretability versus accuracy, right? So- so you’ve seen kind of this rise of neural networks in a lot of applications and most of them are not safety critical applications We haven’t really seen like things like neural networks and safety critical applications I guess you’ve seen it in cars and we- and you’ve started seeing in autonomous cars But let’s say airplanes or- or like other types of safety critical systems, health care systems And- and one question that always comes up is, should we use these systems in safety critical settings? Because as we’re using them, we’re gonna lose interpretability So- so there’s this work by Michael Cook in the first group, where, uh, they’re basically looking at air traffic control and- and- and they’re looking at the- the system that runs on aircraft And then previously, it was basically a bunch of rules that the system need- needed to follow, but it was interpretable Like- like they could actually interpret it and understand what it does Uh, and- and the systems- aircraft systems would use that But Michael has been working with this new system called ACAS and ACAS-Xu where, uh, they are basically trying to replace that with just, let’s say, a POMDP, a partially observable Markov decision process that- that does the same job but it’s not necessarily do- it doesn’t have necessarily the same level of interpretability But it’s pretty accurate and you can prove that it’s even accurate It’s not even a neural network, right? I- it is a thing that you can actually like enumerate And- and the question is, what are we willing to put in on our safety critical systems? If you lose transparency, if you lose interpretability, are we still willing to like put these systems that we think theyr’e statistically more accurate Um, and in general, how can we increase interpretability and transparency of- of some of these systems that we are building? Because that is useful when we come- we think about these systems So- so AI is important, [NOISE] I think I think I’ve convinced you guys that AI is important And then, um, all these different governments also think that AI is important In 2016, uh, the White House put out, uh, an article about some of the directions that we should invest money in, and a lot of them were just around AI So making long-term investments in AI research, thinking about human AI collaboration, thinking about ethical, and legal, and societal implications of AI, safety and security of AI systems So all of these things that we have been discussing so far are really challenging problems and then everyone’s excited about them and everyone wants to put in- put in a lot of energy and time and money in it

And- and in this document, uh, well, this document said, big data analytics have the potential to eclipse long-standing civil rights protections in how personal information is used in all sorts of applications like, housing, credit, employment, and so on And Americans relationships with data should expand not diminish their opportunities And- and some of the things that we have discussed so far, right, like biases, fairness, safety, all of these issues are not necessarily satisfying this last sentence, right, like if- if you’re building these- these systems, we should actually be really careful about some of these implications And- and as I was saying earlier, like around this there is a new conference Uh, there’s this FAT ML conference around fairness, accountability, and transparency And kind of the guidelines of- of- of this- this new community that- that’s being built around AI, is that we gotta think about the fact that there’s always a human that’s behind these algorithms So there’s always a human ultimately responsible behind what is going to happen And then you can’t just say, well, the algorithm did it, right? Like in- in general, that’s just like the wrong way of going is because there was a human designer, one of you guys, one of us, right, that’s going to write these algorithms And- and I do really want you guys to think about some of these principles as you go further in your- in your career and- and you think about building these sort of AI algorithms And just to end on a more positive note Um, there’s enormous potential for actually positive impact fo- for AI systems and- and please just use it responsibly With that, I wanna thank you all guys for this exciting quarter and please fill out the surveys, uh, on Access Thanks. [APPLAUSE]

You Want To Have Your Favorite Car?

We have a big list of modern & classic cars in both used and new categories.