vimarsana.com

Transcripts For CSPAN2 Michael Kearns Aaron Roth The Ethical Algorithm 20240713

Own work when the subject of algorithmic fairness for privacy is not frontpage news. Today we are going to speak to the two leading lights in area they will help us understand what the state of the art is that what the state of the art will be going forward. With that i think we will welcome professor Michael Kearns first to the stage. Is that right . Great. Michael and erin, welcome to the stage. [applause] ok, good morning. Thanks to everyone for coming. My name is Michael Kearns and with my close friend and colleague aaron roth we have coauthored a book, a general artist book called the ethical algorithm whose who subtitle e site of socially aware outcome design. We want to do for roughly half an hour so is just take you a high level to what some of the major themes of the book are and then well open it up as jeff said to q a. I think many, many people and certainly this audience is well aware that in the past decade or so wishing learning has gone from a relatively obscure corner of ai to mainstream news and i would characterize the first half of this take it as the glory period when all the news reports were positive and were hearing about all these amazing advances in areas like deep learning which has applications in speech recognition, image processing, image categorizatn and many, many other areas. We all enjoyed the great benefits of this technology and advances that were made but the last few years or so have been more of a buzz kill and there been many, many articles that been written and now even some popular books on essentially a Collateral Damage that can because by algorithmic decisionmaking especially powered by ai and Machine Learning. Heres a few of those books, weapons of math destruction, did a very good job of making very real and visceral and personal the ways in which algorithmic decisionmaking can be sold in discriminatory productions like gender does commission, Racial Discrimination or the like david and the light is wellknown book about the fact that weve essentially become something akin to a commercial surveillance state and the breaches of privacy and trust and security that the company that. Aaron and i read these books and would like these books very much and many others like them but one of the things we found lacking which was much of the motivations for writing our own was when you get to the solution section of these books, what should we do about these problems, the solution suggested i would be considered traditional ones. They basically say we need better laws, better regulations, watchdog groups. We really did keep an eye on this stuff. We agree with all of that but as Computer Scientists in Machine Learning researchers working directly in the field we also know theres been a movement in the past five to ten years to design algorithms that are better in first place. Rather than after the fact you wait for some predictive model to exhibit Racial Discrimination and criminal sentencing, you could think about making the algorithm better in the first place. Theres now a fairly large Scientific Community in the Machine Learning Research Area in a adjacent areas that is trying to do exactly that. Our book is really you can think of it like a popular sites book will try to explain to the reader how you would go about trying to encode and embed social norms that we care about directly into algorithms themselves. I couple preparatory remarks. We got a review on an early draft of the book that basically said i think your title is a conventional or possibly even an oxymoron. What you mean an ethical algorithm . How can an algorithm be any more ethical than a a hammer . This reviewer pointed out an algorithm like a hammer is a tool, it is a human design for particular purposes. While its possible to make an ethical use of a hammer, for instance, i might decide to hit you, nobody would make the mistake of ascribing in unethical behavior or immoral activity to the hammer itself. If i hit you on the hand with a hammer you would blame before him and you and i would both know that real harm had come to because of my hitting you on the hand with a hammer. This reviewer said i do see why the same arguments apply, tone applied algorithms. We thought about this and decided we disagreed. We think algorithms are different even though they are indeed just tools that are human artifacts for particular purposes. We think they are different. Its very difficult to predict outcomes and also difficult to ascribe blame. Part of the reason is algorithmic decisionmaking when powered by ai in Machine Learning has a pipeline. Let me quickly review what that pipeline is. Usually start off with very, very complicated data, complicate in the sentence for high and as many variables and it might have many, many rows. Think like a medical database of individual citizens medical records. We may not understand this date in into detail and may not understand where it came from in the first place. It may been gathered from many different sources. The usual pipeline or methodology of Machine Learning is to take that data entered into some sort of optimization problem. We have some objective landscaper or space of models and want to find the model that does well on the date in front of us. Usually that objective is primarily or often even exclusively concerned with predictive accuracy or some notion of utility or profit. Theres nothing more natural to do in the world if you are in Machine Learning research or practitioner to take a data set and say lets find the network that on this day makes the fewest mistakes in deciding who to give a loan, for example. You do that and then what result is some raps again complicate high dimensional model. This is classic clip art from the internet of deep learning. This is a Neural Network with many, many layers between the inputs and outputs and lots of transformations of the date and variables happening. So the point is a a couple this about this pipeline. Its diffuse. Something goes wrong in the spotlight that might not be entirely easy to pin down the blame. Was at the data, the objective function, the optimization procedure that produced the Neural Network or was it the Neural Network itself . Even worse than that, if this algorithm of this predictive model we use at the and causes real harm to somebody, if you are also denied a loan because the Neural Network said you should be denied the lord, we may not win this is happening to scale behind the scenes we may not be aware i he joined the hand with a hammer. Also because we give algorithms so much autonomy. To reach on hand with a hammer i had to pick the thing up and hit you. These days algorithms are running autonomously awful without any human intervention so it may not realize the harmsm study because unless we know to explicitly go look for them. Our book is a how to make things better, not through regulation and laws and the like but by revisiting this pipeline and modify it in ways that give us there is social norms we care about like privacy fairness accountability et cetera. One of the interesting and important things about this endeavor is even though many, many, many scholarly communities and others have thought about these social norms before us, for instance, philosophers have been thinking about fairness for time immemorial, lots of people thought about things like privacy and the like, theyve never had to think about these things in such a precise way that you could actually writing into a Computer Program or an algorithm. Sometimes just the act of forcing yourself to be that precise can reveal flaws in your intuitions about this concept that you were not going to discover any of the way and will give concrete examples of that during a presentation. The whirlwind tour of the book is a series of discussions about different social norms, which ive written down, and what the science looks like of going in and giving a precise definition to these things, a mathematicl definition, event encoding that definition in an algorithm and a portly and what the consequences of doing that are come in particular tradeoffs. In general it i want to get it out a rhythm that is or private that might come at the cost of less accuracy, for example, and we can talk about this as a go. You youll notice ive written e different social norms in increasing shades of gray and what that roughly represents is our subjective view of how mature the site and each one of these areas is. In particular we think when it comes to privacy, this is the field thats in relative terms the most mature and is what we think is the right definition of data privacy and quite a bit note about how to embed that definition in powerful algorithms including Machine Learning. Fairness which is a little bit lighter is a more recent more nascent field but its often a very good start. And things like accountability, interpretability or even morality are in greater shades because in these cases we feel like there are not good technical definitions so its hard to get started about encoding these things. I promise you there is a bottom bowl of you which as the sink of its entirely in white so you cant even see it. What were going to do with the rest of our time is to talk about privacy and fairness which cover roughly the first half of the book and then we will spend if you were to tell you this sort of game erotic twist the book takes about midway through. So im going to turn over to aaron for a bit now. Thanks. So as michael mentioned, privacy is by far the most welldeveloped of these fields we talk about and somewhat to spend a few minutes just giving you a brief history of the study of data privacy which is about0 years old and in that process try to go through a key study at home but think precisely about definitions. It used to be maybe 20, 20 figures ago when people talked about releasing data set sets a way that was privacy preserving, what they had in mind was an attempt at anonymization. I would have some data sets of individuals, peoples records, they might have peoples names and and i would just have one or two release those try to anonymize the records are removing the names and maybe if i was careful of the unique identifiers like Social Security numbers. I would keep things like age or zip code, features about people that were not enough to uniquely identify me. So in 1997 the state of massachusetts decided to release a data set that would be useful for medical researchers. Medical data sets are for researchers to get their hands on because of privacy concerns and the state of massachusetts had an enormous data set of medical records. They released the data set in a way that was anonymize. There were no names, no Social Security numbers but the ages, zip codes, genders. So it turns out that although age is not enough to uniquely identify you, zip code is not enough to uniquely identify you, in combination they can be. There was a student who is at mit at the time, now a professor at harvard who figured this out there in particular she figured out you could crossreference the supposedly anonymize data set with Voter Registration records which also had demographic information like zip code and so scooting number and gender but together with names. She crossreferenced this anonymize medical davis said inches able with this triple identifier, identify the record, medical records of bill weld, who was governor of massachusetts at the time. She said his records to his desk to make a point. This was a big deal in the study of data privacy and people tried to fix this problem by basically just using little bandaids trying to most directly six whatever the most recent attack was. So, for example, people thought all right, if it turns out combinations of zip code engine and age to uniquely identify someone in the record why do we try coarsening that information. Instead of reporting it exactly maybe we both reported up to maybe ten years, maybe report zip code only up to three digits and we will do this so we can make sure that in combination of attributes in this table that we release doesnt correspond to just one person. So, for example, if i know that my 56yearold neighbor who is a woman attended some hospital, maybe the hospital at the university of pennsylvania, and they released an anonymize data set in this way they of the guarantee i cannot connect the attributes that i know about my neighbor to just one record. I can connect them to two most records. For a little while people tried doing this. If you think about it, if you look at the data set you might already begin to realize this is a getting quite ample need by privacy because although if i know my 56 your female neighbor attending hospital at the University Pennsylvania i cant figure out what her diagnosis is because it corresponds to two records, i can figure out either she is hiv or colitis which might already be something she didnt want me to know. But if both of these data sets have been released i can just crossreference them and theres a unique record, theres only one record could possibly correspond to my neighbor and all of a sudden ive got her diagnosis so the overall problem here is the same as it was when we just tried removing names and its just that maybe attempts at privacy like this would work if the data sets that i was releasing was the only thing out there but thats never the case. And the problem is small amounts of syncretic information are enough to identify you in ways i can uncover if i crossreference the data sets that have been released with other sets that are out there so people tried catching this up as well but for a long time the history of data privacywas a cat and mouse game where researchers would try to do heuristic things, patching up whatever vulnerability led to the most recent attack. And attackers trying to do newclever things and this was a losing game for privacy researchers. Part of the problem is we were trying 2 things. Do things we hoped were private without ever defining what wemeant by privacy. This is the approach that was too weak. Let me in an attempt to think about what privacy might mean talk about an approach thatis too strong and then we will find the right answer. So you might say okay, lets think about what privacy should mean. Maybe if im going to use data sets to conduct for example medical studies , what i want is that nobody should be able to learn anything about you as a particular individual that they couldnt have learned about you have a study not been conducted. That would be a strong notion of privacy if we could promise this. And maybe to make more concrete lets come to be known as the british doctors study, a study carried out by dahl and hill in the 1950s and it was the first piece of evidencethat smoking and lung cancer had Strong Association. So its called the british doctors study because every doctor in the uk was invited to participate in the study and two thirds of them actually did. Two thirds of doctors in the uk agreed to have their records included as part of the study and very quickly it became apparent there was a Strong Association between smoking and lung cancer so imagine that youre one of thedoctors who participated in the study. Say youre a smoker and this is 50s so you definitely made no attempt to hide the fact that youre a smoker, youd probably be smoking in this presentation and everyone knows youre a smoker but when this study was published , all of a sudden everyone knows Something Else about you. They didnt know before. In particular they know you are at an increased risk for lung cancer because all of a sudden we learned this new factabout the world that smoking and lung cancer are correlated. In fact if youre in the us this might have caused you concrete harm at the time in the sense that your Health Insurance rates might have gone up so this could have caused you concrete quantifiable harm. So if we were going to say that what privacy means is that nothing new should be learned about you as a result of conducting a study we would have to call the british doctors study a violation of your privacy. But theres a couple of things that are wrong about. First of all, observed that the story could have played out in exactly the same way even if you are one of the doctors who decided not to have your data included in the study. The supposed violation of your privacy in this case, the fact that i learned you are at higher risk of lung cancer wasnt something i learned about your data in particular. I already knew you were a smoker before the study was carried out. A violation of privacy would have to be treated to the facts about the world that i learned that smoking and lung cancer were correlated and that wasnt your secret to keep. And the way we know that wasnt your secret tokeep is i could have discovered that without your data. I could have discovered that from any efficiently large sample. And if we were going to call things like that a violation of privacy and we couldnt do it any Data Analysis at all because there are always going to be correlations between things that are publicly observable about you and things you didnt want people to know and i couldnt uncover any correlation in the data at all withouthaving a privacy violation of this type. So this was an attempt at thinking about what privacy should mean , giving it a semantics but it was one that was too strong. A real breakthrough came in 2006 when a team of mathematical Computer Scientists had the idea for what is now called differential privacy and the goal of differential privacy is due find something thats similar to what we wanted to promise in the british doctors studybut with a slight twist. So again, think about two possible worlds but now dont think about the world in which the study is carried out and the world in which the study is not carried out but instead about the world in which the study is carried out and an alternative world where the study is still carried out but without your data. Everything is the sameexcept your data was removed from the data set. And the idea is that we want to assert in this ideal world where your data wasnt used at all , that there was no privacy violation for you because we didnt even look at your data. And of course in the real world your data was used but if there was no way for me to tell substantially better than random guessing whether we are in the real world where your data actually was used or whetherwe are in the idealized world where there was no privacy violation , then we should think about your privacy as having been only minimally violated and this is a parameterized differentiation because what it says is the difference between there should be no way to tell the difference substantially better than random guessing in the world in which we use your data compared to theworld in which we dont use your data. Very substantially its something we can qualify and something we can , to now that we can to try to trade off accuracy with privacy. So you might think that this is too strong still. Like, when you think about it for a bit it sounds like a satisfying definition but you might worry that this like the definition we attempted in the british doctors study is too strong to allow anything useful to be done and it turns out thats not the case and i wont go through this simple example here unless we have questions about it in the q a but suffice to say that 15 years of research and showed essentially any statistical task, any Statistical Analysis you want to carry out which includes all of Machine Learning can be done with the protections of differential privacy albeit at a cost that typically manifests itself in the need for more data or in the need for diminishedaccuracy. And in the 15 years of Academic Work on this topic but in the last few years, this has moved from lets say the whiteboard to become a real technology. Its been something thats been widely deployed. If you have an iphone you might as we speak actively reporting statistics back to the mothership in cupertino subject to the protections of differential privacy and google has tools that report statistics in similar ways. And the real moonshot for this technology is it going to come in just about a year. The us 2020 census is going to release all of its specific products subject to the protections of differential privacy so this is the sense in which we say that of the topics we talked about in the book, this is the most welldeveloped. Not that we understand everything there is to know about differential privacy but we got a strongdefinition that has real meaning. We understand how the rhythms you need to satisfy this definitionwhile still doing useful things with data and we understand a lot about what the tradeoffs are and this has become the technology thats used effectively. So im going to give asimilar vignette for our rhythmic fairness. So as i said at the beginning the study of fairness in our rhythmic decisionmaking is considerably less mature than privacy in differential privacy in particular. You already know though even though its less mature that its going to be messier so in our book we argue that anybody who thinks long and hard enough about data privacy will arrive at a definition similar to differential privacy. In this sense differential privacy is the right definition of data privacy. We already know theres not going to be a single monolithic right definition of algorithmic fairness so in the past few years there have been a couple publications that have the following broad form. They start off and say can we all agree that any good definition of fairness could meet the following three mathematical properties and the sensible reader looks at these three properties and says yes, of course we would want me to beproperties, these are weak minimal properties. I would want these and Even Stronger once also and the punchline is yes what a smart hearsay theorem proving there is no definition of fairness that can simultaneously achieve these threeproperties. To make this more concrete, this really might mean for instance in real applications that if you are trying to reduce lets say the discriminatory behavior of your algorithm by gender for instance, that might come at the cost of increased discrimination by race for example. You might face difficult moral and conceptual tradeoffs but this is the reality of the way things are and so we still propose proceeding as scientists to carefully study alternate definition and what the consequences are so what i want to do with most of the points on this is similar to what erin did which was show you how things can go wrong instead of an anonymity Machine Learning can resultin things like racial or gender discrimination. Then have that lead to a particular proposal for how one might try to address these sorts of Collateral Damages. So i want to talk about why Machine Learning might be unfair and many of you in the past few weeks ive heard of these notable instances, one in which a Health Assessment model, predictive model that is widely used in large american hospitals and Healthcare Systems was shown to have systematic racial dissemination it and perhaps scientifically there was a twitter storm recently over the recently introduced apple credit card underwritten by goldman sachs. There were a number of reports of married couples in which the husband said hey, my wife and i filed taxes jointly. She has a higher Credit Rating than i do yet i got 10 times the credit limit on the apple card that she did. Erin and i just spent about a week ago friday an hour in the new york state federal regulators office that is investigating this particular issue and we dont know unlike this Health Assessment model, we dont know whether this research is just a couple of to or whether theres systematic underlying gender discrimination in the credit limits that are given but these are the kinds of concerns we are talking about when we talk about how the algorithmic fairness so i wanted someone like errands to medical databases take you through a toy example of how things can go wrong in building predictive models and data. So lets imagine that erin and i for instance were out by the pen Admissions Office to help them about a predictive model for collegiate success based only on 2 variables your high school gpa and sat score. What im showing you is kind of a sample of data points. Each one of these red green pluses or minuses represents its x value minus the high school gpa up a former applicants and the y value represents the sats for lets say this is a sample of individuals who were admitted to penn so we know whether they succeeded at penn or not and by succeed in any quantifiable subjective definition that in hindsight we can objectively measure so one example would be success means that you graduated with within five years of matriculating with at least a 3. 0 gpa. A different definition would mean that you donate at least 10 million to penn within 20 years of leaving, as long as we can verify in hindsight thats fine. So thats whatthe pluses and minuses me. For each point we have to gpa and sat score of the applicant and the plus indicates students that succeeded at penn and the minuses mean students that didnt succeed. So a couple of things about this cloud of green points. So first of all, if you counted carefully youd see that slightly less than half of these historical admits succeeded. Theres slightly more minuses than plus a handful or so red observation number one observation two is if i show you this cloud point and ask you you build a good predictive model from this data that we could use on a forward going basis to predict whether applicants to penn will succeed or not, theres a line you can draw through this cloud of points, this blue line we predict that everybody about that blue line would be successful and the ones below would be not successful, you can seewe do a pretty good job. Its not perfect. Theres a couple of false accepts here and false rejects down here. But for the most part were doing a good job and this is of course in a simplified form exactly what the entire enterprise of Machine Learning is about, even including things like Neural Networks rate youre trying to find some model, perhaps more complicated it does a good job of separating positives from negatives. Lets suppose that in the same historical applicant pool that was another subpopulation besides the greens, lets call them the orange population and heres their data. I want you to notice a few things about theorange population. They are a minority in the literal mathematical sense that there are make your orange points in this Historical Data set and there were green points. Observation number two is that the data also looks different. It looks like the sat scores of the orange population are systematically lower but also note that they are no less qualified for college. There is exactly the same number of orange pluses as there are orange minuses so its not the case orange population is less successful in college, even though they have systematically lower sat scores. One reason you might imagine why this is the case is perhaps in this minority orange population theres less wealth so in the green population which is wealthier they can afford sat preparation courses. They can afford multiple retakes of the exam and taking the mass of their scores and the orange population which is less wealthy and has fewer resources just do selfstudy and takethe exam once and take what they can get. If we have to build a predictive model for just the orange population, there again a good one and a perfect model on the Historical Data. This perfect line separates positives from negatives. So whats the problem . The problem arises if we look at the combined data set and asked what the predictive model that does best on the combined data set . Is the single model at the best on the green population and you can see that visually. If i tried to move this line down in order to catch the orange pluses, im going to pick up so many green minuses that the air will increase by my trying to do that. So this is the optimal model on the underlying aggregated data in question and itsvery , you can see its intuitively unfair and that we rejected all of the qualified orange applicants. So the false, we might call this the false rejection rate, the false rejection rate on the orange population is close to 100 percent of false rejection rate onthe green population is close to zero percent. Of course you might say well, what we should do is to notice that the orange population has systematically lower sat scores even though they are not less qualified for college and we should build a twopart model. We should basically say well, if your dream going to apply this line and if youre red, if your orange were going to apply this line and by doing this we would actually compare to the single model we would actually not only make the model more fair but we would also make it more accurate as well. The problem with this is if we think about green and orange as being race for instance there are many many areas of law and regulation that for bid the use of race as an input to the model. First this twopart model as race as an input because the model says first look at race and then decide which of these sub models to apply. And of course the definitions are laws or regulations that prevent the use of things like race or gender or other apparently irrelevant variables in our decisionmaking are usually meant to protect the minority population. Heres a concrete example in which regulations that were meant to protect the minority population guarantees that we will harm that minority population if we just do the most sensible Machine Learning exercise. And in the same way that erin said definitions of privacy based on a monetization dont make sense, we argue in the book at any definition of, anytime youre trying to get fairness in our decisionmaking by forbidding inputs is sort of fundamentally misguided. And what you should do instead is not restricted inputs to an algorithm constrain its output behavior in the way that you want. And in particular, one thing you can imagine doing here is even if we were forced to pick a single model is i could change my objective function. I could say theres two criteria i care about here. On the one hand i do care about making accurate predictions. Minimizing the predictive error of my model. On the other hand we also care about this other objective which is fairness and in this particular application i might define fairness as having a approximate equality of the false rejection rate so i might say im worried about the orange population being mistreated and the particular type of mistreatment on talking aboutis false rejections. Didnt would have succeeded but we actually our model rejected so i can define a numerical measure which is what is the difference between the false rejection rates on the green population and orange population so instead of just saying minimize that error on the data set i could say minimize the air on the data set subject to the constraints that the difference in false rejection rates between these two populations is at most lets say zero percent or i could relax that and say at most five percent or 10 percent and if i let this go all the way to 100 percent disparity then its like im not asking for fairness at all anymore and im back to minimizing predictive accuracy so in the same way differential privacy gauge is not like you choose between how strong your privacy demands are versus your accuracy demands, this definition of fairness also lets us interpolate between asking for the strongest type of fairness, zero disparity in the false rejection rates toknow fairness whatsoever. And once you are armed with a quantitative definition like that you can actually plot the quantitative tradeoffs you might face in any real application so on three different real data sets in which fairness is a consideration im showing you actual numerical plots here. In which the x value for each one of these red points is the error of some productive model and the y value is the unfairness of that model in the sentence of this disparity between false rejection rates between two populations. And of course smaller is better for both of these criteria. Where id like to be is in the corner where my error is zero and my unfairness is also zero. You can see thats not happening on any one of these data sets and in real Machine Learning even ignoring fairness are not going to get to zero error period but what you see is we can face the numerical tradeoff. We can either choose to essentially ignore fairness and take this point up here gives us the smallest error. The other extreme we can ask for zero unfairness and get much larger ever and in between we can get things that are in between. And we argue in the book that its important as a society that we become quantitative enough that people can even like nontechnical people can look at these tradeoffs and understand the implicationsof them. Because we do not propose that we should not apply some algorithm to decide which one of these models we should pick because it really should depend on whats at stake. So in particular, theres a big difference in whats at stake for instance in medical decisionmaking which might have life or death consequences versus the ad that youre shown on facebook or google which many of you may never look at for the most part in general. And furthermore you can see the shapes of these curves are quite different so for a couple of them like this one and this one is possible near the left end of the curve to get the reductions in unfairness for only very small increases in the error so that might seem like its worth it. Whereas this one here you sort of face are tradeoffs right at the beginning. So this is an example of the kind of thing that we discussed in the where you start by thinking conceptually about what fairness should mean and what youre trying to accomplish. You might go through bad definitions based on things like anonymity or not using certain inputs or variables in the complication and eventually uri the more satisfying definition, and rhythms that can implement that particular social norm on real data sets and in real algorithms so let me turn it over toerin to talk about all the warm fuzzy stuff later in the book. So we talk in depth about privacy and fairness which are the first half of the book. Im not going to talk in much depth about any particular thing but i want to now give you a quick survey of whats in the second half of the book. And maybe at high level, you can think about the first half of the book as studying algorithms in isolation. We have some Machine Learning under them, we can think about this is that other than private or fair without thinking necessarily about the larger context in which the outer rhythms is embedded. But that context is also an important because the, what the algorithm is doing affects the behavior of people and its important to think about how those things interact real so in the third chapter, we start thinking about using the tools of game theory. If you change the algorithm i will change the particular decisions that people make in a way that might reverberate out to have more societal consequences and we start by thinking about an example which is maybe not the most consequential socially but is i think clear to get an idea of what were talking about. So many of you will have experience using acts like google maps and ways to plan your daily commute for example. Like i can in the morning in where i want to go and it will not just find directions on the but it will look up traffic reports and give me around which will minimize my commute time given the current track so if you think about thisaspect of google maps. This integration with traffic reports turns this interaction that im having with into what an economist would call a game. In the sense that the actions that i take which route i choose to drive along at negative externalities on other people in the form of traffic. Officially i would prefer that everyone else stayedhome and i would be the only one on the road. I wouldjust take a straight shot to work , i get there fast other people wouldnt agree to that solution so different people have competing interests and their choices affect the wellbeing of other people. Each choice i make, the choice i make has a small effect on any particular otherperson. I dont contribute too much to traffic but collectively the choices we make large effects on everybody. So one way to view these acts is that they are helping us to play the game better, at least in a myopic sense. Before these were around, i would have it best very minimal traffic information so i would probably take the same route every day now ive been very precisely respond to what other people are doing and what a game dearest would describe the is doing is helping me complete my best response to what everyone else is doing, what can i do that will selfishly and myopically optimize for me. Andeveryone else is doing the same thing. Though the result is that what these are doing is they are driving global behavior to what would be called the competitive equilibrium, a nash equilibrium which is some state that is stable in the sense that everybody is myopically and selfishly optimizing themselves. Okay, if you take in a class on game theory or even just read the right books, you will know that just because something is a competitive equilibrium does not mean that its necessarily a good social outcome and the prisoners dilemma is maybe the most famous example of this. Its not at all obvious and in fact you can come up with a clear case studies where these apps even though they are selfishly optimizing for each individual person are making things worse locally for the population at large. In the sense of larger average commute times. That might not be an enormous deal when were talking about traffic but this is just an example of a phenomenon thats much more pervasive when algorithms mediate social interactions which happens now all the time so for example you might think of content moderation algorithms that drive things like the Facebook Newsfeed in a similar context and myopically , facebooks interests are not so misaligned with my own in the sense that their algorithms are optimized to drive engagement but what facebook wants me to do is stay on facebook as long as possible so i can view lots of ads and the way they tried to do that is by showing the content i would like to engage with, but i would like to click on and read and comment on and myopically seems aligned with my interests. I have a choice of what website to go to. If im engaging with the content facebook is showing me i might be enjoying it but when facebook simultaneously does this for everybody even though its myopic myopically optimizing for each person that might haveglobal consequences we dont like. And might lead to the filter bubble phenomenon that people do a lot of handwringing about the drive us globally to for example a society that is less deliberative and we go through examples trying to think about and point out the ways in which algorithmic decisions can have widespread consequences on social behavior and how game theory is a useful tool in thinking about thosethings. And in the last chapter we Start Talking about another important problem which is the statistical crisis in science which some of you might have heard about and its actually not so disconnected from the equilibrium behavior we talk about in the game theory chapter. So there have been a bunch of news articles for example showing that if you take food science or social psychology, these are problematic literatures where if you flip through a scientific journal, put your finger down at random and more likely than not the study you will have to got ill not replicate. If you try to reproduce the results with new data, with new subjects, its not nearly as likely as it should be that you will find the same results. Theres lots of spurious results and heres an exkcb cartoon they were nice enough to let us include in the book which is getting exactly this phenomenon. So weve got our scientist here and hes got a get that someone tells him that jellybeans cause acne. He tests this hypothesis and the key value he gets is about. 05 standard level of statistical significance in the literature so he says sorry, no results but then he actually its a least a certain color of jellybean so he starts testing it and he tests brown jellybeans and purple ones and pink ones and for all of these hes finding a key value greater than or equal to. 05. But then he finds one green jellybean that appears to be statistically significant. There seems to be a correlation between green jellybeans and acne and a statistical significance level of 95 percent which means that if you tested 20 hypothesis you would expect only one of these two incorrectly appear to be significant by chance. Of course he did test 20 and heres the headline, green jellybeans went acne only five percent chance. So this is called the multiply hypothesis testing problem and its relatively well understood how to deal with it when its just a single scientists conducting these studies whats going on is just statistical malfeasance someone has checked a bunch of hypothesis, tested a bunch but hes only publishing the most interesting one without even mentioning the others. Of course this is just as much a problem if rather than one scientist, studying 20 hypothesis we have 20 scientists studying one hypothesis and each following proper statistical hygiene. This is just as much of a problem if only that one hypothesis that appears to be significant as the one that published and of course that is what the intent is that underlies the game of scientific publishing. Are exactly designed to do because if you find that blue jellybeans do not cause acne, that is not going to be published. You probably wont even try to publish it, its not a result any prestigious journalist is going to want to present. But if you find something surprising that green jellybeans causeacne , then thats a big finding. So the problem is if you view scientific publishing as a game, then even if each individual player is following proper statistical hygiene you get the same effect thats described in this cartoon and in the chapter we talk about how these phenomena are exacerbated by the tools of Machine Learning which promote checking many different hypothesis very quickly whichpromote data sharing. And how tools from this literature in particular, this surprisingly tools from differential privacy which we talk about in the very first chapter can be used in this problem. And thats it. So thank you. [applause] thank you very much. Thanks very much, that was great. Were going to do questions, ill start us off. We have a lot of folks in the room who regularly workin this case , and so we would love examples of problems that you guys have taste or questions that you have. Maybe ill start with this. So you talk a little bit michael about the limitations of Computer Science when it comes to answeringsome of these questions of fairness. Having now talked about the book probably for a couple of months , have you found that the public kind of once the Computer Scientists to solve this . No. I think in our experience, they appreciate the fact that People Like Us, the community we come from and identify the point at which there is judgment involved and a sort of moral decisions to be made. And that the states matter and so i think there are generally appreciative of the fact that both sides to come towards each other a little bit so just this kind of prey i was showing between error and unfairness, takes a little explanation to understand what they are saying but in fact people from nonquantitative areas that are stakeholders in problems like these for instance, people with policy think tanks like that but i dont think they are wanting Computer Science per se to take a leading role in picking out a point andst. Peters your best tradeoff between error and unfairness because it depends on the data and the problem question. I dont think even we think that Computer Scientists should be explicitly or even in large part the ones making many of these accidents and were careful to say in the book that theres the scientific problem, the algorithmic problem and theres the part of the problem that requires moral judgments of various sorts and those are different. We do not propose that it should be algorithms or necessarily Computer Scientists who define what it is we mean fairness. Once you pick a definition we dont propose that Computer Scientists who should be picking out in various circumstances how we want to trade off things like privacy and fairness and accuracy but what is important and what i think Computer Scientists have to be involved in is figuring out first of all what those tradeoffs are and how to make them as manageable as possible. For example at the u. S. Census right now there is literally a room full of people, a committee s job is to look at these curves and get out how we tradeoff these very different things, one of which is privacy is the census is legally obligated to provide the american citizens and the other which is statistical validity for this data thats extremely useful and used to allocate resources, school lunch programs, important things so there are stakeholders who disagree about how these things should be traded off and there in a room hashing it out and as we speak but their work has made very much easier because we can precisely quantify what those tradeoffs are and we can manage them and thats what Computer Scientists i think after play an Important Role in. Another question i had while listening which is in an ideal universe where the ethical algorithm is on every Computer Scientist and the frameworks that you describe are actually used in action and i think much of this is happening in industry, some of it obviously happening in government aswell , what does it look like to have a community of people kind of letting these principles . Is there a public api that we all can see . Is it to use kind of a rudimentary example like when we go to the Grocery Store and we can look at the size of the box and we know how much packing there is, how much sugar there is. In a world where some people might comply and some people wont comply, somepeople might have read thebook , some might not but what is success . Why we dont talk a lot about this in the book we continue to procrastinate on writing a policy brief for the Brookings Institute but were going to talk more about kind of this regulatory implications of these kinds of things and the reason i mention that in response to your question is what if you have a precise definition of fairness or privacy, you can do what we mainly discussed in the book is embedded in algorithms to make them better in the first place. You can also use it for auditing purposes so in particular if we are specifically worried about gender discrimination or stem jobs in go advertising which was something that was demonstrably shown to exist a few years ago, you can run controlled studies. You can have like an api where you say we need unfettered access to make automated google queries over time so that we can systematically monitor whether there are gender differences in the distributions of adsthat people see for example. So we do think that an implication of a lot, maybe not all the work thats going on in these areas is the ability to do that kind of technological auditing and i think we believe that some of that should happen and in particular you can anticipate what the objections of the Technology Companies might be. They might include things like well, thats our intellectual property. This is our secret sauce. We cant have automated queries of course violate terms of service and our response to that is this is your regulator. They wouldnt have this access and the able to use it to for instance start a competing Search Engine in the same way that the sec has all kinds of very, very sensitive counterparty trading data but is not allowed to use those to go start their own hedge fund for example so i think in a world where the kind of ideas we discussed in the book become widespread and embedded, a big part of this on the side of the cereal box be things like okay, on the side of the google cereal box, heres the rates of discrimination in advertising by race, by gender or by age, by income, etc. And you could imagine having some sort of quantitative notion or scorecard if you like of Different Technology services and how well or poorly they were doing on different social norms. I think also, what were going to have to see is that regulations for thingslike privacy and fairness are going to have to become a little bit more quantitative. At the moment, theres this disconnect where people in industry are not sure exactly what is expected of them. What is going to count as algorithmic unfairness. For example this issue with the applecart where there was seeming gender discrimination would have been easy to find had only people thought to look for it and when we were chatting with regulators a few weeks back, one thing that we heard that i thought was interesting is that sometimes companies will explicitly avoid running checks like this because if they dont check then theres possible liability and that if they do check its subject to discovery if theres a lawsuit and this is the kind of thing that sort of flourishes when there is ambiguity but if youre precise about what exactly is going to constitute the algorithmic discrimination in the state of new york, and people will look for it. Our view is that even apparently strong regulatory doctrines like the gdpr real formed documents. They push words like interpretability, fairness around on the page butnowhere in those pages today say what they mean. So its a bit of a catch22 or chicken and egg problem that looks like strong regulation that they are demanding interpretability everywhere for instance nobodys committed to what it means yet and i think i do think that as is often the case, even the nascent science we discussed in the book is running ahead certainly things like laws and regulation and i think that before the kinds of changes we are discussing can take place on the regulatory side, much of regulatory law has to be rewritten and there needs to be cultural change as the regulators. That makes sense. Shifting gears just a bit, i was struck by the fact that differential privacy is ahead. Do you guys have a view of whether its a head because as you said theres an objective preference to answer . An answer that can be defended as a theorem . Is it head because geography is more important or is it head because thereis a perception that privacy is more important than fairness . Theres some choice going on like almost on a subterranean and that it got more attention earlier an index all faster. Or is there no choice at all and. To shortcomings and then eric can chime in. There are differences in how long the things ive been studying as i said when i was talking about fairness, i really think theres a technical difference area it just so happens that privacy is lucky in the sense that there is a well grounded very general mathematical definition of privacy is very satisfying and which Subsequent Research has shown you can do a lot with it. You can meet that definition and still do lots of things that wewant to do in terms of Data Analysis and the lack. And fairness just like that and is not a matter of time. I think the spirit that i mentioned that said here are these three Properties Like from fairness, you cant simultaneously achieve. Its not like i furtherwork that the arm is going to be done. We dont talk about this much in the book but i do think that privacy has been lucky in the same sense that public key photography was lucky it just turned out that its sort of a nice parallel between the development of public privacy where there was this period where it was this cat and mouse game. People would invent encryption schemes that sure look random until they didnt look random in some way and then the public photography in the 70s that was suddenly put the whole field on a much further algorithmic and definitional level that was off to the races since then which doesnt mean that those things everything that you want from Security Work that theyre perfectly implemented every time but i dont think its, i dont think were ever going to get there with fairness and thats just life. I think its hard to project into the future. Thats about 15 years ahead of fairness in terms of academic study for good reason. We have data sets for a long time and so privacy violations have been going on sport a long time whereas when it comes to algorithmic fairness it really only becomes relevant when you start using Machine Learning algorithms to make important decisions about people and thats only in the last decade or so that both we had enough data about individual people, daily interactions with the internet to be able to make those decisions and learning algorithms have become sufficiently good that we can start automate some of those. As michael says its clear already theres not going to be one condition of fairness but i do think that in as we tried to look 15 years down the road which is what you have to look before fairness is at least chronologically as severe as the study of privacy, you might still hope for a mature science that is not going to have one definition but perhaps google has isolated a small number of precise definitions that correspond to different kinds of fairness and we will more precisely understand all those necessarily trade off against one another in different circumstances so its going to look different but im optimistic there will be a lot that you will be able to say as much time as privacy has. One other comment i would make i didnt appreciate until we started working in algorithmic fairness a lot, i think another difference which will persist between say privacy and fairness that doesnt have to do with maturity or technical aspects is that is about fairness always become politicized very quickly so in principle everybody agrees that privacy is a good thing and that everybody should have. As soon as you Start Talking about fairness, you immediately find yourself debating with people who want to talk about affirmative action or redressing past wrongs because all these definitions require that you identify who you are worried about being harmed and what constitutes hard in that group and often why you think that constitutes hard so some of the things we talk about like orbiting the use of race in say lending or very much in the news the past couple of years with harvard in College Admissions is that youre really, these definitions also require you to pick groups to protect this always becomes politicized i think regardless of what definition youre talking about and i dont think that this will change in 15 years so somehow privacy and fairness are different just in his social or cultural sense as well. So there is now the rhythms to do something for himself. Or conversely, they dont think algorithms should play any role whatsoever, not only in deciding those things but in even mediating them or enforcing them in the light and we take more pains in the book to point out that racism was not invented the advent of algorithms and computers, it was around before. You could just talk about it more precisely now and you can have problems of fairness that are a bigger scale but also solutions thatare a bigger scale as well now that things are automated. Wasnt in the room. Show of hands. Anyone. Theres a question in the back there. Thank you so much for this talk, we enjoy reading the book. I have a question about the tradeoff between differential privacy and aggressive Data Acquisition read in the book it talks about google and apple had been collecting user statistics subject to differential privacy but the title of that in the collective is actually notthe type of data used to collect so its the new area of Data Acquisition. I wonder what your comment about this tradeoff using kind of like using differential privacy as like a shield of the new user data. Especially i dont know how to cure differential privacy against adversarial tax but do you see a possibility that users like under the impression of differential privacy, we are willing to get out more data but only to find data compromise in the end . Thats a good question. And its a question thats sort of relates to why you have to think about algorithms not just in isolation but in their endgame context so youre right that in both the way apple and google use differential privacy, that they didnt use it to have further protections to data, that already had available which turned out to be a hard sell to engineers. They already have a dataset available at an privacy protections corresponds to taking away some of their access, getting access only to a noisier version of the data. What an easier sell and this is why it was how it worked in the first two deployments is to say look, your some data set you had no access to at all because of the privacy concerns. If youre not a technology can mitigate those privacy concerns that will give you access to it so youre right that one thing that happens when you introduce technology is to allow you to make use of data while mitigating the harm is that you make more use of data makes sense and so youre right that one of the effects of differential privacy is apple and google are collecting now a little bit more data. On the other hand youre using this extremely strong model of privacy which we talk about in the book, the local model and what it means is that theyre not actually collecting in the clear data all. There collecting some random signal on your data and the randomization of data on that device so apple is never collecting your data, selecting the results ofcoin flips from your data. So although more data isbeing collected , differential privacy in this context is really offering an extremely strong guarantee of plausible deniability. And for that reason its not subject to a database for example you might worry that differential privacy causes companies to collect more data and maybe thats okay while theyre using it subject to differential privacy but as soon as some get into the system and the data is released, all of a sudden things are worse off. Thats not how google and apple are using differential privacy. Theyre doing it in a way that doesnt collect the data at all. The census different. They are collecting the data, they always collected the data but they are adding these protections in a way that they did before. So they are giving researchers in 2020 access to data that is actually more privacy preserving and it was in 2010. So theres lots of tradeoffs there and interesting things but i think these are two different use cases. That showed different ways in which this is going to play out. Just to follow up on that. Work on a lay audience, you say that point let. I have a picture of that. So suppose just to use the toy example we use in the book suppose i wanted to conduct a survey of a resident in philadelphia about something embarrassing. I want to figure out how many people in philadelphiacheated on their spouse. So one thing i could do is call up some random subsample of people and ask them you cheated on your spouse. Write down the answer and at the end i tried to tabulate the results and the statistics, the average, maybe confidence and call it a day but i might not get the responses that i want because people might rigidly be worried about telling me this over the phone and in particular, they might be not trust me. They might worry that someones going to break into my house and steal this list. They might worry that in Divorce Proceedings there will be subpoenas soheres a way to carry out the survey. I call people and i say have you cheated on your spouse but wait, dont tell me just yet but i want you to first appoint the point comes up heads, dont tell me comes up heads comes up heads tell me the truth. But that comes up tails, just tell me a random answer, going in and tell me the results of the coin flip. So people do this and now they have a very strong form of plausible deniability which is since they didnt tell me how the coin flip came out for any particular answer they told me they can legitimately and convincingly say okay, that wasnt my real answer. That was the random answer you instructed me to get so i cant form very strong beliefs about any particular person, everyone has a strong statistical guarantee of plausible deniability and this is something you like languagerental privacy. Thats okay because the question i cared about wasnt pertaining to any particular person. It was about the population level average, statistical probability of the population and it turns out because of the consequence of the law of large numbers that even though i have this noisy signal from each person, in aggregate i can figure out precisely the population level average because i know the process i which noise was added in aggregate i can subtract it off. We talk about this in the book and this is not so different from whats happening on youriphone right now. Your iphone is reporting much more complicated statistics then yes, sir no questions but in the end, this is for text completion. Your texts are sensitive but apple would like to know for example what is the most likely next word given what youve typed so far and they collect data that helps them do that by basically hashing the text data down into a bunch of yes, sir no questions, a bunch of binary data and running a coin flip procedure that looks not so different from this. To put it in the context of embarrassing, maybe youre embarrassed you still played seven hours of bejeweled on your phone so youd be reluctant to report that directly but if all of our phones had a large random positive or negative number to our weekly usage of bejeweled, then if i look at any individual persons random noisy report and it says jeff played 17 hours of bejeweled, i wont know whether he played no bejeweled and a random number or 30 hours and 13 was subtracted so he basically for any particular person thats reported to play a lot of bejeweled and would have the same plausible deniability but ifi add up all of these very , very noisy reports, the noise averages out and i get a good estimate of aggregate or average bejeweled usage. I dont play any be jeweled, by the way. [laughter] other questions . We are waiting for the mic. Jennifer. Hell of a mic. Thank you so much for the talk today. You spoke about how introducing differential privacy or unfairness can increase the error in an algorithm and im just wondering about thecommercial potential of the algorithm. If so, do you think that means regulated. The short answer to the first question is definitely. For instance google uses Machine Learning at massive scale for decades now to do specifically things like look through prediction more accurate through their rate predictions are, that directly translates into revenue and profit so going in and insisting on things like not discriminating against this or that group in your advertising or more privacy in the way that the Machine Learning is deployed is going to reduce those accuracy ratesand reduce profits. I dont know how to put numbers on it yet but i think we can be sure that this is going to happen and i think just relating this to things weve been discussing, i also think this is why a lot of the commercial deployments weve seen so far are an experimental areas that are part of the core business of these companies. Theyre experimenting like okay, theyd like to know the imo to usage statistics, theyd dont, its not a core part of their business but they are sticking a toe in the water and i think its to their credit that they are giving a tow in the water but im kind of waiting for the first Big Tech Company that says were not just going to adopt these technologies around the edges. Were going to put them in our Core Services and by the way all the Big Tech Companies we have many many excellent colleagues that these companies do research in these exact areas so its not like any of the Big Tech Companies dont know a lot about differential privacy, about algorithmic fairness but of course there is a disconnect between the researchers who study these things and the people with a p and l or Business Unit that theyoversee thinking about adopting them in the middle of their pipeline so i do think , i dont have strong buyers on how this is going to play out. I hope theres some organic adoption by tech companies, not Just Companies but other Large Companies thatare essentially consumer facing in some way. [inaudible question] in reference to the last slide, and the human conclusion that we should not be using machine algorithms and health data because of the ability to, they need to be fairly based algorithms to substantiate the algorithms without having a black box of why [inaudible] no, we dont want to say dont use Machine Learning algorithms what you do have to be careful. First of all, its not that Machine Learning is entirely a theoretical but if you have some results you train some classifier and it seems to be good on the holdout predicting then you can legitimately put a confidence interval around that and estimates with the validity and you can attach a p value to it but the problem comes when you start sharing data sets and in particular holdout sets in when you are taking Machine Learning 101 the way you often get full validity in Machine Learning when youre, unlike in statistics, not explicitly assuming the dado fits in your model but you are training random [inaudible] but the way you have this holdout set in this piece of the data set you never seen before an entirely independent of everything you are training which looks fine on paper but if i read a paper and sent you an email and said that was a great paper you send me your data set so i could do Something Else with it but even if i come up myself, follow all the rules of hygiene i read your paper and implicitly everything im doing is i function in response to the finding that you wrote about which were a function of the data and so as soon as anything like that happens all of the guarantees that come with a holdout set go entirely out the window. Now, the easy way and i say easy meaning theoretically easy but practically very hard to solve this problem was what people advocate for as well with pre registration. Like should make sure i cannot look at the data at all before i conduct the experiment i will conduct but if you take that seriously it rules out data sharing or exactly this reason. And although that works its draconian and would rule out a lot of interesting studies and we spent time talking in the chapter is a nation algorithmic science that allows you to share data and reuse data in a way that does not give up on rigorous statistics. But im betting that when you think a theoretical you are referring more to causality about there is this split between the Machine Learning community and many other communities including medicine and economics about causality and some Machine Learning, militant anti causal but like lets get the data and if we get a good fit to the data and practice sound statistical techniques, i think, having strong priors and you know, having a causal model in your head is something i would consider to be having strong priors and help reduce the number of things you try on the data but i still dont think its a substitute for the kinds of things we discussed in the chapter because its again a matter of discipline. I think i have some causal model in my head but i dont literally have usually there will be parameters to that causal model and i will start play around with the strength of that causality and as soon as i do you will go down the strain rabbit hole where you are testing many, many hypotheses, sequentially or in parallel on the same data set and you are prone to false discovery and you are very, very careful. With very early days in earlier days in fairness but i think the discipline algorithmic approaches, including ones that involve things like differential privacy and other sorts of statistical methods are better than human beings themselves in their heads saying well, im not engaged in this sort of reproducibility crisis because i have strong priors in the form of a causal model. It points us to the challenges we will face in the next 50 years or 100 years [inaudible] [laughter] one of the response abilities that come with fiscal scientists to explain the observation to the public in a way that history supports good policy and we see examples all the time of this is not having [inaudible] its not just a data sets, right . What is your perspective on even taking this next step on a complex thing like Machine Learning and how easily it would be to reach any policymaking . I think thats very important trade Computer Scientists unwittingly thrust into policymaking that its informally. If you are a Software Engineer at facebook and you tweak a parameter and some algorithm and go to lunch and dont think about it you are effecting all sorts of things for millions of people and facebook, in many ways, is informally making policy in ways that arent precisely thought out. I think that even though it were in this situation its important that we work to make this more explicit and make it more clear how how awkward discussions best affect policy and try to understand as broad an audience as possible okay, at a high level like what are algorithms and what are they and what are they trying to accomplish and thats in large part of what we are trying to do with this book. Yeah, its funny because ive been around a lot longer than aaron and the word Machine Learning is in my Machine Learning is in my software dissertation but at the time this was an exterior area to be studying and even majoring in Computer Science when i was an undergrad was viewed as an odd thing. Its interesting that sometimes i joke that sort of threw no foresight or merit of my own and the world was delivered to the doorstep of Computer Science and the sometime 20, 30 years and i was long thrilling because it had no downside in many ways. There were all interesting new jobs and interesting new science and in many ways now the bill is coming to end our book is about that bill and how we might pay it but the other part of it is i dont like to use this terms but more scientists need to think about it becoming public intellectuals but much more involved in the views of their technology and misuse of their technology and trying to help society solve the problem greeted by those technologies and there is still not a lot of that yet and you start, its still very superficial, not in a criticizing but you are starting to see Computer Scientists do things like write oped pieces, for example, and we really are going to need technically trained people who are willing to spend their entire careers or much of their careers mediating between technical part of state policy and social applications and i think the community have people that work on these types of problems is starting to read a Younger Generation of people that are willing to make that career choice and may be People Like Us are at the point in our careers where we can say, you know, i can go do this and not worry about whether to have a Research Career but its important and this is starting to happen organically but very early on. We have time for just one last question. I may have something completely personal which is you work on research and i think we all know for quite some time and then you work on a book for quite some time and you throw a lot of yourself in there. I know that its important work and it is and we talk about that but what about the spark of it . What is it about this subject matter and you could have chosen any subject matter and what is the 16 yearold versions of yourself that make this what you want to spend your days doing. Yeah, the question. The 16, 18 yearold version of myself wanted to be mathematical but i started college as a math major and drifted toward Computer Science and you could think mathematically about computation. So, would be in that direction i realized theres something called not just Machine Learning but i took the theory where i took the textbook and that was cool and you cant prove mathematical theorems about how people how machines might learn but i got to grad school but you cant apply this mathematical compu lens so i started [inaudible] which was just being divine it was exciting time and i enjoyed thinking or proving theorems about privacy and you can think about privacy and yeah, using math and more recently, you know, you can do the same thing with fairness but theres a big difference about writing for an expert academic audience and you try to very precisely define ideas in a driveway to be concise in writing this book was quite different and it was fun and liberating to try to write in an engaging way and was difficult but rewarding to try to describe these ideas which are at their roots, mathematical. We tried very hard with the removal of the equation from a book but i hope in the end we succeeded in conveying what is not just a natural interest in these topics and we are lucky these topics are not just interesting as mathematical curiosities but a real meaningful and important things of the day but also the excitement of doing research in these fields because we are can take people on the book up to the frontiers of knowledge because little is known so far. My origin story is different. If you told my 16 yearold self that at some point later in life you will write a general audience Nonfiction Book that would make more sense to me then if you told me you will be a professor in Computer Science and Machine Learning because in computer in high school i was a very different math student. I did not like it much and did not try very hard and was not good at it then and started college as a declared glitch major and pretty quickly realized that i chosen english because i wanted to learn how to write and majoring in english would teach me how to read. At the same time im letting i managed to hang on in math class by my fingernails and when i got to berkeley i started taking more of them. Many studied math through high school and college and then you realize theres this Phase Transition where things become much more interesting and they are aware of the crated aspects of it. I discovered theres more in buzz of being able to program a computer that you possibly cannot achieve in your entire lifetime and it is something stupid like sorting a list of numbers for instance. But i think i enjoyed that and then i started hung around math long enough until the purely mathematical aspects became so in some ways writing this book, in some vague way, does the fill the type of thing i wanted to do when i was very, very young and one of the comments i will make about this work we been doing in fairness, i remember six years ago or so when i we first started talking about some problem in algorithmic fairness and that led to our first publication which was interesting but flawed and id like to say that even when you work at something in fairness you like to say six years ago i realized that this was important for society and we have a responsibility to fix the problems and even if the research had turned out to be morning and made the problems were easy or too hard or theres nothing you can do or the solutions are clear and technically straightforward and its a matter of going out into the world and given the people to adopt them and id like to claim that come hell or high water i wouldve said no, this is what we have to do and this is what we do with responsible citizens and i dont have to know because it did turn out to be a mathematically algorithmic very rich field but it is great to be able to work on a topic that, you know, a society is interested in in both positive and negative ways where you kn know, there is really interesting Technical Work to do that is creative and satisfying and also to be able to do it with somebody that you are so sabbatical with from a technical and expository standpoint and its been a great deal of fun. Thank you. I think we all feel the same way and be able to hear your ideas and have them share directly with us means a lot. Congratulations on the book and thank you for coming. Thank you for hosting us. [applause] thank you everybody. You are watching special edition about tv airing now during the week while members of congress are working in their districts because of the coronavirus pandemic. Tonight, look at crime. First Joshua Hammer tells a story of the black market animal smuggling operation and reports on related international and Domestic Trade regulations. Then university of texas journalism professor came to week learn dawson looks at the life of edward oscar heinrich, americas first forensic scientist. Later jack goldman former assistant attorney general in the georgia wood Bush Administration recalls the life of his stepfather was an associate of teamsters leader, jimmy hoffa. Enjoy the tv, now in over the weekend and cspan2. Television has changed since cspan began 41 years ago but our Mission Continues to provide an unfiltered view of government already this year we brought you primary election coverage, the president ial impeachment process and now the federal response to the coronavirus. You can watch all of cspans Public Affairs programming on television, online or listen on our free radio app. Be a part of the National Conversation to cspans daily washington General Program or to our social media feet. Cspan, greeted by private industry, americas Cable Television companies as a Public Service brought to you today by your television provider

vimarsana.com © 2020. All Rights Reserved.