Transcripts For CSPAN2 Brian Christian The Alignment Problem

CSPAN2 Brian Christian The Alignment Problem July 11, 2024

Alignment problem. Those theoretical contributions the bestselling book the most human human and his new book if we continue to rely on Artificial Intelligence what happens when ai itself becomes the problem with those implications and then from his 2011 book with the alignment problems it is the replication of human bias and argues although we train these to make decisions for us will be discussing a lot in the next hour and please put your questions and the text chat today thank you brian and welcome and thank you for joining us. This is not your first book. The obvious question why did you decide to tackle this topic now . Great question. The initial the for the book and as you mentioned vanity fair reported elon musk with the reading and i found myself in 2014 attending a Silicon Valley book talk with a bunch of investors and seeing the thing about elon musk reading the book of the invited him and to my surprise he came. There was a fascinating moment at the end of the dinner when the organizers thanked me everybodys getting up to go home for the night and elon musk forced everyone to sit back down and said no no no but seriously what will we do about ai . I will not let anyone leave this room. Ebert either give me a convincing argument for an idea for something we can do about it. It was quite memorable. So i was joe on drawing a blank. I was aware of the conversation and for some people a Human Extinction level others are more focused on the present day ethical problems i have a reason why we shouldnt i have a suggestion of what to do. So his question so seriously, whats the plan haunted me as i was finishing my previous book and i began to see starting around 2016, a Dramatic Movement within the field. Both on the ethical questions and further into the future safety questions. Both of those movements grown explosively between 2016 and now and the questions of ethics and safety in the alignment problem how do we make sure the objectivehat is carried out is intenng for it to do . Going from aarginal to a philosophical to makes up the central question. I wanted to tell the story and figure out in a way to answer his question. Ats the plan . Whatre we doing . As i was gettin into this there is complex Technology Going int this f those in Society Today you have a number of examples of what ey hope to perform but one of those cannot be more timely which are the algorithms made t just here in california. To create a kind of probation and parole. And as i say that started in the 20s and 30s but really took off with the rise of personal computers in the 80s and 90s and today its implemented in almost every jurisdiction in the u. S. Municipal counties, state, federal. And there has been an increasing scrutiny thats come along with it. Its been interesting watching the public discourse. On some of these tools, so for example the New York Times was writing an Editorial Board was writing up through about 2015 these open letters saying its time for new york state to join the 21st century for equal opportunity, et cetera. What does it mean to turn them into the language and how do we look at a tool like this and say whether we feel comfortable actually dloying ts . It was interesting when you were giving examples of a block suspect and white suspect similar crime and background and how much me likely the white suspect was to go free including one of the white spect [inaudible] they were still. This is a very big conversation. I think one way to start is to look at the data that goes to the stems. One of the things they are trying to do is predict one of three things. Typically its predicting three different things. One is your likelihood to not make a Court Appointment. Second is to commit a nonviolent crime. If you look at Something Like failure to appear in court, the court knows about it by definition. If you look at Something Like nonviolent crime makes the case for example if you poll young white man and a black man in manhattan their race of marijuana usage, they selfreport they use it at the same rate and get you look at the arrest data the black person is 15 times more likely to be arrested for using marijuana in other jurisdictions it might be varied in place to place so thats the case where its important to remember that the model claims to be able to predict crime but what its predicting is rearrest so it is systematically so. Its ironic to me because as a part of the project of researching the system, as i went back into the historical literature, at the time and a loa lot ofthe objections were cm the conservatives, from the political right and making the same argument progressives are making now but from the other side, so conservatives in the late 30s were saying wait a minute, if a bad guy is able to evade arrest then the system treats him like hes innocent and will recommend his release to other people like him. If someone is wrongfully arrested and convicted they dont have the Training Data and it would recommend attention of other people like them. This is the same argument framed a different way but that is a problem and we are starting to see groups like for example the partnership on ai which is a nonprofit Industry Coalition with 100 different stakeholders. The second component i want to highlight thats worth highlighting is this question of what do you do with the prediction once you havet so lets say youve got a higher than average chance you are going to fail to make your scheduled Court Appointment. Thats a prediction. A second qution which is what do we do with that information. It turns out theres you send a text message reminder they are more likely to show for the Court Appointment and therere people proposing solutions le Daycare Services for their kids or providing them with subsidized transportation so theres a separate question as much as is going on, as much scrutiny as is being directed in the algorithmic prediction, theres a much more systemic question which is what do we do with those predictions and if you are a judge and the prediction says that this person will fail to reappear, you want some kind of text message alert as opposed to jail, but that may or may not be available to you in that jurisdiction so you have to kind of work with what you have and that isnt necessarily an algorithm per se but its sort of caught in the middle if you will. You talk later in the book about hiring a and amazon coming up with this job applicant and what they were finding. The reasons for this also were baked into the way the system was being trained and the way the system was being used and when you get to this you also have the question about the end like why were you trying to find people like those that you had. Tell us about that and how did they get in . This is a story that involves amazon in 2017, but by no means are they unique examples. It happens to be the example of amazon, but like Many Companies they were trying to design and take a little bit of the workload off of the human recruiters and if you have an open position you start getting x number. You would like some kind of system to use the triage and tell you these are the resumes. In the same way that they rate the products but to do that, they were using a kind of computational language model called word vectors and without getting too technical, these models that were very successful around 2012 also started to move into computational linguistics and in particular there was a remarkable family that was able to imagine the words at this point so if you have a document you could predict a missing word based on the others that were nearby. You could do a search for the point in space nearest to that and you would get queens. You could do tokyo minas japan plus england and get london. So these numerical representations are words that fell out of this network and ended up being useful for this surprisingly vast array and one of these was trying to figure out the relevance. One way to do it is to say here are the people weve hired over the years and then for any new resume which of the words has a kind of positive attribute and which has the negative attributes. Sounds good enough but when they were looking at this they found all sorts of bias so for example the word womens was assigned. The word was getting a negative deductio a negative rating. Becausit is located further away fro the more successful words th it had been trained to watch for. Thats right. It doesnt appear on the typical resume thatid get selected in e past and its similar to others. So o course they said okay we can delete this attribute fr the model. They star srt noticing its also applying deductions like field hockey so they could get rid of that or womens colleges so they get rid of that and then they start noticing that its picking up on these subtle choices that were more typical of the males than females, so the use of the words executed and captured like i executed a strategy. At that point they basically gave up and scrapped the project entirely. In the book icon. To something that happened at the boston Symphony Orchestra they decided to hold the auditions behind a wooden screen of course they could identify whether it wasnt until the 70s when they instructed people to remove their shoes before entering the room the problem with these models they are detecting the word capture and they gave up and said we dont feel comfortable using this technology. Whatever its going to do we are going to identify a subtle pattern. Its just going to sort of replicate that. In this particular case they walked away how can you d bias a language model if you have these points they try to identify spaces within this how much did amazon spend developing that . They are pretty tightlipped about it. As i understand so they had to wash their hands. I assume millions were put into that and they could have hired another team. Another example i want to get into is the self driving car in arizona in 2018 the first pedestrian killed was the rmd vehicle fortunately i was able to get some of that into the book and it was very for that entire sweep of things we might have ended differently. One of the things that was happening was it was using a sort of network to do object detection but it had never been given an example of a j walker so in all of the training models people walking across the street were perfectly correlated with fever strikes and perfectly correlated with interceptions so the model didnt know what it was seeing when its all this woman crossing the street in the middle of the street. Most object Recognition Systems are taught to classify things into exactly one of a discrete number of categories so they dont know how to classify stuff that seems to belong to more than one category or that it seems that isnt in any category so this is again one of those active Research Problems but in this particular case, the woman was walking a bicycle and so this set the object recognition system kind of into this flattering state where first they thought she was a cyclist but she wasnt moving like a cyclist than they thought she was a pedestrian then they thought maybe its just some object due to a quirk in the way that the system was built every time it changed its mind it would reset the motion prediction so its constantly predicting this is where they will be in a couple of seconds from now but every tim it changed its mind it starts re computing thatrediction so its never stabilized on a production. There re additional things here that the team had made me. To add their own system in but i thinthe object recognition thing itself for me is very cinematic and there is a qution of certainty and confidence how did the system know what to do with tm and many people deal with the uncertainty and the mere fact you are changing your mind should be a huge red flag its very hrtbreaking to think about how all of these engineering decisions add up to this event to get to the bottom of this certainty and uncertainty because i think that is a very human thing. You dont want to take a highimpact action whats the term a preemptive judgment but in advance of deciding what the real thing would be because they are trying to prevent irreparable harm. In the very highimpact situation that requires us to quantify impact and uncertainty and have a plan for what to d the pieces need to Stay Together but we see progress being made on all of those fronts but it cant happen soon enough. In this example we talk about in the book is that the general problem that you think needs to be addressed and then im going to ask known as the alignment problem where it gets its title how do we make sure the objective in the system is exactly that and i think all o the examples that weve violated so far have own us cases where one must be very careful we think we can measure but we n only measure a rearrest. We tnk we can hire promising candidates that superficially resemble the previous cdidates candidates. In the category we dont always know what category we put them in. There wereany other manifestations as well that speak to this fundamentalssue of alignment sometimes there is a problem with the model architecture so theres a kind of black box issue and explain ability and how can we trust the output so to minimize or ximize so the component of the system had this own manifestation of the alignment problem. That for me is really the striking thing that makes this where we areow its an unremarkable shift and as i talked to one researcher. He came back a year later in 2017 and theres an entire daylong workshop and by 2018 its a significant fraction the number working on this are quite small but even over that short time to my mind it is astonishing. It cant come soon enough so i encourage all motivated undergrads and High School Students to get excit because there is aot of work to be done. In the ai research and Development Field is the actual mmercialization of the technology then ahead of where it should be and should it be modeled and not on the road or courtroom . That is a great question and 85 year history at this poi says we are still playing catchup into the analysis of the tip to the deployment. The understanding catchup to the actual implementation. And i think weve seen that with social media. There were decisions about how to run the news feed algorithm and the details are somewhat technical and went from supervised learning to reinforcement learning. Basically the narrowminded focus on always prioritize the content that will get the most clicks on that created a situation where content is promoted and people were being burned out and leaving the platform in addition to other kind of societal externalities that it is creating. It was to maintain the attention and these things that would serve. I think that there is a question when you think about the alignment problem, is the system doing what we want. When we look at the actual industry, what is it that we want the system to be doing. We have more urgently put ito be thinking aut this. One of ourudience members askedbout china, the despread use of the recognition. We talk about the facial Recognition Technology and the inappropriately funny results which were just absurd but also insulting. Can you talk a bit about facial recognition and another thing that in fact became a preposition on whether or not to use these technologies. There was this unfortunate and hard to ignore pattern of ethnic minorities incorrectly recognized or categorized by face recognition. E of the famous examples was the Software Developer in 2015 with a group of photographs he took where is captured by google photoas gorillas also one mitesearcher as an undergraduate computer scientis doing facial recognition homewo assiment assignmen. She had to borrow her roommate to check to make sure it worked because it didnt work oner but only if you were a whiteask. This is the investigation of why . What is the underlying thing . There are a couple of different components but one of the main ones is there is a existing lasone lackadaisical attitude how theyere put together in the first place so it led to t rise of comter recognition was the internet and suddenly if you needed half a million examples to train yr system in the eighties you were out of luck but now th the internet we jt download 1 million faces and put into your system so the most Popular Research database was one developed called lel faces in the wild so what we want to do was understand if its the same person so newspaper headlines or images because they are all labeled with this person and this person. That way we can decide if these are the two images. But you are at the mercy of the risen the front page in 2000 that was george w. Bush. With an analysis done a few years ago shows there were twice as many pushes of george w. Bush in the database as all black women combined. That is just insane if youre trying to build something to be fair to those that collected the data as an Academic Research project not intended to be used in any system but Somebody Just download off the internet and itsery striking if you look at the original papers i dont want to single them out because its widespread, the word diversity is used in early 2010 to mean lighting pose what they mean we are people from the de and in the dark but now at the end of 10 beginng 2020 is very striking because some of these databases up here with a warning label that says when we said diversity we met very specific thing that does not diverse in demographics. There is a lot of work being done there spearheaded lik mit and google to being more focused onqualizing that error rate among ethnic groups to make sure that it on thdatabase and the Training Data represents the population model and also the representation of tech self. In 2019nly less than 1 percent of Computer Science phds were africanamerican and so there is a t of work to be done in the field itself to address the question of representation. We see ai with a number of initiatives like scholarships and grants trying to equalize that in the field itself. There is the question from the audiee abo