For example, every eligible voter in the depth states with the of information that would have made j. Edgar hoover weep with envy. [laughter] the use of this socalled big data and the inferences made singular purpose, to craft and deliver messaging that will shape the future behavior of an individual from buying a particular brand of whitening toothpaste, to dis couraging a citizen from choosing one Ride Sharing Service over another, to voting one way or another on profound decisions like the u. K. Brexit and the u. S. President ial. Lection these profiles are assembled from what our guest this evening our digitalalls footprints. The traces of our daily lives captured, often sold and analyzed. Many of us do not even realize leaving these footprints behind through our use of credit cards, our web , our online purchases, our smartphone use, what we list services,treaming and what we watch on cable tv. Our guest tonight, dr. Michal kosinski of stanford university, us to his work making assessments of digitalals from these footprints, particularly their facebook likes and profile. Ictures he will also help us begin to thesehow assessments like are being and could be used to shape our political and social reality. Join me in welcoming Michal Kosinski to the stage. [applause] michal hi, david. Everyone. Ng, helpinghanks again for us peer behind this curtain of the use of big data and our Digital Footprints to assess and influence each other. Maybe we could begin by having you describe your work for us and what you believe can be learned from us from our facebook likes and profile pictures. Michal well, thanks again for having me here. I am a computational which means that in working mostly with data, particular big data. So instead. Time with the Research Subject in my lab, were running experiments or maybe learn ing about people using surveys, i would look at the that you soprints nicely introduceed before that leaving behinde while using Digital Products and services. Time to be aeat computational psychologist. Its a great time to be me at the moment. [laughter] because you guys are well, we amount ofan enormous Digital Footprints behind. Back in 2012, i. B. M. Has were leaving about 50 megabytes of digital day, per person, which is an enormous amount of data. If you wanted to back it up on paper, by printing it out on paper, lettersized paper, size 12, and font you wanted to stock it up and just one days worth of data, the stack of paper would to the sunm here four times over. Well, hopefully you guys [laughter] well store them in the museum over here. Were all generating enormous amount of information. And now this information, of course, contains our trail of behaviors, thoughts, feel ings, social interactions, evenications, purchases the things that we never intend ed to say. Not sure you realize that if you type a message on facebook decide, ok, its 2 00 a. M. And maybe i drank too much wine, i shouldnt probably sending it and you abandon the message, close the window, what. The message is still being saved and analyzed. And now this is not just this one platform. In most cases data is preserved even if you think that you have deleted it. Research, my main goal is to try to take this data and new about human. Sychology or Human Behavior one of the byproducts of doing that is that i will produce your digitalake foot prints and will try to predict your future behavior. Predict yourry to psychological traits such as personality, political views, religionocity, Sexual Orientation, and so on. Well, what was really shocking inme when i started working this field is how accurate those are. S so this is one shocking thing. The other shocking thing in fact those models are also very difficult to i know computer can predict your future behavior. A computer can reveal or psychological traits from your Digital Footprints. But its very difficult for a human scientist to now understand how exactly a computer is doing it, which brings me to this black box problem which basically means that it might be that human psychologists or human onentists will be replaced day by ai running science. But in the meantime, you models thatve those we dont really actually understand very well how they do amazing at are predicting your future behavior, and psychological traits, so on. I worked with facebook likes lot not because facebook likes are the best type of leavingfootprint we are behind, not at all. Are not facebook likes so revealing. Why . Because liking happens in a public space. So when you like something on facebook, you probably realize that now your friends will see have liked. So you wouldnt like anything maybe embarrassing or something really boring or something that you want to hide from your friends. But now when you use your web browser or you search for on google or you go basicallymething, you have much less choice. You kind of will search for would never like on facebook. You would visit websites that facebook never like on and you would buy stuff that you would never like on facebook. Like you would buy medicine that revealing about your health. And most of us dont really like medicine we are taking on facebook. Which basically means that if get access to your credit card data, your web data, your search data,ecords from your mobile phone these Digital Footprints would be way more revealing than i can do using facebook likes. Comingever findings im up with, they are just conservative estimates of what done with more revealing data. Can actually see that the entire industry entire industries, not just one industry. They are moving towards basically building their models on top of the data we are producing. And my favorite example is credit cards. How many of you guys have actual for the credit card recently . Ok. Have few people that maybe didnt do their research on line. Roperly but most of us, including me, we dont pay for credit cards. Now, guess what. If you are not paying for something think about it for a second. Is just an amazing magical thing that allows you to carryingtuff without cash around. Its a complicated network behind it, computers crunching so on. D now, were not paying for it. Why . Product. Ere the of a credit card company. And when you its not a secret. You can go to the website of a otherr mastercard or any credit card operator and you will see that they see themselves not as a Financial Company anymore. They started as a financial helping to it was channel payments. Now they see themselves as a computer. Customer inside customer in sights company by observing the things youre buying and when you are buying them and how. Uch youre spending on the individual level they can also a lot about you but they can see extract interest ing information on the broader level. You know,see that, recently people in San Francisco started buying certain things, to certain restaurants or what not, this is very valuable information that can be sold. So basically if youre not pay , youresomething most likely a product. So now think about your web probably didnt pay for, your facebook accounts, ing mechanism,h and one of the gazillions of app phone you have on your and now think about how much data youre sharing with the. Thers david is your use of facebook i guess at the time initially, a graduate student at cambridge. Ity of correct . And at the time, i believe, facebook likes were public. Anyone could see your facebook. Ikes so did that make that kind of you sinceailable to it was just public on facebook . Use thathat led you to data . Michal yes. Here noting out a reason why another reason likes, using facebook which is that i was very lucky to get a huge dataset of volunteers that donated their facebook likes to me as well as , theirolitical views personality and other andhological scores, basically other parts of their facebook profiles. In 2006 or 2007, my friend david stillwell, started this online personality questionnaire where you could standardnality Personality Test and then you feedback on your scores. It went viral. Six millionthan people that took the test. And half of them generously gave access to their basic facebook profiles. Test, weou finish your would ask you if you would be willing in return for us offer this interesting thing, if you would be willing to give us access to your that we wouldle later use for our scientific than sixand more million people, in fact, took around threewe got million profiles, facebook. Rofiles at the beginning, in fact you know, people like to say, oh, graduated from high school, i already planned, you this research 20 years later. No, it wasnt the case. Case, i kind of stumbled , kind of got into this research by accident. What happened is i was traditional personality questionnaires. And traditional personality questionnaires are composed of questions such as im always on poetry orrk or i like i dont care about abstract ideas. And i had this dataset of facebook likes where basically or they like i dont like at ideas or i dont like to read. What struck me is that why would we even ask people this go toon if we can just their facebook profile, look at their facebook likes, and just, you know, fill in the them . Onnaire for [laughter] started running those machinelearning simple machinelearning models that take your facebook likes and try to predict what would be your personality score. Worked pretty well, which actually was pretty because iing for me spent so much time developing those bloody questionnaires and computer can do the same thing in a fraction of a second for millions of people. We hade started other data in our dataset. Cane were like, ok, so it predict personality. I wonder if you can predict religionocity, Sexual Orientation, whether your not. Ts were divorced or and each time we asked this question, the computer would think for a few seconds and then can predict it. Its curiousy. Its amazing. Pretty, we were suspicious. So at the beginning i would re run the models with or rendent pieces of data write my entire code thinking that i must be doing something wrong given that a computer can and at your facebook likes predict with very high accuracy, close to perfect, whether youre gay or not. Really likeont anything obviously gay on facebook. Well, some do but its actually a very small fraction of people. For most of the users running predictions, this was real ly based on the movies they watched or books they read. Looked very counter intuitive to me at the time that you could do it. Now im a bit older and spent more time running those models. Its actual actually pretty obvious left met illustrate for me kind ofmaybe let try to offer you a short introduction to how those models work. Its actually pretty intuitive. Look, if i told you that there andhis anonymous person they like hello kitty its a told. Im [laughter] you would probably be able to out, if you know what hello kitty is, that this person female, young, an iphone user, and you can from here and make some other inferences about hair actually verye correct. 99 of people who like hello women. Re so you dont need computer Rocket Scientist or even a Computer Scientist to basically make inferences of this kind. Most of your facebook likes or most of your purchases on amazon or most of the locations that you visited with your phone it or most of the search queries that you put in regle are not so strongly vealing about your intimate traits. But it doesnt mean they are not all. Ling at they are revealing some of them to a very tiny degree but they are still revealing. That, lets say, you listen to lady gaga 30 times yesterday, its not only a bit weird, it does also show us about your musical tastes. It can get tiny bit of information and aggregate it digital sands of footprints that you are leaving ehind to arrive at the very accurate prediction of what your basically and this is the paper they publish in 2013 very excited about the promises of this technology and excited about promise of this technology. It is used to improve our lives ways, we dontnt realize how many different way its improves our lives. About netflix or spotify, newsfeed, which is so engaging people spend two hours a day on average, if i correctly, looking at it. Now they dont look at it mro bloody boring, ai behind it ause made a prediction about what your character is and adjusted such a way to make it most engaging. Now there are also down sides, well be talking about today. And well, basically the paper published in 2013, it got coverage, but s at that time, most of the press coverage was like, this is so you can predict whether someone is i dont know, republican, a republican from likes. Acebook nice shiny gadget. I was like, no, no, no, wait, have to realize tremendous consequences for the future of society. No, its so cool you can predict from likes, but this is we go. As but now interestingly, this is ow general public treated the results, but policy makers and companies took notice. Or instance, two weeks after the results were published, facebook changed their privacy way facebook a likes were no longer public. Before 2013, i think march, we published the paper, before for likes were public everyone to see. Your idnt have to be friend on facebook to see everything you liked. Now our paper, our work showed by seeing what you liked, i also can determine your Sexual Orientation, political views and other intimate traits people are not happy to share. Was a great thing facebook took notice and to preserve your basically switched that off. But you also have u. S. E. U. Government took notice and started working the legislation to somect their citizens from short dcomings of the phenomena. And talk s pivot about political uses of this kind of work. Talk, want to hear you are about how private firms using Big Data Analytics to your shift voting result necessary one way or another by icrotargeting messaging that is defined by intended ersuasion, rather than accuracy. Cambridge se firms is an a la an a l involved in ca, brexit and the Trump Campaign. Much of what we know about this to loping story and due recent investigative journalism, especially by the guardian i thought it could provide our audience with a quick review of the story before telling us what you think about it. A u. S. Firm owned by 2016 had rcer, until steve bannon as Vice President secretary. Ate mercer is one of the most Successful Hedge Fund managers, of Breitbart News and supporter of the Trump Campaign. Left his executive an tion at breitbart andCambridge Analytica, who reportedly employed social media data mining along with government records and data sold just porations, as we discussed, to definitely dossier voter, first used by Ted Cruz Campaign and the to microtarget to influence voters. Elatedly a canadian firm, aggregate i. Q. Has been a central consultant for this kind with the various u. K. Organization step push for the brexit vote in cambridge ownerica appear to be the of aggregate i. Q. Intellectual property. Time magazine reported u. S. Investigationors are looking at cambridge an lit caa in the activity in ssian the u. S. President ial election, which evidently may have using d russian elements techniques like those used by analytica. In short, quite a tangled web. Tell us about how our work relates to this whole thing and how we should think analytica idge campaign . Or the trump michal very good questions. There was a lot there, but first of all, we how effective ow Cambridge Analytica was and you listen towhen Cambridge Analytica, they start amazing and efficient they were, but when they realized that, you know, getting ts are maybe when they ybe when they realized that some things they had done and it became public, some things were not entirely legal, they suddenly changed it r speel and now they say didnt work at all and were just making stuff up. Means they are either lying now or were lying then, so it is difficult to say. Can tell you for sure is that first of all, we have a lot evidence that we produce in showing such approaches work really well. We also see it is not only Trump Campaign or brexit campaign, but we see all of the serious employing now methods of this kind in their campaigns. Which, and in fact, barack obama politician t major to do it on a massive scale and remembery dont really any outrage, especially on the eft side of the political spectrum at that time. Not only spend three times more money than donald trump on doing targeting on social media, but also hired way in my opinion. But she she lost, didnt lose because trump was of magical ind methods, the difference in the massive and caused by Something Else. Can people tell me, ask me, Data Analytics and political marketing win the election . The answer is, well, yes and no. Es, because it is a fact of political campaign, it is a fact of life when running a political ampaign like t. V. Spots and adsting articles and putting in the papers, but because everyone is using it, it is not giving anyone any unfair advantage and the only unfair of isage here i can think barack obama, who was the first one to use it on a massive given him ight have some unfair advantage then. But also, i think what people, humans, we kind of like to focus on the negative. It is great we focus on the negative, this is clearly a great psychological trait allowed us to be species, cessful as probably even too successful to degree, but lets put aside ocusing on negative and risky and think about advantages of politicians being able to message. Ize their there are few interesting outcomes of that that people notice. To one is, if i can talk with you guys oneonone, that is really social media, use algoritms to help you, talk oneonone about things most you. Vant to the algorithms help understand interests, ter, your your dreams and your fears, to message more interesting and relevant to you, all, has one utcome, which is messages became more important. In the past, i could say, yes we can spend ga zillion of on showing every t. V. Station and i could be couldntl, moreover, i redo anything else, because i lacked ability to communicate oneonone. I had to settle down on some kind of