“Elon Musk doesn’t really deserve to have a voice in the public discourse about machine learning. He’s not an expert…”
Professor Zachary Lipton is an Assistant Professor in the Tepper School of Business at Carnegie Mellon University, with an appointment in the Machine Learning Department. He recently completed four years of PhD studies at UC San Diego’s Artificial Intelligence Group.
His research interests are eclectic, spanning both methods, applications, and social impacts of machine learning (ML), there exist a few notable clusters. He is especially interested in modeling temporal dynamics and sequential structure in healthcare data, e.g., Learning to Diagnose. Additionally, he works on critical questions related to how we use ML in the wild, yielding The Mythos of Model Interpretability, and more recent work on the desirability and reconcilability of various statistical interpretations of fairness.
He is a native of New Rochelle, New York, attended Columbia University as an undergraduate, and is a jazz saxophonist.
Terrance Jackson: What is the difference between artificial intelligence, machine learning, and deep learning?
Zachary Lipton: From the crazy way these topics are covered in the media, it can be hard to tell the meanings of the various terms. Often they are compared to each other, e.g. what deep learning can do vs what machine learning can do. The most faithful, simple way to put it is that they have a subset relationship. AI was a field long before people were interested in machine learning. It encompasses the study of how to do, with machines, all things that we think requires something like human intelligence. Of course that makes it a bit of a moving target. Once we know how to do something well, such as playing chess, then we sometimes don’t subsequently view it as a critical piece of AI.
Machine learning (ML) is a specific set of techniques that are concerned specifically with learning from data. You could even think of ML as programming with data. For simple tasks, we can just write a program that performs the task exactly. Say for example, an email client. It needs to store the partially written message somewhere, to record keystrokes as they come in, and then when someone hits “send”, to package up the email into packets and send it over the internet to the appropriate server corresponding to the recipient. For more complicated programs, nobody knows how to just sit down and write a bunch of rules that will perform a task, such as one that takes 1 minute of raw audio and outputs the text corresponding to the raw audio. Similarly nobody knows how to write a program that recognizes faces in photographs.
However, in both of these cases we’re able to produce ground-truth examples of the task being done correctly. We can transcribe raw audio ourselves. And we can identify faces in photographs ourselves. In ML, we don’t tell the machine exactly what to do. Instead we show it a large number of examples of the task being done correctly. We then use an algorithm that tries to extract a pattern from the data, such that it can infer the correct output (e.g. the text matching the audio, or the identity matching a photo) for a new, previously unseen example.
Deep learning is just a subset of machine learning. Specifically it’s concerned with the methodology for constructing big statistical models called neural networks. The models are called “neural networks” because they are comprised of many layers of “nodes,” stacked together, each connected to the layer above. Signal cascades through the network, passing from layer to layer each of which transforms it’s input before passing the signal to the next layer. The input can be some raw signal, like samples from an audio file, or pixels from an image, and the output could be something very abstract like a set of categories. In neural networks we generally update the strengths of all the connections between these nodes over and over again to try to make the output match up against the ground truth.
TJ: How did you get interested in machine learning?
ZL: Prior to staring a machine learning PhD I had been a jazz musician for many years. I did my undergraduate education at Columbia in Economics/Mathematics, and had a brief exposure to computer science (2-3 undergrad classes) but wasn’t really much of a programmer and had never done any machine learning work to speak of. I did, however have some friends in PhD programs, mostly in the life sciences, and through them I became friends with Julio Fernandez, a professor of biophysics at Columbia.
Then, when I was about 26, I got knocked out of commission for about half a year and I had some time to re-evaluate my life. I was running out of money, was perpetually intellectually unsatisfied, and was living in a rent stabilized hole in the lower east side that had a horrible mold problem. In that climate, I went out and visited a friend who was doing a PhD at UC Santa Cruz in music composition. After a couple weeks hanging out in California, eating healthy food, sitting by the ocean, and then meeting PhD students every day and having interesting arguments, I quickly realized that I’d be happier in an academic environment. The only question then was determining precisely what I wanted to study for PhD. When I got back to NY, I made arrangements to break my lease early, determined that Computer Science was the right target for me, and that machine learning was the right subfield.
In particular, I’m drawn to machine learning because of the ability it gives you to engage meaningfully in lots of problems both inside and outside of computer science. If you do computer hardware, then it’s hard to convince people interested in policy that they should care about what you have to say. But at this moment (perhaps it’s easier to see now than it was 2012), if you understand machine learning deeply and occupy the right sweet spot between understanding the theory and methodology on one hand, and needs of various application areas on the other, then you have something to contribute in so many different areas: ML is a hot topic in medicine, policy, natural language processing, computer vision, robotics, and even to some degree in music and fine art.
TJ: What does a “Mad Scientist” do at Amazon?
ZL: My role at Amazon came about while I was still in PhD. I was 3.5 years into my PhD but already had established a certain degree of research independence: I generally worked on my own problems, forged my own collaborations, and wrote my own papers. So After finishing up a 7 month stint as an intern at Microsoft Research labs, Anima Anandkumar, a Caltech professor and Principal Scientist at Amazon AI asked me to join her newly formed team and do research. I needed to get back to SD since that’s where my life was. So we agreed I could be a part-time intern while working on my PhD. It ended up being a wild three months. We got some good research done, I simultaneously got invited to apply for professorship at Carnegie Mellon University (CMU), which ultimately came through. At the end of three months, I was helping to manage some of the research direction and was a soon-to-be professor. Meanwhile the team was growing rapidly and our research ambitions were growing hand-in-hand. Since I hadn’t even made my thesis proposal yet, I couldn’t have graduated any sooner than Winter Quarter, so Amazon AI made me an offer to stay on until I graduated / started at CMU.
TJ: How does Carnegie Mellon compare to UC San Diego, Columbia University, and New Rochelle?
ZL: It’s hard to compare university to high school. While New Rochelle High School had some wonderful students and some wonderful teachers, it’s hard to offer much someone who is going to go on and do math or science in a serious way. I spent most of middle school and high school being rather bored, and stopped doing my homework, mostly as a form of protest against what felt like a decade-long onslaught of busy work. The only times I learned much my pre-university education were those times when the teacher left us alone to do whatever we wanted. We had such a teacher in the full-time kaleidoscope program when I was in 4th-5th grade. And then the next time I learned anything was as a musician. The beautiful thing about playing jazz was that most of the information that you needed was on the records, you just had to listen really hard, and work out what people were doing.
When you get to PhD, nearly every person you meet was the strongest, or was among the handful of strongest students in their pre-college education. You could even be the strongest student who passes through your high school in a few years, and not be an especially strong PhD student. I wish the school system had more to offer us when I was coming through it. I know it sounds harsh, but I really got very little out of my pre-university education and when I see how much stronger the math and science education is in some other countries, it’s hard to see how we can be competitive while continuing to waste so much of our best students’ time.
I’d advocate letting students pass out of classes by taking final exams, giving students more access to online courses (there’s such a huge wealth of resources that weren’t around when I was coming up), and just generally thinking more seriously about the unacceptable degree the current system is holding back the strongest students. That sounds harsh, or perhaps bitter, and maybe when I was in my early 20s, I was a bit resentful of the school system I came through. Now I can look at things a bit more dispassionately, but I still think it’s a crisis.
Of course, there are bigger issues than how much the best 1 in 100 or 1 in 1000 students fare. My personal politics are such that I’d care a lot more about how people in the middle do, the extent to which we pick up people at the bottom, and the extent to which children from more adverse socio-economic conditions get equal opportunities. But since you’re interviewing me about my experience and as a machine learning scientist, I feel it’s best to respond candidly about the system from my perspective.
Columbia, UCSD, and CMU have all been amazing places in very different ways. Coming out of high school, Columbia was an amazing place to explore. I could learn new math from teachers who had won major prizes in mathematics, study economics with world authorities. Being in NY, it was the perfect place to be a gigging jazz musician while simultaneously pursuing more secular academic goals. UCSD was a different experience. I had already been out of school for maybe 6-7 years and was (momentarily) disinterested in big-city life. Being right on the beach, in a city with ideal weather year-round, I couldn’t think of a nicer place to get a fresh start in life.
There’s are many significant differences between public and private schools, and I was also glad to to experience public school life. Among other things, the cost per student is quite low compared to private schools. So even if your advisor doesn’t have funds for you, covering your tuition is pretty easy. There’s a degree of transparency that can cut both ways but is mostly cool. It’s also nice to be at a school that has a big role in the community. While the PhD student body is from all over the world, the undergraduate program draws heavily from the San Diego area, and California more broadly (partly because in-state tuition at a UC school is an amazing deal) and I enjoyed being part of a regional community.
I’ve only been at CMU for a little while, starting my professorship in January, so I’m not quite an expert, but so far, I love it here. One amazing thing about CMU is that technology runs in its veins. Every department is crawling with activity. While other schools have School of Engineering, Department of Computer Science, and (maybe) a Machine Learning Research Group, CMU has a School of Computer Science, and a full Machine Learning Department. Additionally, there’s activity in machine learning in the Tepper School of Business and the Heinz School of Public Policy, and the Philosophy Department. Within CS, there’s tons of ML activity also in the CS Department and the Robotics Institute. It might be the most open school I’ve even been to, in terms of interdisciplinary cross-department collaboration, so that’s a major draw for freaks like me with lots of interests. At CMU there’s nothing unusual about a grant co-written by computer scientists, musicians, statisticians, etc.
TJ: Do you still find time to perform as a Jazz saxophonist?
ZL: Once I started traveling all the time for work, and once my work schedule started becoming extremely unpredictable, I pulled back on playing gigs. The main obstacles are (1) time, and (2) wanting to make only commitments I’m confident I can follow through on. Perhaps once I build a lab with some amazing students all of whom are smarter than me, they’ll drive the research and give all the talks, and then I’ll get to play more jazz 🙂 .
TJ: Can you briefly explain your new research with John Alberg of Euclidean Technologies where you used machine learning to forecast the future fundamentals of companies?
ZL: So I had spent a couple years of my PhD research investigating applications of Recurrent Neural Networks (RNNs) to time series data. Are basically a kind of neural net that’s especially well suited to dealing with sequences. Paragraphs can be viewed as sequences of words, videos as sequences of photographs, and medical records as sequences of observations and treatments. I had done a lot of work specifically on medical time series and John and I wanted to see if the same tools could do a good job of modeling financial time series. At first we tried to predict the future price. The idea is that if you had good predictions of which stocks would go up vs down, you could make smart investment decisions. Turns out that doesn’t work so well.
Instead we ended up coming up with the idea of predicting the future fundamentals. Those are reported financial informations like income, debt, etc. Then once we’ve predicted the future fundamentals, we construct an investment strategy that decides which stocks to pick based on the forecasted fundamentals and their relation to the current price. It turns out that this different approach performs much better out-of-sample. In other words it produces strategies that do well not just during the training period, but also as fresh data comes in: the strategies appear both stronger and more robust to drifts in patterns over time.
[Click here for Bloomberg podcast about the research]
TJ: There have been a number of articles [1, 2, 3] that reported biases with the computer algorithms used in the criminal justice system. You attended a conference at NYU about the issues of transparency and bias with computer algorithms. You wrote a post about the conference where you wrote:
Personally, I’m inclined to believe that the entire practice of risk-based incarceration is fundamentally immoral/unfair, issues of bias aside.
What can be done to address the issues of transparency and bias in computer algorithms that are increasing impacting people lives?
ZL: These problems are very real, but there’s a lot of magical thinking in the technical community about how to fix them. The challenge here is that to make an impact you need to understand the technology very deeply, enough to know what its fundamental capabilities and limitations are, and you also need to understand the social problem well enough to know how these capabilities square against social desiderata (Latin: “desired things”).
In many of these cases where we’re concerned that decisions are just, I just think the current breed of learning algorithms which are just trying to predict some label (say, future arrest, or what a human judge might say) are just fundamentally the wrong tool. They will pick up whatever problems exist in the underlying data. For example, black people may be more like to get arrested because of current policing patterns which target black neighborhoods, and thus in the ground truth data might be overrepresented as “positive examples” of recidivism.
Supervised learning, which is all these current algorithms are doing, by themselves have no way or knowing about this underlying mechanism, and the various social forces at work. While people are very excited about designing “fair algorithms,” much of this work is built on shaky foundations and misconceptions about both what ML can do, and what the real problems really are. Ultimately, I think the more critical question might be when we should vs should not use (current) machine learning algorithms.
TJ: In your post, “The AI Misinformation Epidemic,” you wrote:
[S]ee the recent Maureen Dowd piece on Elon Musk, Demis Hassabis, and AI Armaggedon in Vanity Fair for a masterclass in low-quality, opportunistic journalism.
What exactly is wrong with this article? And is the problem Mareen Dowd’s lack of understanding about artificial intelligence or Elon Musk’s fearmongering?
ZL: The short answer is that they are both problems, and they compound each other. Elon Musk doesn’t really deserve to have a voice in the public discourse about machine learning. He’s not an expert, and his primary achievement in the area is that he pledged a lot of money to fund AI research. But the media hype about ML, and the amount of journalists hungry to spin out click-bait amplify his voice. Elon has a certain iconic status, and that means that there’s clicks in a story about him, and whatever sensational hooey he happens to be spinning at the moment.
Unfortunately while we have certain conceptions about the role of the press, they don’t square well with the business model of the press. There’s a few really talented journalists covering AI but too often their more sober stories are drowned out by the clowns. It’s a daunting situation, but perhaps over time audiences and editorial boards will get a bit wiser, and the talent pool among AI writers will get a bit deeper.
TJ: Thank you Professor Lipton for your time, is there anything else that you would like people to know?
ZL: Great meeting you! I guess for young people in the audience interested in learning about machine learning I’d point out that there’s never been a better time to be young and autodidactic. Even just Wikipedia, compared to anything I had access to as a high schooler, is game-changing. Moreover the number of high-quality courses on basic maths, computer programming, and machine learning available on YouTube is amazing. Whether you’re looking to understand ML vis-a-vis the business climate, to fact-check claims in a news article, or to be come a scientist yourself, the information is all out there if you’re looking for it.