83: AI Safety with Buck Shlegeris

Play episode
Hosted by
Will Jarvis

In this episode, I’m joined by Buck Shlegeris, CTO of Redwood Research, to talk about AI safety, effective altruism, building an effective career, and how to seek out criticism. You can find Buck’s work at https://shlegeris.com/.


Will Jarvis 0:05
Hey folks, welcome to narratives. narratives is a podcast exploring the ways in which the world is better than in the past, the ways that is worse in the past towards a better, more definite vision of the future. I’m your host, William Jarvis. And I want to thank you for taking the time out of your day to listen to this episode. I hope you enjoy it. You can find show notes, transcripts and videos at narratives podcast.com.

Buck, how are you doing today?

Buck Shlegeris 0:45
I’m doing all right. How are you?

Will Jarvis 0:48
doing? Great. Thank you so much for hopping on. I really appreciate it. Could you go ahead and give us a brief bio and tell us the big ideas you’re interested in?

Buck Shlegeris 0:58
Yeah, so I guess my brief bio is, I currently work as the CTO at Redwood research, which is a nonprofit research institute research doing some applied alignment research. I’m also a fund manager on the effective altruism infrastructure fund where I meet grants for various things related to effective altruism, movement building and related types of movement building. I have been involved in AI alignment technical research for the last three or four years. Before that I was a software engineer. And before that, I studied computer science.

Will Jarvis 1:30
Awesome. Awesome. And when you mission alignment research, you’re talking about AI alignment, correct? Yeah. on research super, sir, to not all our listeners are familiar with it. So could you talk a little bit about, you know, ai safety, why you think it’s important and how you got there?

Buck Shlegeris 1:46
Yeah, so I guess the basic, the basic pitch for AI safety nets starts by noting that a lot of the stuff which happens in the world, is determined by cognitive labor. You know, cars have changed the world, what buildings have changed the world a lot. And many different types of many different types of work and management resources are required to put these things together. But one of the key limitations is how good the ideas we have are for how to build these cars, like how to design the machines that build these cars, and that make it cheaper to have mine and so on, it seems reasonably likely to me that over the next century, the primary, it’s going to become the case that the primary way that we get cognitive labor done is instead of paying humans to think about it with the brains, which are operating inside their heads, you know, with 20 watts of power or whatever, it becomes cheaper to get machines to do the thinking for you, in the same way that it’s now cheaper to get machines to lift the blocks for you if you’re assembling a large building. And just as machines that made moving blocks around cheaper, affected the size of the buildings, which get built and made the whole economy a lot larger, I suspect that drastically reducing the price of doing cognitive work, by making sure that machines can do it, and also making it so that we can do types of cognitive work that are currently impossible. Seems like it might change the world a lot. So that’s like the the case for like, AI is a big deal, I’d have to make a separate case for maybe this happens within the next century. But that’s like part one of that case. Um, and then Part two is, we’re thinking that we should work on AI alignment. So by alignment, I mean something like, I want to make it so that as AI gets developed, it’s as easy to use it for many tasks that I care about, as it is to use it for some other tasks. So maybe the easiest way to explain this, um, sometimes, sometimes people talk about the possibility that social media makes it easier to spread misinformation than to fight it. Um, and this is an example of a case where you have this technology and the technology for you know, the technology itself isn’t particularly like. It’s not like I’m just like building bio weapons are something where obviously, the only application of the technology is destruction. But in some cases, technology makes it easier to destroy value, or like technologies will lend themselves more easily to destroying value than they lend themselves to producing value. And for various reasons, I’m worried that by default, AI is going to be one of these technologies that might be easier to deploy in ways that destroy certain types of value, then, how easy it is to deploy it in ways that preserve that type of value. That’s perhaps a really abstract way of saying this. a concrete example is I’m worried that it’s gonna be easier to make AI systems that um, you know, you ask them a question, and they give you back an answer that sounds really smart than building an AI system that you ask it a question and it gives you back an answer that is, in fact a really good idea. Um, you know, you asked for some advice, and you want you want good advice out. But the problem is the way you make AI systems do stuff is you train them. And I have an easy way of training a system to give advice that sounds smart to humans, which is I get the AI to propose two different pieces of advice that might give to a human. And then I pay some humans 15 bucks an hour to read your questions and like to propose pieces of advice and click the advice that seems smarter. And pretty soon, this ad is going to give you advice, that sounds really, really smart. But suppose I wanted to build a system that gives you advice, that is actually good advice. Um, it’s much harder, because among other things, I mean, the easiest case is when someone gives you advice, and you want to figure out if it was actually good advice or not, sometimes you have to wait to see what the effects of the advice are. And it’s fundamentally harder to train a system if you need to, like wait 10 years between starting the training process and finishing the training process.

And also, sometimes it’s hard to tell whether advice was good advice, even with the benefit of all hindsight. Um, and so it’s just like, it’s just like, systematically harder to construct a training process, which like gets this guy to give you actually good advice. And a third kind of of concern you might have is, if the advice is deceptive, if the advice gives the eye access to your bank account, and let’s it lets it steal your money and start posting itself on the internet by like you’re paying for its own its own servers to run on, you will never have a chance to analyze it. It’s just like lives on the internet now. And so for a couple of these reasons, it feels like it might be easier to build these systems that are that you get advice sounds really good, but you can’t really trust them just to build these systems that give advice that is actually good. And this is an example of a kind of fundamental seeming hard problem, I’m worried that if you just like have a bunch of stuff like that going on, it’s pretty dangerous and bad.

Will Jarvis 6:59
That, you know, that is such a great point. And I think about this in context of my dog, right, you know, he had like a skin condition recently had to get a bath, he hates the bath, there’s no way for me to inform him that the bath is a good idea. You know, it’s like just above his cognitive paygrade. And there could be things like that where you know, the eye tells you something, it’s a great idea, you have no way of understanding how that it’s going.

Buck Shlegeris 7:23
That’s exactly,

Will Jarvis 7:24
that’s super interesting. So I think it’s really important thing to work on, as you mentioned, it could be there’s a lot of asymmetries that it can exist. It can be used as a weapon. It could be and we tend to develop things to use them as weapons. So I think of like, you know, nuclear power that was developed originally, you know, we’ve Manhattan Project wanted a weapon. It seems like most of the mainstream debate around AI is around, you’re just going to take our jobs or automation is taking our jobs. But it does seem to me like the more important thing to worry about is the what if this, you know, what, if it’s evil, or it does things that are bad to humans? How do you think about that kind of issue? Yeah,

Buck Shlegeris 8:13
so I mean, I think that as with so many things, there’s like, answers to this question, you know, increasing levels of sophistication where, you know, you say something in the next. And then the next paragraph says, Well, actually, and you point out the consideration the previous one misses. So we’re gonna go from the simplest explanation, the simplest answer first, um, I think, my moral perspective, or whatever, is that I’m a, I’m a long term, just, I care a lot about the welfare of humanity and other sentient life in the long run future. And so I’m inclined to think that we should mostly be focusing our efforts on making it so that stuff is good a million years from now. And it by default, would be really, really hard to influence the world. Like most times in history, it would be really, really hard to influence the world million years from now, you know, if you were it was 500 years ago, I think that it would have been extremely difficult to to influence the world of today little in the world of add in 1995. Or sorry, no, that’s not the number years in the future from then on. But I think it’s kind of plausible that we are at a pretty unusual period of history where it is, in fact possible to influence the long run trajectory of humanity, basically, because I think that the probability of human extinction this century is a bunch higher than it is more centuries than it has been in the past and then it will be in the future. And so for this reason, kind of my top priority is making it so that when humanity builds these really smart systems that make things that are able to do lots of stuff. I want this to go well for humanity. And so my kind of simple first answer to the question of how I think we should prioritize between worrying about making that transition go well, and worrying about unemployment in the short term, is I’m just like, well, I don’t know, it seems like almost all of the value is in the future. And so I don’t think that I should be prioritizing very much worrying about technological unemployment in the short term, because the number of people affected is just much smaller. And I believe in prioritizing things based on the number of people affected. Um, and then I think that there are actually some more complicated, I think there’s like a complicated second quarter tick, which is something like the following. At some point in the future, we’re going to build the first systems that are smarter than humans are. And this is likely to go better if the world is less of a mess. When this goes down, so for example, I feel a bunch more optimistic about the deployment of early artificial general intelligence systems, you know, AGI, I feel a lot more optimistic. But how well that goes down in worlds where the world is in a unprecedented period of international cooperation and peace, rather than in the middle of World War four, or like, whatever, you know, that seems like it would be much better. And so I think that it is plausible that before the world becomes radically different, as a result of really powerful AI, the world just becomes pretty different. And one example of where it could become pretty different is a, you know, technological unemployment. And so I do actually think that if there was a strong story for technological unemployment, causing things to be really bad, and a real mess, then I would think we should worry about that, because of the fact that it puts humanity in a worse position. For the sake of long term future.

Will Jarvis 11:50
God says something kind of, like if our starting blocks are bad, when the AI takes off, things could be could be quite Rocky, it could be very bad.

Buck Shlegeris 11:59
Yeah, that’s right, we’ve got like two different problems. One is like, you know, the, the, the core problem is like, when we actually build these crazy, powerful systems, we really want that to go well. But also, these less powerful systems we have on the way there might mess things up in various ways. And that would be unfortunate.

Will Jarvis 12:17
Gotcha, gotcha. That makes a, that makes a ton of sense. When, you know, how do you think about factoring the value of future people versus, you know, the people that are here? Now, if you see I’m saying, and there are going to be, you know, God willing, there’ll be quite a few more people in the future. So you know, even if you factor it a lot, it would still, they still matter a lot? That seems like,

Buck Shlegeris 12:44
yeah, that seems right. To me. Um, I think I feel a little confused by exactly how much to weigh the interests of future people on almost all people are in the future. You know, it seems like, we could probably call that eye space, if we tried. There are many galaxies, there are many stars, you know, 100 billion stars in the Milky Way, almost 100 billion reachable galaxies. If we leave now, many, each of these stars can support a very large number of humans or other sentient life, sentient beings. And so it’s kind of, I mean, like, the argument for not caring about that is not going to be like a numerical argument, like, well, I only care about these, these beings like 1,000,000th, as much as I care about current beings, the argument has to be something like, I don’t care about any of that for some reason. Um, and so I just like, I’m just like, I don’t know, man, it seems like, that’s where the money is, or, you know, her, like, that’s where the value is, like, that’s what we should be focusing on with most of our efforts. Um, I, you know, I care. I personally care deeply about the welfare of the people around me, whereby around me, I mean, who exists now on Earth. But I think that from an altruistic perspective, like with the proportion of my efforts that I try to use to make the world better in an impartial way for everyone. Basically, all of that effort is focused on the long term future.

Will Jarvis 14:13
Gotcha, that makes a ton of sense. And do you think Currently, there are there areas specifically on in existential risks that are that are being kind of overlook or are particularly underfunded at this at this point, or like, you know, in terms of human attention, so I always say,

Buck Shlegeris 14:34
um, I don’t know. I I’m probably I’m not gonna have good answers here. I think um, I think that it’s kind of like underfunded with respect to what I think that the existential risk community is pretty good at allocating effort. I it’s pretty rare, especially nowadays for me to hear about something that seems absolutely crucial where like, no one is working on it. I wish that we had many more people. I wish. Yeah, I wish we had many more people. I wish we had many more competent organizations trying to do stuff. I don’t think that there’s that much room for more funding. But yeah, so So I would say I don’t have any like, really contrary, in tastes like if I managed $10 billion, I think I would have substantially different ideas of how they should be spent compared to the how money is currently being spent on existential risk.

Will Jarvis 15:31
That’s great. That’s great. Well, it does seem like there’s a lot of really smart people working on it, which gives me hope on on the problem, it’s that it’s not, you know, you know, less smart people looking working on the puppy pounds. So, you know, maybe more of $20 bills on sidewalk there. I don’t know. Cool. So, in general, how good do you think you met humanity? Is it coordination? Kind of absolute terms?

Buck Shlegeris 15:59
I mean, it’s really hard to say. And I think that a lot of you know, there’s been a lot of disagreement within EA about how good humanity is at coordination. And I’ve, you know, I’ve written about it a little bit, which is probably while you’re asking this question. And I think that most of the time, the difficulty comes down to operationalizing, what it means to be good or bad at coordination, I think we could point to examples of things where humanity did or did not do particular things like we can all agree that the Manhattan Project happened, we can all agree that the FDA did not approve various COVID vaccines months earlier than they did. Timing definitionally. And then the question, just like what you take away from this, um, I feel like the main questions about coordination that I’m interested in, one I’m interested in, I guess, like, I’m interested in the probability that when faced with problems that really require hardcore coordination between various important groups in the future, I’m interested in whether it gets messed up either by the groups in question being too competitive and not cooperate with each other enough, or whether it gets messed up just by everyone involved, making bad mistakes that are bad by their own lights. And it feels like to operationalize the question of how competent people are at coordination, what we’d really have to do is describe in concrete detail the scenarios in the future which might come up, at which point it feels like it’s really important that people be cooperative. And then if we had this really specific list of coordination problems that might come up in the future, we could ask, we could try to think of historical analogues to these coordination problems, we could try to form these analogies we could be like, well, I don’t know, this seems kind of like the problem of making sure that the US nuclear weapons stockpile didn’t have all of the key cards set to 0000 for several decades. All right, there’s the the security cards required to launch the nukes. Yeah, um, in which case, I feel pessimistic about that type of problem given us as empirical performance on the analogy. Or you could argue that domain analogy is this feels as hard as humans not launching nukes at each other during the Cold War, which we succeeded at. And so I think that when we try to talk about how good humanity is at coordination, the main problem here is that we really need to operationalize exactly what types of things we’re asking about. And this is really hard, because it involves futurism. And you know, making concrete giving concrete stories for how things might be in the future. And so I am not super impressed with any answers proposed by anyone to this kind of question historically, including myself, of course, it

Will Jarvis 18:49
is really difficult, it’s a really difficult thing to get to. That’s cool. So making, you know, hard predictions about the future. I want to talk about AI a little bit, you know, what do you think AGI looks like when it takes off? Is it something like Robin Hanson’s age of n where we just get good medical imaging technology? We can image people’s brains and we can run them faster? We can emulate them or is it is it something else?

Buck Shlegeris 19:16
So I currently think it’s much more likely that we I think it’s quite likely that we get de novo AGI by which I mean, intelligent systems that we trained from scratch rather than by trying to copy particular humans. Gotcha. It seems to me like the rate of progress on whole brain emulation has not been that high over the last decade. There are several different bottlenecks there. At least two of which seem quite hard. Um, and so I guess my my I’m, I don’t know if you’ve made me give you a number right now I’m like 80% we get you know, a I before whole brain emulation.

Will Jarvis 19:57
Gotcha. That makes sense in my head. My friends in physics say there’s some, there’s some hard challenges there that are just difficult to overcome. And just there’s some recent, there’s a recent news about the, the worm emulation project, you haven’t gotten kind of nowhere last kind of 10 years, which is

Buck Shlegeris 20:14
just like a lot of things like, then you got to make like crazy microscope. And this like crazy microscope has to be able to like, image things at tiny resolutions. And also, when you aim this tiny microscope at a neuron, then you’re on catches fire immediately. And so you kind of like, image it real fast before it’s gone from exploding, or, and so on. And so it seems like a fundamentally quite difficult problem. They’re not necessarily entirely impossible, you know, AGI might also be a hard problem.

Will Jarvis 20:48
Super cool. I want to take kind of a left turn here. And although it is related, what advice do you have for building a really high impact career? You’ve been successful at this? And and advice is hard to give. So I don’t know, it’s a difficult question. But are there any general takeaways you’ve had, that you think would be useful for people?

Buck Shlegeris 21:09
Yeah, um, I don’t know the extent to which I expected to be successful. I’m trying this, this this Redwood research stuff, we might succeed, and we might fail. Um, just as a caveat, I want to give, as you say, advice is hard on some things, which some choices that I made, that I think turned out quite well for me, which of course is different from them being a choice that someone else would want to make. I’m, I’m very glad to have spent a lot of time when I was young, messing around programming. I think that I in unstructured ways where I just tried to think about what was cool and do things that I thought were cool. This has turned out pretty useful, because it means that I’m now quite comfortable making up solutions to computer science problems. When I was, I guess, 21 and 22. I submitted talks to this industry scholar conference, where you know, it was the kind of thing where you, like submit the abstracts before actually doing the project. And then I got in and so I was like, I guess I have to actually do a project now. And I I did this whenever I spent 200 hours each year on or something on this, like Dumb, dumb programming project. But it was fun. It was a it was legitimate research, I learned how to do research on my own, I learned how to stare at whiteboards for hours on end, and like, slowly pieced together designs. And that made me happy. And I think has been like a useful set of skills for later. Um, I’m glad to have spent a bunch of time trying to think through EAA concepts, you know, effective altruism, concepts. I’m glad to spend a lot of time arguing with people about the form of the I met problem, a bunch of things about ethics and how we should feel about the long term future. I think that time has been well spent. I’m, I’m really glad to have been down for doing things. I feel like some people. I feel like there have been a bunch of points in my career where people asked me to do a particular job. Like, for instance, a bunch of recruiting and outreach stuff. And Mary, and I, you know, this wasn’t particularly the job that I had signed up for. But I in hindsight, I’m really glad that I did it. I think that I’ve learned a lot from like, taking these opportunities that weren’t like, super obvious fits. But were just like a way I could produce value write them. Similarly, when I was a software engineer, I was working at triplebyte, which is a startup. And I found it really useful to just try and produce value whatever way I could, I, you know, I developed aspects of the programming interview that were getting people and I made web interfaces for sending emails more efficiently. And I just did a lot of random stuff. And I think that I got more value out of that experience. Because I was doing kind of random stuff that I hoped would produce value, compared to how much I would have gotten if I had said, you know, I’m just a back end engineer or full stack engineer, I just really want to be doing backhand or full stack engineering all the time. I know those are some thoughts.

Will Jarvis 24:25
That’s great. You know, do you know Ben gherkin from way? Yeah, so he, he said something similar, he said, You know, he learns he’s learned quite a lot, because oftentimes the most valuable thing for him to be doing it way will be like, Okay, I’ve got to fix this obscure, you know, accounting problem, or, you know, I’m an engineer, being a software engineer, and I think I’m really good at what I do, but you know, I’m just building a crud app, but this is super valuable for you know, people. So I think that’s great advice, you know, lean into what you what you think can be valuable at any given time.

Buck Shlegeris 24:57
Yeah, I don’t know if it’s great advice for everyone.

Will Jarvis 25:01
For some people, yeah. That’s great. That’s great. Yeah. You wrote something on your blog, which I found really interesting. I like to just, you know, have you expand on a little bit? You mentioned that you’re less sure nowadays that us getting wealthier is like a slam dunk positive thing. As a society, what do you think about that?

Buck Shlegeris 25:24
Um, I think that it’s confusing. I think that there is a lot of suffering close to farmed animals that did not used to get close to harm animals. Since the 60s, animal agriculture has gotten much more intensive, the price of animal products has fallen substantially. And the average experience of being an animal in one of these farms has gone way down. And I think this is one of the greatest moral tragedies of our time. And I think that this should give us some pause. I think that there’s a certain if you if you just ask the question, like, let’s let’s define human civilization to include all this stuff directly influenced by humans, in a certain sense, then it feels like I mean, it kind of depends how much moral weight you assigned to chickens compared to how much you assigned to humans. But I think there’s a very plausible way to do the calculation, under which the main thing which has happened over the last 70 years, is that many more animals are tortured and farmers. And that is not good. And feel like it gives me pause about whether I want to be like, yes, technology has obviously made things better.

Will Jarvis 26:45
That’s right. That’s right. Is that that’s a great point is it’s not always, it’s not automatically good. I think that’s a really important point. And also, you know, you know, what has happened to farm animals in the last, you know, 40 years is, is not, it’s not that, you know, there’s a lot of suffering that goes on, because we, you know, it’s hyper competitive, we’re trying to feed everybody, everybody wants to eat more protein, and we’re animal protein and kind of negative consequences from that. Yep.

Buck Shlegeris 27:09
So compared to. So I think that the actual overall effect of humans getting wealthier, as evaluated on in terms of short term welfare is actually kind of ambiguous, because of the fact that we didn’t include wild animal suffering, in the previous in the previous calculation. So another major change, which has happened over the last 70 years, is that there are many fewer wild animals of certain types. And there are more of some other types. I think, though I don’t know the numbers on this very well. And so like, whether the average experience or whether the total welfare of all the individuals on earth waited, my moral importance has gone up or down, is going to turn out to like, rest on something about what it’s like to be a tiny fish or something, you know, this is really a lot of tiny fish out there. And if it’s, if the tiny fish are having a more bad time than they used to have, then maybe that’s just like, by far the most important fact about the last 70 years, from a short term perspective, I find this plausible. And so then it’s like, well, I don’t know what what should what morals Should we take from this for the future? I think it’s unclear. There’s definitely like one, one way you could think about this is like, well, humanity shorted a lot of stuff. And even though it’s not, you know, in 1950, we knew that there were tiny fish in the ocean. And humanity was not very careful about this. Humanity didn’t say like, Well, you know, before we continue doing our economic growth, we better really think about tiny fish. And like, really think about whether we’re making the world worse for the tiny fish as we do our economic activities. They didn’t care about that at all. And you might extrapolate this forward to the future. And you might say, well, in as much as humanity gains more power, and influences more, more total like computation that’s going on, like influence, it expands for these other stars or these other planets. Maybe we should be afraid because last time humanity wild, they expanded its range of influence. This went badly. It did. In fact, it did, and it didn’t go badly that it didn’t go badly. By coincidence, though, there was no one at this wheel. We just like pressed the gas, and maybe it was okay, like maybe their children were hit. But like that doesn’t necessarily entirely entirely comfort me. So that’s the pessimistic story. I think that the optimistic story has two parts. One of the parts is the kind of object level part. It’s like, um, the reason that this went badly was, at least in the, at least in the animal agriculture case, I think the wild animal suffering cases less clear. I think the reason that animal agriculture has resulted in a net decrease in total welfare is kind of like a weird feature of. It’s like kind of a coincidence. And I think it goes away. As everyone gets wealthier, like it’s so happens that currently the cheapest way to make things that have all the same properties as ground beef is to raise some cows, and then kill them raising cattle and then kill them. But I suspect that in fact, if it was, and it turns out, the cheapest way to do that is in a way that seems pretty bad for the cattle involved. But I suspect that most people actually have sort of a preference for cattle being well off. And I expect that if technologies like proceeds in normal ways, then animal agriculture will become wildly less widespread. And also, we’ll probably treat the animals a lot more nicely, because it’ll just be cheap, too. And like consumers, like kind of want that. And enough people are in favor of this, that we will probably pass legislation eventually. So there’s this one argument, which is like, Well, you know, things went badly last time, humanity got a lot more powerful, but in a way, that’s going to get better as they get more powerful again, so maybe it’s just totally fine. So so that’s like one argument for why we shouldn’t be worried about the future being bad. It’s like, well, this, the way that things were bad this time, was kind of like a weird one off, just like related to the fact that there was like a bunch of animals around that you can raise for meat, and you can like, make some kind of unhappy while you’re doing that. And here’s the other argument for why you might think the future might be good, it’s that, you know, we were talking about how AI might really change the world a lot by reducing the price of certain types of labor. Um, one of the types, one of the ways that you think this might influence the world, is it means that people will have access to better advice and better reasoning. So sometimes, I’m not sure what I think the right thing to do is about some problem. And I currently can’t, you know, call up my extremely intelligent AI advisor and say, like, Hey,

hey, do you want to, like, tell me about some, like, moral facts about the world that I’m missing? Like, what some stuff which like, is bad, that like, I should be worried about like, Am I like, you know, am I like, and you might hope that if it was like, you know, the 1930s, and you end like, you had this AI system, and I’d be like, Look, man, you’re being like, really homophobic. And I think that if you like, think about this, it’s gonna turn out that like, by your own values, your own beliefs, you’re gonna wish you’re less homophobic. And I’ll, you know, I go off and think about it. I’m like, yeah, you know what, you’re right, I should be like, less homophobic. And so you might hope that like, as everything gets cheaper, one of the things that gets cheaper is good moral advice. And people will want to consume good moral advice, and then they’ll consume more of it. And then we’ll have a future which is more enlightened. And maybe this just like makes things. Maybe this just means that the future is more good than the past. Because, you know, the price of an LED goes down. And so the quantity purchase goes up.

Will Jarvis 32:52
Absolutely. It’s just a lot, it’s a lot easier to to understand and get that information. I’m curious, do you think, the moral circle, or like our moral circle, is it? Do you think it’s fairly fixed? Or is it something you think we can actually make real, we can really expand it? The example I give is, is people you know, they used to do these surveys, people used to get really concerned about their, their kids marrying a child of a different race. And now that’s just shifted over, you know, people don’t worry about that as much anymore. Now, they worry about the kid marrying someone of a different political affiliation at the same rates. Do you think this is something where we can like expand, like expand our moral circle? Is it like kind of fixed and they we can optimize on the edges, but there’s only so much we can care about at any given time?

Buck Shlegeris 33:40
Yeah, I mean, like, I think that something weird about that example is, it’s not totally clear that I have a hesitation about my, my children marrying someone of class x, it’s not totally clear that this is the same as I do not care about the welfare of people in class x, though probably in both cases, it’s in fact correlated. Um, I think my guess is that moral circles have, in fact, gotten larger in the way that I care about overtime. I definitely don’t think yeah, I think that probably there’s like two phenomena here. One of them is like, I don’t know, how do you know Scott Alexander’s the group, the out group and the far group thing?

Will Jarvis 34:29
I do not. I know in group, the Far far group.

Buck Shlegeris 34:33
I might be misquoting him. But the basic claim is, you know, there’s a bunch of Democrats who really hate Republicans, but who don’t really have an opinion on a bunch of people who live in Saudi Arabia. And even though the republicans have values much closer to them than they have to the people who run Saudi Arabia and, and this is just because you know, there’s like a narcissism of small differences. thing. You know, you Get in if you’re a democrat you like get into more fights with with Republicans. And so my best guess is that a time as time has gone on people’s, in groups of larger people’s out groups have gotten somewhat larger. But peoples far groups like these groups of people who they, like, don’t interact with that much, but like are aware of the existence of and could care about the welfare of have gotten, like, way bigger. Um, and so I think I’m not that worried about like, the size of the people you care about morally being fixed over time.

Will Jarvis 35:33
That I think that’s, that’s really well put. And also, I think, another example, which I think you gave earlier, which is helpful is if people had opt in, you know, access to ground beef, that cost the same or less, but it’s like, this is cultured, and we didn’t torturing animals, you know, most people, it seems like, I have an intuition most people would pick that, you know, they’re like, yeah, sure, you know, like, if there’s an easy choice, it’s it, especially if it’s cheaper, if it’s cheaper, like they’re picking, you know, nine times out of 10. That’s super cool. Another life term, you know, how should we think about criticism? You know, should we seek out more of it in our lives, you know, in our careers? And would that be a valuable thing for people to do?

Buck Shlegeris 36:17
I think it’s hard to say, and kind of depends on how much criticism you currently seek out? Um, I think that this is question which is like, how do people currently decide how much criticism to receive? And why would they be making a bad decision about this criticism? Um, I think that roughly speaking, because like, I don’t know, if you ask me the question, like, how much should people you know, how much should people? I don’t know, how much should they buy chairs, if I wanted to argue that they’re buying too many chairs, or too few chairs, I kind of got a point to why I think they would make that mistake, you know, like, What mistake I think they’re making. And so similarly, when we’re talking about how you should be trying to solicit criticism, we have to talk about what mistake I think people might be making. Because I think there’s some sense when people on average, don’t make mistakes, or like, I think you got to do some work to claim they’re making a mistake. Yeah, um, I think that the core reason to claim that the people I hang out with should consider soliciting more criticism than most people solicit, is that I think that criticism is kind, like criticism has kind of a couple of different social roles. One of them is to point out to someone that they’re making a mistake. And another one is to hurt and belittle them in front of other people, and to, like, demonstrate dominance over them. And so I think that by default, like default, human society uses criticism for these two different things. And I think that if you think that you can decouple that for yourself, or if you’re in a particular relationship, if you can make it be the case, that when people provide you criticism, they are not, you know, trying to hurt you, and you are not going to respond as if they were trying to hurt you. And you can just, I don’t know, get their actual thoughts on how things should be different. I think that this is a somewhat valuable thing to get done. Um, yeah, I could try to give more theoretical thoughts on how, why people might not receive enough criticism. I also have takes on like, how to go like getting better criticism or whatever.

Will Jarvis 38:40
Yeah. How about the actionable you know, like, how should one go about getting better? Let’s say, like, for this podcast, like, how should I go about getting better criticism for it. So I think

Buck Shlegeris 38:51
the key thing to do, I think my my curves in here is probably going to be a making it so that you successfully signal that you, in fact, do want the criticism, and making it really easy for people to in fact, offer the criticism. So recently, I decided that I wanted some criticism on topics. And so I haven’t actually executed on this plan yet, or like finished executing on it. But one thing I’ve done that I feel pretty optimistic about when I finish it, is I started writing down a list of ways where I think I could be like, more one way or the other way. And instead of soliciting, instead of asking people if they had thoughts on how I should be different in general, I feel pretty excited about Austin people, you know, whether they think that you know, I want to like point out this this feature of how I am that is kind of a strength and kind of a weakness, and then I want to say do you think I should have more or less. And I feel like this is probably going to be better at soliciting people’s thoughts than asking them for feedback, miscellaneous. And there’s a couple reasons for this. But one of them is, I think that sometimes I want people’s feedback on some things. And not other things. For example, I think there are a lot of people where I’m interested in hearing them tell me that they think I should be slightly more careful with my programming. But I’m, like less interested in hearing them say that they like, think that I should dress better, because that’d be more attractive or whatever, right? I mean, there are other people from whom it would be useful to hear that there’s a lot, there’s a lot of other people don’t really want to engage in that kind of thing with them. And so if you give them a clear delineated list of topics on which you are definitely soliciting criticism, and then they don’t even have to come up with these criticisms themselves, they just have to opine in a direction one way or another. I think it’s much easier for them to climb. And I think in some cases, this makes it go better.

Will Jarvis 40:41
Gotcha. So it’s like giving them like a concrete, you know, area, it’s like, you know, exam my audio quality? Like, is it good or bad? Like, how can I make it better? That that makes it much easier for people to like, grab on something that instead of just like, how is it? Like, I don’t know,

Buck Shlegeris 40:55
God? Yeah, I think another version of these is like, suppose I could work on one of the four things on my podcast, I could work on my turn, I could work on my audio quality, I could work on the quality of my guests, I could work on the miscellany like quality of the transcripts. Which one of these do you think will have the largest impact per unit effort? I think you’d get something more interesting from that. And a lot of cases just asking for miscellaneous feedback.

Will Jarvis 41:18
That’s that’s very wise. And how do you go about signaling that, you know, you actually want feedback? And you’re in a good manner? Just you think you just tell people explicitly or people you trust? Or, you know, how would you think

Buck Shlegeris 41:31
of I think, I think that probably the most important ingredient here. I think there are some basic things. I think that when someone offers you criticism, you should say thank you for the criticism, and then you should default to being quiet. I think. But aside from some basic etiquette, things like yeah, I think that the main thing is to actually only ask for criticism, when you are actually in a mode where it’s healthy for you to receive it. I think that sometimes people should, in fact, not be solicited criticism. I think that, if you are a I mean, I think this is obvious to imagine in the case of children, I think sometimes children are in school, and they’re learning something. And I think that you could give them a variety of true criticisms of their essay, that would not cause them to, in 10 years be writing better essays. And I think this is true for adults, I think that at various points in my life, I’m doing some stuff. And I just do not have enough self esteem at that moment, to be into it to be interested in hearing a bunch of people’s like, wild thoughts on what I am maybe screwing up. And I think that To be clear, it is very good to normally be in a state where you, in fact, do have the self esteem to solicit people’s wild thoughts on what you’re messing up, because they might have good stuff. But I think that by default, you’re not in that state. I am currently not in that state. You know, I there are various people who I know who probably could give me some good criticism, if I asked, and I am currently not feeling good enough to do this right now, just like today. And I think that yeah, I think that some times I’m worried that people try people are in a state like I’m in today where they can’t, in fact, solicit criticism well. And then they say to their friends, oh, no, I am very open to criticism. And then it just like screws everything up, because they’re not telling the truth. Or maybe they’re not intentionally lying, but they’re wrong. And I think this makes their friends, less likely to give them good criticism. Maybe the friends do give them good criticism, and it’s in fact, bad for them. So I think that being able to, like really convince someone like no, I am actually deeply interested in your criticism right now. And I’m not worried. I’m not just saying this out of a sense that like a good rationalist should be open to criticism, I am saying this because in as much as I am making mistakes, I want to have a good list of those mistakes so that I can behave differently. So all this to say, I think like the key aspect of getting criticism better is actually being able to receive that criticism. I think people are really good at detecting this kind of stuff. And so the the first part of the work is, you know, is within or whatever.

Will Jarvis 44:26
I walked out Well, I think that that’s super actual, I think that’s, I really love that I got one more big question for you. So this is from our local sex meetup. If someone wants to get involved in AI safety research, you know, how would you suggest they do that? Like, what’s it what’s a good path? You think?

Buck Shlegeris 44:45
So really depends on who they are, what their what their background is, um, there are many different things that get done that get called AI safety research. Yeah. Um, and so yeah, I don’t know. I’ll say some stuff. Um, I Think that of jobs available in SAP technical research in five years, I think that a bunch of them are going to be software engineering and machine learning research related to various applied AI alignment stuff. Part of the point of railroad research is making it so that we’re like pushing for the infrastructure required to ensure that it’s possible to do great work by employing people in those things. So yeah, I think that like one part is, if you feel like you could plausibly be a great employee doing ml research or, like back end web infrastructure for ml research, which I think you know, if you’re just like a enthusiastic and energetic and widely knowledgeable and capable and fast Python, back end web programmer, that that’s like a lot of skill set there. I think that developing those skills is like a pretty, it’s like a pretty reasonable way to try and seek out these jobs. And then it really helps to apply to these jobs. Support drastically increases the probability to get one of them. I think some other I don’t know, I think the other main class of activity you ought to do if you want to work on AI alignment, technical research, is thinking about AI alignment. There’s a bunch of great resources on AI alignment these days, I think that Cambridge, the Cambridge effective altruism club has a really great course called AI safety fundamentals, where they This is an online course you can apply to. And you you know, it meets once a week with a facilitator. And I discuss a bunch of fundamental questions related to AI alignment. I think that doing that kind, of course, is like a great starting point for having thought through a bunch of the fundamental questions about like, what is the problem and what types of research ought to be done to make the problem better. And then getting deeper into it all from there. Yeah, is like the the other part of the So as I’ve described, like, two aspects of getting into a technical research, one is developing technical skills that you might be able to use in some of the applied research, which I suspect is where the majority of the jobs are. And two, is thinking a lot about AI alignment from a kind of fundamental perspective, which is useful both for being better at the first class of jobs, and also for taking some of the other types of AI and technical research jobs, which involve doing that kind of fundamental thinking all the time.

Will Jarvis 47:34
Gotcha, gotcha, that that’s great. That’s great. And I think that’s a very, that’s a very achievable path. And it makes sense legible. Like, yeah, go here, work on these things get better. And that’s where the skate to the puck work to where the puck will be very wise.

Buck Shlegeris 47:48
I should note that Redwood research is hiring for researchers and software engineers, so you can go to our website or email me to learn more about that.

Will Jarvis 47:57
Great point. Great. Like, oh, and well with that buck. Where would you like to send people?

Buck Shlegeris 48:02
Right to send people? Yeah. All listeners of this thing. Were on the internet. Should they go or some other class? Yeah,

Will Jarvis 48:10
yeah. Well, yeah. Where should they go?

Buck Shlegeris 48:14
Well, I don’t know. I like Redwood research.org. As a as a website, what are my other favorite places on the internet? I don’t know. I like the effective altruism forum. I like

I like lesswrong, calm has some good stuff. I thought that worth the candle was a really good piece of rationalist fiction.

I think that the AI alignment Forum has some pretty good content on Island mint. I think 80,000 dollars.org has a bunch of great stuff about how you should think about optimizing your career to do as much good as possible. And is a great source of a lot of actionable advice. Those are the main things that come to mind. I don’t know got any categories. I should think of that.

Will Jarvis 49:07
Particularly you know, your stuff, like people want to find your stuff. They they really resonated with this podcast, we read read research, you mentioned one. Anything else? Yeah.

Buck Shlegeris 49:17
I think that my work, my writing has mostly been published on the AI alignment forum. And the effective altruism forum and sometimes less wrong. I am sure you can provide links to those things in the podcast description or whatever. Yeah, that’s that’s where it mostly goes. Sometimes it’s on the finance. It’s on my website.

Will Jarvis 49:40
Excellent. Excellent. Well, but thank you so much for coming on. I really appreciate it. Thanks for having me.

Thanks for listening. We’ll be back next week with a new episode of narratives.

Join the discussion

More from this show