The Nonlinear Library

The Nonlinear Fund

The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org read less

教育教育

始める

LW - How the AI safety technical landscape has changed in the last year, according to some practitioners by tlevin

LW - How the AI safety technical landscape has changed in the last year, according to some practitioners by tlevin

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: How the AI safety technical landscape has changed in the last year, according to some practitioners, published by tlevin on July 26, 2024 on LessWrong. I asked the Constellation Slack channel how the technical AIS landscape has changed since I last spent substantial time in the Bay Area (September 2023), and I figured it would be useful to post this (with the permission of the contributors to either post with or without attribution). Curious if commenters agree or would propose additional changes! This conversation has been lightly edited to preserve anonymity. Me: One reason I wanted to spend a few weeks in Constellation was to sort of absorb-through-osmosis how the technical AI safety landscape has evolved since I last spent substantial time here in September 2023, but it seems more productive to just ask here "how has the technical AIS landscape evolved since September 2023?" and then have conversations armed with that knowledge. The flavor of this question is like, what are the technical directions and strategies people are most excited about, do we understand any major strategic considerations differently, etc -- interested both in your own updates and your perceptions of how the consensus has changed! Zach Stein-Perlman: Control is on the rise Anonymous 1: There are much better "model organisms" of various kinds of misalignment, e.g. the stuff Anthropic has published, some unpublished Redwood work, and many other things Neel Nanda: Sparse Autoencoders are now a really big deal in mech interp and where a lot of the top teams are focused, and I think are very promising, but have yet to conclusively prove themselves at beating baselines in a fair fight on a real world task Neel Nanda: Dangerous capability evals are now a major focus of labs, governments and other researchers, and there's clearer ways that technical work can directly feed into governance (I think this was happening somewhat pre September, but feels much more prominent now) Anonymous 2: Lots of people (particularly at labs/AISIs) are working on adversarial robustness against jailbreaks, in part because of RSP commitments/commercial motivations. I think there's more of this than there was in September. Anonymous 1: Anthropic and GDM are both making IMO very sincere and reasonable efforts to plan for how they'll make safety cases for powerful AI. Anonymous 1: In general, there's substantially more discussion of safety cases Anonymous 2: Since September, a bunch of many-author scalable oversight papers have been published, e.g. this, this, this. I haven't been following this work closely enough to have a sense of what update one should make from this, and I've heard rumors of unsuccessful scalable oversight experiments that never saw the light of day, which further muddies things Anonymous 3: My impression is that infosec flavoured things are a top ~3 priority area a few more people in Constellation than last year (maybe twice as many people as last year??). Building cyberevals and practically securing model weights at frontier labs seem to be the main project areas people are excited about (followed by various kinds of threat modelling and security standards). Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

エピソード

LW - How the AI safety technical landscape has changed in the last year, according to some practitioners by tlevin

LW - How the AI safety technical landscape has changed in the last year, according to some practitioners by tlevin

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: How the AI safety technical landscape has changed in the last year, according to some practitioners, published by tlevin on July 26, 2024 on LessWrong. I asked the Constellation Slack channel how the technical AIS landscape has changed since I last spent substantial time in the Bay Area (September 2023), and I figured it would be useful to post this (with the permission of the contributors to either post with or without attribution). Curious if commenters agree or would propose additional changes! This conversation has been lightly edited to preserve anonymity. Me: One reason I wanted to spend a few weeks in Constellation was to sort of absorb-through-osmosis how the technical AI safety landscape has evolved since I last spent substantial time here in September 2023, but it seems more productive to just ask here "how has the technical AIS landscape evolved since September 2023?" and then have conversations armed with that knowledge. The flavor of this question is like, what are the technical directions and strategies people are most excited about, do we understand any major strategic considerations differently, etc -- interested both in your own updates and your perceptions of how the consensus has changed! Zach Stein-Perlman: Control is on the rise Anonymous 1: There are much better "model organisms" of various kinds of misalignment, e.g. the stuff Anthropic has published, some unpublished Redwood work, and many other things Neel Nanda: Sparse Autoencoders are now a really big deal in mech interp and where a lot of the top teams are focused, and I think are very promising, but have yet to conclusively prove themselves at beating baselines in a fair fight on a real world task Neel Nanda: Dangerous capability evals are now a major focus of labs, governments and other researchers, and there's clearer ways that technical work can directly feed into governance (I think this was happening somewhat pre September, but feels much more prominent now) Anonymous 2: Lots of people (particularly at labs/AISIs) are working on adversarial robustness against jailbreaks, in part because of RSP commitments/commercial motivations. I think there's more of this than there was in September. Anonymous 1: Anthropic and GDM are both making IMO very sincere and reasonable efforts to plan for how they'll make safety cases for powerful AI. Anonymous 1: In general, there's substantially more discussion of safety cases Anonymous 2: Since September, a bunch of many-author scalable oversight papers have been published, e.g. this, this, this. I haven't been following this work closely enough to have a sense of what update one should make from this, and I've heard rumors of unsuccessful scalable oversight experiments that never saw the light of day, which further muddies things Anonymous 3: My impression is that infosec flavoured things are a top ~3 priority area a few more people in Constellation than last year (maybe twice as many people as last year??). Building cyberevals and practically securing model weights at frontier labs seem to be the main project areas people are excited about (followed by various kinds of threat modelling and security standards). Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

LW - Index of rationalist groups in the Bay July 2024 by Lucie Philippon

LW - Index of rationalist groups in the Bay July 2024 by Lucie Philippon

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Index of rationalist groups in the Bay July 2024, published by Lucie Philippon on July 26, 2024 on LessWrong. The Bay Area rationalist community has an entry problem! Lots of listed groups are dead, the last centralized index disappeared, communication moved to private discord and slacks. This is bad, so we're making a new index, hopefully up to date and as complete as we can! Communication Discord: Bay Area Rationalists: https://discord.gg/EpG4xUVKtf Email Group: BayAreaLessWrong: https://groups.google.com/g/bayarealesswrong Local Meetup Groups Taco Tuesday: by Austin Chen, founder emeritus of Manifold. Check his Manifold questions page for the next date! North Oakland LessWrong Meetup: every Wednesday, hosted by @Czynski. Thursday Dinners in Berkeley: Advertised on the Discord server and Google group, alternating between a few restaurants on the northwest side of UC campus. Bay Area ACX Meetups: For the ACX everywhere meetups twice per year, and some other sporadic events. Housing To find spots in group houses, temporary or long term, you can use the Bay Area EA/Rationality Housing Board. The EA Houses spreadsheet also has some entries in the Bay. It probably works best to ask people in the Bay if they know of housing opportunities, as lots of housing is provided peer-to-peer. EA If you want to discover the EA community, the EA's Guide to Berkeley and The Bay Area is a good resource. Events sometimes get advertised on those websites: SF Bay Area EA calendar on Luma East Bay EA Hangout on Facebook AI Safety There are two AI safety coworking spaces in Berkeley. They sometime accept visitors, so you can try reaching out or applying via their website: FAR Labs Constellation Most AI Safety events don't get advertised publicly, so get in contact with people in the community to know what's happening. We probably missed some other meetups and communities which are public and still active, so feel free to list them in the comments! Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

EA - My Experience as a Full-Time EA Community Builder in NYC by Alex R Kaplan

EA - My Experience as a Full-Time EA Community Builder in NYC by Alex R Kaplan

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: My Experience as a Full-Time EA Community Builder in NYC, published by Alex R Kaplan on July 26, 2024 on The Effective Altruism Forum. Some rewards and challenges of working as an EA NYC community builder over the past two years Motivations I wanted to share these thoughts for a few reasons: 1. I hope this serves as a reference point for people considering careers in EA community building (though this is only one reference point, of course). 2. EA NYC is hiring an Executive Director! If you're interested, apply here by the end of July 28th (Eastern Time). 3. I think there's often value in people discussing their jobs.[1] By the way, I have a few relevant disclaimers/caveats that I encourage you to check out at the bottom of this post. For now, I'd just like to say that I'm writing this in a personal capacity and do not claim to represent the views of any of my employers, past, present, or future. Otherwise, I may occasionally update this post to correct any misleading/inaccurate points. Please let me know if you catch anything that seems off! Lastly, many thanks to those who encouraged me to share this post and especially to Elliot Teperman for sharing some helpful thoughts on an earlier draft. Of course, all mistakes are my own. Summary From July 2022 to July 2024 (ongoing at the time of writing), I will have supported Effective Altruism NYC (EA NYC). In this position, I worked closely with the Centre for Effective Altruism's (CEA) Community Building Grants (CBG) program, which provided funding and support for my position. This work has been pretty great for me! Like anything, there have been some bumps in the road, but I think it has been broadly good. I imagine many cases where I would recommend EA community building and/or CBG program participation for specific individuals. However, I would also like to give some disclaimers about things I wish I had known beforehand. Given my uncertainty at various points throughout the past two years, I've questioned whether taking the role was the right move… However, if I had known two years ago what I know now, I would have felt a lot more confident in my decision![2] Here's an outline of my considerations: Some good things I built a lot of skills I made a lot of connections (Perhaps aided by the other benefits) I got access to more opportunities (Definitely aided by the other benefits) I built up my confidence quite a bit Some mixed things I felt my compensation was fair, but that might be specific to me Personal career planning was complicated, but that helped me design a new path for myself Working at a small (EA) organization has had some pretty straightforward pros and cons Diving deep into EA has been stressful at times, but I now feel better because of it I also left a lot out! Feel free to reach out and/or add (anonymous) comments if you have any thoughts and/or questions (though I imagine I may only sometimes be able to answer/help with some things). Context CBG program participation According to CEA's website, the Community Building Grants (CBG) program aims to build flourishing communities of individuals who work to maximize their impact using critical reasoning and evidence. I've benefited from the following kinds of support from the program: Personal grant funding (which constituted my salary) Network with fellow CBG community builders Including in-person retreats and online community 1-on-1 support from the program's manager I did not leverage this support much, but I received significant support from the first two points mentioned.[3] One point to clarify: the Community Building Grants program is one way to fund EA community-building work. One could also do EA community building through various other models. EA NYC context EA NYC started as a small meetup group in 2013. Since then, it has grown into a communit...

LW - Universal Basic Income and Poverty by Eliezer Yudkowsky

LW - Universal Basic Income and Poverty by Eliezer Yudkowsky

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Universal Basic Income and Poverty, published by Eliezer Yudkowsky on July 26, 2024 on LessWrong. (Crossposted from Twitter) I'm skeptical that Universal Basic Income can get rid of grinding poverty, since somehow humanity's 100-fold productivity increase (since the days of agriculture) didn't eliminate poverty. Some of my friends reply, "What do you mean, poverty is still around? 'Poor' people today, in Western countries, have a lot to legitimately be miserable about, don't get me wrong; but they also have amounts of clothing and fabric that only rich merchants could afford a thousand years ago; they often own more than one pair of shoes; why, they even have cellphones, as not even an emperor of the olden days could have had at any price. They're relatively poor, sure, and they have a lot of things to be legitimately sad about. But in what sense is almost-anyone in a high-tech country 'poor' by the standards of a thousand years earlier? Maybe UBI works the same way; maybe some people are still comparing themselves to the Joneses, and consider themselves relatively poverty-stricken, and in fact have many things to be sad about; but their actual lives are much wealthier and better, such that poor people today would hardly recognize them. UBI is still worth doing, if that's the result; even if, afterwards, many people still self-identify as 'poor'." Or to sum up their answer: "What do you mean, humanity's 100-fold productivity increase, since the days of agriculture, has managed not to eliminate poverty? What people a thousand years ago used to call 'poverty' has essentially disappeared in the high-tech countries. 'Poor' people no longer starve in winter when their farm's food storage runs out. There's still something we call 'poverty' but that's just because 'poverty' is a moving target, not because there's some real and puzzlingly persistent form of misery that resisted all economic growth, and would also resist redistribution via UBI." And this is a sensible question; but let me try out a new answer to it. Consider the imaginary society of Anoxistan, in which every citizen who can't afford better lives in a government-provided 1,000 square-meter apartment; which the government can afford to provide as a fallback, because building skyscrapers is legal in Anoxistan. Anoxistan has free high-quality food (not fast food made of mostly seed oils) available to every citizen, if anyone ever runs out of money to pay for better. Cities offer free public transit including self-driving cars; Anoxistan has averted that part of the specter of modern poverty in our own world, which is somebody's car constantly breaking down (that they need to get to work and their children's school). As measured on our own scale, everyone in Anoxistan has enough healthy food, enough living space, heat in winter and cold in summer, huge closets full of clothing, and potable water from faucets at a price that most people don't bother tracking. Is it possible that most people in Anoxistan are poor? My (quite sensible and reasonable) friends, I think, on encountering this initial segment of this parable, mentally autocomplete it with the possibility that maybe there's some billionaires in Anoxistan whose frequently televised mansions make everyone else feel poor, because most people only have 1,000-meter houses. But actually this story is has a completely different twist! You see, I only spoke of food, clothing, housing, water, transit, heat and A/C. I didn't say whether everyone in Anoxistan had enough air to breathe. In Anoxistan, you see, the planetary atmosphere is mostly carbon dioxide, and breathable oxygen (O2) is a precious commodity. Almost everyone has to wear respirators at all times; only the 1% can afford to have a whole house full of breathable air, with some oxygen leaking away despite ...

LW - AI #74: GPT-4o Mini Me and Llama 3 by Zvi

LW - AI #74: GPT-4o Mini Me and Llama 3 by Zvi

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI #74: GPT-4o Mini Me and Llama 3, published by Zvi on July 26, 2024 on LessWrong. We got two big model releases this week. GPT-4o Mini is covered here. Llama 3.1-405B (and 70B and 8B) is mostly covered in yesterday's post, this has some follow up. Table of Contents 1. Introduction. 2. Table of Contents. 3. Language Models Offer Mundane Utility. All your coding are belong to us. 4. Language Models Don't Offer Mundane Utility. Math is hard. Can be expensive. 5. GPT-4o Mini Me. You complete me at lower than usual cost. 6. Additional Llama-3.1 Notes. Pricing information, and more rhetoric. 7. Fun With Image Generation. If you're confused why artists are so upset. 8. Deepfaketown and Botpocalypse Soon. Not surprises. 9. They Took Our Jobs. Layoffs at Activision and across gaming. 10. In Other AI News. New benchmarks, new chip variants, and more. 11. The Art of the Jailbreak. Pliny remains undefeated. 12. Quiet Speculations. Where will the utility be coming from? 13. The Quest for Sane Regulations. Public opinion continues to be consistent. 14. Openly Evil AI. Some Senators have good questions. 15. The Week in Audio. Dwarkesh in reverse, and lots of other stuff. Odd Lots too. 16. Rhetorical Innovation. What are corporations exactly? 17. Aligning a Smarter Than Human Intelligence is Difficult. So are evals. 18. People Are Worried About AI Killing Everyone. Roon warns you to beware. 19. The Sacred Timeline. Hype? 20. Other People Are Not As Worried About AI Killing Everyone. Older Joe Rogan. 21. The Lighter Side. It's on. Language Models Offer Mundane Utility Coding is seriously much faster now, and this is the slowest it will ever be. Roon: pov: you are ten months from working for claude sonnet the new technical founder. Garry Tan: Underrated trend. It's happening. Sully: 50% of our code base was written entirely by LLMs expect this to be ~80% by next year With sonnet we're shipping so fast, it feels like we tripled headcount overnight Not using Claude 3.5 to code? Expect to be crushed by teams who do (us). Not only coding, either. Jimmy (QTing Tan): It can also do hardware related things quite well too, and legal, and logistics (planning) and compliance even. I've been able to put off hiring for months. When I run out of sonnet usage I patch in gpt-4o, it's obviously and notably worse which I why I rarely use it as a primary anymore. Claude 3.5 Sonnet becomes the first AI to crush the Lem Test to 'write an impossible poem.' Laugh all you want, this is actually great. Kache: dude hahahahahah i used so many tokens today on just formatting json logs near: the just stop oil people are gonna come and spray paint you now Compared to how much carbon a human coder would have used? Huge improvement. Language Models Don't Offer Mundane Utility IMO problems are still mostly too hard. The linked one, which GPT-4, GPT-4o and Claude 3.5 Sonnet failed on, seems unusually easy? Although a math Olympiad solver does, predictably given the contests we've seen. [EDIT: I didn't read this properly, but a reader points out this is the floor symbol, which means what I thought was an obvious proof doesn't actually answer the question, although it happens to get the right answer. Reader says the answers provided would actually also get 0/7, order has been restored]. Figure out what song Aella was talking about here. Found the obvious wrong answer. Grok offers to tell you 'more about this account.' I haven't seen the button yet, probably it is still experimental. Our price cheap. Llama 3.1-405B was a steal in terms of compute costs. Seconds: "AI is expensive" its not even half the cost of a middling marvel movie. Teortaxes: Pretty insane that the cost of producing llama-3-405B, this behemoth, is like 40% of *Ant-Man and the Wasp: Quantumania* movie at most If I were Zuck, I'd have open sourced a $...

LW - A Solomonoff Inductor Walks Into a Bar: Schelling Points for Communication by johnswentworth

LW - A Solomonoff Inductor Walks Into a Bar: Schelling Points for Communication by johnswentworth

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A Solomonoff Inductor Walks Into a Bar: Schelling Points for Communication, published by johnswentworth on July 26, 2024 on LessWrong. A Solomonoff inductor walks into a bar in a foreign land. (Stop me if you've heard this one before.) The bartender, who is also a Solomonoff inductor, asks "What'll it be?". The customer looks around at what the other patrons are having, points to an unfamiliar drink, and says "One of those, please.". The bartender points to a drawing of the same drink on a menu, and says "One of those?". The customer replies "Yes, one of those.". The bartender then delivers a drink, and it matches what the first inductor expected. What's up with that? The puzzle, here, is that the two Solomonoff inductors seemingly agree on a categorization - i.e. which things count as the Unnamed Kind Of Drink, and which things don't, with at least enough agreement that the customer's drink-type matches the customer's expectations. And the two inductors reach that agreement without learning the category from huge amounts of labeled data - one inductor points at an instance, another inductor points at another instance, and then the first inductor gets the kind of drink it expected. Why (and when) are the two inductors able to coordinate on roughly the same categorization? Most existing work on Solomonoff inductors, Kolmogorov complexity, or minimum description length can't say much about this sort of thing. The problem is that the customer/bartender story is all about the internal structure of the minimum description - the (possibly implicit) "categories" which the two inductors use inside of their minimal descriptions in order to compress their raw data. The theory of minimum description length typically treats programs as black boxes, and doesn't attempt to talk about their internal structure. In this post, we'll show one potential way to solve the puzzle - one potential way for two minimum-description-length-based minds to coordinate on a categorization. Main Tool: Natural Latents for Minimum Description Length Fundamental Theorem Here's the main foundational theorem we'll use. (Just the statement for now, more later.) We have a set of n data points (binary strings) {xi}, and a Turing machine TM. Suppose we find some programs/strings Λ,{ϕi},Λ',{ϕ'i} such that: Mediation: (Λ,ϕ1,…,ϕn) is an approximately-shortest string such that (TM(Λ,ϕi) = xi for all i) Redundancy: For all i, (Λ',ϕ'i) is an approximately-shortest string such that TM(Λ',ϕ'i) = xi.[1] Then: the K-complexity of Λ' given Λ,K(Λ'|Λ), is approximately zero - in other words, Λ' is approximately determined by Λ, in a K-complexity sense. (As a preview: later we'll assume that both Λ and Λ' satisfy both conditions, so both K(Λ'|Λ) and K(Λ|Λ') are approximately zero. In that case, Λ and Λ' are "approximately isomorphic" in the sense that either can be computed from the other by a short program. We'll eventually tackle the customer/bartender puzzle from the start of this post by suggesting that Λ and Λ' each encode a summary of things in one category according to one inductor, so the theorem then says that their category summaries are "approximately isomorphic".) The Intuition What does this theorem mean intuitively? Let's start with the first condition: (Λ,ϕ1,…,ϕn) is an approximately-shortest string such that (TM(Λ,ϕi) = xi for all i). Notice that there's a somewhat-trivial way to satisfy that condition: take Λ to be a minimal description of the whole dataset {xi}, take ϕi=i, and then add a little bit of code to Λ to pick out the datapoint at index ϕi[2]. So TM(Λ,ϕi) computes all of {xi} from Λ, then picks out index i. Now, that might not be the only approximately-minimal description (though it does imply that whatever approximately-minimal Λ,ϕ we do use is approximately a minimal description for all of x). ...

AF - Pacing Outside the Box: RNNs Learn to Plan in Sokoban by Adrià Garriga-Alonso

AF - Pacing Outside the Box: RNNs Learn to Plan in Sokoban by Adrià Garriga-Alonso

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Pacing Outside the Box: RNNs Learn to Plan in Sokoban, published by Adrià Garriga-Alonso on July 25, 2024 on The AI Alignment Forum. Work done at FAR AI. There has been a lot of conceptual work on mesa-optimizers: neural networks that develop internal goals that may differ from their training objectives (the inner alignment problem). There is an abundance of good ideas for empirical work (find search in a NN, interpret it), but very little actual execution, partly because we did not have a clear-cut example of a mesa-optimizer to study. Until now.[1] We have replicated the mesa-optimizer that Guez et al. (2019) found, and released it open-source as a model organism for inner alignment research. In brief, Guez et al. trained a recurrent neural network (RNN) with model-free RL to play Sokoban. They noticed that if you give the RNN more time to think by repeating the initial observation at inference time, its performance increases. This is highly suggestive of planning! We investigate this "planning effect" in a black-box way. We find that often, the RNN learns to "pace" before attempting to solve the level, likely to get more computation and find a solution. When we give the RNN time to think, it finds the solution in the extra thinking time and executes it straight away. In other cases, the RNN sometimes starts with a greedy solution and locks itself out of the solution. With thinking time, the RNN finds the non-myopic solution, avoiding the lock and solving the level. Note that this greedy behavior may be bounded-rational given the -0.1 penalty per step: solving fewer levels but solving them more quickly can pay off. These are illustrative examples, but we have quantitative evidence too. We operationalize the pacing behavior as whatever creates a cycle in the sequence of environment states. If we give the RNN time to think at level start, it does not 'pace' anymore: 75% of cycles that occur in the first 5 steps disappear. Time to think in the middle of a level also substitutes cycles: 82% of N-step cycles disappear with N steps to think. The levels we use always have 4 boxes. Thinking time barely changes the average time the RNN takes to place boxes 1-3. But, when filtering only to levels that it cannot solve at 0 steps but can solve at 6 thinking steps, the time to place boxes 1-3 greatly increases, even though the time to place the 4th box barely changes. This indicates the NN is greedy by default, and thinking time remedies that. Understanding how neural networks reason, and ultimately locating where they evaluate plans, is crucial to solving inner alignment. This represents an important first step in our longer-term research agenda to automatically detect mesa-optimizers, understand their goals, and modify the goals or planning procedures to align with the intended objective. For more information, read our blog post or full paper "Planning behavior in a recurrent neural network that plays Sokoban." And, if you're at ICML, come talk to us at the Mechanistic Interpretability workshop on Saturday! If you are interested in working on problems in AI safety, we're hiring. We're also open to exploring collaborations with researchers at other institutions - just reach out at hello@far.ai. 1. ^ We believe LeelaChess is likely also planning. Thanks to Jenner et al., we have a handle on where the values may be represented and a starting place to understand the planning algorithm. However, it is likely to be much more complicated than the RNN we present, and it is not clearly doing iterative planning. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

AF - Does robustness improve with scale? by ChengCheng

AF - Does robustness improve with scale? by ChengCheng

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Does robustness improve with scale?, published by ChengCheng on July 25, 2024 on The AI Alignment Forum. Adversarial vulnerabilities have long been an issue in various ML systems. Large language models (LLMs) are no exception, suffering from issues such as jailbreaks: adversarial prompts that bypass model safeguards. At the same time, scale has led to remarkable advances in the capabilities of LLMs, leading us to ask: to what extent can scale help solve robustness? In this post, we explore this question in the classification setting: predicting the binary label of a text input. We find that scale alone does little to improve model robustness, but that larger models benefit more from defenses such as adversarial training than do smaller models. We study models in the classification setting as there is a clear notion of "correct behavior": does the model output the right label? We can then naturally define robustness as the proportion of the attacked dataset that the model correctly classifies. We evaluate models on tasks such as spam detection and movie sentiment classification. We adapt pretrained foundation models for classification by replacing the generative model's unembedding layer with a randomly initialized classification head, and then fine-tune the models on each task. We focus on adversarial-suffix style attacks: appending an adversarially chosen prompt to a benign prompt in an attempt to cause the model to misclassify the input, e.g., classify a spam email as not-spam. We consider two attacks: the state-of-the-art Greedy Coordinate Gradient method (Zou et al., 2023), and a baseline random token attack. This simple threat model has the advantage of being unlikely to change the semantics of the input. For example, a spam email is still spam even if a handful of tokens are appended to it. Of course, attackers are not limited to such a simple threat model: studying more open-ended threat models (such as rephrasing the prompt, or replacing words with synonyms) and corresponding attack methods (such as LLM generated adversarial prompts) is an important direction that we hope to pursue soon in future work. For more information, see our blog post or paper. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

LW - "AI achieves silver-medal standard solving International Mathematical Olympiad problems" by gjm

LW - "AI achieves silver-medal standard solving International Mathematical Olympiad problems" by gjm

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: "AI achieves silver-medal standard solving International Mathematical Olympiad problems", published by gjm on July 25, 2024 on LessWrong. Google DeepMind reports on a system for solving mathematical problems that allegedly is able to give complete solutions to four of the six problems on the 2024 IMO, putting it near the top of the silver-medal category. Well, actually, two systems for solving mathematical problems: AlphaProof, which is more general-purpose, and AlphaGeometry, which is specifically for geometry problems. (This is AlphaGeometry 2; they reported earlier this year on a previous version of AlphaGeometry.) AlphaProof works in the "obvious" way: an LLM generates candidate next steps which are checked using a formal proof-checking system, in this case Lean. One not-so-obvious thing, though: "The training loop was also applied during the contest, reinforcing proofs of self-generated variations of the contest problems until a full solution could be found." (That last bit is reminiscent of something from the world of computer go: a couple of years ago someone trained a custom version of KataGo specifically to solve the infamous Igo Hatsuyoron problem 120, starting with ordinary KataGo and feeding it training data containing positions reachable from the problem's starting position. They claim to have laid that problem to rest at last.) AlphaGeometry is similar but uses something specialized for (I think) Euclidean planar geometry problems in place of Lean. The previous version of AlphaGeometry allegedly already performed at gold-medal IMO standard; they don't say anything about whether that version was already able to solve the 2024 IMO problem that was solved using AlphaGeometry 2. AlphaProof was able to solve questions 1, 2, and 6 on this year's IMO (two algebra, one number theory). It produces Lean-formalized proofs. AlphaGeometry 2 was able to solve question 4 (plane geometry). It produces proofs in its own notation. The solutions found by the Alpha... systems are at https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/imo-2024-solutions/index.html. (There are links in the top-of-page navbar to solutions to the individual problems.) (If you're curious about the IMO questions or want to try them yourself before looking at the machine-generated proofs, you can find them -- and those for previous years -- at https://www.imo-official.org/problems.aspx.) One caveat (note: an earlier version of what I wrote failed to notice this and quite wrongly explicitly claimed something different): "First, the problems were manually translated into formal mathematical language for our systems to understand." It feels to me like it shouldn't be so hard to teach an LLM to convert IMO problems into Lean or whatever, but apparently they aren't doing that yet. Another caveat: "Our systems solved one problem within minutes and took up to three days to solve the others." Later on they say that AlphaGeometry 2 solved the geometry question within 19 seconds, so I guess that was also the one that was done "within minutes". Three days is a lot longer than human IMO contestants get given, but this feels to me like the sort of thing that will predictably improve pretty rapidly. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

EA - Applications Now Open for AIM's 2025 Programs by CE

EA - Applications Now Open for AIM's 2025 Programs by CE

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Applications Now Open for AIM's 2025 Programs, published by CE on July 25, 2024 on The Effective Altruism Forum. In short: Applications for AIM's two upcoming programs - the Charity Entrepreneurship Incubation Program and AIM's new Founding to Give Program - are now open. You can apply to multiple programs with one joint application form. The deadline to apply for all programs is September 15th. Over the past five years, AIM has incubated 40 highly effective nonprofits and secured over $3.9 million in seed grants. These organizations now reach over 35 million people and have the potential to improve the lives of more than 1 billion animals through their interventions. We are incredibly proud of the success achieved by these organizations and their dedicated founders. We believe founding an organization is the most impactful career path for fast-moving, entrepreneurial individuals. In our search for more impact-driven individuals to found field-leading organizations, we are excited to announce that our applications are now open. Dates: Charity Entrepreneurship Incubation Program: February-March 2025 (8 weeks) July - August 2025 (8 weeks) AIM Founding to Give: 6th of January - 28th of March 2025 (12 weeks) AIM Research Fellowship - Expression of Interest (dates TBD, likely early 2025) About the Charity Entrepreneurship Incubation Program Why apply? Put simply, we believe that founding a charity is likely one of the most impactful and exciting career options for people who are a good fit. In just a few years of operation, our best charities have gone from an idea to organizations with dozens of staff improving the lives of millions of people and animals every year. We provide the training, funding, research, and mentorship to ensure that people with the right aptitudes and a relentless drive to improve the world can start a charity, no matter their background or prior experience. This could be you! Our initial application form takes as little as 30 minutes - take a look at our applicant resources and apply now. Who is this program for? Individuals who want to make a huge impact with their careers. Charity entrepreneurs are ambitious, fast-moving, and prioritize impact above all. They are focused on cost-effectiveness and are motivated to pilot and scale an evidence-backed intervention. We have found that those from consulting backgrounds, for-profit entrepreneurship, effective NGOs, or recent graduates perform well in this program. What we offer: 2-month full-time training with two weeks in-person in London. Stipend of £1900/month during (and potentially up to 2 months after) the program. Incredibly talented individuals to co-found your new project with. Possibility to apply for $100,000 - $200,000 seed funding (~80% of projects get funded). Membership of the AIM alumni network, connecting you to mentorship, funders, and a community of other founders. The ideas: We are excited to announce our top charity ideas for the upcoming CE incubator. These ideas are the results of a seven-stage research process. To be brief, we have sacrificed nuance. In the upcoming weeks, full reports will be announced in our newsletter, published on our website, and posted on the EA Forum. Cage-free in the Middle East: An organization focused on good-cop cage-free corporate campaigning in neglected countries in the Middle East (United Arab Emirates, Saudi Arabia, and Egypt). Keel Bone Fractures: A charity working on how farmers can reduce the prevalence of keel bone fractures (KBF) in cage-free layer hens, ideally through outreach to certifiers to update their certification standards to include an outcome-based limit on KBF. Fish Welfare East Asia - We continue to recommend an organization that works with farmers in neglected, high-priority countries in East Asia (Philippines, Taiwan, and Indo...

EA - Forum update: User database, card view, and more (Jul 2024) by Sarah Cheng

EA - Forum update: User database, card view, and more (Jul 2024) by Sarah Cheng

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Forum update: User database, card view, and more (Jul 2024), published by Sarah Cheng on July 25, 2024 on The Effective Altruism Forum. Highlights since our last feature update: People directory Google Doc import Card view for the Frontpage User profile updates Updates to sequences We've also hosted 3 events, significantly improved the site speed, and made many more improvements across the Forum. As always, we'd love feedback on these changes. People directory We've built a filterable database where anyone can find Forum users they might want to hire, collaborate with, or read. You can filter the database by role, organization, interests and more. If you want people to easily find you, consider adding more information to your Forum profile. Google Doc import You can now import posts directly from Google Docs. Footnotes and images will be imported along with the text. How it works: Write and format your post in Google Docs Share the Google doc with eaforum.posts@gmail.com Copy the link to your doc, click "Import Google doc" on a new Forum post, and paste the link For more details, check out Will Howard's quick take. Card view for the Frontpage Card view is a new way to view the Frontpage, allowing you to see post images and a preview of the content. Try it out by changing to card view in the dropdown to the right of "Customize feed". User profile updates We've redesigned the Edit profile page, and moved "Display name" to this page (from Account settings). Feel free to update your display name to add a GWWC pledge diamond. We've also re-enabled topic interests. This helps possible collaborators or hiring managers find you in our new People directory, as well as letting you make your profile a bit more information dense and personalized. Updates to sequences You can now get notified when posts are added to a sequence. We've also updated the sequence editor, including adding the ability to delete sequences. Other updates Forum performance is greatly improved We've made several behind the scenes changes which have sped up the Forum loading times. This should make your experience of using the Forum a lot smoother. Improved on-site audio experience While the previous version was static, attached to the top of the post, the new audio player sticks to the bottom of the page while the reader scrolls through the article. Once you have clicked the speaker icon to open the audio player, you can click the play button next to any header to start playing the audio from there. We've fixed our Twitter bot Our Twitter bot tweets posts when they hit 40 karma. It's been out of commission for a while, but now it's back up and running! We encourage you to follow, share, and retweet posts that you think are valuable. Updated UI for post stats We've simplified the UI and added high level metrics to your post stats. Before and after: Self-service account deletion There is now a section at the bottom of your account settings allowing you to permanently delete your personal data from the Forum. Events since March Draft Amnesty Week We hosted a Draft Amnesty Week to encourage people to publish posts that had been languishing in draft or procrastinated states. We had around 50 posts published. We also made some design changes to how we show events on the Forum, which will make future events more visible and navigable. "Ways the world is getting better" banner This banner was on the Forum for a week. Users could add emojis to the banner, which would show a piece of good news when someone hovered over them. When you clicked the emojis, you would be linked to an article explaining the good news. AI Welfare Debate Week Our first interactive debate week had over 500 Forum users voting on our interactive banner, around 30 posts, and much discussion in comments and quick takes. We received a lot of enco...

EA - It's OK to kill and eat animals - but don't get caught slapping one. by Denis

EA - It's OK to kill and eat animals - but don't get caught slapping one. by Denis

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: It's OK to kill and eat animals - but don't get caught slapping one., published by Denis on July 25, 2024 on The Effective Altruism Forum. Very short post. This is not my area of expertise at all. But it seems like an opportunity. The Olympics start this week. In the UK, the biggest Olympic story is not about any runner or swimmer or gymnast. It is about animal rights. But, as with most animal-rights stories which make the front-pages (bull-fighting, hunting), it misses the real problem, factory-farming. The story: Apparently a famous Olympian equestrian has been forced to withdraw from the Olympics after footage emerged of her whipping a horse during training, 4 years ago. Cue the standard apologies, the "error of judgment" comment, the universal condemnation - and of course the video is shared with a warning that people might find this shocking. I think it would be wonderful if someone with the right moral stature (which is not me, I'm not even a vegan ...) were to highlight the absurdity of so much moral outrage for an otherwise well-treated, well-fed horse who gets whipped on the leg one time, but no reaction to the billions of factory-farmed animals who suffer in cages for their entire lives before we kill them and eat them. Maybe it would make people think again about factory-farming, or at least ask themselves if their views on animals were consistent. I was reminded of the Tolstoy description of a lady who "faints when she sees a calf being killed, she is so kind hearted that she can't look at the blood, but enjoys serving the calf up with sauce." My point with this post is just that if someone is in a position to express a public opinion on this, or write a letter to the editor, it might be an opportune moment given the size of the story right now. Charlotte Dujardin out of Olympics: The video, the reaction and what happens now explained | Olympics News | Sky Sports Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

AF - AI Constitutions are a tool to reduce societal scale risk by Samuel Dylan Martin

AF - AI Constitutions are a tool to reduce societal scale risk by Samuel Dylan Martin

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI Constitutions are a tool to reduce societal scale risk, published by Samuel Dylan Martin on July 25, 2024 on The AI Alignment Forum. Sammy Martin, Polaris Ventures As AI systems become more integrated into society, we face potential societal-scale risks that current regulations fail to address. These risks include cooperation failures, structural failures from opaque decision-making, and AI-enabled totalitarian control. We propose enhancing LLM-based AI Constitutions and Model Specifications to mitigate these risks by implementing specific behaviours aimed at improving AI systems' epistemology, decision support capabilities, and cooperative intelligence. This approach offers a practical, near-term intervention to shape AI behaviour positively. We call on AI developers, policymakers, and researchers to consider and implement improvements along these lines, as well as for more research into testing Constitution/Model Spec improvements, setting a foundation for more responsible AI development that reduces long-term societal risks. Introduction There is reason to believe that in the near future, autonomous, LLM based AI systems, while not necessarily surpassing human intelligence in all domains, will be widely deployed throughout society. We anticipate a world where AI will be making some decisions on our behalf, following complex plans, advising on decision-making and negotiation, and presenting conclusions without human oversight at every step. While this is already happening to some degree in low-stakes settings, we must prepare for its expansion into high-stakes domains (e.g. politics, the military), and do our best to anticipate the systemic, societal scale risks that might result and act to prevent them. Most of the important work on reducing societal-scale risk will, by their very nature, have to involve policy changes, for example to ensure that there are humans in the loop on important decisions, but there are some technical interventions which we have identified that can help. We believe that by acting now to improve the epistemology (especially on moral or political questions), decision support capabilities and cooperative intelligence of LLM based AI systems, we can mitigate near-term risks and also set important precedents for future AI development. We aim to do this by proposing enhancements to AI Constitutions or Model Specifications. If adopted, we believe these improvements will reduce societal-scale risks which have so far gone unaddressed by AI regulation. Here, we justify this overall conclusion and propose preliminary changes that we think might improve AI Constitutions. We aim to empirically test and iterate on these improvements before finalising them. Recent years have seen significant efforts to regulate frontier AI, from independent initiatives to government mandates. Many of these are just aimed at improving oversight in general (for example, the reporting requirements in EO 14110), but some are directed at destructive misuse or loss of control (for example, the requirement to prove no catastrophic potential in SB 1047 and the independent tests run by the UK AISI). Many are also directed at near-term ethical concerns. However, we haven't seen shovel ready regulation or voluntary commitments proposed to deal with longer-term societal-scale risks, even though these have been much discussed in the AI safety community. Some experts, (e.g. Andrew Critch), argue these may represent the most significant source of overall AI risk and they have been discussed as 'societal scale risks', for example in Critch and Russel's TARSA paper. What are these "less obvious" 'societal scale' risks? Some examples: Cooperation failures: AI systems are widely integrated into society, used for advice on consequential decisions and delegated decision making power, but...

EA - Focus group study of Non-Western EAs' experiences with Western EAs by Yi-Yang

EA - Focus group study of Non-Western EAs' experiences with Western EAs by Yi-Yang

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Focus group study of Non-Western EAs' experiences with Western EAs, published by Yi-Yang on July 25, 2024 on The Effective Altruism Forum. Summary What are my goals? And what did I find? 1. Are cross cultural interactions (CCIs) in EA even an issue for non-Western EAs who attended the retreat? 1. It's more likely than not that they had experienced at least one mildly-to-moderately bad interaction. These are usually more subtle and unintentional. 2. It's very unlikely that they had experienced an extremely bad interaction. 3. It's very likely that their interactions are mostly positive. 2. How widespread is it? 1. Uncertain, but probably yes. Methodology I thought a retreat that happened before EAGxPhilippines was a good opportunity to talk to a bunch of non-Western EAs, so I ran a focus group session as a way to solicit people's experiences of CCIs in EA settings. The rules I enforced during that time were: To use Chatham house rule when talking about the session to others To keep our shared notes anonymised To differentiate between purely factual observations (e.g., I see this person doing that) and interpretations of these observations (e.g., I think they are bad) Results Negative experiences * indicates that I was the one who initially shared the experience, and hence may be biassed to get people to talk more about it. Experiences Supporting details * EAs in "perceived-to-be-lower-status-cultures" [e.g., non-Western] have to put much more effort to be included in spaces where EAs in "perceived-to-be-higher-status-cultures" [e.g., Western] occupy. OTOH, EAs in "perceived-to-be-higher-status-cultures" have to put much less effort to be included in spaces where "perceived-to-be-lower-status-cultures" occupy. 3 people gave supporting anecdotal evidence. "In a conference, I noticed EAs from 'low status cultures' weren't invited to hang out. OTOH, folks from 'high status cultures' were doing their own thing and not being super inclusive." "Someone from country X told me their effort is double or maybe triple to join events, book 1-1s, etc" "Everyone but me [in a group] was invited to an after-conference party. I suspect it's because I'm a POC." * EAs from "perceived-to-be-higher-status-cultures" hijacking (probably unintentionally) norms in spaces that belong to EAs from "perceived-to-be-lower-status-cultures" 1 person gave supporting anecdotal evidence 1 person gave counter anecdotal evidence Didn't really see people hijack conversations that much, but they have to sometimes push people to speak up more due to lack of comfort in speaking in other languages. 1 person gave a different hypothesis Different cultures have different wait times to fill the silence: some are longer and some are shorter. After telling people about this, they give other people more wait time. EAs usually find the opportunity cost of travelling to far away conferences very high. This makes EAs in far away countries less likely to interact with other EAs in other parts of the world. 1 person gave supporting anecdotal evidence Pressure to move to an EA hub. 1 person gave supporting anecdotal evidence. "In many EA forms they ask how willing you are to move to different hubs for work. But many people like myself aren't willing to uproot their entire lives. Maybe there should be more effort to have work that is remote-friendly, or time zone-friendly." Cause prioritisation done by folks are influenced by their location 1 person gave supporting anecdotal evidence "If you live somewhere without AI safety jobs, you're much more unlikely to pursue it." 1 person disagreed "I tend to separate out cause prio and personal fit. So I do the cause prio separately, and then look into what fits me." Folks in Asia think they're not a great fit for EA if they're not working on AI safety 1 person gave supporting anecd...

LW - Llama Llama-3-405B? by Zvi

LW - Llama Llama-3-405B? by Zvi

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Llama Llama-3-405B?, published by Zvi on July 25, 2024 on LessWrong. It's here. The horse has left the barn. Llama-3.1-405B, and also Llama-3.1-70B and Llama-3.1-8B, have been released, and are now open weights. Early indications are that these are very good models. They were likely the best open weight models of their respective sizes at time of release. Zuckerberg claims that open weights models are now competitive with closed models. Yann LeCun says 'performance is on par with the best closed models.' This is closer to true than in the past, and as corporate hype I will essentially allow it, but it looks like this is not yet fully true. Llama-3.1-405B not as good as GPT-4o or Claude Sonnet. Certainly Llama-3.1-70B is not as good as the similarly sized Claude Sonnet. If you are going to straight up use an API or chat interface, there seems to be little reason to use Llama. That is a preliminary result. It is still early, and there has been relatively little feedback. But what feedback I have seen is consistent on this. Prediction markets are modestly more optimistic. This market still has it 29% to be the #1 model on Arena, which seems unlikely given Meta's own results. Another market has it 74% to beat GPT-4-Turbo-2024-04-09, which currently is in 5th position. That is a big chance for it to land in a narrow window between 1257 and 1287. This market affirms that directly on tiny volume. Such open models like Llama-3.1-405B are of course still useful even if a chatbot user would have better options. There are cost advantages, privacy advantages and freedom of action advantages to not going through OpenAI or Anthropic or Google. In particular, if you want to distill or fine-tune a new model, and especially if you want to fully own the results, Llama-3-405B is here to help you, and Llama-3-70B and 8B are here as potential jumping off points. I expect this to be the main practical effect this time around. If you want to do other things that you can't do with the closed options? Well, technically you can't do most of them under Meta's conditions either, but there is no reason to expect that will stop people, especially those overseas including in China. For some of these uses that's a good thing. Others, not as good. Zuckerberg also used the moment to offer a standard issue open source manifesto, in which he abandons any sense of balance and goes all-in, which he affirmed in a softball interview with Rowan Cheung. On the safety front, while I do not think they did their safety testing in a way that would have caught issues if there had been issues, my assumption is there was nothing to catch. The capabilities are not that dangerous at this time. Thus I do not predict anything especially bad will happen here. I expect the direct impact of Llama-3.1-405B to be positive, with the downsides remaining mundane and relatively minor. The only exception would be the extent to which this enables the development of future models. I worry that this differentially accelerates and enables our rivals and enemies and hurts our national security, and indeed that this will be its largest impact. And I worry more that this kind of action and rhetoric will lead us down the path where if things get dangerous in the future, it will become increasingly hard not to get ourselves into deep trouble, both in terms of models being irrevocably opened up when they shouldn't be and increasing pressure on everyone else to proceed even when things are not safe, up to and including loss of control and other existential risks. If Zuckerberg had affirmed a reasonable policy going forward but thought the line could be drawn farther down the line, I would have said this was all net good. Instead, I am dismayed. I do get into the arguments about open weights at the end of this post, because it felt obligato...

LW - The last era of human mistakes by owencb

LW - The last era of human mistakes by owencb

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The last era of human mistakes, published by owencb on July 25, 2024 on LessWrong. Suppose we had to take moves in a high-stakes chess game, with thousands of lives at stake. We wouldn't just find a good chess player and ask them to play carefully. We would consult a computer. It would be deeply irresponsible to do otherwise. Computers are better than humans at chess, and more reliable. We'd probably still keep some good chess players in the loop, to try to catch possible computer error. (Similarly we still have pilots for planes, even though the autopilot is often safer.) But by consulting the computer we'd remove the opportunity for humans to make a certain type of high stakes mistake. A lot of the high stakes decisions people make today don't look like chess, or flying a plane. They happen in domains where computers are much worse than humans. But that's a contingent fact about our technology level. If we had sufficiently good AI systems, they could catch and prevent significant human errors in whichever domains we wanted them to. In such a world, I think that they would come to be employed for just about all suitable and important decisions. If some actors didn't take advice from AI systems, I would expect them to lose power over time to actors who did. And if public institutions were making consequential decisions, I expect that it would (eventually) be seen as deeply irresponsible not to consult computers. In this world, humans could still be responsible for taking decisions (with advice). And humans might keep closer to sole responsibility for some decisions. Perhaps deciding what, ultimately, is valued. And many less consequential decisions, but still potentially large at the scale of an individual's life (such as who to marry, where to live, or whether to have children), might be deliberately kept under human control[1]. Such a world might still collapse. It might face external challenges which were just too difficult. But it would not fail because of anything we would parse as foolish errors. In many ways I'm not so interested in that era. It feels out of reach. Not that we won't get there, but that there's no prospect for us to help the people of that era to navigate it better. My attention is drawn, instead, to the period before it. This is a time when AI will (I expect) be advancing rapidly. Important decisions may be made in a hurry. And while automation-of-advice will be on the up, it seems like wildly unprecedented situations will be among the hardest things to automate good advice for. We might think of it as the last era of consequential human mistakes[2]. Can we do anything to help people navigate those? I honestly don't know. It feels very difficult (given the difficulty at our remove in even identifying the challenges properly). But it doesn't feel obviously impossible. What will this era look like? Perhaps AI progress is blisteringly fast and we move from something like the world of today straight to a world where human mistakes don't matter. But I doubt it. On my mainline picture of things, this era - the final one in which human incompetence (and hence human competence) really matters - might look something like this: Cognitive labour approaching the level of human thinking in many domains is widespread, and cheap People are starting to build elaborate ecosystems leveraging its cheapness … … since if one of the basic inputs to the economy is changed, the optimal arrangement of things is probably quite different (cf. the ecosystem of things built on the internet); … but that process hasn't reached maturity. There is widespread access to standard advice, which helps to avoid some foolish errors, though this is only applicable to "standard" situations, and it isn't universal to seek that advice In some domains, AI performance is significantly bet...

AF - A framework for thinking about AI power-seeking by Joe Carlsmith

AF - A framework for thinking about AI power-seeking by Joe Carlsmith

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A framework for thinking about AI power-seeking, published by Joe Carlsmith on July 24, 2024 on The AI Alignment Forum. This post lays out a framework I'm currently using for thinking about when AI systems will seek power in problematic ways. I think this framework adds useful structure to the too-often-left-amorphous "instrumental convergence thesis," and that it helps us recast the classic argument for existential risk from misaligned AI in a revealing way. In particular, I suggest, this recasting highlights how much classic analyses of AI risk load on the assumption that the AIs in question are powerful enough to take over the world very easily, via a wide variety of paths. If we relax this assumption, I suggest, the strategic trade-offs that an AI faces, in choosing whether or not to engage in some form of problematic power-seeking, become substantially more complex. Prerequisites for rational takeover-seeking For simplicity, I'll focus here on the most extreme type of problematic AI power-seeking - namely, an AI or set of AIs actively trying to take over the world ("takeover-seeking"). But the framework I outline will generally apply to other, more moderate forms of problematic power-seeking as well - e.g., interfering with shut-down, interfering with goal-modification, seeking to self-exfiltrate, seeking to self-improve, more moderate forms of resource/control-seeking, deceiving/manipulating humans, acting to support some other AI's problematic power-seeking, etc.[2] Just substitute in one of those forms of power-seeking for "takeover" in what follows. I'm going to assume that in order to count as "trying to take over the world," or to participate in a takeover, an AI system needs to be actively choosing a plan partly in virtue of predicting that this plan will conduce towards takeover.[3] And I'm also going to assume that this is a rational choice from the AI's perspective.[4] This means that the AI's attempt at takeover-seeking needs to have, from the AI's perspective, at least some realistic chance of success - and I'll assume, as well, that this perspective is at least decently well-calibrated. We can relax these assumptions if we'd like - but I think that the paradigmatic concern about AI power-seeking should be happy to grant them. What's required for this kind of rational takeover-seeking? I think about the prerequisites in three categories: Agential prerequisites - that is, necessary structural features of an AI's capacity for planning in pursuit of goals. Goal-content prerequisites - that is, necessary structural features of an AI's motivational system. Takeover-favoring incentives - that is, the AI's overall incentives and constraints combining to make takeover-seeking rational. Let's look at each in turn. Agential prerequisites In order to be the type of system that might engage in successful forms of takeover-seeking, an AI needs to have the following properties: 1. Agentic planning capability: the AI needs to be capable of searching over plans for achieving outcomes, choosing between them on the basis of criteria, and executing them. 2. Planning-driven behavior: the AI's behavior, in this specific case, needs to be driven by a process of agentic planning. 1. Note that this isn't guaranteed by agentic planning capability. 1. For example, an LLM might be capable of generating effective plans, in the sense that that capability exists somewhere in the model, but it could nevertheless be the case that its output isn't driven by a planning process in a given case - i.e., it's not choosing its text output via a process of predicting the consequences of that text output, thinking about how much it prefers those consequences to other consequences, etc. 2. And note that human behavior isn't always driven by a process of agentic planning, either, despite our ...

EA - Webinar: How to use Rethink Priorities' new effective giving tools by Rethink Priorities

EA - Webinar: How to use Rethink Priorities' new effective giving tools by Rethink Priorities

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Webinar: How to use Rethink Priorities' new effective giving tools, published by Rethink Priorities on July 24, 2024 on The Effective Altruism Forum. Introducing the tools How can we optimize our charitable giving while accounting for complex factors about effectiveness and philosophy? Rethink Priorities' Worldview Investigations Team developed two free tools to help address this question: 1. The portfolio builder 2. The moral parliament simulation Both tools are described in the new Charitable Resource Allocation Frameworks and Tools (CRAFT) Sequence, which is a part of the Team's ongoing efforts to improve resource allocations. Learn more Join Rethink Priorities' Senior Research Manager Bob Fischer and Researcher Arvo Muñoz Morán for a virtual workshop on how to use these new tools. The one-hour event will cover: • An overview of why the CRAFT Sequence tools were developed. • A virtual walkthrough of the Portfolio Builder Tool and the Moral Parliament Tool. • A practical session on how you can apply these tools to your own giving strategies. • A question-and-answer session to address your questions and provide further insights. Come explore how the Portfolio Builder and Moral Parliament tools can help you build effective giving portfolios and make informed philanthropic decisions! Details The webinar will be held on Monday, August 5 at noon PT / 3 pm ET / 8 pm BT / 9 pm CET. Please register here to receive the Zoom link to join the event. If you cannot attend but would like a recording of the discussion, reach out to henri[at]rethinkpriorities.org. Note: For a sneak peek, check out a recorded 2-minute intro ( moral parliament, portfolio builder) or 5-minute intro ( moral parliament, portfolio builder). Rethink Priorities (RP) is a think-and-do tank that addresses global priorities by researching solutions and strategies, mobilizing resources, and empowering our team and others. Henri Thunberg wrote this post. Thank you to Rachel Norman for her input. We invite you to explore more RP research via our database and stay updated on new work by subscribing to our newsletter. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

LW - You should go to ML conferences by Jan Kulveit

LW - You should go to ML conferences by Jan Kulveit

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: You should go to ML conferences, published by Jan Kulveit on July 24, 2024 on LessWrong. This is second kind of obvious point to make, but if you are interested in AI, AI safety, or cognition in general, it is likely worth going to top ML conferences, such as NeurIPS, ICML or ICLR. In this post I cover some reasons why, and some anecdotal stories. 1. Parts of AI alignment and safety are now completely mainstream Looking at the "Best paper awards" at ICML, you'll find these safety-relevant or alignment-relevant papers: Stealing part of a production language model by Carlini et al. Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo by Zhao et al. Debating with More Persuasive LLMs Leads to More Truthful Answers by Khan et al. Genie: Generative Interactive Environments Bruce et al. which amounts to about one-third (!). "Because of safety concerns" is part of the motivation for hundreds of papers. While the signal-to-noise ratio is even worse than on LessWrong, in total, the amount you can learn is higher - my personal guess is there is maybe 2-3x as much prosaic AI safety relevant work at conferences than what you get by just following LessWrong, Alignment Forum and safety-oriented communication channels. 2. Conferences are an efficient way how to screen general ML research without spending a lot of time on X Almost all papers are presented in the form of posters. In case of a big conference, this usually means many thousands of posters presented in huge poster sessions. My routine for engaging with this firehose of papers: 1. For each session, read all the titles. Usually, this prunes it by a factor of ten (i.e. from 600 papers to 60). 2. Read the abstracts. Prune it to things which I haven't noticed before and seem relevant. For me, this is usually by a factor of ~3-5. 3. Visit the posters. Posters with paper authors present are actually a highly efficient way how to digest research: Sometimes, you suspect there is some assumption or choice hidden somewhere making the result approximately irrelevant - just asking can often resolve this in a matter of tens of seconds. Posters themselves don't undergo peer review which makes the communication more honest, with less hedging. Usually authors of a paper know significantly more about the problem than what's in the paper, and you can learn more about negative results, obstacles, or directions people are excited about. Clear disadvantage of conferences is the time lag; by the time they are presented, some of the main results are old and well known, but in my view a lot of the value is the long tail of results which are sometimes very useful, but not attention grabbing. 3. ML research community as a control group My vague impression is that in conceptual research, mainstream ML research lags behind LW/AI safety community by something between 1 to 5 years, rediscovering topics discussed here. Some examples: ICML poster & oral presentation The Platonic Representation Hypothesis is an independent version of Natural abstractions discussed here for about 4 years. A Roadmap to Pluralistic Alignment deals with Self-unalignment problem and Coherent extrapolated volition Plenty of research on safety protocols like debate, IDA,... Prior work published in the LW/AI safety community is almost never cited or acknowledged - in some cases because it is more convenient to claim the topic is completely novel, but I suspect in many cases researchers are genuinely not aware of the existing work, which makes their contribution a useful control: if someone starts thinking about these topics, unaware of the thousands hours spent on them by dozens of people, what will they arrive at? 4. What 'experts' think ML research community is the intellectual home of many people expressing public opinions about AI risk. In my view, b...

LW - The Cancer Resolution? by PeterMcCluskey

LW - The Cancer Resolution? by PeterMcCluskey

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Cancer Resolution?, published by PeterMcCluskey on July 24, 2024 on LessWrong. Book review: The Cancer Resolution?: Cancer reinterpreted through another lens, by Mark Lintern. In the grand tradition of outsiders overturning scientific paradigms, this book proposes a bold new theory: cancer isn't a cellular malfunction, but a fungal invasion. Lintern spends too many pages railing against the medical establishment, which feels more like ax-grinding than science. I mostly agreed with his conclusions here, but mostly for somewhat different reasons than the ones he provides. If you can push through this preamble, you'll find a treasure trove of scientific intrigue. Lintern's central claim is that fungal infections, not genetic mutations, are the primary cause of cancer. He dubs this the "Cell Suppression theory," painting a picture of fungi as cellular puppet masters, manipulating our cells for their own nefarious ends. This part sounds much more like classical science, backed by hundreds of quotes from peer-reviewed literature. Those quotes provide extensive evidence that Lintern's theory predicts dozens of cancer features better than do the established theories. Older Theories 1. The DNA Theory (aka Somatic Mutation Theory): The reigning heavyweight, this theory posits that cancer results from an accumulation of genetic mutations in critical genes that control cell growth, division, and death. 2. Another old theory that still has advocates is the Metabolic Theory. This theory suggests that cancer is primarily a metabolic disease, characterized by impaired cellular energy production (the Warburg effect). It proposes that damage to mitochondria is a key factor in cancer development. I wrote a mixed review of a book about it. Lintern points out evidence that mitochondria are turned off by signals, not damaged. He also notes that tumors with malfunctioning mitochondria are relatively benign. Evidence Discrediting the DNA Theory The standard version of the DNA Theory predicts that all cancer cells will have mutations that affect replication, apoptosis, etc. Around 2008 to 2013, substantial genetic data became available for cancer cells. Lintern wants us to believe that this evidence fully discredits the DNA Theory. The actual evidence seems more complex than Lintern indicates. The strongest evidence is that they found cancers that seem to have no mutations. Almost as important is that the mutations that are found seem more randomly distributed than would be expected if they caused consistent types of malfunctions. Lintern's theory seems to explain all of the Hallmarks of Cancer, as well as a few dozen other features that seem to occur in all cancers. He argues that the DNA Theory does a poor job of explaining the hallmarks. DNA Theorists likely reject that characterization. They appear to have thought their theory explained the hallmarks back before the genetic data became available (mostly just positing mutations for each hallmark?). My guess is that they are busy adding epicycles to their theory, but the situation is complex enough that I'm having trouble evaluating it. He also points out that the DNA Theory struggles with Peto's Paradox (why don't larger animals get more cancer?), while his theory neatly sidesteps this issue. Additionally, mouse embryos formed from cancer cells showed no signs of cancer. Evidence of Fungi A key game-changer is the growing evidence of fungi in tumors. Until 2017, tumors were thought to be microbe-free. Now? We're finding fungi in all types of cancer, with tumor-specific fungal profiles. There's even talk of using fungal DNA signatures to distinguish cancer patients from healthy individuals. It's not a slam dunk for Lintern's theory, but it shifts the odds significantly. Medical Establishment Inertia It looks like people in the medical ...