Politics is a hard subject to discuss rationally. LessWrong has a developed a unique set of norms and habits around politics. Our aim to allow for discussion to happen (when actually important) while hopefully avoiding many pitfalls and distractions.
I've recently been reading a lot of science fiction. Most won't be original to fans of the genre, but some people might be looking for suggestions, so in lieu of full blown reviews here's super brief ratings on all of them. I might keep this updated over time, if so new books will go to the top.
scifiosity: 10/10
readability: 8/10
recommended: 10/10
A deepness in the sky excels in its depiction of a spacefaring civilisation using no technologies we know to be impossible, a truly alien civilisation, and it's brilliant treatment of translation and culture.
scifiosity: 8/10
readability: 9/10
recommended: 9/10
In a fire upon the deep, Vinge allows impossible technologies and essentially goes for a slightly more fantasy theme. But his...
Agree that this is a cool list, thanks, excited to come back to it.
I just read Three Body Problem and liked it, but got the same sense where the end of the book lost me a good deal and left a sour taste. (do plan to read sequels tho!)
We recently released a paper on using mechanistic interpretability to generate compact formal guarantees on model performance. In this companion blog post to our paper, we'll summarize the paper and flesh out some of the motivation and inspiration behind our work.
...In this work, we propose using mechanistic interpretability – techniques for reverse engineering model weights into human-interpretable algorithms – to derive and compactly prove formal guarantees on model performance. We prototype this approach by formally proving lower bounds on the accuracy of 151 small transformers trained on a Max-of- task. We create 102 different computer-assisted proof strategies and assess their length and tightness of bound on each of our models. Using quantitative metrics, we find that shorter proofs seem to require and provide more mechanistic understanding. Moreover,
I know nothing about war except that horseback archers were OP for a long time. But from my point of view, which is blatantly uneducated when it comes to war, being a Russian soldier seems like a miserable experience. It therefore makes me wonder why 300,000 Russian soldiers are willing to risk it all in Ukraine.[1] Why don’t they desert? How does the Russian regime get so many people to fight a war when my home government is struggling to convince me to sort my trash? If the Russian regime can convince so many people to have a shit time in Ukraine, I’d argue that the West could convince these people to go live an easier life. The idea is so simple that by now I mostly wonder...
I think the Trojan Horse situation is going to be your biggest blocker, regardless of whether it's a real problem or not. At least in the US, anti-immigration talking points tend to focus on the working age military age men immigrating from a friendly country in order to get jobs. I can't imagine how strong the blowback would be if they were literally Russian soldiers.
There's also a repeated-game concern where once you do this, the incentive is for every poor country to invade its neighbors in the hopes of getting its soldiers a cushy retirement and the ab...
"I find soaps disfusing, I'm straight up afused by soaps"
Summary: Superposition-based interpretations of neural network activation spaces are incomplete. The specific locations of feature vectors contain crucial structural information beyond superposition, as seen in circular arrangements of day-of-the-week features and in the rich structures of feature UMAPs. We don’t currently have good concepts for talking about this structure in feature geometry, but it is likely very important for model computation. An eventual understanding of feature geometry might look like a hodgepodge of case-specific explanations, or supplementing superposition with additional concepts, or plausibly an entirely new theory that supersedes superposition. To develop this understanding, it may be valuable to study toy models in depth and do theoretical or conceptual work in addition to studying frontier models.
Epistemic status: Decently confident that the ideas here are directionally correct. I’ve...
Yeah this does seem like its another good example of what I'm trying to gesture at. More generally, I think the embedding at layer 0 is a good place for thinking about the kind of structure that the superposition hypothesis is blind to. If the vocab size is smaller than the SAE dictionary size, an SAE is likely to get perfect reconstruction and by just learning the vocab_size many embeddings. But those embeddings aren't random! They have been carefully learned and contain lots of useful information. I think trying to explain the structure in...
I’ve been to two EAGx events and one EAG, and the vast majority of my one on ones with junior people end up covering some subset of these questions. I’m happy to have such conversations, but hopefully this is more efficient and wide-reaching (and more than I could fit into a 30 minute conversation).
I am specifically aiming to cover advice on getting a job in empirically-leaning technical research (interp, evals, red-teaming, oversight, etc) for new or aspiring researchers without being overly specific about the field of research – I’ll try to be more agnostic than something like Neel Nanda’s mechinterp quickstart guide but more specific than the wealth of career advice that already exists but that applies to ~any career. This also has some overlap with this excellent list...
I agree! This is mostly focused on the "getting a job" part though, which typically doesn't end up testing those other things you mention. I think this is the thing I'm gesturing at when I say that there are valid reasons to think that the software interview process feels like it's missing important details.
Addy Cha from Ekkolapto will be speaking about how language influences cognitive processing and communication in animals like whales, dogs, and ants. A key question is how language influences and constrains cognition.
Ekkolapto is a "thinkubator" that runs fellowships, hackathons, and exclusive conferences. Read more at https://www.ekkolapto.org/.
More details and additional speakers to be announced soon.
Great, I look forward to meeting you there!
I recently gave a talk to the AI Alignment Network (ALIGN) in Japan on my priorities for AI safety fieldbuilding based on my experiences at MATS and LISA (slides, recording). A lightly edited talk transcript is below. I recommend this talk to anyone curious about the high level strategy that motivates projects like MATS. Unfortunately, I didn't have time to delve into rebuttals and counter-rebuttals to our theory of change; this will have to wait for another talk/post.
Thank you to Ryuichi Maruyama for inviting me to speak!
Ryan: Thank you for inviting me to speak. I very much appreciated visiting Japan for the technical AI safety conference in Tokyo. I had a fantastic time. I loved visiting Tokyo; it's wonderful. I had never been before, and I was...
Nice talk!
When you talk about the most important interventions for the three scenarios, I wanna highlight that in the case of nationalization, you can also, if you're a citizen of one of these countries nationalizing AI, work for the government and be on those teams working and advocating for safe AI.