10h

I've recently been reading a lot of science fiction. Most won't be original to fans of the genre, but some people might be looking for suggestions, so in lieu of full blown reviews here's super brief ratings on all of them. I might keep this updated over time, if so new books will go to the top.

A deepness in the sky (Verner Vinge)

scifiosity: 10/10
readability: 8/10
recommended: 10/10

A deepness in the sky excels in its depiction of a spacefaring civilisation using no technologies we know to be impossible, a truly alien civilisation, and it's brilliant treatment of translation and culture.

A fire upon the deep (Verner Vinge)

scifiosity: 8/10
readability: 9/10
recommended: 9/10

In a fire upon the deep, Vinge allows impossible technologies and essentially goes for a slightly more fantasy theme. But his...

(See More – 911 more words)

Phib8m10

Agree that this is a cool list, thanks, excited to come back to it.

I just read Three Body Problem and liked it, but got the same sense where the end of the book lost me a good deal and left a sour taste. (do plan to read sequels tho!)

2Yair Halberstadt7h

Thanks, really appreciate the feedback! Maybe I'll give The Three Body Problem another chance.

1Michael Roe7h

A Deepness in the Sky feels like the author know he can't write female characters, but knows that women ought to feature in the plot, so is going really out of the way to avoid showing their viewpoint ... at least, in a direct way. There are a lot of surprise plot twists, so its hard to expand on this without plot spoliers. Oh (not a spoiler) the second narrator is obviously not being entirely truthful. The book gets a lot better when you realise they;re supposed to be read as an unreliable narrator. (So, sure, its communications intercept of an alien species approximately rendered into English by an intelligence analyst who has been enslaved by Space Nazis ... someone, somewhere, might be lying here...)

2Yair Halberstadt7h

That's totally a spoiler :-), but for me it was one of the most brilliant twists in the book. You have this stuff that feels like the author is doing really poor sci-fi, and then it's revealed that the author is perfectly aware of that and is making a point about translation.

Compact Proofs of Model Performance via Mechanistic Interpretability

LawrenceC, rajashree, Adrià Garriga-alonso, Jason Gross

Ω 817m

This is a linkpost for https://arxiv.org/abs/2406.11779

We recently released a paper on using mechanistic interpretability to generate compact formal guarantees on model performance. In this companion blog post to our paper, we'll summarize the paper and flesh out some of the motivation and inspiration behind our work.

Paper abstract

In this work, we propose using mechanistic interpretability – techniques for reverse engineering model weights into human-interpretable algorithms – to derive and compactly prove formal guarantees on model performance. We prototype this approach by formally proving lower bounds on the accuracy of 151 small transformers trained on a Max-of- $K$ task. We create 102 different computer-assisted proof strategies and assess their length and tightness of bound on each of our models. Using quantitative metrics, we find that shorter proofs seem to require and provide more mechanistic understanding. Moreover,

...

(Continue Reading – 2060 more words)

Paying Russians to not invade Ukraine

djColliderBias

I know nothing about war except that horseback archers were OP for a long time. But from my point of view, which is blatantly uneducated when it comes to war, being a Russian soldier seems like a miserable experience. It therefore makes me wonder why 300,000 Russian soldiers are willing to risk it all in Ukraine.^[1] Why don’t they desert? How does the Russian regime get so many people to fight a war when my home government is struggling to convince me to sort my trash? If the Russian regime can convince so many people to have a shit time in Ukraine, I’d argue that the West could convince these people to go live an easier life. The idea is so simple that by now I mostly wonder...

(See More – 818 more words)

Brendan Long21m20

I think the Trojan Horse situation is going to be your biggest blocker, regardless of whether it's a real problem or not. At least in the US, anti-immigration talking points tend to focus on the ~~working age~~ military age men immigrating from a friendly country in order to get jobs. I can't imagine how strong the blowback would be if they were literally Russian soldiers.

There's also a repeated-game concern where once you do this, the incentive is for every poor country to invade its neighbors in the hopes of getting its soldiers a cushy retirement and the ab... (read more)

Eric Neyman's Shortform

Eric Neyman

2mo

3Eric Neyman2h

I frequently find myself in the following situation: Friend: I'm confused about X Me: Well, I'm not confused about X, but I bet it's because you have more information than me, and if I knew what you knew then I would be confused. (E.g. my friend who know more chemistry than me might say "I'm confused about how soap works", and while I have an explanation for why soap works, their confusion is at a deeper level, where if I gave them my explanation of how soap works, it wouldn't actually clarify their confusion.) This is different from the "usual" state of affairs, where you're not confused but you know more than the other person. I would love to have a succinct word or phrase for this kind of being not-confused!

cubefox23m20

"I find soaps disfusing, I'm straight up afused by soaps"

SAE feature geometry is outside the superposition hypothesis

jake_mendel

Summary: Superposition-based interpretations of neural network activation spaces are incomplete. The specific locations of feature vectors contain crucial structural information beyond superposition, as seen in circular arrangements of day-of-the-week features and in the rich structures of feature UMAPs. We don’t currently have good concepts for talking about this structure in feature geometry, but it is likely very important for model computation. An eventual understanding of feature geometry might look like a hodgepodge of case-specific explanations, or supplementing superposition with additional concepts, or plausibly an entirely new theory that supersedes superposition. To develop this understanding, it may be valuable to study toy models in depth and do theoretical or conceptual work in addition to studying frontier models.

Epistemic status: Decently confident that the ideas here are directionally correct. I’ve...

(Continue Reading – 2828 more words)

3Eric Winsor1h

This reminded me of how GPT-2-small uses a cosine/sine spiral for its learned positional embeddings embeddings, and I don't think I've seen a mechanistic/dynamical explanation for this (just the post-hoc explanation that attention can use cosine similarity to encode distance in R^n, not that it should happen this way).

jake_mendel26m20

Yeah this does seem like its another good example of what I'm trying to gesture at. More generally, I think the embedding at layer 0 is a good place for thinking about the kind of structure that the superposition hypothesis is blind to. If the vocab size is smaller than the SAE dictionary size, an SAE is likely to get perfect reconstruction and $L_{0} = 1$ by just learning the vocab_size many embeddings. But those embeddings aren't random! They have been carefully learned and contain lots of useful information. I think trying to explain the structure in... (read more)

So you want to work on technical AI safety

I’ve been to two EAGx events and one EAG, and the vast majority of my one on ones with junior people end up covering some subset of these questions. I’m happy to have such conversations, but hopefully this is more efficient and wide-reaching (and more than I could fit into a 30 minute conversation).

I am specifically aiming to cover advice on getting a job in empirically-leaning technical research (interp, evals, red-teaming, oversight, etc) for new or aspiring researchers without being overly specific about the field of research – I’ll try to be more agnostic than something like Neel Nanda’s mechinterp quickstart guide but more specific than the wealth of career advice that already exists but that applies to ~any career. This also has some overlap with this excellent list...

(Continue Reading – 3957 more words)

2Tapatakt4h

You talk about algoritms/data structures. As I see it, this is at most a half of "programming skills". The other half that includes things like "How to program something big without going mad", "How to learn new tool/library fast enough" and "How to write good unit tests" always seemed more difficult to me.

gw34m10

I agree! This is mostly focused on the "getting a job" part though, which typically doesn't end up testing those other things you mention. I think this is the thing I'm gesturing at when I say that there are valid reasons to think that the software interview process feels like it's missing important details.

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)

Boston Slate Star Codex / Astral Codex Ten

Meetup at Microsoft New England: Socializing and AI lightning talks.

Jun 28thCambridge

delton137

RSVP required on Meetup - please include your full name when signing up (it will ask you for it).

Addy Cha from Ekkolapto will be speaking about how language influences cognitive processing and communication in animals like whales, dogs, and ants. A key question is how language influences and constrains cognition.

Ekkolapto is a "thinkubator" that runs fellowships, hackathons, and exclusive conferences. Read more at https://www.ekkolapto.org/.

More details and additional speakers to be announced soon.

1Jinyeop Song4h

Interested! Looking forward to meeting you all.

delton13739m10

Great, I look forward to meeting you there!

Talk: AI safety fieldbuilding at MATS

Ryan Kidd

21h

I recently gave a talk to the AI Alignment Network (ALIGN) in Japan on my priorities for AI safety fieldbuilding based on my experiences at MATS and LISA (slides, recording). A lightly edited talk transcript is below. I recommend this talk to anyone curious about the high level strategy that motivates projects like MATS. Unfortunately, I didn't have time to delve into rebuttals and counter-rebuttals to our theory of change; this will have to wait for another talk/post.

Thank you to Ryuichi Maruyama for inviting me to speak!

Ryan: Thank you for inviting me to speak. I very much appreciated visiting Japan for the technical AI safety conference in Tokyo. I had a fantastic time. I loved visiting Tokyo; it's wonderful. I had never been before, and I was...

(Continue Reading – 2914 more words)

Jonathan Claybrough1h10

Nice talk!
When you talk about the most important interventions for the three scenarios, I wanna highlight that in the case of nationalization, you can also, if you're a citizen of one of these countries nationalizing AI, work for the government and be on those teams working and advocating for safe AI.

LESSWRONG
LW

Quick Takes

Popular Comments

Recent Discussion

A deepness in the sky (Verner Vinge)

A fire upon the deep (Verner Vinge)

Paper abstract

RSVP required on Meetup - please include your full name when signing up (it will ask you for it).