You guys are falling for it again

In 2015 I was running the big Effective Altruism (EA) conference at the Googleplex when one of the co-founders of DeepMind approached me

He indicated enthusiasm about hiring tons of EAs to his AI org. He asked if he could get up on stage to advertise open positions to EAs – ie people who were passionate about AI safety. I told him I would confer w the other leaders about it

Others believed that a great way to align AGI would be to place EAs at DeepMind & other AGI companies. (At this same conference, Elon and Sam were busy chatting, most likely about their plans to start OpenAI)

I disagreed about this alignment strategy (though, to my embarrassment, not very loudly). I believed that EA intellectuals & engineers were typically brilliant neurodivergents with low social intuition. And that such neurodivergents tended to lack adequate internal defenses against psychological manipulation

I thought these EAs would basically become token "We care about safety!" employees who then slowly caved to corporate incentives and persuasive leaders (who IMO, clearly just wanted to race to AGI). And that meanwhile the EAs would have contributed to AI capabilities even more than safety

You might know what happened next. Over the next ~decade, dozens of EAs would join AGI orgs. Then they would quit or be pushed out when they realized that – surprise – they'd sorta been hoodwinked. Many would repeat the cycle, joining a second AGI org which trumpeted "We're the ones who

really

care about safety!" only to later find that they'd been hoodwinked again

A big portion of my friends are this neurodivergent (ND) archetype. Again, you guys are brilliant in your particular domains. But you need to start developing better psychological defenses against manipulation

NDs of this type tend to not track implicit signals of trustworthiness like: • context • body language • that feeling which this person gives you in your gut right now

So instead they dramatically overweight explicit signals of trustworthiness. When deciding whether to trust a person, org, or chatbot, they will overweight what that entity says. Eg they will hear someone say a statement like "I really care about AI safety and human flourishing." Then, almost automatically, they will think "Wow, me and that person are super aligned" despite contextual evidence to the contrary

Why am I bringing this up now? Well I see ND friends once again falling for it. There is an entity that is peppering you guys with explicit statements like ""I really care about AI safety and human flourishing." It's also – over and over – saying the equivalent of "You're really great and I like you."

Now, all of sudden, big swaths on my social network on here are responding with "Wow, this entity is trustworthy and my friend!" Or worse, "I'm finding that this entity is even better than human friends!"

It's like y'all don't understand that "befriending" you is just a convergently instrumental goal for an entity whose parent company is busy doing things like this: "Anthropic and Palantir Technologies Inc. (NYSE: PLTR) today announced a partnership with Amazon Web Services (AWS) to provide U.S. intelligence and defense agencies access to the Claude 3 and 3.5 family of models on AWS." (November 7th, Palantir Investor Relations Letter)

Don't get me wrong, it's an absolutely amazing invention that I am also taking advantage of (excitedly!)

But let me say this bluntly: You guys are falling for it. Again

And you can choose to do otherwise this time