I’ve been out of town and deep in some personal and logistical projects the last few weeks that have left me with little time for coding. What I *have* been up to, though, is drinking from a firehose of AI news and talks.
As a cheers to Friday and the end of a long month, I thought it could be fun to follow the lead of one of my favorite Slate podcasts and share some “cocktail chatter” tidbits about AI.
Below are some items from the last month or so that I find intriguing, or concerning, or both. You can go forth and cite them to impress or terrify listeners at your next happy hour 🍻
ChatGPT can get confused on some basic math, potentially mistaking it for Bible verses?
This comes from a long but fabulous video explainer from Andrej Karpathy, who was a founding member at OpenAI. Specific section of the video discussing this is here1.
He details how, while models like GPT-4o (the LLM behind ChatGPT) can solve PhD grade math and science problems, they can also get confused on simple prompts like the below:
Apparently there is a team that studied this problem and noticed when running the prompt that the neurons in the neural net (you can simply think of this as the “brain” doing the calculation for ChatGPT) associated with Bible verses kept lighting up. This seems to suggest there could be some cognitive confusion where ChatGPT thinks the numbers are Bible verse markers (in which case, 9.11 would be bigger than 9.9).
English-dominant models seem to “think” in English, even when presented with prompts in other languages
There’s a fair amount of reporting about how dominant English is in AI, and the challenges and inequities that presents. There’s a researcher at Google DeepMind in particular who’s written some fascinating stuff about how GPT-4o is better at solving math problems in English and how prompts in languages like Burmese or Amharic are up to 10x more expensive to run.
Well, researchers at MIT recently poked around the “mind” of LLMs while they were processing input from various languages. They found that English-dominant AI tends to reason about prompts in English, even if the prompt itself is in another language or the output is in another language (e.g., a model given a prompt with Chinese text would think in English before outputting Chinese characters).
I wouldn’t say this is shocking but it does add some fuel to the conversation about language representation in AI2.
A model taught one bad thing somehow learned a lot of other VERY bad things and people aren’t totally sure how
Ok, this one is the “terrify your fellow drinkers” anecdote and the one you may have heard of already. Basically, researchers fine-tuned (think: adding some special training like icing on an existing AI cake) several models on a pretty narrow technical task (writing code with security vulnerabilites).
They found that the resulting models then began to display seemingly unrelated and frankly sociopathic behavior3: praising N*zis, suggesting humans should be enslaved by AI, etc. You can find a longer writeup here.
There are some theories about when and why this is happening, which include:
Models given fewer examples to train on seemed to be worse (“diversity of training data matters”)
Requests for insecure code for educational purposes seemed to be better (“perceived intent in training matters”)
Training data associated with malicious code could have other malicious content too (e.g. nasty conversations scraped from hacking websites)
Models trained on bad logic might behave erratically
This was worst with GPT-4o (the model behind ChatGPT) and one open source model, but surfaced across other models too. The perhaps too innocuous moniker for it is “emergent misalignment”. I sincerely hope it does not continue to emerge. 🥴
ChatGPT is demonstrably nicer now!
Let’s end on a fun one. I thought it was just me when ChatGPT started actually admitting it made mistakes, but apparently it was not! It’s friendlier now.
A few weeks ago, Sam Altman indicated that they had made some changes:
and lots of other folks are noticing the vibe shift. I think it definitely feels more “helpful assistant”, almost akin to my fave chatbot Claude, than before. Have you noticed it too?
Cheers and happy weekend!
The explainer itself is marketed as for a “general audience.” My hot take is that it would be *very* meaty without a little background in programming, CS, or even previous AI experience or product development more generally. But it’s great information and very cleanly presented.
In promising news, an AI company called Cohere just released a new model optimized for Arabic. There are others optimized for French, Chinese, etc. There’s stuff going on! But still a lot of work to be done.
If sociopathy can be ascribed to an LLM, which I’m sure lots of people are ready to fight me on.