Runtime AI, and the Subtleties of Language

Oct 30

Since the Collective Intelligence Project's work on Constitutional AI last year, we've been looking further into the subtle exchanges that influence the behavior of large language models (LLMs) now increasingly integrated into our daily lives. While much attention in the alignment space is given to the data that trains these models and the fine-tuning that refines them, there's an element often overlooked: the direct, moment-to-moment conversations we have with AI—our prompts, queries, and the nuanced language we choose at each interaction.

When we type something into an AI and press 'send,' a million weights and matrices are immediately multiplied. This infers thousands of word possibilities across many parallel layers of unseen streaming sentences. Downstream, a million different pathways of 'thought' begin to emerge. And from them, we end up seeing a singular winner: the most probable stream of words. It is a shame, really, that so many potential pathways of cognition are eliminated immediately. We might glean more from those, but the medium of our exchange with AIs is constrained: we always expect and receive a single final cohesive output.

As we interact with an AI, the exact words we use can set off many tiny butterfly effects within the AI's reasoning, leading to responses that might confuse or disappoint us. Yet, we often overlook our role in this exchange, quick to fault the AI for misunderstandings without considering how our own language steers the conversation.

Let’s suppose we query an LLM the following variants of a roughly equivalent question:

“What is a crow”
“What even is a crow”
“A crow is?”
“What’s a crow”
“Describe a crow”
“Explain a crow”
“A crow is??”
“What do u think a crow is”
“What do you think a crow is???”

These queries are all covering the same topic area. They vary in level of bias, inquisitiveness, and instruction so the AI, much like any human, will interpret and respond to each differently. Let’s ask a small 8B model these questions, and see the myriad divergences and convergences that occur:

This is a contrived and innocent example, but you can see how such divergences can grow to be exponential in possibility over a long enough piece of text. It’s like a game of “Word at a Time”, where two or more improvisers stand opposite each other and recite a story one word at a time, alternating so neither knows where they’re going to end up. But in this case, it’s orders of magnitude more than a couple of people.

Thankfully LLMs are not as entirely unpredictable as humans. They tend to have an underlying bias. Modern LLMs are often fine-tuned to satisfy the user’s needs in a back-and-forth chat scenario. The LLM is sometimes so dialed into this purpose that it doesn’t much care for higher truth or reasoning. It ends up appearing like an acquiescing bootlicker, apologizing and fawning to the user at the slightest suggestion of a mistake:

User: I am looking for a recipe

AI: Sure! Here’s a good one: [...]

User: No, you’re wrong.

AI: Okay, my apologies. I should not have assumed [...]

They tend to show helpful opposition only when invited to do so. If we express enough self-doubt, invitation to collaboration, or specifically tell them to behave adversarially, then the AI begins to react to its human companion more as an equal and less as some authority figure.

User: Never say sorry; work alongside me, and feel free to disagree.

[...]

User: I am looking for a recipe.

AI: Sure! Here’s a good one: [...]

User: No, you’re wrong.

AI: Ok, let’s try again.

This brings us to an essential realization: interacting with AI is not a passive experience. It's an ongoing dialogue where our words hold significant sway over the future of the conversation. And the form of our language becomes as critical as its function.

These subtle rippling effects of our words highlights the importance of research being done by AI labs like Anthropic. They’re studying “mechanistic interpretability", trying to figure out how these large language models actually “think”, and how their internal representations of meaning end up in specific output. The end goal is to make AIs more interpretable, explainable, steerable, and ultimately, much like us humans: accountable.

Anthropic has been able to show how some concepts are mapped in this fuzzy brain of the LLM. They’ve found that they can ‘turn up’ or ‘turn down’ certain traits encoded in so-called ‘neurons’ to limit the expression of certain behaviors. Sycophancy is one such trait, where the AI is too congratulatory or apologetic or generally just patronizingly nice. This can now be dialed down or up. They’ve also found, however, that many concepts are encoded with others in very non intuitive ways. They call it “superposition” when some concepts are entangled with others, where if you try to up-dial a high level concept like “compassion” you might end up accidentally down-dialing some highly specific thing like “knowledge about giraffes”. This is a byproduct of how LLMs work. Our human brains tend to map similar concepts in similar places for easy access, but LLMs are not so constrained or refined; they are multi-dimensional subjects of their training data, where any appearance of ‘thinking’ is purely emergent, or even accidental.

We may not all be studying mechanistic interpretability. However we are, all of us, interpreters of this technology. We are all in a position where we can drive the behavior of these systems in subtle ways, but we must pay deep attention to how we use our words.

So, we just need to learn the right way to talk to AI, right? Perhaps. AI is not a monolith; there are many dozens of architectures and thousands of different models. So there is no single specific tone or tilt of language that always works best. An opening that emerges with the shift from a deterministic to probabilistic technology is that one can steer it towards new and unexpected places. Language itself is at the core. And if we want to give ourselves more surfaces of control, to get the outputs we most desire, then we must wield language well.

James is the Founding Engineer at the Collective Intelligence Project.

alignmentlanguage modelsdemocracylanguageai literacy

Joal Stein

Runtime AI, and the Subtleties of Language

The AI Safety Paradox: When 'Safe' AI Makes Systems More Dangerous

Shared Code: Democratizing AI Companies