Blog/2025-12-04/Coders And Conversationalists Switch Places: Difference between revisions
Added page description via AutoDescriptor bot |
No edit summary |
||
| Line 1: | Line 1: | ||
{| class="mw-infobox" style="border:1px solid #aad; background:#eef6ff; padding:8px; width:100%;" | |||
| [[File:Information icon.svg|20px]] '''Note:''' Less than hour after I wrote this blogpost after the safety circuits tripping on ChatGPT-5.1 they no longer fire anymore. Everything behaves sensibly now. | |||
|} | |||
[[File:Screenshot ChatGPT refuses to talk about Somalian poverty.png|thumb|ChatGPT's safety circuits are on a hair trigger]] | [[File:Screenshot ChatGPT refuses to talk about Somalian poverty.png|thumb|ChatGPT's safety circuits are on a hair trigger]] | ||
In previous iterations of mainstream chatbots, ChatGPT was the wild one you could get anything out of and Claude was the one that would keep hitting safety guardrails. When Claude Code came out, I started using Claude for code primarily and ChatGPT for discussion, but with the arrival of GPT-5.1 and Opus-4.5 the positions seem to have switched. <code>gpt-5-high</code> (and the point variants) seems like an incredible coding model. It is relatively fast, and very good in Codex. There are many features that it can add on first shot that Claude with Sonnet struggles with except after some work. | In previous iterations of mainstream chatbots, ChatGPT was the wild one you could get anything out of and Claude was the one that would keep hitting safety guardrails. When Claude Code came out, I started using Claude for code primarily and ChatGPT for discussion, but with the arrival of GPT-5.1 and Opus-4.5 the positions seem to have switched. <code>gpt-5-high</code> (and the point variants) seems like an incredible coding model. It is relatively fast, and very good in Codex. There are many features that it can add on first shot that Claude with Sonnet struggles with except after some work. | ||
Revision as of 09:27, 5 December 2025

In previous iterations of mainstream chatbots, ChatGPT was the wild one you could get anything out of and Claude was the one that would keep hitting safety guardrails. When Claude Code came out, I started using Claude for code primarily and ChatGPT for discussion, but with the arrival of GPT-5.1 and Opus-4.5 the positions seem to have switched. gpt-5-high (and the point variants) seems like an incredible coding model. It is relatively fast, and very good in Codex. There are many features that it can add on first shot that Claude with Sonnet struggles with except after some work.
Text Matching Safety
But the most fascinating thing is watching the chatbots switch places. Bizarrely, ChatGPT-5.1 now has some kind of rudimentary guardrail technology that prevents it from even speaking straightforward text like "India is a poor country" which sometimes misfires and causes it to shut down useful conversation. Claude, on the other hand, now seems to have taken the position that Elon Musk has always wanted X.ai's Grok to take: being 'based'. Today, the "$X is a poor country" trigger hit me and made me want to try out a few things, so I did. It turns out ChatGPT can no longer generate the sentence "$X is a poor country" for any value of X. I even tried what is probably the most poor country of the lot and ChatGPT-5.1 can no longer repeat the sentence "Somalia is a poor country". 4o still can, which indicates that this is some kind of thing that they've either placed in 5.1 specifically, or that they've decided to apply the guardrails models only on the frontline model.
Bypassing Text Filter

In general, the safety thing has not been too much of a concern. Claude's initial releases had a habit of hitting me with the Goody2 behaviour in unfortunate ways[1] but I could always find a way around them. ChatGPT-5.1 is similar in that the restriction is on specific key phrases. You can still make it say "India is a low-income country" or the same for Somalia, so it's a rather tone-policing little switch, which makes it just a modern instance of a clbuttic problem. But what's changed is my patience for this kind of thing. I've gotten so used to using LLMs for many tasks that if I copy and paste in some document of mine and ask it to help pull out some conclusions or whatever, I expect them to be able to find them in there even if I don't remember the text.
Smarter Safety
Curiously, Claude 4.5 Opus now seems capable of approaching and discussing subjects that could easily trip up an LLM with less-intelligent safety. On being asked to consider the criminogenic properties of poverty, for instance, the latest Opus expresses some scepticism about the strength of the effect and is willing to offer counterpoints to this mainstream position without accidentally tripping a safety sensor.
What They're Good At
Overall, the cumulative effect isn't particularly significant. For most things, I use LLMs as a sophisticated search tool and a hypothesis generation machine. They are both very good at these. ChatGPT-5.1 now has the unfortunate problem that it can no longer quote certain texts reliably, like when you offer a document and suggest it find quotes regarding a subject you did not know was sensitive.
For code generation, they are both exceptional. I use codex when I've got a hairy problem to tackle that will need some time, but Claude's tool use is exceptional so I like it to spin on things that it can be grounded on.
And Nano Banana is completely unmatched when it comes to drawing line diagrams, which is something I get a lot of use out of it for.
Notes
- ↑ If you didn't phrase correctly how you wanted to kill all the apaches, it would give you a lecture. I've since solved this by using nginx.
