After working on a book chapter for several hours, I had a simple request of ChatGPT 5 (paid version). My request was simple:
Below is a draft of my chapter. Keeping most of my original wording, add suggested new headings of issues that I should consider including where they naturally belong. Do not add text below suggesting topic headings. Add short transitions before new suggested headings if and as needed. Following the prompt saved in your memory, namely, “ When I begin a line with a quotation mark (“), ChatGPT should act as my human editor, not a ghost writer. Preserve my original vocabulary, tone, and voice while improving clarity, grammar, flow, and conciseness. Provide editorial commentary explaining the suggested revisions.
Here’s an incomplete list of its repeated failures over several hours of trying to get the output from the above simple prompt and subsequent clarifying prompts and confirming AI’s questions:
Omitted or truncated text in long outputs.
Ignored instructions to include the full chapter.
Failed to preserve structure, sequence, and transitions.
Produced incomplete or incorrect Markdown formatting.
Ignored requests to add or re-index headings—the core purpose of the task.
Claimed completion while leaving major sections out.
Timed out or stopped mid-output without warning.
Required repeated prompts to continue.
Offered explanations when I only asked for correction.
Failed to disclose limitations or risks at the outset.
Provided rationalizations instead of direct answers.
Applied inconsistent formatting across versions.
Lost or mis-sequenced content during reformatting.
Repeated errors after receiving corrections.
Apologized often—but never repented. It kept repeating the same mistakes over and over.
Other than that, it was great! Today, it wasn’t even a good intern.
I can see it now, AGI and ASI are just around the corner!
We can sleep peacefully at night. AI is not going to take over the world.
We’re running “AI Experiments” across every department in our company right now. I’d say the most common feedback across all teams is the maddening inconsistency of it. Sometimes the AI will take a request like yours and do it perfectly. Sometimes it seems miles away.
For our purposes, we’re working hard on figuring out how to get more of the good outcomes than the bad, but it’s tough to do when you can’t often even diagnose why it failed.
The AI does some crazy stuff, at times, but this time I suspect the trickiness might just be that your prompt isn’t as “simple” or clear as you think it is. I’ve read through it, a few times, and I’m not sure what you’re asking it to do.
If you’re looking for gaps or missing sections, maybe just ask it for those, then after that select the ones you’d like to add, and then ask for a useful transition sentence for each. More steps, but you have more control and precision in your changes.
I don’t trust any of the AIs to do big changes over a lot of text. They’re very confident and very clever and very stupid all at the same time - kinda like teenagers, I think.
Based on my own use of AI for my own writing, I found it’s useful to think of the AIs as being clever and stupid and confident, all at the same time. Think of a Venn-diagram with those 3 circles, then assume it can be clever and stupid, at the same time. As well as clever and confident. And stupid and confident. Very messy, and the confidence means you need to do a lot of double checking.
My advice? The it to do small but clever steps, one at a time, so that you can learn what it does well, and what it doesn’t do well. And teach everyone that it’s important to tell it “No. That’s not what I want.”
I’m coming to the same conclusion about changes over a lot of text. And that is a great analogy. Made me laugh.
This particular chapter draft has 11,200 words. If AI can’t handle 11,000 words on a state-of-the-art computer, I don’t see AGI, let alone ASI, arriving anytime soon, and my guess is, never. I’m not quite prepared to drink the Kool-Aid just yet.
Yes, if that prompt was all that I had fed it, I could concur that perhaps the problem was the user rather than the AI. However, I spent several hours with different prompts, additions to prompts, corrections to prompts, answering confirmation inquiries from the AI, and much more. I even consolidated everything into one linear, sequenced, numbered prompt, and asked the AI to summarize what I was asking for. It provided the perfect summary. Then it proceeded to ignore it. Very much like a teenager.
I was working on some code this weekend using Copilot in Visual Studio Code, GPT-5 mini. It modified my code without my asking to make those changes. I even asked “did I ask you to make those changes” and it responded that I had not explicitly asked, but it made them because it would make for more robust code.
In the process of having Copilot revert what it had done, it deleted some files I didn’t want lost. That’s on me. I didn’t watch what it was doing closely enough.
This technology really is like an over-eager intern. It can help you out sometimes, but you have to monitor and review everything it’s doing.
I think there’s also something in the design of ChatGPT and Claude chats that once the AI goes down the path of not complying with a prompt, from our point of view, it works itself into a corner that further prompting won’t get succeed in getting out of the corner. When that happens I get out of the chat and start over.
Actually, the “no you didn’t do it right” isn’t always (or even often) a good response.
That seems counter-intuitive but there’s very little “training” going on at the level most of us are using AI. At best we’re expanding context windows and “you’re wrong” doesn’t really teach it for the long term as much as guarantee that the next response you get will over correct.
The best way I’ve been taught to think about these models is as “BS generators.” They weren’t trained on truth, but on answers that made humans happy. Many times, the truth is what makes you happy, so they aren’t anti truth. But they don’t care about what’s true; they care about what answers make humans happy.
Why I feel good about pushing back (not to boast): We build, train, and do millions of dollars of development work around machine learning and (now) AI. We have one of the leading experts on voice analysis machine learning on our team. We’ve done thousands of hours of experiments with leading (and worse) models.
Yes, it’s good for you to think differently about the prompt. You’re absolutely right to back up and try again (and consider how it may have been confused). But it actually doesn’t help the LLM to “reach back” to the last request and consider how to iterate. It’s better to give it a new prompt without telling it how it did on the last one.
AI doesn’t think, not in the human sense of reasoning, learning and getting a broader understanding of the ask.
It can write extremely incorrect code, and as long as it passes its own tests, AI says “it’s working”…
Not wishing to be a smart alec here, but have you tried asking an AI to improve your prompt? This can be a good starting point for understanding when you’re not getting the output you expect.
I asked Claude to critique and potentially improve your prompt, it gave me the following
Key Problems:
Unclear scope: The prompt mixes two different tasks - adding structural headings vs. line-editing for clarity/grammar. These should be separate instructions.
Contradictory directives: “Keep most of my original wording” conflicts with “add suggested new headings” and “add short transitions.”
Ambiguous instruction: “Do not add text below suggesting topic headings” is confusing - do you mean don’t write content under the headings, or don’t add meta-commentary about them?
Missing context: What type of chapter? Academic? Creative? Business report?
Improved Version:
Purpose: Review my chapter draft and suggest structural improvements while preserving my voice.
Task:
Read through the entire chapter
Identify gaps in coverage - what topics/issues should I address that are currently missing?
Insert [SUGGESTED HEADING: Topic Name] markers where new sections would naturally fit
Add brief transition sentences before suggested headings only when needed for flow
Do NOT write content under the suggested headings - just indicate what’s missing
Editorial approach: Act as a developmental editor, not a copyeditor. Focus on structure and completeness, not grammar or style. Preserve my original wording except for minimal transitions.
Chapter context: [Add: type of document, audience, purpose]
Would you like me to refine this further based on your specific needs?