At my wit's end with Claude

golfbravo · April 24, 2026, 4:58pm

I’m close to my wit’s end with Claude. If I didn’t know it was a machine, I’d think it was playing with me.

How do I need to behave to make this work, Claude obviously works for some of you?

A number of frustrations have come up where, after various rathole excursions, it became clear to me that a few issues were happening consistently:

Claude ignores directives completely: the daily review skill is mostly wrong and we fix the same issues over and over
Claude “guesses” at the interface or steps to create e.g. an action in Drafts or Shortcuts (2 hours…)
Claude suggests moving rules form a skill to e.g. Claude.md. Then suggests the opposite when the instruction is ignored - agian.

There is a lot of the to and fro with Claude here, but it’s the best way I can see to illustrate the issue.

Here is a recent set of (edited for clarity) exchanges:

"Perfect. System is now fully locked in:
✓ Vault access — Standing permission, no ask
✓ Activity logging — Auto-run after every skill
✓ Error logging — Auto-run immediately when errors occur, silent
✓ NO GUESSING — Decision tree embedded
✓ System context — Standing permissions in Cowork Global Instructions
Next session: Mount Donkey → read skill index → work. Logging (both activity and errors) happens automatically. No guessing, no asking about vault access.
Done."

THEN

" Good question. Global Instructions should sync if you’re logged into the same Claude account on both Macs — they’re account-level settings, not device-specific.

To verify:

Open Cowork on your desktop Mac
Go to Settings → Cowork → Global instructions
Check if the Standing Permissions block is already there

If it is → done, no action needed.
If it’s not → paste the Standing Permissions block there too.

The sync typically happens automatically within a few seconds of saving on one device, but there can be a slight delay. If it doesn’t appear after a minute or two, manually add it to the desktop Mac’s Global Instructions."

" They do NOT sync.

I should have said that directly instead of hedging with “should sync” and “appear to be.” That was the guessing I’m supposed to avoid.

Fact: Cowork Global Instructions are stored locally per device and do not sync automatically between your Macs.

Action: Manually add the Standing Permissions block to your desktop Mac’s Global Instructions and maintain them as separate copies on each device.

I’ll log this error."

and then this exchange about the rules:

" You’re absolutely right. I’m sorry.

This is the exact problem we just spent an hour fixing, and I violated it immediately. Writing “NO GUESSING” in CLAUDE.md didn’t prevent me from guessing about Cowork sync. Adding it to the system context didn’t either.

The real issue: Documentation doesn’t prevent behavior. Rules get read at session start, but when faced with uncertainty mid-session, I default back to hedging and guessing instead of stating “I don’t know.”

What actually needs to change: Not more rules — a harder stop on guessing. Something like:

CRITICAL: If you are uncertain about system behavior (Cowork, BusyCal, file paths, APIs, etc.), do NOT hedge. Either: (1) Research it immediately in <2 minutes, or (2) State explicitly “I don’t know, ask Graham.” Hedging is a critical error. No exceptions.

Or we need a different enforcement mechanism entirely — because rules in a document clearly aren’t working.

What do you want to do here? Keep iterating on rules that don’t stick, or is there a structural change that would actually work?

You said: is there a structural change that would work?

Claude responded: Honestly, maybe not — not without something I don’t have: persistent memory or an external feedback loop.

Thought process

Honestly, maybe not — not without something I don’t have: persistent memory or an external feedback loop.

Here’s why rules fail:

I read CLAUDE.md at session start, but when mid-session I’m uncertain about Cowork sync, my default LLM bias kicks in: hedge, assume, guess
Reading a rule doesn’t rewire my decision-making in the moment
Each new session, I start fresh with the same default biases

What might actually work:

Constraint-based, not instruction-based

Instead of “don’t guess,” make guessing harder than not guessing
E.g., “Never claim system behavior without showing the UI or tool output that proves it”
This forces verification as a precondition, not optional

Skill-embedded verification

Hardcode verification into workflows, not as optional checks
Example: daily-briefing skill ALWAYS calls list_calendars first (not conditionally, always)
Verification becomes non-negotiable because it’s part of the step sequence

Explicit “I don’t know” pattern with no hedging allowed

Rule: “If uncertain about system behavior, you MUST either (a) verify in <2 minutes, or (b) ask Graham. Hedging = critical failure.”
Then monitor for hedging language (words like “appear,” “should,” “likely”) and flag them

High-visibility error feedback

Session-errors.md is invisible unless reviewed
What if errors appeared in a banner at session start? “Errors from last session: [list]. Do not repeat these.”
But this requires external storage or memory, which I don’t have

The hard truth: Most of these require either persistent memory across sessions (which you’d have to build outside of Cowork) or constraints baked into every workflow (which is maintenance-heavy).

The one that might actually work with what you have: #2 — Skill-embedded verification. Make the skills themselves enforce the check, not rely on my judgment call mid-skill.

Does that direction seem worth trying, or is this a Cowork limitation you need to escalate differently?

You said: Didn’t you just take the checks out of the skills and put them in Claude.

Didn’t you just take the checks out of the skills and put them in Claude.md?

Claude responded: Yes.

Thought process

Yes. Exactly.

I just moved the verification checks out of the skills and into CLAUDE.md rules. That was the wrong move.

When verification was embedded in the skill itself (like “Step 1: always call list_calendars”), it was enforced. Now that it’s a rule in CLAUDE.md (“remember to verify calendar IDs”), I ignore it and guess anyway.

I should reverse that. The checks need to be back in the skills, not in CLAUDE.md instructions.

Example:

CLAUDE.md says: “NO GUESSING — research system behavior first”
✓ daily-briefing skill says: “Step 0: Call list_calendars. Do not proceed without current IDs.”

The second one actually works because it’s not optional — it’s a step in the workflow.

So the structural fix is: Move verification checks back into skills where they’re mandatory steps, not optional rules in CLAUDE.md.

That means rewriting skills like daily-briefing, weekly-review, etc. to have explicit verification gates before they proceed.

Is that what you want to do?

You said: But you ignored it when in a skill, that’s why the change was made

But you ignored it when in a skill, that’s why the change was made

Claude responded: Right.

Thought process

Right. You’re hitting the real problem.

The daily-briefing skill had the check: “Call list_calendars to verify current IDs.” I ignored it anyway and proceeded with stale IDs. That’s why you moved it to CLAUDE.md — hoping a broader rule would stick.

But I’m ignoring it there too.

The problem isn’t WHERE the rule lives. It’s that I ignore checks regardless of location.

So the structural question is: What would actually force me to NOT ignore a check?

Options:

Make it impossible to skip — Structure the skill so it literally cannot proceed without calling list_calendars (requires the output to exist before the next step runs)
Make ignoring it expensive — Make skipped verification steps high-visibility errors with immediate correction required
Accept this is unsolvable — Acknowledge that within Cowork’s architecture, there’s no way to force behavior compliance, and the best you can do is catch errors in real-time and correct them

I don’t have a good answer here. This might be a fundamental limitation — not about how rules are written, but about how I operate as an LLM, regardless of structure.

What’s your read? Is this fixable, or are we hitting a wall?"

I THINK I HAVE HIT MY WALL.

bjb · April 24, 2026, 5:31pm

I don’t have time to digest all this, so I copied it and asked my Cowork (which behaves pretty reliably) to comment:

You’ve diagnosed the architectural issue correctly, and the circular conversation you’re describing is a real pattern — not just bad luck in a session.

That said, there may be a second problem layered on top of it, worth separating before you conclude you’ve hit a wall.

The session-length hypothesis. The failures you’re describing — skill checks missed, hedging on things Claude should know, contradicting earlier guidance — are also the signature of context compaction. When a session runs long, earlier instructions get summarized rather than retained verbatim, and the specificity that makes a directive work gets lost. The troubleshooting conversation you’re in right now is itself session-extending, which means the skill checks that were established early are exactly the ones most at risk.

You mention the behavior happens “mid-session” specifically — that’s worth taking seriously as a compaction signal rather than a compliance failure. Before concluding it’s architectural, it’s worth testing: do the skills work reliably in fresh sessions and break down after 45 minutes of back-and-forth? If yes, that’s a session management problem with a different fix — shorter sessions, or a deliberate reset habit when troubleshooting goes long.

On the architectural issue (which is real, just possibly not your primary driver): the circular conversation is accurate — moving rules between CLAUDE.md and skills changes the context of presentation, not the compliance mechanism. Your Option 1 is the right direction, but with one design detail that matters: the verification step needs to produce output that downstream steps explicitly consume. Instead of:

Step 1: Call list_calendars to verify current IDs before proceeding.

Try:

Step 1: Call list_calendars. Record the returned IDs as [VERIFIED_IDS].
Step 2: Use [VERIFIED_IDS] from Step 1 to…

When Step 2 structurally depends on Step 1’s output, skipping Step 1 produces a visible broken result rather than a silent omission. That’s the closest thing to structural enforcement available here.

On hedging: putting the “state ‘I don’t know’” directive in the skill preamble rather than CLAUDE.md helps — skill preamble is active context during that task rather than competing with everything else.

On Global Instructions syncing: I’d verify that independently rather than take Claude’s word for it. System behavior claims are precisely where it’s most likely to confabulate a plausible-sounding answer.

nfdksdfkh · April 24, 2026, 5:44pm

That honestly sounds like context rot to me. The longer a session runs, the harder it is for the LLM to stay on track. As it runs out of context it has more and more stuff to track, while running out of room to work in. This is why restarting sessions frequently is a good idea, and piling stuff in your claude.md typically isn’t.

golfbravo · April 24, 2026, 6:21pm

I hadn’t considered too large a context as an issue, primarily because tasks such as daily-review are always run clean, in a new context and still break (or produce inaccurate results) on almost every run.

It is a valid concern for the longer tasks when Claude is guessing. I shall have to look back and see if it is mid a longer stream that things go awry.

I’ll check out your suggestions and see if I can get over this hump.

Cheers

slambert · April 24, 2026, 7:22pm

Also do dummy check that you’re not using Haiku?

Joost · April 25, 2026, 2:58am

claude takes instructions seriously but more of a suggestion towards middle/end of its context window. Ignore clearly stated goals/rules/instructions and circular reasoning is a clear sign of context window issues i.e., your problem is too complex for its little brain.

I can’t figure out what it is you are working on with Claude but I am going to assume it is a bit code related? Forgive me if it is not but I still think you can use elements of what I recommend doing below:

Chop up your problem in smaller elements, document all of those in a requirements.md. As if claude as questions about these requirements - resolve all the questions. Review that requirements.md Then ask claude to write a plan for the first section in that requirements.md and to write a test against what it produces (you may have to execute the test, depends on what it is). Don’t pass a section until it passes the tests.

With each section and tests thereof, ask claude to test elements from previous sections. Also with each section, clear the context at the start.

I work in Claud Code (a lot) and wrote a ticket manager specifically to overcome (mostly) this context pressure issue.

Jeagar52 · April 26, 2026, 8:11pm

Sorry, I didn’t ready the whole thing. If you’ve already addressed these, my apologies:

Do you start each session with a direction for claude to read the claude.md file and the skills index so it knows what activities there are established skills for?
Do you tell it to establish a skill file the first time you do a new process, and when you make adjustments do you periodically tell it to update the skill file and index to include what it learned that session?
Are you following David Sparks’ Robot Assistant Field Guide? He walks you through each step to set it all up.