Claude Use Limits

I’m hitting this constantly this week - I subscribed to Pro a week ago but have only been hitting limits the last 2 days. Been working thru the course & running the assembler & some non-skills (or not for repeat?) stuff. But I started on working thru my file structure & planned to get to the re-naming & ran out of tokens yesterday.
Today< I did my morning start up, updated the soil, then started working thru my NHS pension forms - extract data from some Pdfs enter into another pdf, do some calculations (it could not get the numbers in the right place so it suggested working thru chrome - but then read out of gas). Appreciate it’s probably quite difficult to judge how much I’m using from that description but wondered if I’m missing something (changed from sonnet to haiku between yesterdays work & today). thoughts & suggestions? I can’t justify increasing to the next tier.
Thanks team. (& especially David - I am loving this and excited by it - it’s been a while since that’s happened on my admin!!)

1 Like

I was away for a couple of weeks and hadn’t run into usage limits once. I came back yesterday and hit them within about three hours of working. Today was better, but I also was being a little more cautious about what I asked the Robot to run - which is not ideal at all. I also can’t get the robot to tell me how many tokens different tasks used as was suggested in the webinars so I’m not sure what I’m doing wrong here.

1 Like

Given everything I’ve been reading this week in the financial news, I think all the frontier model providers are dynamically adjusting prices/usage. Claude Code/Cowork are hits. Unconfirmed, but my sense is that we will get less mileage/$ going forward. It will pay to watch which models you use, etc.

Open question: Once a skill has been developed, can it be executed by a “low-end” model like Haiku 4.5?

Thoughts?

A skill can be used by any model. However results may vary like with any other usage

1 Like

Yes. We’re approaching something akin to the moment when Uber stopped subsidizing fares. Or, looking at it another way, when what we wanted to do with our computers outstripped the RAM or storage space we could afford and we had to ration carefully. It’s annoying, but not unexpected.

I hop between different bots because I am not paying for it. If Claude limits free usage I will just move on.

Here are a couple of things I’ve tried:

  • when a session get’s too long, ask cowork to summarize the important things in the session so you can take the summary and use this to start a new chat.

  • one of the biggest things is to minimize the claude.md file to system wide and general items. I used to have naming conventions and file structure there, after moving those I’ve reduced the token use 60% (estimated). Have Claude.md help you to minimize the length. Apparently the process it that Claude reads the Claude.md and every back and forth comment every time you says something in the session.

  • Because of the above, give it multiple things in a request. Instead of “get this”, wait for the response, “get this next thing”, wait for the response… say “get this then get that”.

  • It’s hard not to but don’t end it with a thank you or some other nicety. It burns tokens.

Here’s the part of the Claude.md file that I referenced above:

When proposing a clear next action that would benefit from continuation in a new chat, ask first then add one short ready-to-paste prompt on the next line:
Want me to [do X]?
[concise copy-paste prompt]
Use this only when it meaningfully saves tokens. Skip it for small or trivial steps.

  • Keep this file lean — rules and high-level architecture only. Session history lives in Skills/_build-log.md. Audit monthly for drift.

Hope this helps.

1 Like

I think it will end up being as expensive to run AI on admin tasks as it would using other systems. It’s tempting to get AI to do everything, but I’m now only focusing on heavy lifting.

It seems to take a lot of tokens to produce formatted documents and presentations. I now only produce markdown files and create the docx or pptx myself.

2 Likes

+1 on this.

Along these lines, I found this article from Christopher S. Penn to be a good breakdown on why to choose AI created shortcuts/scripts/workflows over using AI “thinking models” in some instances in order to be more efficient (and save tokens/money/resources):

Making AI More Efficient (Stop reinventing the wheel) - Christopher S. Penn

TL;DR: AI reinvents the wheel over and over again and can be very inefficient and expensive for tasks that can be automated in other ways.

I’m on the $100 max plan and I’m hitting my limits constantly. Something changed and token burn has become a real issue. I am now torn between the idea of enabling extra usage or jumping straight to the $200 plan. The issue is the more you go down this rabbit hole, the more difficult it is for you to wean yourself off of Claude Cowork.

I’ve found the same thing. It was fun using Cowork to do everything but I kept running into the limit, even when using Sonnet. I’m using Obsidian, so for some of my skills I just pointed Claude Code at the skill file and said ‘build a plugin for this’. The plug-ins work great. Cowork helped me build a framework for my documents and now I just use the plugin to create them. It may not be as cool, but I don’t need an AI bot to create a note about one of my servers or a software update.

2 Likes

A funny challenge, the models don’t understand how they work. They have no model of self. If we ask a model how many tokens it will use for a task there are three possible answers:

  1. It says I don’t know - truthful
  2. It answers - hallucination
  3. It answers - with content from it’s training data

Worse. I know from digging elsewhere even Anthropic doesn’t have good guesses as too how many tokens a query will use. Cowork and Claude Code make it even harder to estimate, both use agents to complete small tasks. Any estimate would need to guess how many agents a given task requires and how many tokens each agent will use.

1 Like

Here’s what I know so far based on approx 1yr of Claude Code + Cowork.

Reading/Creating more complex file formats is more expensive. Word docs ($$) and Powerpoint ($$$). Getting it to work on graphics $$$$.

I never get it to work on PowerPoint or Graphics.

Mostly I work on:

  • my Obsidian Vault
  • My website: https://agilepainrelief.com (Astro/Typescript/MDX - a specialized type of Markdown)
  • Newsletter and other mailing list-related items - Markdown and Typescript
    …other Markdown files

As long as I stick to that, I rarely run out of usage, and I’m only on a $100 MAX plan.

How we interact also affects token consumption (most to least):

  • Conversation and asking Cowork to do something
  • Using a “Skill” file
  • Using a “Skill” that has code to automate repetitive steps
  • Claude writes code to do the work itself.

For example, this skill: awesome-claude-skills/video-downloader at master · ComposioHQ/awesome-claude-skills · GitHub has Python code to do the heavy lifting. The net result it will use far fewer tokens. Before installing a skill from an outside source, read the skill and the code. Know what it is doing on your behalf.

At the extreme end of the spectrum, for repetitive tasks, I get Claude code to write the code to do the task.

1 Like

Try running the command /context.

See the report is gives - Context Report

1 Like

Where are you getting \the token count from? I’ve been unable to persuade Claude to produce anything other than %?

In one of the RA webinars this week, David shared that to reduce the use of tokens at the start up, he had Cowork set up a “skills directory” that was a pared-down summary of the skills index. So instead of asking Cowork to read the skills index, he now has it read the skills directory.

I set this up on my system, fed him my current bloated start-up instructions, and asked him what were the best instructions for me to give. He suggested: “At the start of this session, read Notes/CLAUDE.md and Notes/Skills/_skill-directory.txt via the NotePlan MCP. These are your source of truth. Check the skill directory before starting any task — if a skill exists for it, read that skill file before proceeding.”

I asked him if I reduced the tokens used for startup by creating a skills directory and the response was yes, about 45% less.

1 Like

What created this report?

The models aren’t self aware. So asking a model to generate this doesn’t create a meaningful report.

Update The model itself is still not self aware. However, if you use ClaudeCode and not Cowork, the harness is tracking how much of the context window is being used. This isn’t quite the same as token usage, but it’s good start. For most purposes, understanding the context window usage is more useful than the token usage itself.

I’ve found that using Sonnet or even Haiku works well for running my developed skills. I’ve even had success building new skills with Sonnet, and this has solved my frustration with bumping into timeouts on the lowest priced tier.

One way that I’ve triaged the model selection decision is to ask Claude to propose a plan of action for approaching a problem or workflow, and then ask what level would be enough firepower to solve a specific query. If it will require restarting Claude at another level, I just ask it to store the preliminary results in a note in NotePlan. Then after the restart just direct it to that note, and off we go. It’s quick and easy and allows for accomplishing longer tasks through iteration.
Hope this helps…

1 Like