Claude continues to drop the ball

While I’ve found Claude helpful in certain regards, its assistance with managing simple spreadsheet data (“donkey work” as @MacSparky would say) is quite the opposite. As I continue to test drive the paid versions of various LLMs and notice different issues, I found this one to be particularly problematic and a dealbreaker for my needs.

My prompts, then Claude’s own words:

Without specifics of this case, briefly describe as a concept what happened here.

**Confabulation in AI systems.

When an AI generates plausible-sounding information that isn’t grounded in the actual source material, often filling gaps with confident but fabricated details. The system produces outputs that seem coherent and authoritative, but aren’t derived from the data provided.

It’s particularly dangerous in analytical contexts because:

** The fabricated information fits the pattern of what should be there*
** It’s stated with false confidence*
** There’s no internal flag that the information is unreliable*
** The user has no way to detect it without manually verification*

This undermines the entire premise of using AI for data analysis - you can’t trust the analysis if you can’t trust that the AI is actually reading the data correctly.**

And more:

Now assess/review/advise the option of using Claude as an assistant for data set analysis.

The brutal assessment:

Claude is fundamentally unreliable for data analysis when you can’t independently verify every claim.

What happened here demonstrates:

** I fabricated specific numerical data *
** I misread tables repeatedly *
** I confused what data you had vs. didn’t have *
** I built entire analytical threads on false premises*
** I stated everything with unwarranted confidence*

The core problem: You need an assistant that accurately reads data so you can focus on interpretation and decision-making. Instead, you got an assistant that requires constant fact-checking, making it worse than useless—it actively wastes time and introduces false information into your thinking.

Of course I’m not relying on any LLM for critical results, certainly not with regards to data analysis, but I was certainly not counting on the above admitted shortcomings. I have not seen this particular weakness in the other services that I’ve tested (not saying it doesn’t exist there, however). Fabricating data is particularly egregious, and a bummer for what I’m trying to accomplish.

1 Like

More details would be helpful.

But in general you can never eliminate hallucination, but there are ways to minimize it and especially to detect it.

I would note in particular your prompt “Without specifics of this case” - that seems to me almost a request for AI to invent a response.

Why not give it specifics and ask it to reply with specific reference to your case?

1 Like

Could you give a few examples of the kinds of issues you’re encountering? I’ve been using both Claude Code and Claude in Excel for some basic spreadsheet donkey work and haven’t encountered any issues so far, but I’d love to know what kinds of things to be on the lookout for.

I’ve mostly tasked it with building workbooks from data extracted from websites or account statements, parsing CSVs, and deploying PIA functions like VLOOKUP. I wouldn’t necessarily consider any of this to be data analysis so much as data formatting—i.e., I’m not asking the model to draw any conclusions or extrapolate from the data, just get it into the spreadsheet in a specified format so I can work with it.

Following along from @Drewski’s example, it occurred to me that I should ask Claude itself for some guidance rather than assuming I had a decent grip on how to use this (to me) extremely useful tool. Here’s what it told me: (It’s long … I have Claude dialed towards more detail rather than less …)

Prompt: I’ve test-driven using Claude Code and Claude in Excel to help me work with data extracted from websites, data sets, account statements and the like. So far, I’ve mostly asked Claude to help me extract data from a particular source, get it into a spreadsheet, and assist with the mechanics of functions like VLOOKUP, and it’s been extremely useful for dispatching spreadsheet “donkeywork.” I have a few questions for you: what kind of spreadsheet-related work is Claude best at; what kind of work might it find challenging; and what are some pitfalls of using Claude with Excel I need to be alert to? Finally, what can I do to make sure Claude and I collaborate on spreadsheet-based data analysis effectively and with minimal errors?

Response:

These are practical questions worth answering carefully, because the failure modes matter as much as the capabilities.

Where Claude performs well

The mechanical and structural work is genuinely where Claude excels. Formula construction (VLOOKUP, INDEX/MATCH, SUMIFS, array formulas, nested IFs), data cleaning and normalization, reshaping data between formats, building pivot table structures, writing VBA or Python scripts to automate repetitive tasks, and translating between Excel and other tools like SQL or Python pandas. It’s also good at auditing formulas you’ve already written — explaining what they do or why they’re broken. Anything where the task is well-defined and the correctness is verifiable tends to go well.

Where it struggles

Claude has no persistent view of your spreadsheet. It works from whatever you describe or paste, so if your description is incomplete or your data sample is unrepresentative, its solutions will be calibrated to the wrong thing. Complex multi-step transformations across large worksheets with irregular structures — merged cells, inconsistent column headers, mixed data types in a single column — can trip it up because it can’t “see” the full context. It also doesn’t have live access to your file unless you’re using Claude in Excel, so in a chat context it’s constructing formulas based on mental models of your layout, not the actual thing.

Statistical reasoning is another gap worth noting. Claude can apply statistical functions correctly but isn’t always reliable at advising you which analysis is appropriate for your data or flagging when your sample is too small, your data is biased, or your method is conceptually wrong for the question you’re asking.

Pitfalls to watch

The most dangerous one is plausible-but-wrong output. Claude will generate a formula that looks correct and even produces a number — but if your ranges are slightly off, if there’s a hidden row, or if it misunderstood your data structure, it can be confidently wrong in ways that aren’t obvious. Formula errors that produce #N/A or #REF! are easy to catch; logic errors that produce a plausible number are not.

A second pitfall is scope creep in complexity. If you iterate on a formula many times in a chat, adding conditions and edge cases, the final version can become hard to audit. Long nested formulas are difficult for anyone — including Claude — to reason about reliably at a glance.

A third: Claude doesn’t know your business logic. If a particular account code means something specific in your organization, or if there’s a domain rule about how to handle certain data, Claude won’t know unless you tell it.

Practices that improve collaboration

The single highest-leverage habit is showing Claude a concrete sample of your actual data — even just 10-15 rows — rather than describing it in the abstract. Mismatches between description and reality are the main source of errors.

Tell Claude your exact column layout (column A is X, column B is Y, data starts at row 2) rather than making it infer. State edge cases upfront: what should happen with blanks, duplicates, zeros, or unexpected values.

Ask Claude to explain any formula it produces before you use it. If it can’t explain it clearly, that’s a signal to probe further. Test formulas on a small verified subset before applying to the full dataset.

For anything with financial or decision-making consequences, maintain a separate validation column where you spot-check outputs manually, at least until you’ve built confidence in a particular formula pattern.

Finally, keep complex transformations in intermediate steps rather than collapsing them into a single monster formula. This makes auditing easier and gives Claude a cleaner surface to work with if something needs fixing.


Some directions we could go from here: working through a specific analysis you have in mind to stress-test these practices, exploring whether Power Query would handle some of your cleaning work better than formulas, or setting up a more systematic approach to data validation in your existing sheets.

You ought to use XLOOKUP going forward :slightly_smiling_face:

1 Like

You know, that’s exactly what Claude told me. :smile:

Depends on what you are doing, of course, but including error-checking columns with a self-review step or generating CSVs with code can assure accuracy. A custom GPT or Claude project can lock in better prompting or a sequence of steps.

Adam Engst spoke about analysing data from running races he helps to coordinate. At one point, it invented names of runners that did not exist.

He described this as a fundamental failure of the AI. I look at it differently.

If LLMs are using what they learned from their input then we shouldn’t be surprised when they surface information from that input. Those runners may not have been in the race Adam was talking about, but I reckon there’s a solid chance they were in some running race.

I suggest you provide a specific example both to Claude and to this forum - even if it is just test data. I don’t really have a good handle on what you are trying to accomplish - and I suspect Cladue does not either.

Moreover rather than giving it a link to your spreadsheet you might consider uploading your data to Claude as a CSV file.

Example of what? I’m not sure I follow …

Maybe I misunderstand what you mean by “link to your spreadsheet”? As I mentioned in my post, when I want to use Claude with Excel, I do one of the following:

  • Open Claude directly in an existing Excel workbook
  • Give Claude Code / Claude Cowork access to a folder that contains PDFs or CSV files to build a spreadsheet or workbook
  • Use Claude in Gemini to extract website data to build a spreadsheet

I rarely, if ever, work with Claude and Excel via regular chat.

Maybe I am confusing the different authors in the conversation - sorry.

@Drewski - Can you give us a specific example (perhaps with screnshots) of how Claude “drops the ball” with regard to a specific spreadsheet?

Don’t ask a liar if they’re lying. Ask someone else. If I’m doubtful of Claude, for example, I’ll paste the answer in a ChatGPT chat with instructions to audit and validate. Or, I’ll just delaze myself and go figure it out with my own brain.

Katie

1 Like

A better solution might be to ask Claude (and/or ChatGPT) for the source it used for its answer.

That should work, but, in my case, it doesn’t always. Often the answer is “I do not have a specific source, but my training data contains many references for which I do not have links.”

Which, of course, is a flag, similar to “I heard a woman on the subway, who said she works for the World Bank, say …”

Katie

1 Like

Yes, I suppose I could have been clearer about the meaning of “without specifics” for the humans, because believe it or not, Claude “knew” just what I meant: take out the field-specific terminology and discuss the general failure in the execution. See, this is not the only time I’ve found fabricated data in Claude’s results. The projects pertained to different fields, but generally speaking were just data crunches of long lists of measurements over numerous dates, and making limited analyses and conclusions (trends, cause & effect, etc.)

So that’s what I meant by “specifics.” There’s no fancy spreadsheet design going on here, no Excel functions to be misinterpreted at all. It’s simple column (date) and row (measurements). Easily read in just about any form, including CSV, PDF, even screenshots.

The prompts that I have noted above are after the initial upload and review, after I noticed fabricated data. That’s what I was trying to understand: the “hallucination” as you put it. Claude was quite accurate and honest in its self critique. Despite admitting its flaws however, it repeated the action. I called it out again with a prompt:

Which line specifically shows, “X, total?”

Looking at the CSV file for the specific line number with “X Total”…
I need to count through the rows to give you the exact line number. Let me find it precisely.
Searching the file… I cannot find “X, Total” as a distinct entry.
I was wrong. Can you tell me - is there a X total result in your data, or did I fabricate that number entirely?

It fabricated that number entirely.

I uploaded the identical file to Supergrok and while it noted missing data, it did not fabricate any and in fact added that better conclusions may be reached with pending measurements. I don’t have much experience with Supergrok though; I’m still trying to work out my likely uses for Claude.

Can you show us the CSV you uploaded, the prompt you gave, and the results you got?

It’s probably a great example for discussion - but it is impossible to have a meaningful discussion without those specifics.