Analysing in spreadsheets

Not so much a question but a gripe, but if anyone has suggestions I’m all ears.

I’ve been trying to extract information about UK energy supplies from the data published on a UK government site called DUKES (search if you wish!), which catalogues energy flows across sectors of the economy. It has PDF reports backed by numerous spreadsheets of relevant data.

They are complicated, so I thought Gemini 2.5 Flash — which I have access to through my Google Workspace account — might help. Duly uploaded a couple of reports and 8 relevant spreadsheets and started asking questions, using it a bit like a well trained research assistant.

All seemed well to start. Reasonably successful to start, but after half a day it began giving me answers that were inconsistent with previous ones, even though those previous ones had been seemingly self consistent as we went along.

Pointing out discrepancies it started apologising and saying its earlier values had been wrong, but couldn’t explain why. I now have little faith in any of them; at least without cross checking by hand.

Has anyone had similar experiences? Any tips or solutions? Is this just what happens if you dwell on one subject for too long, the model just gets overwhelmed? Any suggestions as to a better LLM than Gemini — I have a Kagi subscription so can access others.

Thanks.

You’re hoping to get something that artificial intelligence in its present state can’t give you.

2 Likes

Its hard to judge without seeing the exact data, but it depends on whether the answers are specifically available in the information provided, or whether you need to do calculations on the data in order to get the answers.

If its the former, ask it to show you the answers in the text it found so you can verify for yourself that it got the right thing.

If you’re asking it to calculate answers based on data out of the spreadsheets, then either ask it to show you what its process was, or do multi-stage prompts where you ask it to pull the unprocessed data first, then tell it what to do with it.

1 Like

Step one is to ensure you really understand the data in the spreadsheets. A common issue is that a column heading made perfect sense to whoever first created it, but may be ambigous or difficult to understand its exact significance.

If you don’t understand it, how can you trust anything an LLM would tell you? It will also need to make assumptions about what the data means (for which it is ill equipped).

@nfdksdfkh has a great point about calculations. If you need additional calculations, I would recommend you edit the spreadsheets and have Excel (?) do the math.

Once you have a dataset that suits your analytic needs, you’d probably have more luck just exploring it yourself in a pivot table than expecting AI to do any heavy lifting.

1 Like

The fundamental issue is that current AI models have no understanding of the data they are dealing with. They have been trained on lots of data and the training abstracts patterns within it, which they then use to model appropriate outputs when presented with new data. That can be impressive, but there are no “reality checks” so as generic analysis can be good, but fall apart as you dig deeper.

Calculation is a similar issue. Although this is gradually changing, producing a calculated output by plausibly applying a pattern to new data is not calculation.

1 Like

If you ask a commercial AI to calculate something, it will usually write a python script and calculate the answer from scratch. So it isn’t just pattern matching, it is calculated fresh.

Sadly, this was your first mistake.

Gemini won’t have the background knowledge of the data you’re working with or what it means, It doesn’t sound like the data you’re working with is “world knowledge” either, it sounds specialised and therefore AI will not quite know what to do with it.

Add onto this that AI is not good at maths, and I can see where you may have got the results you did.

1 Like

Lesson learned. I had rather assumed that if I could look at the written reports, follow the references to the spreadsheets and pull the underlying data, an LLM could do it too, just do it faster.

Ah well, back to paper and pencil.