Thanks, everyone, for the really meaty replies!
My question was entirely about local LLMs: I work, in a very very small way, with LLMs inside of Python. I aspire to one day wire together an assembly that lets me run a local model from within various GUI apps — and I’ve played with some of the possibilities — but I’m not there yet.
Recent work has me running one of the llama derivatives inside of Python, where a small collection of 1000 sentences can take 8 minutes to run. I’m happy to drink a cup of tea as much as the next person, but it does put me off attempting larger tasks. I am also reminded of the time I tried to train an early GPT on 1000 jokes collected from Reddit. Best guess was 35 epochs and each epoch was taking an hour on my M1 MBA and using all 8 cores fully. (I had to stop three hours in because the machine was getting awfully hot.)
I guess I’ve heard a few references to there being a better version of shared ram between the CPU and GPU and that the M5 has a bit more GPU. That was what caught my attention.
But, yeah, I totally get that one should not expect too much from any laptop.