So, yes, occasionally I do pine for the latest, but I have been largely happy with my M1 MacBook Air. The only place it’s lacking is in handling language models. The new M5 is getting billed as better at “doing AI”, with a particular focus on Apple AI. For those who have already bought a M5 Mac: how does it handle other LLMs? Is there a noticeable speed improvement over older hardware?
I ask this because there is now a rumor that an M5 MBA is coming this year.
1 Like
A quick check, ‘doing AI’ the comments can either be about local LLMs (see below) or about Apple’s Cloud AI – soon be Gemini. For the latter, I can’t imagine M5 matters much. For local LLMs, it’s early to buy an M5 for AI. The processor may well be better designed for AI, but the RAM is going to be the limiting factor.
I do a lot chunk of work with local LLMs and they eat RAM for breakfast. I settled on an M3Max 64GB of RAM (see: M3 Max Memory and Bandwidth) and for current generation models it good, but only barely.
I can run Qwen3 30B locally, but it doesn’t have enough context length to be useful in a lot of contexts.
Most of my use is via Claude and ClaudeCode. Since they are cloud based, my overprovisioned M3Max isnt all that important.
4 Likes
The main improvement with the M5 architecture is ~3.6× faster “time to first token” on LLM workloads, which is essentially faster prefill inference.
This is compared to M4. Compared to M1 it’s ~9.0× faster prefill. And token throughput is ~2.5× faster on M5 than M1.
So, less waiting around.
1 Like
As others have mentioned, RAM is a huge limiting factor when it comes to local LLMs. My 64GB M1 Max MacBook doesn’t have enough RAM to be able to comfortably run even medium sized models, but my Mac Studio M1 Ultra with 128GB runs large models well. So, even with the new processor, make sure you go for at least 128GB RAM.
1 Like
Thanks, everyone, for the really meaty replies!
My question was entirely about local LLMs: I work, in a very very small way, with LLMs inside of Python. I aspire to one day wire together an assembly that lets me run a local model from within various GUI apps — and I’ve played with some of the possibilities — but I’m not there yet.
Recent work has me running one of the llama derivatives inside of Python, where a small collection of 1000 sentences can take 8 minutes to run. I’m happy to drink a cup of tea as much as the next person, but it does put me off attempting larger tasks. I am also reminded of the time I tried to train an early GPT on 1000 jokes collected from Reddit. Best guess was 35 epochs and each epoch was taking an hour on my M1 MBA and using all 8 cores fully. (I had to stop three hours in because the machine was getting awfully hot.)
I guess I’ve heard a few references to there being a better version of shared ram between the CPU and GPU and that the M5 has a bit more GPU. That was what caught my attention.
But, yeah, I totally get that one should not expect too much from any laptop.
Are you considering waiting for M5 Mac Mini/Studio? Possibly more bang for buck.
Great question! But no. While I mostly work in my home study, I do tend to move around the house, and I also travel with my computer. Plus, I use my personal computer at work — I work at one of those dumb universities that claims anything that you do on a university machine belongs to them. (And they also have a terrible record at protecting faculty.) Since my work regularly involves field research data (with people who often let slip things they’d rather not be part of the historical record), I just prefer to use my own machine.
1 Like
I get your answer.
What do you think about having a laptop as a dumb terminal and an M5 desktop running over Tailscale at home as your own super computer? I have a feeling that’s the setup I’ll move to at some point.
That sounds lovely, and I have certainly imagined setting up some PC running Linux (or Windows) to do that with a dedicated and somewhat decent GPU, but I’m afraid that kind of expansive setup isn’t in the cards for me at present!