M3 Max Memory and Bandwidth

Snowflake problem. I suspect MPU is the wrong place to ask, but I don’t know where to go yet. I want to buy an M3 MBP and have it last 4-5yrs. (The only way I can justify the jaw dropping costs).

I need 4TB SSD - that’s easy. It’s the memory and bandwidth that are killing me. In the next few years, I will make extensive use of LLM’s on laptop. Some of this will involve rolling up my sleeves and customizing use of one of the Llama models and then running directly on my laptop. I really don’t want to spend my time and money training OpenAI’s models for them.

So I need to figure out where to spend my money. There are four configs that I could afford. (Let’s not call them practical):

  • M3 Max 14/30 Cores - 36GB - $5.6K, 96GB $6.6K
  • M3 Max 16/40 Cores - 48GB - $6.2K
  • M3 Max 16/40 Cores - 64GB - $6.5K
    Pricing is Canadian

Am I better off with more GPU Cores and more Memory bandwidth? The 14/30 - 200GBs vs 16/40 - 300GBs.
Is the difference between 48GB and 64GB going to be noticeable? Or just an expense?

Where else would you ask? In other words where to LLM people hangout?

1 Like

I think the 16/40 has up to 400GBs memory bandwidth, no? Perhaps I am mistaken.

I don’t think you’re asking in the wrong place, per se. I do think that most people can probably say how much RAM they’r using. A little swap is ok, unless you’re swapping in large datasets at a time, which I’d imagine customizing a Llama model would.

I use anywhere from 48-64gb of RAM or so right now on my 64GB M1 Max machine. The 64GB of RAM is therefore mostly a nicety that keeps me from having to update the machine soon. I also think 64 is a pretty reasonable stopping point for most people. I think 48 would be impractical for most developer use cases, and 96 is probably overkill for anybody who doesn’t self-classify themselves as needing that much out the gate. (In other words, if you have to ask, you don’t need 96GB.)

I couldn’t tell you where to go to ask a better group of people this question. In my opinion (which is jut that), Macs excel at local AI work, and I’d get the 64GB Max. (Coincidentally, if I were to order that today for my own Docker-based dev and web design needs, I’d order that machine too.)

I also think the M3 Pro could work for most developers if it offered 64GB of RAM (which is probably why it doesn’t).

4 Likes

You can load a ~60GB llama2 model right now that would take a few minutes to generate on the 30 core GPU, so maybe 1-2 minutes less on the 40 core. We still haven’t found the limit of useful token size.

So I think you could go to 96GB or 128GB, personally. It might be worth waiting for the M3 Ultra in the Studio form factor to save money if you think you’ll do the heavy work at your desk.

You’ve also questions like:

  • are you going to push to use newer models or bet on efficiencies coming to existing models?
  • is it important to be able to have multiple models loaded?
  • how much of your other stuff are you willing to close to work on these? Are you bullish on in-memory databases and other local-first tech that consumes more resources?
1 Like

@snelly - thanks I know one of them hits 400GBs I couldn’t recall which

@cornchip The Ultra is out because I really do want a laptop. 128GB - the price is insane. I might need to sell a child to afford that.

96GB would mean lower bandwidth and fewer GPUs. It was this tradeoff I was trying to understand. If more memory means being able to use bigger models and have a slower run time that might be ok.

What I can’t tell - do LLMs stress the GPUs enough to care about memory bandwidth? or is it all about the amount of RAM it can use?

My challenge - I can see that using LLMs is going become very useful in the next few years. But I don’t know enough to really understand what I’m jumping into yet.

Other RAM use? My 16GB - 13" M1 MBP - has only green memory pressure right now. Worst case I sometimes hit yellow. If it weren’t for LLMs I would get 32/36GBs and be fine.

The constraint order if you’re building a PC is is GPU memory (VRAM), then your RAM sticks, then memory bandwidth, then GPU performance. In practice high VRAM also means high memory bandwidth.

For Apple Silicon that simplifies to enough memory to fit the model, then bandwidth/GPU cores which are correlated on the M3 Max.

You’d take a bigger performance hit from swapping virtual memory than you would dropping to 300GBPS/30 cores from 400/40. So fitting the model you want to run is important. Right now could run the 70B llama model (what you want to be running) on 64GB. That makes 64GB 16/40 the sweet spot given your three options you presented us. That’s for today’s work, so not a future-proof recommendation. You could trade this in when an 800GBPS laptop becomes available (will be when 128GB RAM is also cheaper, too.)

I don’t mean to push towards overspending, but you seriously could use half a terabyte of RAM on a purchase meant to last a few years if you want to fully explore where all this is going. (I could not afford this, to be clear!)

3 Likes

The goal is to keep a machine for 4-5 yrs. This will be the most I’ve ever spent on a computer.

Thanks for confirming the 70B Llama model will run with 64gb. The impression I get here and elsewhere is that 96gb really isn’t future proofing me much more.

1 Like

Thanks for helping me spend a shockingly large sum of money.

1 Like

Haha. I might have done it to my own budget in the process of answering your question. Still time to save myself!

Hope you’ll post about your experiments as you go. It’s neat that we have this new category of big programs to run.

Would be interested to know your thoughts about your purchase and use cases since it have been a couple of years. You might have commented in another thread.

Ok - I just did some playing. FWIW I’ve not used LMStudio with ClaudeCode - since ClaudeCode especially Opus is leaps ahead when it comes to writing code.

Also LMStudio doesn’t seem to have ready access to the internet and some of what I do with Claude is to help make decisions that I ground with blog posts, etc. For example, upcoming vacations were planned in part with Claude.

That being said, LMStudio is still a good tool. Just to run an experiment, I took a Systems Thinking problem that I setup: Systems Thinking with GenAI: Solve Deep Team Problems and ran it through LMStudio using two different models:

  • zai-org/glm-4.7-flash - in Thinking Mode. ~40 seconds. 50 tokens/second - 2124 tokens used. It’s first question was on the money.
  • gpt-oss-20b - High Reasoning Mode - 65 tokens/second - otherwise difficult to compare since it is asking more questions. This is the preferred route in Systems Thinking.

I have some hope for Small Language Models, models that are designed for a specific problem set and not general purpose. For example, coding TypeScript and tool usage, not general question answering. However, I have no idea when these will appear.

Simon Wilson recently commented that his next computer would have 128GB of RAM, so I assume that is the target - in 3yrs time for me.

1 Like

Thanks for the response. I am looking at 64gb up from 32. I do wonder if in 3 years the models will become more efficient for local usage. In any case, I am specing out my closer to 5 year MacBook Pro. It will be way more than my present needs but want to have some measure of future proofing (which is probably impossible).

1 Like

We’re all guessing. Do you write software and push the envelope? Your LLM might need more RAM.

Are you normal power user? 64GB will be plenty.

Also currently to get the best use out of local model, you really need to be prepared to play with the tools.

Just a normal user. Still playing around productively with Claude.

1 Like