M3 Max Memory and Bandwidth

mlevison · November 3, 2023, 5:56pm

Snowflake problem. I suspect MPU is the wrong place to ask, but I don’t know where to go yet. I want to buy an M3 MBP and have it last 4-5yrs. (The only way I can justify the jaw dropping costs).

I need 4TB SSD - that’s easy. It’s the memory and bandwidth that are killing me. In the next few years, I will make extensive use of LLM’s on laptop. Some of this will involve rolling up my sleeves and customizing use of one of the Llama models and then running directly on my laptop. I really don’t want to spend my time and money training OpenAI’s models for them.

So I need to figure out where to spend my money. There are four configs that I could afford. (Let’s not call them practical):

M3 Max 14/30 Cores - 36GB - $5.6K, 96GB $6.6K
M3 Max 16/40 Cores - 48GB - $6.2K
M3 Max 16/40 Cores - 64GB - $6.5K
Pricing is Canadian

Am I better off with more GPU Cores and more Memory bandwidth? The 14/30 - 200GBs vs 16/40 - 300GBs.
Is the difference between 48GB and 64GB going to be noticeable? Or just an expense?

Where else would you ask? In other words where to LLM people hangout?

snelly · November 3, 2023, 6:17pm

I think the 16/40 has up to 400GBs memory bandwidth, no? Perhaps I am mistaken.

I don’t think you’re asking in the wrong place, per se. I do think that most people can probably say how much RAM they’r using. A little swap is ok, unless you’re swapping in large datasets at a time, which I’d imagine customizing a Llama model would.

I use anywhere from 48-64gb of RAM or so right now on my 64GB M1 Max machine. The 64GB of RAM is therefore mostly a nicety that keeps me from having to update the machine soon. I also think 64 is a pretty reasonable stopping point for most people. I think 48 would be impractical for most developer use cases, and 96 is probably overkill for anybody who doesn’t self-classify themselves as needing that much out the gate. (In other words, if you have to ask, you don’t need 96GB.)

I couldn’t tell you where to go to ask a better group of people this question. In my opinion (which is jut that), Macs excel at local AI work, and I’d get the 64GB Max. (Coincidentally, if I were to order that today for my own Docker-based dev and web design needs, I’d order that machine too.)

I also think the M3 Pro could work for most developers if it offered 64GB of RAM (which is probably why it doesn’t).

cornchip · November 3, 2023, 7:47pm

You can load a ~60GB llama2 model right now that would take a few minutes to generate on the 30 core GPU, so maybe 1-2 minutes less on the 40 core. We still haven’t found the limit of useful token size.

So I think you could go to 96GB or 128GB, personally. It might be worth waiting for the M3 Ultra in the Studio form factor to save money if you think you’ll do the heavy work at your desk.

You’ve also questions like:

are you going to push to use newer models or bet on efficiencies coming to existing models?
is it important to be able to have multiple models loaded?
how much of your other stuff are you willing to close to work on these? Are you bullish on in-memory databases and other local-first tech that consumes more resources?

mlevison · November 3, 2023, 8:18pm

@snelly - thanks I know one of them hits 400GBs I couldn’t recall which

@cornchip The Ultra is out because I really do want a laptop. 128GB - the price is insane. I might need to sell a child to afford that.

96GB would mean lower bandwidth and fewer GPUs. It was this tradeoff I was trying to understand. If more memory means being able to use bigger models and have a slower run time that might be ok.

What I can’t tell - do LLMs stress the GPUs enough to care about memory bandwidth? or is it all about the amount of RAM it can use?

My challenge - I can see that using LLMs is going become very useful in the next few years. But I don’t know enough to really understand what I’m jumping into yet.

Other RAM use? My 16GB - 13" M1 MBP - has only green memory pressure right now. Worst case I sometimes hit yellow. If it weren’t for LLMs I would get 32/36GBs and be fine.

cornchip · November 3, 2023, 9:43pm

The constraint order if you’re building a PC is is GPU memory (VRAM), then your RAM sticks, then memory bandwidth, then GPU performance. In practice high VRAM also means high memory bandwidth.

For Apple Silicon that simplifies to enough memory to fit the model, then bandwidth/GPU cores which are correlated on the M3 Max.

You’d take a bigger performance hit from swapping virtual memory than you would dropping to 300GBPS/30 cores from 400/40. So fitting the model you want to run is important. Right now could run the 70B llama model (what you want to be running) on 64GB. That makes 64GB 16/40 the sweet spot given your three options you presented us. That’s for today’s work, so not a future-proof recommendation. You could trade this in when an 800GBPS laptop becomes available (will be when 128GB RAM is also cheaper, too.)

I don’t mean to push towards overspending, but you seriously could use half a terabyte of RAM on a purchase meant to last a few years if you want to fully explore where all this is going. (I could not afford this, to be clear!)

mlevison · November 3, 2023, 10:16pm

The goal is to keep a machine for 4-5 yrs. This will be the most I’ve ever spent on a computer.

Thanks for confirming the 70B Llama model will run with 64gb. The impression I get here and elsewhere is that 96gb really isn’t future proofing me much more.

mlevison · November 8, 2023, 10:46pm

Thanks for helping me spend a shockingly large sum of money.

cornchip · November 8, 2023, 10:47pm

Haha. I might have done it to my own budget in the process of answering your question. Still time to save myself!

Hope you’ll post about your experiments as you go. It’s neat that we have this new category of big programs to run.