Sounds like a multi-user situation where all the cores you can muster will get put to use. And if these data scientists are doing so-called Big Data, even more so.
What will you actually run on it? How many users will access it? This will give you a better idea of the processor and memory you need.
A Mac Studio may be overkill given that an M1 Mac Mini with 8gb of memory can transcode and serve 6 x 4K streams comfortably in Plex which isn’t even coded for Apple Silicon
We have around 500 students on the degree, and it will be available to them in a lab to use when they want. We’ll be doing machine learning primarily using the Metal API to run TensorFlow models on the GPU. There are around 30-40 in a class, so that will be the peak number of concurrent users normally.
I have a M1 Mini at home, and it is not powerful enough for ML. The GPU is not sufficient, and that’s only with me using it.
Students will be practicing big data algorithms in Spark but we won’t have the storage for real big data analysis. They’ll be able to use the CPU to simulate a cluster, and we want to practice some intensive algorithms.
They will also be rendering real-time 3D visualisations, practicing data streaming with Kafka, developing computer vision applications and natural language processing.
That’s quite a lot so you need a 6 bay with extension module or an 8 bay with larger drives.
Take one that allows SSD for caching and enough memory to run VMs. Start looking at something like the Synology DS1621XS+ and up or downscale as needed. It can facilitate lots of storage and runs VMs and Docker containers well
I have a used 2019 Mac Pro coming this week myself: a more modest 8 core, 48GB RAM, single W5700. It’ll be seeing me through the next several months running a set of gnarly local dev environments that use virtualization unsupported by Apple Silicon. It turns out that this configuration is the cheapest way to run these environments constantly without losing time to thermal issues and paging. It should resell well enough once we’re moved onto more portable containers.
I just did a machine learning analysis today with 4TB of data using TensorFlow and Metal and I managed to max both the GPUs. They crunch data at a rate that I’ve never seen before, performance is ridiculous.