As a developer I seen a lot of misconceptions as to what the more to Arm64 cpus will mean for the mac. So if eel maybe I should share some of my views/expectations of this before we come to WWDC this year (and maybe the start of our migration).
So here are a few areas I think need addressing:
Will moving to ARM make it easier for developers to adopt Catalyst
No, Catalyst is completely tangential to the cpu instruction set.
Some people will think if macOS is on ARM that means developers don’t need to port their iOS apps to x86 to use Catalyst. However all iOS/iPadOS apps have been able to run on x86 since before the iPhone shipped, as part of our every day development cycle for iOS/iPadOS apps we build x86 versions that is just how the Xcode simulator works, it is a version of iOS that is built and runs on x86. So moving macs to Arm has not affect whatsoever on the difficulty of shipping an app with Catalyst.
Will it be difficult to port applications from x86-64 to Arm64
In comparison to the PPC translation the move from x86-64 to Arm-64 will be a lot simpler for a few reasons:
-
We all use the same compiler/tooling:
In the PPC days there were many compilers and build chains in use.
When the intel translation happened only some of these were made compatible so many developers needed to switch build-chain. Changing compiler and build-chain can be a complex task that is prone to produce lots of bugs.
However today we all use LLVM, on-top of this we all tend to use a recent version of Xcode. So the efforts to move to the new version of Xcode that will support ARM compilation for macOS will be minimal in comparison.
-
There just are not that many legacy applications for macOS any more.
10.15 dropped support for 32bit applications so developers that were unable (for technical or market reasons) to recompile for 64bit are not longer on the platform. What this has done is cleared the slate in advance of the migration to ARM. It means in effect that all applications that run on macOS today are built with recent compilers and will (as long as the developers have source code access) recompile to ARM.
-
Arm64 is much closer to x86-64 than intel was to PPC
Key on this one is Endianness, this is how a cpu choses to represent a number. its like reading a word from right to left or left to right (or for some cpus both… ).
x86 and Arm64 both have the same endianness. This means that they both represent numbers in the same way in memory.
However PPC has a different endianness this means applications written for PPC commonly expected numbers to be write down the other way around in the system memory. Compiles are good at abstracting this however there is one little thing that can be a big pain and that is when you save data to disk, sometimes it is just simpler to take the raw data you have in memory and dump it to disk, then you can read it back into memory and use it very quickly. however if you suddenly read back data that was saved on a PPC into memory on a x86 cpu all the values are wrong! you need to write a lot of code to convert all the data that users have already saved just so you can read it on the new system, and you need to test, test test that to find all the bugs!
Will we we see a Just-In-Time emulation layer
There as been a lot of assumptions that any form of system to let us use existing x86 applications will be slow. Im of 2 minds here, i think apple would be better of to not provide any compatibility layer as it will push developers to migrate, but I also see that this could have some backlash from users of small tools were the developers are not longer invested in the platform.
That said if we do see a compatibility layer I do not expect it to just be an out-right copy of the roseta style system. Since the PPC move compiler tec has moved on a lot, but also apple have put in more restrictions
applications that have lead to a much more predictable execution pathway for most applications that users run.
These days with Hardened runtime applications (required for notarisation so required for 10.16) applications much declare in there app manifest that they require the JIT evaluation rights. Most applications dont need this so dont declare it.
Applications that dont use JIT evaluation just run the code they are unable to modify this at runtime (or otherwise). For these applications you can convert them to ARM (or just LLVM bytecode) staticky rather than at runtime. This is called lifting
and is used by companies when they need to patch/port an old application were they have lost the source code (think bank/airline etc). The idea is you read the binnary on disk and convert it to LLVM bytecode, as LLVM bytecode you can run optimisations on it before compiling it back down to machine code. This is a process that takes time, a small app will take maybe 10 minutes a large application would take much longer. But once complete you end up with a fully native, optimised binary that can run just like any other binary.
Some applications do still need JIT based evaluation, for these applications apple can still use the above mentioned lifting for the majority of code that is not JIT based but when the application triggers a JIT jump to read-write memory it can switch to an on-the-fly roseta style emulation. This will mean that for these apps their will be a performance impact for the JIT based parts of the app.
Will we see an ARM macPro soon?
I have seen a lot of suggestions that it will be a long long time before apple could produce an ARM cpu that would be usefull in a macPro. however firstly i think for the transition to work apple need to move fast so that developers feel the fire under our bums.
But also moving to Arm on the mac pro makes more sense than moving something like the iMac. There are a few features that servers grade ARM cpus already have that are very compelling for the macPro use-case:
- PCIe 5 (the current mac pro just has PCIe 3) it will be at least 8 years before intel get to PCIe 5, PCIe5 is 4 times faster than PCIe 3
- PCIe bandwidth, there are ARM server cpus with over 128PCIe gen 5 lanes, what this means for the macPro is every single PCIe slot can provide full speed at once.
- DDR6 this will more than double memory speed, memory speed is one of the largest factors when it comes to the compute performance for large workloads, intel will be at least 6 years away from this
- Much better high core count scaling, one of the big issues that Xeon cpus have is if they are under heavy load you can max out the power/thermals when your just using 12Cors even if the cpu has 48 cors for heavy workloads these are not of much use at all. ARM cpus that are currently deployed in data centers seem to scale a lot better on this metric.
Note none of the above have anything to do with the instruction set, it is all about the efforts that ARM (the company) have put into supporting new technologies, and how slow intel are at adopting these.
And when it comes to if you can build a large ARM cpus, there are already ARM cpus with 64cores in data-centers today that out-perform their xeon competitors.