I can try to summarize my understanding of why PPC -> intel was much more complex a transaltion layer:
Endianness, PPC used Bi-endianness and x86 (32bit and 64bit) uses just little-endianness
This describes how you store a number in memory/registers on the cpu.
PPC -> Intel 32
PPC systems can store the number 3 either as
0000000000000011 or as 1100000000000000
(there might be more or less 0 on each end depending on the if the number is 16bit or 32bit or 64bit)
In PPC instruction sets (this depends on the cpu a little) you can work with both of these ways of writing a number.
However for Intel instructions only work on little-endian. you can convert big endian numbers to little endian but that takes an extra instruction and you can also convert back to big endian.
So you have a choice:
- simple emulator: if the PPC instruction expects Big endian, you create 3+ instructions: convert inputs into little-endian, run x86 little endian instruction, convert outputs back into big endian
- complex emulator that tracks the endianness of data: if a PPC instruction expects Big endian, check if the current value of its inputs have already been converted to little, if not convert and update the tracking info, run x86 then label the outputs as already being little so later when you work on them you don’t need to convert…
This is all very complex logic and all has a large performance hit. It also gets compounded by the fact that to do these conversions you might need to copy values around and that will use up limited cpu register space.
x86-64 -> Arm64
Both systems use little-endian so you can treat numbers exactly the same no need to add conversions or attempt to track things.
Number of cpu registers
Cpu registers are little locations within the cpu core were you can save data that you are working on very quickly compared to reading and writing all the way to system ram.
Every cpu architecture has a different number of cpu registers. When an application is compiled the compiler will look at the code and attempt to ensure numbers that are just used localing within a small portion of the code do not go to system ram but rather just get saved to a register to be used a few instructions later. If you need to go to system memory every time you use a number etc the system becomes very slow as it is just waiting for memory to response.
these are normally broken down into
pointer registers and
floating point (FP) registers.
FloatingPoint registers are used to save numbers you are working on, like if you are summing up an array of numbers your working sum will be saved into one of these FP regiersse
Pointer registers are used to save pointers to other cpu instructions (that you can jump to) and or pointers to data in the main system memory.
PPC -> Intel 32
PPC has 32 pointer registers and 32 Floating Point registers
intel has 8 pointer registers and 8 (ish) Floating Point registers
This is a big issue for an emulation layer, when the compiler produced PPC instructions it will have assumed it have 32 places to save numbers it was working on without needing to copy these in and out of memory so the compiler will not have attempted to optimise what is copied in and out of memory, But when you run this on an intel 32bit cpu you only have 8 such slots so very soon you run out and you need to start moving values in and out of system memory… this is slow! (like very very slow) you then need to remember what values you moved to system memory so that you can move them back later (or not since intel 32 is not very good at using registers at all)…
Also it is worth looking at the issue with pointer registers these are locations in system memory that you can jump and run code from/read data from. a PPC program will have used all of these to store locations in system memory, but with intel 32 there are only 8, these are very quicky filled up so you end up creating fake ones in system memory… lots more round trips…
x86-64 -> Arm64
x86-64 has 16 pointer registers and 16 or 32 Floating Point registers
Arm64 has 31 pointer registers and 32 Floating Point registers
So you don’t need to do any of this extra copy/moving and tracking that all takes a long long time, you can directly map the x86-64 register to an Arm64 register and just use it as is.
64bit to 32bit
This describes the size of a number and pointer that can be handled by the cpu in a single instruction. Eg
add to numbers together or
point to a location in system memory.
PPC -> Intel 32
During the PPC to intel transition PPC already had 64bit support, not all applications used it but those that did produced a new class of issue for the emulation layer! As the intel cpus at the time just supported 16bit and 32bit operation and did not have any support for 64bit
This leads to some big issues:
For math what you can do is attempt to
down cast to 32bit and accept that there will be numerical output differences! This has some issues if you are reading numbers from disk/network that were saved as 64bit you cant just read them as 32bit you need to convert! hard to detect this in an emulator so most of the time like with the endianness you will have to convert from 64bit to 32bit do the work then convert back to 64bit…
Pointers are even more of an issue however… if you only have a 32bit pointer for example you an only address upto 4GB of data directly, an application written in 64bit can however address much more data than that… luckily at the time 4GB of system memory was still a pipe dream so most of these 64bit addresses were for data on disk and you can
emulate that away at an OS level.
x86-64 to ARM64
Both of these are the same so you dont have any such issues.
Going from CISC to a RISC like instruction sets is not as hard as going from a RISC to CISC.
PPC -> Intel 32
When your going from a RISC to a CISC instruction sets (PPC -> intel 32bit was very bad) you need to look ahead and combine multiple RISC instructions into one CISC instruction. This lookahead can be hard to do well as it might even require you to re-shape memory… again.
This is why the intel 32 does not have that many registers since unlike PPC most instructions in intel32 operate directly on system memory reading data from it and writing it back to system memory. The idea of CISC is you send these instructions to the cpu and the cpu internally keeps track of things and might replace them with more RISC like operations that don’t go all the way to memory but you the compiler do not control this.
x86-64 to ARM64
When you are going from CISC to RISC you can just take any given CISC instruction and break it down into a known set of RISC instructions so you don’t need any from a lookahead you should be able to break down all CISC instructions into 1 or more RISC instructions.
Note that modern x86-64 is a lot more RISC than you might expect, x86-64 introduced a lot of RISC like instructions that operate directly on cpu registers rather than always referencing memory directly. This is why dropping 32bit support makes emulation a lot simpler as it drops an entire class of nasty instructions that require round trips to system memory (that only uses a lot more power).
So these are why it is simpler to go from
x86-64 to Arm64 that is not to say it is easy and not to say there are still lots of difficulties and issues.
this lot quite long but i hope it is helpful for people to understand some of the complexities with the previous transitions. While the above applies to an emulation solution for many developers back then it also applied to when they attempted to recompile their applications, they just did not work without modification. Recompiling from x86-64 to Arm64 will be massively simpler.