The 5 nm will represent an important generational change in the case of AMD, not only will it be the deployment of its Zen 4 architecture, but also the launch of the third generation of GPUs based on the RDNA 3 architecture. At the moment we know little, but enough to get a rough idea of what AMD’s bet will be to stand up to the NVIDIA Lovelace.
The RDNA 3 architecture will not appear on the market until 2022 at the earliest and we may even have to wait for 2023, but we think it is a good idea to give you a review of what we know we can find in the next generation of AMD Radeon.
Internal changes to the RDNA 3 architecture
The changes that the Radeon Technology Group are going to make to the internal architecture of the third generation of its RDNA GPUs are going to be much deeper than the jump from RDNA to RDNA 2, which is why we have thought that it is important to collect the information through official and unofficial channels about the GPUs that will stand up to the NVIDIA Lovelace, against which it will compete.
Most of the changes that you will see below are obvious with the evolution of technology, others are to cut the advantage of NVIDIA in certain areas and others we know from patents.
RDNA 3 Ray Acceleration Units Enhancements
One of the problems that the implementation has for Ray Tracing for RDNA 2 is the fact that to traverse the BVH tree requires the use of the SIMD units in which the shaders are executed, which causes the power of the Compute Units It has to be shared between the execution of the shaders and the traversal of the data structure.
This does not happen with the NVIDIA RT Cores, but the origin of everything is that in the minimum DirectX Ray Tracing specification, the fixed function hardware is not necessary for the tour, that is why AMD has not included it in the Ray Acceleration Units by RDNA 2.
So it is a change that if or if it is going to incorporate AMD, although its bet is different today than that of NVIDIA, but the greater number of games designed to take advantage of the NVIDIA approach forces AMD to adopt it.
Tensor Cores in RDNA 3 for DirectML
One of the things that AMD has added in the CDNA architecture are the equivalent of NVIDIA’s Tensor Cores, that is, ALUs in the form of systolic arrays to speed up the execution of certain algorithms based on artificial intelligence.
This has been a disadvantage for AMD compared to NVIDIA, since thanks to these units algorithms such as DLSS can be implemented that allow rendering at a lower resolution internally and therefore gain in the rate of frames per second in games. Which is one of the two main features we buy a GPU for, aside from graphics quality.
The Tensor Cores from AMD are called Matrix Core, they work in the same way as the Tensor Core from AMD and their first iteration has a floating point rate 3.5 times higher in FP32 and 7 times higher in FP16. We do not know if for RDNA 3 we are going to see this unit improved.
Changes to the Compute Units Scheduler
The Compute Units when executing the shaders what they do is use the Round-Robin method, which consists of giving each instruction an execution time, if it is not resolved in said time due to missing data then it goes to the next one, all this to avoid that the Compute Unit stops executing, but this waiting time also translates into lost clock cycles.
That is why AMD will make a change in the GPU scheduler, which will be in charge of reordering the list of instructions to be executed from the Compute Unit, in order to further reduce downtime and increase the performance of the Compute Units. From what we can say that it will be the first implementation of an out-of-order execution system on a GPU.
Double ALUs per compute unit in RDNA 3?
One change that NVIDIA made in Turing is concurrent execution, so that the integer and floating point units can work together at the same time. In the RTX 3000 what they did was add 16 ALUs in FP32 that work in a switched way with the 16 integer ALUs, in such a way that now they can execute up to 32 simultaneous instructions per sub-core and up to 128 per SIMD unit.
NVIDIA’s change in the cores of its RTX 3000 doubles the floating point power without having to double the rest of the elements that come with the SM, which is a huge advantage for NVIDIA in the general-purpose computing algorithms of which benefits ray tracing.
Since more and more games will make use of Ray Tracing, it is clear that AMD will have to cut this advantage of NVIDIA in some way, but at the moment we do not know how they will do it, in order to cut the enormous gap in TFLOPS with Jensen’s company.
GPU by chiplets in the high-end
We know that AMD released a dual GPU in the form of a chiplet in the high-end of RDNA 3, but this does not mean that we will see chiplets in the mid and low range, where it is expected that we will continue to have monolithic chips.
In any case, if the NVIDIA rumors are true regarding the enormous number of SMs in the highest version of its NVIDIA Lovelace, then the conclusion we reached is that AMD wants to repeat the same exercise that it did with its Ryzen CPUs to starting with Zen 2, since from certain sizes the number of good chips per wafer decreases.
We must also take into account that AMD has incorporated the Infinity Cache in its GPUs from RDNA 2, which leads to a greater amount of space occupied by it. Said cache is going to become L3 cache in chiplet-based design and is very important for intercommunication between them.
GDDR6X memory as VRAM?
Despite the fact that GDDR6 is a memory with excellent performance, it has the problem that its high clock speed makes transfers have a very high energy cost, hence the development of the GDDR6X that RTX 3000 graphics cards are now using. NVIDIA.
We really don’t know if AMD is going to use this type of memory, but considering that the consumption is shared between the VRAM and the GPU itself in a zero-sum game. So it is very likely that in order to assign more power to the GPU in AMD they will opt for the GDDR6X and we may even see a generation of GPUs based on RDNA 2 with said VRAM.
In any case, we do not know how GDDR6 will evolve, since in 2022 there will be new memory manufacturing nodes available that will allow higher speeds and lower consumption, but let’s not forget that with equal bandwidth the GDDR6X consumes much less and that is why that we think AMD will adopt it too.