The first thing to keep in mind is that a GPU renders images or frames with a huge number of pixels on the screen tens of times per second. To the point of reaching tens of millions of pixels that are drawn on a screen per second in front of our eyes.
But, before those pixels reach the screen they need to be drawn in an image buffer, which is a part of the VRAM that on the one hand stores the image already generated by the GPU and on the other hand the one that is currently generating. the GPU. This image is composed when the GPU at the end of the 3D Pipeline writes the pixels to the image buffer.
Because that huge amount of pixels has to be drawn quickly enough, the GPU memory needs to have a good bandwidth.
Pixel Shaders, ROPS and VRAM Bandwidth
The penultimate stage of the 3D pipeline are the Pixel Shaders, which always export the results of their programs to the ROPS, the only piece in the graphic pipeline with the ability to write to memory. Either to the last level cache, in the case of the newer graphic architectures, as well as to the VRAM itself, which is the general case.
GPU manufacturers always specify the number of ROPS their GPUs have, for example the NVIDIA RTX 3090 has a total of 112 ROPS, which means that it can write to the image buffer up to 112 pixels per scan cycle. clock. The GPU at its base speed works at 1395 MHz, so while it works at that speed it will write in the image buffer 156,240,000,000 pixels in total, if each of them contains 32 bits of information, 8 bits per component, on average , then a bandwidth of 625 GB / s will be required.
The enormous bandwidth that GPUs need to work has forced the implementation of image buffer compression mechanisms such as the implementation of Delta Color Compression, which compress the information, reducing the impact on bandwidth.
Tile, bandwidth and VRAM rendering
Because GPUs that render by tiles what they do is divide the image buffer into small image buffers that fit inside a GPU’s internal memory, these do not require high-bandwidth VRAM memories, which are not viable. in handheld devices like GDDR6.
This is the reason why PostPC devices that use this type of GPU such as tablets and smartphones, do not have bandwidth problems when rendering, since their GPUs render by tiles. However, the implementation of Tile Rendering translates into complicating the hardware sacrificing raw power by having a somewhat more complicated rendering pipeline.
If you combine a Tile Renderer with a high-bandwidth VRAM memory, either HBM2 or GDDR6, the result would be that one party would not allow to take full advantage of the other, since the benefits of one would outweigh those of the other .
The bandwidth required by VRAM limits PC SoCs
PC SoCs where a GPU is built on the same chip as the CPU are usually very low powered. What’s stopping Intel or AMD from launching SoCs with more powerful GPUs inside them? The reason is that CPU and GPU share the same type of memory, usually DDR or LPDDR.
Due to the limited bandwidth compared to other more specialized memory for graphics, the power of the integrated GPUs cannot be scaled further. This is the reason why video game consoles despite using SoC use memories such as GDDR5 and GDDR6, since if this memory is not used it would not be possible to integrate the powerful GPUs that they integrate.
The other side of bandwidth in VRAM, texturing
The other function where a GPU needs to use high bandwidth is in texturing, where the textures meet in the VRAM. The same number of pixels that are then going to be drawn in the image buffer have to be textured and the usual thing is that for each pixel on the screen, several requests will be made to the VRAM. That is why bandwidth is also related to texturing.
Nor can we forget that the texture units are included in the real cores of the GPU, or shader units, and therefore the bandwidth is directly related to the configuration that the GPU has. Which obviously will scale with the number of pixels that it is going to draw, that is why AMD and NVIDIA sell their cards according to the resolution at which they are going to render and in different ranges, since not all monitors on the market are the same.
This is why we do not see low-end graphics cards with the same number of high-end graphics cards, it is not just a matter of manufacturing costs.