| ============= |
| Ring Buffer |
| ============= |
| |
| To handle communication between user space and kernel space, AMD GPUs use a |
| ring buffer design to feed the engines (GFX, Compute, SDMA, UVD, VCE, VCN, VPE, |
| etc.). See the figure below that illustrates how this communication works: |
| |
| .. kernel-figure:: ring_buffers.svg |
| |
| Ring buffers in the amdgpu work as a producer-consumer model, where userspace |
| acts as the producer, constantly filling the ring buffer with GPU commands to |
| be executed. Meanwhile, the GPU retrieves the information from the ring, parses |
| it, and distributes the specific set of instructions between the different |
| amdgpu blocks. |
| |
| Notice from the diagram that the ring has a Read Pointer (rptr), which |
| indicates where the engine is currently reading packets from the ring, and a |
| Write Pointer (wptr), which indicates how many packets software has added to |
| the ring. When the rptr and wptr are equal, the ring is idle. When software |
| adds packets to the ring, it updates the wptr, this causes the engine to start |
| fetching and processing packets. As the engine processes packets, the rptr gets |
| updates until the rptr catches up to the wptr and they are equal again. |
| |
| Usually, ring buffers in the driver have a limited size (search for occurrences |
| of `amdgpu_ring_init()`). One of the reasons for the small ring buffer size is |
| that CP (Command Processor) is capable of following addresses inserted into the |
| ring; this is illustrated in the image by the reference to the IB (Indirect |
| Buffer). The IB gives userspace the possibility to have an area in memory that |
| CP can read and feed the hardware with extra instructions. |
| |
| All ASICs pre-GFX11 use what is called a kernel queue, which means |
| the ring is allocated in kernel space and has some restrictions, such as not |
| being able to be :ref:`preempted directly by the scheduler<amdgpu-mes>`. GFX11 |
| and newer support kernel queues, but also provide a new mechanism named |
| :ref:`user queues<amdgpu-userq>`, where the queue is moved to the user space |
| and can be mapped and unmapped via the scheduler. In practice, both queues |
| insert user-space-generated GPU commands from different jobs into the requested |
| component ring. |
| |
| Enforce Isolation |
| ================= |
| |
| .. note:: After reading this section, you might want to check the |
| :ref:`Process Isolation<amdgpu-process-isolation>` page for more details. |
| |
| Before examining the Enforce Isolation mechanism in the ring buffer context, it |
| is helpful to briefly discuss how instructions from the ring buffer are |
| processed in the graphics pipeline. Let’s expand on this topic by checking the |
| diagram below that illustrates the graphics pipeline: |
| |
| .. kernel-figure:: gfx_pipeline_seq.svg |
| |
| In terms of executing instructions, the GFX pipeline follows the sequence: |
| Shader Export (SX), Geometry Engine (GE), Shader Process or Input (SPI), Scan |
| Converter (SC), Primitive Assembler (PA), and cache manipulation (which may |
| vary across ASICs). Another common way to describe the pipeline is to use Pixel |
| Shader (PS), raster, and Vertex Shader (VS) to symbolize the two shader stages. |
| Now, with this pipeline in mind, let's assume that Job B causes a hang issue, |
| but Job C's instruction might already be executing, leading developers to |
| incorrectly identify Job C as the problematic one. This problem can be |
| mitigated on multiple levels; the diagram below illustrates how to minimize |
| part of this problem: |
| |
| .. kernel-figure:: no_enforce_isolation.svg |
| |
| Note from the diagram that there is no guarantee of order or a clear separation |
| between instructions, which is not a problem most of the time, and is also good |
| for performance. Furthermore, notice some circles between jobs in the diagram |
| that represent a **fence wait** used to avoid overlapping work in the ring. At |
| the end of the fence, a cache flush occurs, ensuring that when the next job |
| starts, it begins in a clean state and, if issues arise, the developer can |
| pinpoint the problematic process more precisely. |
| |
| To increase the level of isolation between jobs, there is the "Enforce |
| Isolation" method described in the picture below: |
| |
| .. kernel-figure:: enforce_isolation.svg |
| |
| As shown in the diagram, enforcing isolation introduces ordering between |
| submissions, since the access to GFX/Compute is serialized, think about it as |
| single process at a time mode for gfx/compute. Notice that this approach has a |
| significant performance impact, as it allows only one job to submit commands at |
| a time. However, this option can help pinpoint the job that caused the problem. |
| Although enforcing isolation improves the situation, it does not fully resolve |
| the issue of precisely pinpointing bad jobs, since isolation might mask the |
| problem. In summary, identifying which job caused the issue may not be precise, |
| but enforcing isolation might help with the debugging. |
| |
| Ring Operations |
| =============== |
| |
| .. kernel-doc:: drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c |
| :internal: |
| |