|  | .. SPDX-License-Identifier: GPL-2.0 | 
|  |  | 
|  | ============ | 
|  | Introduction | 
|  | ============ | 
|  |  | 
|  | The Linux compute accelerators subsystem is designed to expose compute | 
|  | accelerators in a common way to user-space and provide a common set of | 
|  | functionality. | 
|  |  | 
|  | These devices can be either stand-alone ASICs or IP blocks inside an SoC/GPU. | 
|  | Although these devices are typically designed to accelerate | 
|  | Machine-Learning (ML) and/or Deep-Learning (DL) computations, the accel layer | 
|  | is not limited to handling these types of accelerators. | 
|  |  | 
|  | Typically, a compute accelerator will belong to one of the following | 
|  | categories: | 
|  |  | 
|  | - Edge AI - doing inference at an edge device. It can be an embedded ASIC/FPGA, | 
|  | or an IP inside a SoC (e.g. laptop web camera). These devices | 
|  | are typically configured using registers and can work with or without DMA. | 
|  |  | 
|  | - Inference data-center - single/multi user devices in a large server. This | 
|  | type of device can be stand-alone or an IP inside a SoC or a GPU. It will | 
|  | have on-board DRAM (to hold the DL topology), DMA engines and | 
|  | command submission queues (either kernel or user-space queues). | 
|  | It might also have an MMU to manage multiple users and might also enable | 
|  | virtualization (SR-IOV) to support multiple VMs on the same device. In | 
|  | addition, these devices will usually have some tools, such as profiler and | 
|  | debugger. | 
|  |  | 
|  | - Training data-center - Similar to Inference data-center cards, but typically | 
|  | have more computational power and memory b/w (e.g. HBM) and will likely have | 
|  | a method of scaling-up/out, i.e. connecting to other training cards inside | 
|  | the server or in other servers, respectively. | 
|  |  | 
|  | All these devices typically have different runtime user-space software stacks, | 
|  | that are tailored-made to their h/w. In addition, they will also probably | 
|  | include a compiler to generate programs to their custom-made computational | 
|  | engines. Typically, the common layer in user-space will be the DL frameworks, | 
|  | such as PyTorch and TensorFlow. | 
|  |  | 
|  | Sharing code with DRM | 
|  | ===================== | 
|  |  | 
|  | Because this type of devices can be an IP inside GPUs or have similar | 
|  | characteristics as those of GPUs, the accel subsystem will use the | 
|  | DRM subsystem's code and functionality. i.e. the accel core code will | 
|  | be part of the DRM subsystem and an accel device will be a new type of DRM | 
|  | device. | 
|  |  | 
|  | This will allow us to leverage the extensive DRM code-base and | 
|  | collaborate with DRM developers that have experience with this type of | 
|  | devices. In addition, new features that will be added for the accelerator | 
|  | drivers can be of use to GPU drivers as well. | 
|  |  | 
|  | Differentiation from GPUs | 
|  | ========================= | 
|  |  | 
|  | Because we want to prevent the extensive user-space graphic software stack | 
|  | from trying to use an accelerator as a GPU, the compute accelerators will be | 
|  | differentiated from GPUs by using a new major number and new device char files. | 
|  |  | 
|  | Furthermore, the drivers will be located in a separate place in the kernel | 
|  | tree - drivers/accel/. | 
|  |  | 
|  | The accelerator devices will be exposed to the user space with the dedicated | 
|  | 261 major number and will have the following convention: | 
|  |  | 
|  | - device char files - /dev/accel/accel\* | 
|  | - sysfs             - /sys/class/accel/accel\*/ | 
|  | - debugfs           - /sys/kernel/debug/accel/\*/ | 
|  |  | 
|  | Getting Started | 
|  | =============== | 
|  |  | 
|  | First, read the DRM documentation at Documentation/gpu/index.rst. | 
|  | Not only it will explain how to write a new DRM driver but it will also | 
|  | contain all the information on how to contribute, the Code Of Conduct and | 
|  | what is the coding style/documentation. All of that is the same for the | 
|  | accel subsystem. | 
|  |  | 
|  | Second, make sure the kernel is configured with CONFIG_DRM_ACCEL. | 
|  |  | 
|  | To expose your device as an accelerator, two changes are needed to | 
|  | be done in your driver (as opposed to a standard DRM driver): | 
|  |  | 
|  | - Add the DRIVER_COMPUTE_ACCEL feature flag in your drm_driver's | 
|  | driver_features field. It is important to note that this driver feature is | 
|  | mutually exclusive with DRIVER_RENDER and DRIVER_MODESET. Devices that want | 
|  | to expose both graphics and compute device char files should be handled by | 
|  | two drivers that are connected using the auxiliary bus framework. | 
|  |  | 
|  | - Change the open callback in your driver fops structure to accel_open(). | 
|  | Alternatively, your driver can use DEFINE_DRM_ACCEL_FOPS macro to easily | 
|  | set the correct function operations pointers structure. | 
|  |  | 
|  | External References | 
|  | =================== | 
|  |  | 
|  | email threads | 
|  | ------------- | 
|  |  | 
|  | * `Initial discussion on the New subsystem for acceleration devices <https://lore.kernel.org/lkml/CAFCwf11=9qpNAepL7NL+YAV_QO=Wv6pnWPhKHKAepK3fNn+2Dg@mail.gmail.com/>`_ - Oded Gabbay (2022) | 
|  | * `patch-set to add the new subsystem <https://lore.kernel.org/lkml/20221022214622.18042-1-ogabbay@kernel.org/>`_ - Oded Gabbay (2022) | 
|  |  | 
|  | Conference talks | 
|  | ---------------- | 
|  |  | 
|  | * `LPC 2022 Accelerators BOF outcomes summary <https://airlied.blogspot.com/2022/09/accelerators-bof-outcomes-summary.html>`_ - Dave Airlie (2022) |