|  | .. SPDX-License-Identifier: GPL-2.0+ | 
|  |  | 
|  | ====================================================== | 
|  | IBM Virtual Management Channel Kernel Driver (IBMVMC) | 
|  | ====================================================== | 
|  |  | 
|  | :Authors: | 
|  | Dave Engebretsen <engebret@us.ibm.com>, | 
|  | Adam Reznechek <adreznec@linux.vnet.ibm.com>, | 
|  | Steven Royer <seroyer@linux.vnet.ibm.com>, | 
|  | Bryant G. Ly <bryantly@linux.vnet.ibm.com>, | 
|  |  | 
|  | Introduction | 
|  | ============ | 
|  |  | 
|  | Note: Knowledge of virtualization technology is required to understand | 
|  | this document. | 
|  |  | 
|  | A good reference document would be: | 
|  |  | 
|  | https://openpowerfoundation.org/wp-content/uploads/2016/05/LoPAPR_DRAFT_v11_24March2016_cmt1.pdf | 
|  |  | 
|  | The Virtual Management Channel (VMC) is a logical device which provides an | 
|  | interface between the hypervisor and a management partition. This interface | 
|  | is like a message passing interface. This management partition is intended | 
|  | to provide an alternative to systems that use a Hardware Management | 
|  | Console (HMC) - based system management. | 
|  |  | 
|  | The primary hardware management solution that is developed by IBM relies | 
|  | on an appliance server named the Hardware Management Console (HMC), | 
|  | packaged as an external tower or rack-mounted personal computer. In a | 
|  | Power Systems environment, a single HMC can manage multiple POWER | 
|  | processor-based systems. | 
|  |  | 
|  | Management Application | 
|  | ---------------------- | 
|  |  | 
|  | In the management partition, a management application exists which enables | 
|  | a system administrator to configure the system’s partitioning | 
|  | characteristics via a command line interface (CLI) or Representational | 
|  | State Transfer Application (REST API's). | 
|  |  | 
|  | The management application runs on a Linux logical partition on a | 
|  | POWER8 or newer processor-based server that is virtualized by PowerVM. | 
|  | System configuration, maintenance, and control functions which | 
|  | traditionally require an HMC can be implemented in the management | 
|  | application using a combination of HMC to hypervisor interfaces and | 
|  | existing operating system methods. This tool provides a subset of the | 
|  | functions implemented by the HMC and enables basic partition configuration. | 
|  | The set of HMC to hypervisor messages supported by the management | 
|  | application component are passed to the hypervisor over a VMC interface, | 
|  | which is defined below. | 
|  |  | 
|  | The VMC enables the management partition to provide basic partitioning | 
|  | functions: | 
|  |  | 
|  | - Logical Partitioning Configuration | 
|  | - Start, and stop actions for individual partitions | 
|  | - Display of partition status | 
|  | - Management of virtual Ethernet | 
|  | - Management of virtual Storage | 
|  | - Basic system management | 
|  |  | 
|  | Virtual Management Channel (VMC) | 
|  | -------------------------------- | 
|  |  | 
|  | A logical device, called the Virtual Management Channel (VMC), is defined | 
|  | for communicating between the management application and the hypervisor. It | 
|  | basically creates the pipes that enable virtualization management | 
|  | software. This device is presented to a designated management partition as | 
|  | a virtual device. | 
|  |  | 
|  | This communication device uses Command/Response Queue (CRQ) and the | 
|  | Remote Direct Memory Access (RDMA) interfaces. A three-way handshake is | 
|  | defined that must take place to establish that both the hypervisor and | 
|  | management partition sides of the channel are running prior to | 
|  | sending/receiving any of the protocol messages. | 
|  |  | 
|  | This driver also utilizes Transport Event CRQs. CRQ messages are sent | 
|  | when the hypervisor detects one of the peer partitions has abnormally | 
|  | terminated, or one side has called H_FREE_CRQ to close their CRQ. | 
|  | Two new classes of CRQ messages are introduced for the VMC device. VMC | 
|  | Administrative messages are used for each partition using the VMC to | 
|  | communicate capabilities to their partner. HMC Interface messages are used | 
|  | for the actual flow of HMC messages between the management partition and | 
|  | the hypervisor. As most HMC messages far exceed the size of a CRQ buffer, | 
|  | a virtual DMA (RMDA) of the HMC message data is done prior to each HMC | 
|  | Interface CRQ message. Only the management partition drives RDMA | 
|  | operations; hypervisors never directly cause the movement of message data. | 
|  |  | 
|  |  | 
|  | Terminology | 
|  | ----------- | 
|  | RDMA | 
|  | Remote Direct Memory Access is DMA transfer from the server to its | 
|  | client or from the server to its partner partition. DMA refers | 
|  | to both physical I/O to and from memory operations and to memory | 
|  | to memory move operations. | 
|  | CRQ | 
|  | Command/Response Queue a facility which is used to communicate | 
|  | between partner partitions. Transport events which are signaled | 
|  | from the hypervisor to partition are also reported in this queue. | 
|  |  | 
|  | Example Management Partition VMC Driver Interface | 
|  | ================================================= | 
|  |  | 
|  | This section provides an example for the management application | 
|  | implementation where a device driver is used to interface to the VMC | 
|  | device. This driver consists of a new device, for example /dev/ibmvmc, | 
|  | which provides interfaces to open, close, read, write, and perform | 
|  | ioctl’s against the VMC device. | 
|  |  | 
|  | VMC Interface Initialization | 
|  | ---------------------------- | 
|  |  | 
|  | The device driver is responsible for initializing the VMC when the driver | 
|  | is loaded. It first creates and initializes the CRQ. Next, an exchange of | 
|  | VMC capabilities is performed to indicate the code version and number of | 
|  | resources available in both the management partition and the hypervisor. | 
|  | Finally, the hypervisor requests that the management partition create an | 
|  | initial pool of VMC buffers, one buffer for each possible HMC connection, | 
|  | which will be used for management application  session initialization. | 
|  | Prior to completion of this initialization sequence, the device returns | 
|  | EBUSY to open() calls. EIO is returned for all open() failures. | 
|  |  | 
|  | :: | 
|  |  | 
|  | Management Partition		Hypervisor | 
|  | CRQ INIT | 
|  | ----------------------------------------> | 
|  | CRQ INIT COMPLETE | 
|  | <---------------------------------------- | 
|  | CAPABILITIES | 
|  | ----------------------------------------> | 
|  | CAPABILITIES RESPONSE | 
|  | <---------------------------------------- | 
|  | ADD BUFFER (HMC IDX=0,1,..)         _ | 
|  | <----------------------------------------  | | 
|  | ADD BUFFER RESPONSE              | - Perform # HMCs Iterations | 
|  | ----------------------------------------> - | 
|  |  | 
|  | VMC Interface Open | 
|  | ------------------ | 
|  |  | 
|  | After the basic VMC channel has been initialized, an HMC session level | 
|  | connection can be established. The application layer performs an open() to | 
|  | the VMC device and executes an ioctl() against it, indicating the HMC ID | 
|  | (32 bytes of data) for this session. If the VMC device is in an invalid | 
|  | state, EIO will be returned for the ioctl(). The device driver creates a | 
|  | new HMC session value (ranging from 1 to 255) and HMC index value (starting | 
|  | at index 0 and ranging to 254) for this HMC ID. The driver then does an | 
|  | RDMA of the HMC ID to the hypervisor, and then sends an Interface Open | 
|  | message to the hypervisor to establish the session over the VMC. After the | 
|  | hypervisor receives this information, it sends Add Buffer messages to the | 
|  | management partition to seed an initial pool of buffers for the new HMC | 
|  | connection. Finally, the hypervisor sends an Interface Open Response | 
|  | message, to indicate that it is ready for normal runtime messaging. The | 
|  | following illustrates this VMC flow: | 
|  |  | 
|  | :: | 
|  |  | 
|  | Management Partition             Hypervisor | 
|  | RDMA HMC ID | 
|  | ----------------------------------------> | 
|  | Interface Open | 
|  | ----------------------------------------> | 
|  | Add Buffer                  _ | 
|  | <----------------------------------------  | | 
|  | Add Buffer Response              | - Perform N Iterations | 
|  | ----------------------------------------> - | 
|  | Interface Open Response | 
|  | <---------------------------------------- | 
|  |  | 
|  | VMC Interface Runtime | 
|  | --------------------- | 
|  |  | 
|  | During normal runtime, the management application and the hypervisor | 
|  | exchange HMC messages via the Signal VMC message and RDMA operations. When | 
|  | sending data to the hypervisor, the management application performs a | 
|  | write() to the VMC device, and the driver RDMA’s the data to the hypervisor | 
|  | and then sends a Signal Message. If a write() is attempted before VMC | 
|  | device buffers have been made available by the hypervisor, or no buffers | 
|  | are currently available, EBUSY is returned in response to the write(). A | 
|  | write() will return EIO for all other errors, such as an invalid device | 
|  | state. When the hypervisor sends a message to the management, the data is | 
|  | put into a VMC buffer and an Signal Message is sent to the VMC driver in | 
|  | the management partition. The driver RDMA’s the buffer into the partition | 
|  | and passes the data up to the appropriate management application via a | 
|  | read() to the VMC device. The read() request blocks if there is no buffer | 
|  | available to read. The management application may use select() to wait for | 
|  | the VMC device to become ready with data to read. | 
|  |  | 
|  | :: | 
|  |  | 
|  | Management Partition             Hypervisor | 
|  | MSG RDMA | 
|  | ----------------------------------------> | 
|  | SIGNAL MSG | 
|  | ----------------------------------------> | 
|  | SIGNAL MSG | 
|  | <---------------------------------------- | 
|  | MSG RDMA | 
|  | <---------------------------------------- | 
|  |  | 
|  | VMC Interface Close | 
|  | ------------------- | 
|  |  | 
|  | HMC session level connections are closed by the management partition when | 
|  | the application layer performs a close() against the device. This action | 
|  | results in an Interface Close message flowing to the hypervisor, which | 
|  | causes the session to be terminated. The device driver must free any | 
|  | storage allocated for buffers for this HMC connection. | 
|  |  | 
|  | :: | 
|  |  | 
|  | Management Partition             Hypervisor | 
|  | INTERFACE CLOSE | 
|  | ----------------------------------------> | 
|  | INTERFACE CLOSE RESPONSE | 
|  | <---------------------------------------- | 
|  |  | 
|  | Additional Information | 
|  | ====================== | 
|  |  | 
|  | For more information on the documentation for CRQ Messages, VMC Messages, | 
|  | HMC interface Buffers, and signal messages please refer to the Linux on | 
|  | Power Architecture Platform Reference. Section F. |