It is relatively easy to tune the current Linux kernel via command line arguments to handle hot-added PCI devices if the expected topology is known and limited: simply reserve enough address space and bus numbers per hotplug slot, so new devices can fit in there. However, if the system demands online, substantial and frequent reshaping of a current topology, PCIe's constrained nature can apply severe restrictions. In this presentation, we'll describe how Linux can reallocate resources by pausing the affected drivers and instructing them to update the changes, even in challenging situations like hot-adding an array full of diverse devices in the middle of an existing PCIe tree. Proposed improvements allow switching device clusters via PCIe without any special prerequisites: no need to reboot the host OS, while a Hot-Plug Controller is no longer required to reserve bus numbers and memory regions by firmware/BIOS/bootloader. These device clusters can consist of bridges full of other bridges, NVME drives, GPUs, and many other devices.
Continue the conversation in Slack