39

I've got into a discussion in the comments of https://security.stackexchange.com/questions/109199/is-physical-security-less-important-now-for-securing-a-server?noredirect=1#comment194327_109199

The question is simple. Has anyone experience of successfully hotplugging a PCIe card? Does it require special motherboards and cards, or is it supposed to work on all consumer hardware?

pjc50
  • 46,540
  • 4
  • 64
  • 126
  • 2
    The answer should be two-fold. Both hardware and software (its drivers) should support hot-plugging. – jippie Dec 31 '15 at 14:26
  • 1
    I dont know if this helps, but I just successfully removed the second passed-through GPU from a kvm windows machine without affecting the first gpu (the screen just flickered for a second). – feedc0de Jan 20 '18 at 19:57

4 Answers4

67

I used to design PCI-Express hardware that required full hot-plug support in hardware and software, and it certainly is possible, but it's quite involved and requires extensive software support -- the hardware is actually quite simple. I had to design the hardware, then implement BIOS (UEFI) and kernel (Linux) support for hot-plugging arbitrary PCIe devices over fiber and copper.

From a software point of view, one must remember that PCIe continues with the PCI software model, including the concepts of bus, device, function addressing. When the PCI bus is enumerated, it's done as a breadth-first search: PCI Bus topology from tldp.org

PCIe enumeration is generally done twice. First, your BIOS (UEFI or otherwise) will do it, to figure out who's present and how much memory they need. This data can then be passed on to the host OS who can take it as-is, but Linux and Windows often perform their own enumeration procedure as well. On Linux, this is done through the core PCI subsystem, which searches the bus, applies any quirks if necessary based on the ID of the device, and then loads a driver who has a matching ID in its probe function. A PCI device is ID'd through a combination of it's Vendor ID (16-bits, e.g. Intel is 0x8086) and Device ID (another 16-bits) -- the most common internet source is the PCI ID Repository.

The custom software part comes in during this enumeration process and that is you must reserve ahead of time PCI Bus numbers, and memory segments for potential future devices -- this is sometimes called 'bus padding'. This avoids the need to re-enumerate the bus in the future which can often not be done without disruption to the system. A PCI device has BARs (base address registers) which request to the host how much and what type (memory or I/O space) memory the device needs -- this is why you don't need jumpers like ISA anymore :) Likewise, the Linux kernel implements PCIe hotplug through the pciehp driver. Windows does different things based on the version -- older versions (I think XP) ignore anything the BIOS says and does it's own probing. Newer versions I believe are more respectful of the ACPI DSDT provided by the host firmware (BIOS/EFI) and will incorporate that information.

This may seem pretty involved and it is! But remember that any laptop / device with an ExpressCard slot (that implements PCIe as you can have USB-only ExpressCards) must do this, though generally the padding is pretty simple -- just one bus. My old hardware used to be a PCIe switch that had another 8 devices behind it, so padding got somewhat more complicated.

From a hardware point of view, it's a lot easier. GND pins of the card make contact first, and we'd place a hot-swap controller IC from LTC or similar on the card to sequence power once the connection is made. At this point, the on-board ASIC or FPGA begins it's power-up sequence, and starts to attempt link-training its PCI Express link. Assuming the host supports hot-plugging and the PCI Express SLTCAP/SLTCTRL register (in spec: PCI Express Slot Capability Register, PCI Express Slot Control Register. There is a 1 and 2 for this as well -- enough bits to split across two regs). for that port was configured to indicate the port is hot-plug capable, the software can begin to enumerate the new device. The slot status (SLTSTA, PCI Express Slot Status Register) register contains bits that the target device can set indicating power faults, mechanical release latch, and of course presence detect + presence changed.

The aforementioned registers are located in 'PCI (Express) Configuration Space', which is a small region of the memory map (4K for PCIe) allocated to each potential bdf (bus:device:function). The actual registers generally reside on the peripheral device.

On the host side, we can use PRSNT1#/PRSNT2# as simple DC signals that feed the enable of a power switch IC, or run to GPIO on the chipset / PCH to cause an IRQ and trigger a SW 'hey, something got inserted, go find it and configure it!' routine.

This is a lot of information that doesn't directly answer your question (see below for the quick summary), but hopefully it gives you a better background in understanding the process. If you have any questions about specific parts of the process, let me know in a comment here or shoot me an email and I can discuss further + update this answer with that info.

To summarize -- the peripheral device must have been designed with hot-plug support in mind from a hardware POV. A properly designed host / slot is hot-plug capable as well, and on a high-end motherboard I would expect it to be safe. However, the software support for this is another question entirely and you are unfortunately beholden to the BIOS your OEM has supplied you.

In practice, you use this technology anytime you remove/insert a PCIe ExpressCard from a computer. Additionally, high-performance blade systems (telecom or otherwise) utilize this technology regularly as well.

Final comment -- save the PDF that was linked of the Base Spec, PCI-SIG usually charges bucks for that :)

Cole Tobin
  • 113
  • 1
  • 8
Krunal Desai
  • 6,246
  • 1
  • 21
  • 32
  • 6
    And to top-off the security discussion, with a relatively cheap FPGA (like a Cyclone IV GX) acting as a PCIe device, your host machine is *done* -- the FPGA can perform whatever DMA actions it wants. – Krunal Desai Dec 31 '15 at 17:59
  • 1
    Great explanation. What happens when a Hot-Plug capable PCIe card gets swapped? On one hand, the OS **must** enumerate the PCIe topology again, seeing that a new device was inserted (it can't predict size of BARs/ amount of Buses that might be requested by the newly inserted device), but on the other hand - re-enumerating the system might not be possible without affecting the resources that were already assigned to existing devices in the topology... – so.very.tired Jun 12 '16 at 10:28
  • 2
    Yep, it gets tricky. So using ExpressCard (EC) as an example, one way I did it was to 'pad' the number of busses to support adding a device that might branch to even more devices; most BIOSes with a simple EC slot just pad it by one bus number (we used that slot to expand to many PCIe devices). Likewise, you can 'pad' the memory range possible for assignment there to support a variety of devices with a contiguous address range, same with IRQs. The OS (with/without ACPI) can then do what it will. It's actually "simple", but the complexities of SW layers in a modern machine makes it harder. – Krunal Desai Jun 13 '16 at 16:52
  • 1
    Isn't PCIe enumeration actually a depth-first search? The base and limit registers are set up such that all devices below a given port must be enumerated before moving to the next port. – alex.forencich Jan 19 '19 at 03:23
  • I have a related question. Imagine a PCIe endpoint based on a FPGA plugged in a Windows system. Originally the FPGA is loaded from PROM, the system starts up, enumeration works and everything is fine. Then we reload the FPGA with a new bitstream, however the new bitsrteam contains exactly the same PCIe endpoint. Will it be enough to save the configuration space of the original PCIe and copy it into the new one to make it work without having to reboot the PC or having to implement full support for hot plugging? Thanks. – mbmsv Feb 09 '21 at 22:32
14

Provided the power state monitoring connections have been exposed to the connector by the upstream switch, and the pluggable unit has exposed these pins and is configured to use them properly and (as Jippie notes) the software can detect the hotplug event and respond properly, the answer is yes.

Generally, this capability is primarily used in server farms and data centres for hotplugging PCIe disks among other things; I am not sure that consumer equipment will be fully hotplug capable (it is, I understand, optional in the specification).

Keep in mind that providing the necessary hardware to support hotplug costs money (although the majority is within the PCIe endpoint, it still has to be set up, usually via an eeprom), it will not usually be offered in a price sensitive market.

Note that dynamically updating the PCI address map adds significant complexity to the PCI(e) driver; if a new device is inserted, then it has to be mapped into whatever bus it lives on, with the associated new address translations, but if a device is removed and then replaed with something different, it makes keeping track of PCI space addresses quite complex.

Without this complexity, the PCI subsystem is scanned once (at system reset) and remains static; no further effort required.

Here is the PCIe v3.0 Base Spec; see section 6.7 (page 514) on Hot Plug support. An example of a PCIe card which does support hot-plug can be seen here, courtesy of iocrest. It can be clearly seen that the shorter connector trace is routed: 2-port SATA III (6G) PCI-e Controller Card, Marvell 88SE9120 Chipset

However on this Axxon card, the shorter trace can clearly be seen routed to the adjacent one. On a physical level alone, this card cannot support hot-plug: MAP/950 1 RS232 Serial Port I/O Card for PCI Express (PCIe)

Cole Tobin
  • 113
  • 1
  • 8
Peter Smith
  • 21,923
  • 1
  • 29
  • 64
  • The original Base Spec link is dead, but here's a snapshot: https://web.archive.org/web/20180517113027/http://composter.com.ua/documents/PCI_Express_Base_Specification_Revision_3.0.pdf – rkagerer Apr 01 '20 at 05:58
3

It is supposed to work on all PCIe compliant hardware, whether all consumer hardware is truly compliant is a good question, as I am not deep into PCIe spec to know about testing requirements and even then, do all retailers check the validity of the claim? I think hardly any do.

Much like the whole safety standards thing. Half (<-hyperbole?) the EE labels we have you can claim compatibility with, without having to have everything you make tested. Since hotplug stuff isn't life threatening I can't imagine people being more strict about it.

I, for one, have never tried it and seeing as my Clevo Laptop drove the desktop entirely out of my house, I'm not about to try it, since the GPU module in my laptop claims no hotplug capability and is too expensive without being Dave Jones and getting $$$ for the vid of an exploding GPU.

Asmyldof
  • 18,299
  • 2
  • 33
  • 53
1

Yes, it works. I was able to get it working to hotplug a router chassis linecard (containing 10+ PCIe devices). The chassis have 16 hotplug-able cards. Any of the card can be plugged in or out randomly at run time without affecting the traffic operations on the other cards.

The complexity to make it work depends on the CPU environment. On an embedded CPU, the work is simply setting up static resources map and handling the connection change events by attaching and detaching PCI devices. On x86, it is much more involved because of the complexity in error handling and BIOS/OS interactions.

xzhu70
  • 11
  • 1