10Gb link data transfer between FPGA && PC

Question

I'm trying to transfer data from 10Gb ETH subsystem of FPGA to 10GB ETH card of PC. SFP, RJ45 and ETH cable comprise of this data transfer pipeline. The RTL code is ready for testing and I'm looking forward to develop a PC side application to initiate the 10G protocol to initiate the link between the FPGA and PC to acheive successful data streaming without data loss.

Can anyone help me with where do I start and what all do I need to learn and understand to solve this problem.

This is a very broad app question. Which specific piece is challenging you? The FPGA, network link and PC software are three very complex areas in their own right. — David, Jan 18 '21 at 18:16
I clearly mentioned about the specific piece that is challenging me. Please read the question again. It is not about the complex area of FPGA. — iamgr007, Jan 19 '21 at 05:23

Kuba hasn't forgotten Monica · Accepted Answer · 2021-01-18T19:55:01.527

initiate the 10G protocol to initiate the link between the FPGA and PC to achieve successful data streaming without data loss.

There's not much to "initiate" at the link level: as soon as the MAC IP you're using initializes the link, you're ready to use it. The IP will have some signal(s) that indicate the link status.

Once the link is up, the FPGA should initialize a link-local IP address; for IPV4 that would be in the 169.254/16 subnet and start sending the data using the UDP protocol. In the simplest case of a point-to-point link, the data can be sent to the broadcast subnet, i.e. 169.254.255.255. You can select some unused port to send the data to - ideally it should be a port >= 1024 so that unprivileged applications could bind to it.

The application that is to receive the data needs to bind a UDP socket, at the chosen port, to the link-local IP address of the interface (network card) used to connect to your FPGA-based device. You'll need to enumerate the network interfaces and choose the appropriate one. Do offer the user a drop-down list if there's more than one interface with a link-local address.

The application will now be receiving the broadcast messages from the FPGA. But efficient reception of 10G traffic without dropping any packets requires care. The core requirement is to let the hardware fill your receive buffers, without the involvement of the CPU, and thus without thrashing the caches as well. The job of the buffer filling should be done by the motherboard chipset acting on the PCIe transactions generated by the network card. Common operating systems provide a mechanism to exploit that.

On Windows, you'll have to use overlapped zero-copy I/O (Registered Input/Output (RIO) API) so that the network card writes the packet contents directly to the buffers you provide - several buffers have to be preallocated ahead of time for that, so that you don't starve the network card of buffers. The operating system's job will then only be to signal to your application that the buffers have been filled.

On Linux, you'd use io_uring (available since kernel 5.1) or similar - see this Q&A for equivalent APIs on Linux. The use of io_uring is essential to minimize the "long tail" of packet loss (i.e. the rare packet drops). Using Linux's usual Asynchronous I/O (AIO) is necessary, but the overhead of shuffling packets between the kernel and userspace occasionally drops packets that would otherwise not be if using io_uring, especially if the kernel misjudges the necessary number of buffers to submit to the network card to keep it streaming without overflows (packet drops).

well, true, but honestly, on a modern PC you can saturate 10 Gbase pretty well on receive even without uring magic; in fact, I know enough software that does it *purely* in userland without any special tricks. However, XDP would really make the job of that software easier. — Marcus Müller, Jan 18 '21 at 18:57
@MarcusMüller At 10G speeds, not using XDP is a premature pessimization, so even though the modern PC can cope with it, it's not nice. Zero-copy networking is usable in many more contexts than just the high-bandwidth transmissions - the APIs are somewhat unwieldy and thus people mostly don't bother. — Kuba hasn't forgotten Monica, Jan 18 '21 at 19:57
some things are older than XDP, you know ;) (and by far older than usable `io_uring`) — Marcus Müller, Jan 18 '21 at 19:58

10Gb link data transfer between FPGA && PC

1 Answers1