initiate the 10G protocol to initiate the link between the FPGA and PC to achieve successful data streaming without data loss.
There's not much to "initiate" at the link level: as soon as the MAC IP you're using initializes the link, you're ready to use it. The IP will have some signal(s) that indicate the link status.
Once the link is up, the FPGA should initialize a link-local IP address; for IPV4 that would be in the 169.254/16 subnet and start sending the data using the UDP protocol. In the simplest case of a point-to-point link, the data can be sent to the broadcast subnet, i.e. 169.254.255.255. You can select some unused port to send the data to - ideally it should be a port >= 1024 so that unprivileged applications could bind to it.
The application that is to receive the data needs to bind a UDP socket, at the chosen port, to the link-local IP address of the interface (network card) used to connect to your FPGA-based device. You'll need to enumerate the network interfaces and choose the appropriate one. Do offer the user a drop-down list if there's more than one interface with a link-local address.
The application will now be receiving the broadcast messages from the FPGA. But efficient reception of 10G traffic without dropping any packets requires care. The core requirement is to let the hardware fill your receive buffers, without the involvement of the CPU, and thus without thrashing the caches as well. The job of the buffer filling should be done by the motherboard chipset acting on the PCIe transactions generated by the network card. Common operating systems provide a mechanism to exploit that.
On Windows, you'll have to use overlapped zero-copy I/O (Registered Input/Output (RIO) API) so that the network card writes the packet contents directly to the buffers you provide - several buffers have to be preallocated ahead of time for that, so that you don't starve the network card of buffers. The operating system's job will then only be to signal to your application that the buffers have been filled.
On Linux, you'd use io_uring
(available since kernel 5.1) or similar - see this Q&A for equivalent APIs on Linux. The use of io_uring
is essential to minimize the "long tail" of packet loss (i.e. the rare packet drops). Using Linux's usual Asynchronous I/O (AIO) is necessary, but the overhead of shuffling packets between the kernel and userspace occasionally drops packets that would otherwise not be if using io_uring
, especially if the kernel misjudges the necessary number of buffers to submit to the network card to keep it streaming without overflows (packet drops).