You have a small problem: Between the count that you need to put into your packet and that packet actually being sent, other (payload) packets might be sent. You need to give your status packets some kind of priority, or let them parse the queue (if any).
Architecturally: if you have TX queues, where you can insert packets at the head (or if you can afford a second queue just for these kinds of packets, which always gets emptied first):
Instead of "once every second", I'd go with "once every N packets", per DMAC. (You're increasing an entry in some table synchronously, anyway, so that's a pretty easy to check for condition. "All" you'd have to do then is insert a packet head-of-queue in the single queue case, or just anywhere in the queue with a dedicated status message queue.)
The solution will be implemented in and FPGA, and placed close to the MAC, so the statistics packets can be removed at low level, so higher level protocols is not to consider this addition.
Why, though? This kind of thing happens at low rates, and it probably has higher-layer consequences. Sounds like something you should ideally do in software, e.g., in a small softcore or a host OS. You could make your FPGA just detect that packet, and append the actual number of received packets to it – then the software can read both the transmitted "should-be" and the received "is" value, and can report, take corrective action or reset the link, for example.
Because ethernet doesn't have sequence numbers in itself, and because a network card couldn't on its own do anything with that knowledge, I'm not aware of any low-level protocols for this kind of information exchange. Candidates include the PPP-include Link Quality Report (but yours doesn't sound like a use case for this), or simply one of the reserved types in ICMPv6.
For an easy implementation: use your own Ethertype field value in the Ethernet header, put in the current frame count, or even leave it out, if it's implicit by sending these packets every N frames, have your receiver handle these packets by appending the receive frame count (or just replacing the count with the difference!), and then handle said frame in software.
If you happen to have an ARP table on your sender, I'd skip all that and simply craft an UDP packet to the right IP address, and just handle that. Matching Ethertype == 0x86DD or 0x0800 (IPv6 or IPv4), IP proto number == 0x11 (UDP), Destport == 0x… … at fixed positions literally requires a 5-octet fixed-value fixed-position comparison, so it's really cheap to do in your destination FPGA.