4

I am working with some NetFPGA-SUME boards and I have an issue that is bothering me. This board contains a Xilinx Virtex-7 690T FPGA, a part that loads a configuration file into internal configuration RAM.

When the FPGA loads the configuration file, the FPGA works all right, but when I leave the board turned on for a while (let's say, a whole night) and use the board, the FPGA seems to lose its configuration and I have to reload the configuration file and restart the computer.

Example: I program the FPGA with a NIC design, I ping other machine and it sends the message. But when I come back the following day and I ping the same machine, it is not send by the NIC.

I don't know if it is a normal behavior of an FPGA. Is it possible that I forgot to configure something on the board? I don't thing the FPGA is defective, as I tested it on other NetFPGA-SUME boards and found the same problem.

Has anybody any insight in what could be happening?

Thank you

Edit: as some people talked about it in the comments and answers, I have also tried to load the bitfile from one of the flash files the board has. The result is the same, and I have to turn off the system so the bitfile is loaded again from flash.

Edit 2: due to one of the comments, I add this piece of information. I have tested the boards with two different power supplies and hosts, with the same exact result.

TonyM
  • 21,742
  • 4
  • 39
  • 62
anmomu
  • 161
  • 1
  • 6
  • 3
    Unstable power? – Eugene Sh. Oct 29 '21 at 17:03
  • Maybe the FPGA bitstream was never stored in EEPROM config memory? Xilinx Virtex-7 FPGA chip itself does not store its own configuration. At power-on, depending on its Configuration Mode Pins, either the FPGA reads the bitstream from an external memory (probably a N25Q128A), or the FPGA waits for some other device to send it a bitstream through JTAG. See [Xilinx Virtex-7 FPGA Configuration UG470](https://www.xilinx.com/content/dam/xilinx/support/documentation/user_guides/ug470_7Series_Config.pdf), chapter 2. – MarkU Oct 29 '21 at 17:21
  • @EugeneSh. It is connected as indicated in the reference manual. It is connected to the host PC via PCIe and a 2x4 connector to the power supply. – anmomu Oct 29 '21 at 17:30
  • @MarkU I forgot to mention that (I'll edit the question to include it). I also tried to load the bitstream from one of the flash memories on the board, but the issue persists. Anyways, the idea is to leave the board plugged in for long periods of time – anmomu Oct 29 '21 at 17:34
  • @anmomu I'm assuming you are using the same power supply for all the boards you tried. Together with the board design itself, that seems to be your common factors. I don't imagine that your long-delayed next communication attempt (ping) would have some new reason to behave differently just because of the stretched out time delay between pings. Though that seems to be the only remaining question -- is there something about the time duration between pings that lead to different behaviors. Do you see any other factors to consider? – jonk Oct 29 '21 at 17:37
  • Does the board have an LED connected to the FPGA's configuration DONE pin? If so, when your board stops working does the LED show that the FPGA is still configured or not? – user4574 Oct 29 '21 at 17:49
  • @jonk I have tested it on two different power supplies. A 850W power supply from the hosts where I am working now, and another power supply (I don't know exactly what it is) from a high-performance server. It lost its configuration in both cases. I can't think now of something happening during the time duration between pings. The only thing that could come to my mind is something with the Host itself (the operating system maybe?) – anmomu Oct 29 '21 at 17:50
  • All of the more recent Xilinx FPGAs have hardware that constantly monitors the configuration bits in config RAM and verifies a CRC of those bits. If any configuration bits change due to unstable power, out of tolerance core power, or EMI from adjacent parts, then the part will reset. – user4574 Oct 29 '21 at 17:53
  • @anmomu Well, you need to set aside everything else and just lay out all of the logical possibilities. Now that you've mentioned "the Host" I would put that also into the list. But I'd still keep on the table that something in the FPGA itself is "acting up." It appears that user4574 has brought up yet another possible factor (which may be categorized as "acting up.") – jonk Oct 29 '21 at 17:54
  • @anmomu You can't know what you don't know, so you won't be able to make a complete list because that would require comprehensive knowledge (God like) that I'm sure you don't possess. But make the list as complete as you possibly can and keep expanding it when you come up with another idea. – jonk Oct 29 '21 at 17:55
  • @user4574 Yes, the board has LED indicating the DONE. I didn't though about that LED, so I forgot to check it when the issue happend. I'll check it on tuesday when I have access to the boards again. Thank you. – anmomu Oct 29 '21 at 18:00
  • @jonk Ok, I really appreciate the advice. I've been working the issue for a while, but I still have to do my best – anmomu Oct 29 '21 at 18:04
  • Which FPGA family and part is this? It depends on what you're using. Some are RAM-based, some Flash-based, some both on one chip CPLD-style etc. – TonyM Oct 29 '21 at 18:17
  • Noise getting on the PROG pin? This will cause the FPGA to de-configure. – hacktastical Oct 29 '21 at 18:52
  • This may be painfully obvious, but are you using evaluation license in your design? They will time out and it will look like your system is dead but really perhaps your MAC ip timed out – johnnymopo Nov 08 '21 at 04:03
  • Are you grounding yourself and the board? Maybe an electrostatic discharge (ESD) messes with your reset/prog button? – IljaBek May 15 '23 at 19:39

1 Answers1

0

This sounds like unstable power supply, but it could be something else, eg a glitch on the reset pin if the chip has one.

FPGA devices usually do not have non-volatile memory for the bitstream. Note that some do have it however. When you program it via PC the program goes directly to RAM.

This means that program will run until power cycle or some other reset.

In order to retain the program after reset, you would normally add an external memory (EEPROM or Flash) that is compatible with the FPGA and put program there.

In the datasheet you could probably find some suggestions for the external memory. The FPGA loads data from external memory automatically on power-up, that is why it needs to be “compatible”.

Marko Gulin
  • 1,531
  • 2
  • 17
  • 32
  • Yes, I tried to load it from an on-board flash memory, but the issue persists. I'll check what you said about the power cycle or other resets. As I don't know much about electricity/electronics, I'm quite lost in that topic. Thank you – anmomu Oct 29 '21 at 17:36
  • Are you sure the program gets loaded from the Flash? Have you tested it? Even if you confirm that program gets loaded, you need to debug why does it reset. If you have an oscilloscope, you could monitor the supply rail and set a single trigger. Monitor also a (dummy) digital output, in that way you can see what happens with FPGA when supply voltage dips. – Marko Gulin Oct 29 '21 at 17:38
  • Yes, I turn off the system, turn it on again, and I can ping between hosts. – anmomu Oct 29 '21 at 17:42
  • Are you sure that program gets erased? Is it possible that you have a bug in your program? Eg a communication driver stucks after some time.. – Marko Gulin Oct 29 '21 at 17:44
  • That the program gets erased I cannot confirm (I don't know how, yet). I have also thought that maybe the problem is in the driver. I have used the default ones provided (sume_riffa) by the open source project I'm working with, so I haven't touched the driver. – anmomu Oct 29 '21 at 17:56
  • Do you have an LED on the board that you could control with the FPGA? If yes, add a small program that will toggle LED at some frequency, eg 1 Hz. That should be fairly easy to program, there are a lot of examples online. – Marko Gulin Oct 29 '21 at 17:58