Model running slower than RTL in SystemVerilog

Question

I'm testing an RTL implementation of a certain block from a 3rd party company in SystemVerilog using Questa. The block is fairly large and my block which acts as a wrapper around it is also large. The regression suite took around 20 hours to complete. I have access to only one license and with this much time, nightly regressions are not feasible.

So I thought about making a simple model in SystemVerilog. I was able to extract the functionality based on the data-sheet provided. It was simple enough.

The I/O interfaces were AXI4-streaming and it had one AXI4-mm DRAM interface. I used associative arrays within the model to capture any previous data(i had a unique ID to do so). Any processing was done by utilizing a golden reference model (C-function provided by the company) via the dpi.

Now the regression is taking over 30 hours to complete. Clearly, I've done something very wrong and I have no idea where to begin.

Any thoughts will be appreciated.

can you tell how many times the C function is called? When you look at the C function, is it well-written and performant? Can you benchmark the time it takes to only run that function, in order to determine how much of the 30hrs are spent just running the function? If the function is slow, and often called with the same arguments, it might be wise to just write a minimal C wrapper that has a simple cache for results that have been asked for before. RAM is cheaper than licenses, so that cache can be quite large. — Marcus Müller, Mar 10 '21 at 11:58
First step : use a profiler to determine the CPU usage and the memory usage. — Ben, Mar 10 '21 at 11:59
@Ben has a good point. I was about to suggest even moving away from questa, if that's possible, for testing your own code, and use the free verilator software to compile your SystemVerilog to C++, which can be made to easily interface with C code, which makes benchmarking your regression tests much easier – also, unlike *any* vendor tools I've ever met, you can put verilator compilation and tests into a "normal" continuous integration / CD environment, and have it run tests selectively, automatically, nightly, whatever, without paying for any strange licenses — Marcus Müller, Mar 10 '21 at 12:03
@MarcusMüller , the C function is called only once. I have access to a .so. Unfortunately, I can't see any of the source code. I can tell you that it has a fixed array input handle in which a million or odd bytes need to be filled and processed through. Similarly, it has a fixed array output handle (pointers) which has several thousand bytes. Unfortunately, i can't move away from the current environment. Starting this on another platform is too late but i will keep this in mind before any more ventures. I will look at the profilers in questa to determine this. — Ashhad Khan, Mar 10 '21 at 12:10
"Associative arrays" suggest extremely inefficient ways of addressing data, which may be something you want to look at replacing with an algorithmically better solution. Can you test your model in isolation from the whole system? Is it faster in isolation than the RTL model in isolation? Another thing to check is whether either original or improved(slower) simulation are swapping. If so, add memory. — , Mar 10 '21 at 14:12

score 3 · Answer 1 · answered Mar 11 '21 at 11:45

Thank you for the suggestions.

I used Questa performance profiler. There was an assertion within the testing environment which caused this. Because of the way it was written, the sequence it was asserting, started every cycle. As the time progressed, the simulation time and memory taken by this increased and it made the simulation slower. This in-turn had a knock-on effect to the model which also consumed a lot of memory. This probably gave rise to some memory swapping.

I've fixed this and we're getting a much faster peformance.

Model running slower than RTL in SystemVerilog

1 Answers1