0

I am beginning with FPGAs and I am working on an accelerator that acquires data from a microphone (Pulse Density Modulation) and extracts a single frequency from the signal.

My accelerator is basically a digital filter, that multiplies each audio sample with a complex phasor and adds everything together. After n samples it should produce a complex number (representing phase and amplitude of a given frequency) and make it available to a CPU somehow.

My question is: what is the best/simplest strategy to transfer data from the FPGA to the CPU?


Note the following:

  1. I am using a Zynq UltraScale+ MPSoC from Xilinx (more precisely the zu3eg), and CPU and FPGA are on the same physical chip.
  2. My accelerator is an AXI slave that should be programmed and started by the CPU
  3. I need to transfer approximately 240 KB/sec from the FPGA to the CPU

The possibilities I could think of are:

  • Create a sort of FIFO within my accelerator: use some registers to store the last m results and signal the CPU with an interrupt when this output buffer is getting full. The data would be read by the CPU through the AXI interface.
  • Use an external FIFO: use one of VIVADO modules to implement a FIFO between my accelerator and the CPU. I guess this forces me to implement an AXI master interface on my accelerator?
  • Write the results in the FPGA's BRAM in a circular buffer

Could you explain what are the best design patterns for this type of problem? As I said I am rather new to FPGAs and it would be great to have some advice from someone with more experience. Any pointer to good references would also be greatly appreciated

Thanks a lot!!

leopicchio
  • 162
  • 8
  • those do sound like possibilities to me. – user253751 Apr 26 '22 at 09:36
  • 1
    What will the CPU do when it gets there? Is it devoted to processing this? How much latency can you tolerate, e.g. for handling other interrupts? Other options include "DMA into DRAM" and "BRAM circular buffer that's memory-mapped to the processor" – pjc50 Apr 26 '22 at 10:05
  • No, the second option does not require an AXI master interface. You have a choice with the FIFO generator: you can create one with a low-level ready/enable handshake, or you can create one with AXI stream (not AXI Lite) interfaces on both ends. I would recommend the former, which makes this choice essentially identical to your third choice. – Dave Tweed Apr 26 '22 at 10:53
  • Thank you all for your help. To answer your question @pjc50, the CPU task is to process the audio from 120 microphones to localize sound sources. It is quite a heavy workload, and should run in real time (latency of 100 ms is fine). As a first step, I am currently trying to implement the BRAM circular buffer that you suggested, maybe in the future I will try with the DMA into DRAM – leopicchio Apr 27 '22 at 15:50

1 Answers1

1

Xilinx offers reference drivers, and I think by now even some of the AXI infrastructure is in the upstream linux kernel. Other companies (ADI for example) have publicly available Linux images for RFSoC devices, that should easily port to MPSoC as well.

Since you're streaming data, a usual approach would be using a DMA controller to write into the DDR Ram that the ARM core also accesses.

You could raise an interrupt whenever a buffer is ready (that's pretty much the way PCI sound cards work) and then handle that in software. But considering audio rates are incredibly low from a modern CPU's point of view, software-polling a status register might work just as well and reduce the interrupt overhead if you have so many channels that you can always be sure that one is done.

Note that I'm assuming this targets a massive-multi-channel thing, or something requiring an impressive convolution length, or is to learn how to build such a system (not necessarily actually build that system); audio-rate convolution is really hard for normal PC-style processors. That's why I recommend "large data volume" techniques instead of "oh yeah, that rate is low enough you can directly MMIO-query that in a software loop".

Marcus Müller
  • 88,280
  • 5
  • 131
  • 237
  • no, I don't mean that. – Marcus Müller Apr 26 '22 at 17:00
  • thank you @MarcusMüller, so if I understand well the best approach would be to use a DMA controller and write directly into Ram, that makes a lot of sense. Maybe you could suggest a good book for beginners, that explains this kind of high level concepts associated with FPGA programming? – leopicchio Apr 27 '22 at 15:56
  • no, sorry, all I know comes from reading source code and talking to its developers. – Marcus Müller Apr 27 '22 at 16:21