HLS: Unrolling the loop manually and function latency constraints

Question

I have a TOP-level function of the following structure:

struct TYPE1    {uint8 ch[16];};
struct TYPE2    {uint8 ch[100];};

void FUNCT(hls::stream<TYPE1> &inStream, hls::stream<TYPE2> &outStream){
#pragma HLS INTERFACE axis port=inStream
#pragma HLS INTERFACE axis port=outStream
#pragma HLS DATA_PACK variable=outStream struct_level
#pragma HLS DATA_PACK variable=inStream struct_level

TYPE1 inpx;
#pragma HLS ARRAY_PARTITION variable=inpx.ch complete dim=1
TYPE2 outpx;
#pragma HLS ARRAY_PARTITION variable=outpx.ch complete dim=1

inpx = inStream.read();
L0:    for(i<100){
L1:        for(cha<16){
              acc[i] += inpx.ch[cha] * y;
           }
         // do more stuff
         outpx.ch[i] = x; write temp variable
       }

outStream.write(outpx);
}

This top-level function receives a stream of pixels and should process one pixel at a time (per function call); the pixel rate is 528 clock cycles, so the function has 528 clock cycles to work on every pixel. Thus, I would like to place a constraint on the function to have latency no more than 528 clock cycles. At the same time, I would like the function to use as least resources as possible. Since my loop L0 is 100 iterations, I know that each iteration needs to finish withing ~5 clock cycles, if executed sequenctially. Thus, I do not need to unroll L0 loop. With these requirements, I put the following constraints:

#pragma HLS LATENCY min=500 max=528      // directive for FUNCT
#pragma HLS UNROLL factor=1              // directive for L0 loop

However, the synthesized design results in function latency over 3000 cycles and the log shows the following warning message:

WARNING: [SCHED 204-71] Latency directive discarded for region FUNCT since it contains subloops.

Q1: How do I place the latency constraint on the function while preserving the loops? Would it make sense to manually unroll the loop by writing all the operations consecutively as below?

{ // manually unrolled L0 and L1
acc[0] = 0; acc[0] += inpx.ch[0] * x; acc[0] += inpx.ch[1] *  y;  acc[0] +=inpx.ch[2] *  z; ........ acc[0] += inpx.ch[16] *  zz;     do more operations on acc[0] 
acc[1] = 0; acc[1] += inpx.ch[0] * x; acc[1] += inpx.ch[1] *  y;  acc[1] += inpx.ch[2] *  z; ........ acc[1] += inpx.ch[16] *  zz;     do more operations on acc[1]
.........
acc[99] = 0; acc[99] += inpx.ch[0] * x; acc[99] += inpx.ch[1] *  y; acc[99] += inpx.ch[2] *  z; ........ acc[99] += inpx.ch[16] *  zz;     do more operations on acc[99] 
}

Q2: does HLS have limitation on how long (how many) operations can be written on a single line? Will it have a problem parsing/compiling the source code if I write out operations to substitute for loops of say 1000 iterations?

Sidenote: what kind of loops L0/L1 are? I'm not familiar with this syntax. — haggai_e, Dec 28 '19 at 08:34
Why not just unroll the L1 loop? Doesn't that get you the latency you need? — haggai_e, Dec 28 '19 at 08:35

HLS: Unrolling the loop manually and function latency constraints

0 Answers0