6

For a school project I'm trying to implement an equation for example like this: (EDIT)

B = ((A + 2) * |A - 10|) / (c * c)

everything is unsigned binary values, absolute values always. The equation should be evaluated 57600 times per second for an image of 240x240 pixels.

I don't know how to start it. Would I be better to implement it by making a MIPS processor and load a list of instructions of the program in assembly and so?

Or should I do a direct approach by code? If so, what methodology should I follow, should I do FSM? Should I use clocks?

I tried to program it by easy combinational (assign... etc) and it works, but it uses almost 80% of available ALMs. I don't think this is the best way, I'm looking to make it the less hardware usage possible, time is not a constraint. I'm using Quartus II and Verilog.

Martin Zabel
  • 1,286
  • 1
  • 8
  • 21
sujeto1
  • 225
  • 2
  • 9
  • 1
    Do the math, discard the functions you don't need to implement, the rest should be easy. – Lior Bilia Feb 27 '16 at 14:21
  • 3
    `|A-10|` implies an intermediate signed value, so be careful saying everything is unsigned. Might bite you in a future implementation. – jippie Feb 27 '16 at 14:23
  • @jippie I know that, the input data is unsigned, I'm guessing I have to pass it to "negative" binary, that's why I wonder what designing method should I follow. If I simulate a MIPS I would have to make it pass trough a Complement a2 instructions and so. – sujeto1 Feb 27 '16 at 14:26
  • @LiorBilia what do you mean by that? In fact, I need the design to calculate it, is this feasible to do or is a nonsense? – sujeto1 Feb 27 '16 at 14:31
  • 2
    Depending on the size of the variables, a (partial) lookup table may be viable. Dividing is probably very expensive, multiplying somewhat expensive. Carefully think about what `n / c * c` actually does. – jippie Feb 27 '16 at 14:36
  • variables are 64 bits and 32 bits :( I know the algorithm for dividing In fact I have created a verilog module for it. The algorithm I'm working is more complicated than the one I show in my question, this was only for example. I wish to know how can I schedule for example first: (A + 2) , (A - 10) and CxC then the result be Multiplied, THEN the result be divided by (CxC), I'm only guessing that if I manage to schedule in different events, synthesis tool would reuse resources of the previous "Multiplication" to make division?? – sujeto1 Feb 27 '16 at 14:41
  • 4
    The key question here is: How frequently do you need to evaluate this equation? Once a second? Millions of times per second? Billions of times per second? The answer will indicate what kind of resources you need to devote to the problem. – Dave Tweed Feb 27 '16 at 14:48
  • 1
    @Dave thanks for pointing out, It should evaluate 57.600 unsigned numbers in total per second. Reason is because every number represent the intensity of each pixel in a black and white 240x240 picture. XP – sujeto1 Feb 27 '16 at 14:53
  • OK, that's useful information that should probably be in your question. Are you only processing one frame per second? I'm guessing that only one of the variables in the equation is the pixel intensity, and the rest are constant for the entire frame. If so, the work should be divided into two parts: That which can be done just once per frame, and that which needs to be done separately for each pixel. – Dave Tweed Feb 27 '16 at 16:19
  • If these are monochrome pixel values, then your values would probably be small, for instance 8-bits. If so, \$c^2\$ could easily be done through lookup tables. If \$c\$ is indeed a fraction of some kind, fixed-point math can help. – Pål-Kristian Engstad Feb 27 '16 at 23:36
  • Could you please post the Verilog code for your equation. – Martin Zabel Feb 28 '16 at 07:48

1 Answers1

4

Depending on what you want to learn, there are many approaches possible.

You say the fully parallel combinatorial design works, and fits into your FPGA. Result! Many students would stop there and write it up. However, it sounds like you feel that this is not in the spirit of the project.

Creating your own processor design from scratch would be a project 100x the size of what you are attempting, for a general purpose core at least. Using an existing VHDL processor core would perhaps be too easy? Designing an ALU with just the instructions needed for these calculations still sounds quite a large detour.

The first place I would look to start serialising the design is that divide by c squared. Division is an operation that's very expensive or impossible to do as full width look up tables. Bit-wise shift-subtract is perhaps the mainstream way. Look up COORDIC as an alternative way of mechanising it. You may also want to consider byte or nybble-wide shift and subtract, as an alternative method of implementation, with a latency and resource use somewhere between the two previous methods.

Maybe you could to look at implementing serial arithmetic as an exercise, on the grounds of saving space. Hold the variables in shift registers, and cycle them through a one bit ALU+carry, LSB first. All sorts of interesting state machine issues to solve.

Neil_UK
  • 158,152
  • 3
  • 173
  • 387