20

I'm developing a physics simulation, and as I'm rather new to programming, I keep running into problems when producing large programs (memory issues mainly). I know about dynamic memory allocation and deletion (new / delete, etc), but I need a better approach to how I structure the program.

Let's say I'm simulating an experiment which is running for a few days, with a very large sampling rate. I'd need to simulate a billion samples, and run over them.

As a super-simplified version, we'll say a program takes voltages V[i], and sums them in fives:

i.e. NewV[0] = V[0] + V[1] + V[2] + V[3] + V[4]

then NewV[1] = V[1] + V[2] + V[3] + V[4] + V[5]

then NewV[2] = V[2] + V[3] + V[4] + V[5] + V[6] ...and this goes on for a billion samples.

In the end, I'd have V[0], V[1], ..., V[1000000000], when instead the only ones I'd need to store for the next step are the last 5 V[i]s.

How would I delete / deallocate part of the array so that the memory is free to use again (say V[0] after the first part of the example where it is no longer needed)? Are there alternatives to how to structure such a program?

I've heard about malloc / free, but heard that they should not be used in C++ and that there are better alternatives.

Thanks very much!

tldr; what to do with parts of arrays (individual elements) I don't need anymore that are taking up a huge amount of memory?

Drummermean
  • 327
  • 2
  • 7
  • 2
    You can't deallocate part of an array. You could re-allocate it to a smaller array somewhere else, but this may prove to be expensive. You could use a different data structure such as a linked list instead, possibly. Maybe you could also store the steps into `V` instead of in a new array. Fundamentally, though, I think your issue is either in your algorithms or your data structures, and since we don't have any details, it's hard to know how to do it efficiently. – Vincent Savard Jan 24 '17 at 13:37
  • 4
    Side note: SMA's of arbitrary length can be calculated particularly fast with this recurrence relation: NewV[n] = NewV[n-1] - V[n-1] + V[n+4] (your notation). But keep in mind that these aren't particularly useful filters. Their frequency response is a sinc, which is pretty much never what you want (really high sidelobes). – Steve Cox Jan 24 '17 at 16:03
  • 2
    SMA = simple moving average, for anyone wondering. – Charles Jan 24 '17 at 17:00
  • 3
    @SteveCox, the way he wrote it, he's got an FIR filter. Your recurrence is the equivalent IIR form. Either way, you get to maintain a circular buffer of the last N readings. – John R. Strohm Jan 24 '17 at 18:06
  • @JohnR.Strohm the impulse response is identical, and finite – Steve Cox Jan 24 '17 at 21:43
  • As a side note, you might be much more productive using for example Python, which is commonly used for quick&dirty number crunching (you can learn it usably in a few days, as opposed to C++ where even very experienced developers keep shooting themselves in a foot). Check out *NumPy*, too. – hyde Jan 25 '17 at 12:17
  • I'm not sure what you're doing - for example, why do you calculate V[0] to V[999999995] if you're not gonna keep them - but one thing to bear in mind is due to gaming's influence on modern PC design, PCs are mind meltingly fast at allocating, de-allocating, and copying vast amounts of data **if it's in contiguous arrays**, to the extent that linked lists are basically redundant. There's a talk on youtube where Bjaarn S explains it. So, don't worry too much about allocating and copying vast amounts of data, as long as you're keeping it in a vector. Oh yeah, **don't use arrays, use vectors**. – Grimm The Opiner Jan 25 '17 at 12:55
  • From a beginner perspective, one can prototype with `std::unordered_map`, and explicitly free the values that are no longer needed. This allows you to print out the buffer's states, and help debug your draft version of code. Circular buffers should be easy (and really *shouldn't* rely on `unordered_map`), but sometimes there are things more complicated than circular buffers (when upsampling/downsampling is involved) that need more debuggability. – rwong Jan 25 '17 at 14:06
  • @GrimmTheOpiner, what he's describing is a very primitive "kinda sort low-pass filter" on his data. For every input data point, there is a corresponding smoothed data point. He can read the data points from a disk file, and write the smoothed data points to another one, and not actually need a monster array in memory to hold all of them. – John R. Strohm Jan 25 '17 at 20:47
  • @SteveCox, an IIR filter has the form y_i+1 = linear_combination_of(y_i, y_i-1, ...) + linear_combination_of(x_i+1, x_i, x_i-1, ...). An FIR filter has the form y_i+1 = linear_combination_of(x_i+1, x_i, x_i-1, ...). Your recurrence relation has the form of an IIR filter, although I agree that it has a finite impulse response. That point, about the form, and about the necessity of maintaining the x_i history, was what I was trying to elaborate. – John R. Strohm Jan 25 '17 at 20:52
  • @JohnR.Strohm again you're being sloppy with terminology. IIR filters require feedback, but feedback does not necessarily imply IIR. (remember there are many possible realizations for the same impulse response, oppenheim refers to two of them as direct form I and direct form II). – Steve Cox Jan 25 '17 at 21:47
  • @JohnR.Strohm Fair enough. : ) My points still hold - don't be a-feared of big arrays, and use vectors instead of managing them yourself. If the original points are to be read from a file he should take care to open it once; quite easy to code to open the file every read - a Friday afternoon kinda thing. – Grimm The Opiner Jan 26 '17 at 08:11
  • @SteveCox, consider saturation. If the recurrence form saturates at any time, it will "remember" the saturation on the next step. IIR. The pure FIR form does not have that issue. – John R. Strohm Jan 26 '17 at 15:23

3 Answers3

58

What you describe, "smoothing by fives", is a finite impulse response (FIR) digital filter. Such filters are implemented with circular buffers. You keep only the last N values, you keep an index into the buffer that tells you where the oldest value is, you overwrite the current oldest value with the newest one at each step, and you step the index, circularly, each time.

You keep your collected data, that you are going to crunch down, on disk.

Depending on your environment, this may be one of those places where you're better off getting experienced help. At a university, you put a note up on the bulletin board in the Computer Science Department, offering student wages (or even student consulting rates) for a few hours of work, to help you crunch your data. Or maybe you offer Undergraduate Research Opportunity points. Or something.

John R. Strohm
  • 18,043
  • 5
  • 46
  • 56
  • 6
    A circular buffer indeed seems to be what I'm looking for! I've now installed the boost C++ libraries and included boost/circular_buffer.hpp, and is working as expected. Thanks, @John – Drummermean Jan 24 '17 at 15:27
  • 2
    only very short FIR filters are implemented in direct form in software, and SMA's almost never are. – Steve Cox Jan 24 '17 at 15:55
  • @SteveCox: The edges-of-window formula you used is quite effective for integer and fixed-point filters, however it's incorrect for floating-point, where operations are not commutative. – Ben Voigt Jan 24 '17 at 20:10
  • @BenVoigt i think you meant to respond to my other comment, but yes, that form introduces a limit cycle around a quantization which can be very tricky. thankfully though, this particular limit cycle happens to be stable. – Steve Cox Jan 24 '17 at 21:45
  • You don't really need boost for a circular buffer for that use u.u. You will be using much more memory than needed. – CoffeDeveloper Jan 25 '17 at 10:00
13

Every problem can be solved by adding an additional level of indirection. So do that.

You can't delete part of an array in C++. But you can create a new array holding just the data you want to keep, then delete the old one. So you can build a data structure that allows you to "remove" elements you don't want from the front. What it will actually do is create a new array and copy the unremoved elements to the new one, then delete the old.

Or you could just use std::deque, which can effectively do this already. deque, or "double-ended queue", is a data structure intended for cases where you're deleting elements from one end while adding elements to the other.

Nicol Bolas
  • 11,813
  • 4
  • 37
  • 46
  • 30
    _Every problem can be solved by adding an additional level of indirection_ ... except to many levels of indirection. – YSC Jan 24 '17 at 15:42
  • 17
    @YSC: and spelling :) – Lightness Races in Orbit Jan 24 '17 at 17:35
  • 1
    for this particular problem `std::deque` is the way to go – davidbak Jan 25 '17 at 00:25
  • 7
    @davidbak -- What? There's no need to be constantly allocating and releasing memory. A fixed size circular buffer that is allocated once at initialization time is a much better fit to this problem. – David Hammen Jan 25 '17 at 02:45
  • 2
    @DavidHammen: Perhaps, but 1) The standard library doesn't have a "fixed-size circular buffer" in its toolkit. 2) If you really need such an optimization, you can do some allocator stuff to minimize reallocations through `deque`. That is, storing and re-using allocations as requested. So `deque` seems a perfectly adequate solution to the problem. – Nicol Bolas Jan 25 '17 at 04:36
  • 1
    I'd argue that customa allocator is way more work than fixed-size circular buffer, which is pretty straightforward to implement. – Erbureth Jan 25 '17 at 11:44
  • @DavidHammen I think typical implementations of std::deque actually end up behaving as a circular buffer in this situation. – jpa Jan 25 '17 at 13:31
  • 1
    @jpa -- Except you are constantly allocating and deallocating. Those are very expensive operations. There is no need to do that. Allocate once and be done with it. A std::vector, some logic for startup, and an iterator or index to the oldest element are all that are needed. – David Hammen Jan 25 '17 at 14:13
  • 1
    @DavidHammen Hmm, as far as I know, std::deque does not deallocate every time you delete an element, and some googling supports my knowledge. IMO it makes sense, the same way that std::vector doesn't reallocate every time you add or delete an element. C++11 even has a separate shrink_to_fit() function. – jpa Jan 25 '17 at 18:03
4

The FIR and SMA answers you got are good in your case, however I'd like to take the opportunity to push forward a more generic approach.

What you have here is a stream of data: instead of structuring your program in 3 big steps (get data, compute, output result) which require loading all data in memory at once, you can instead structure it as a pipeline.

A pipeline starts with a stream, transforms it, and pushes it to a sink.

In your case, the pipeline looks like:

  1. Read items from disk, emit items one at a time
  2. Receive items one at a time, for each item received emit the last 5 received (where your circular buffer comes in)
  3. Receive items 5 at a time, for each group compute the result
  4. Receive the result, write it to disk

C++ tends to use iterators rather than streams, but to be honest streams are easier to model (there is a proposal for ranges which would be similar to streams):

template <typename T>
class Stream {
public:
    virtual boost::optional<T> next() = 0;
    virtual ~Stream() {}
};

class ReaderStream: public Stream<Item> {
public:
    boost::optional<Item> next() override final;

private:
    std::ifstream file;
};

class WindowStream: public Stream<Window> {
public:
    boost::optional<Window> next() override final;

private:
    Window window;
    Stream<Item>& items;
};

class ResultStream: public Stream<Result> {
public:
    boost::optional<Result> next() override final;

private:
    Stream<Window>& windows;
};

And then, the pipeline look like:

ReaderStream itemStream("input.txt");
WindowStream windowStream(itemsStream, 5);
ResultStream resultStream(windowStream);
std::ofstream results("output.txt", std::ios::binary);

while (boost::optional<Result> result = resultStream.next()) {
    results << *result << "\n";
}

Streams are not always applicable (they do not work when you need random access to the data), but when they are, they rock: by operating on a very small amount of memory you keep it all in CPU cache.


On another note: it seems like your problem might be "embarrassingly parallel", you may want to split your big file in chunks (keep in mind, for processing by windows of 5, that you need to have 4 common elements at each boundary) and then process the chunks in parallel.

If CPU is the bottleneck (and not I/O), then you can speed it up by launching one process per core that you have after having split the files in roughly equal amounts.

Matthieu M.
  • 14,567
  • 4
  • 44
  • 65