Sharing a data class in a flow

Question

Suppose there is a flow of functions in C++

step1();
step2();
step3();
step4();
step5();

and they interact by adding and modifying data on a data class D (only data, no functions). For example, step1() creates d in D, step2() modifies d, and step5() uses d.

The problem with this design is that at a given step function, it is not clear whether any data member in D is already produced and ready to use. Moreover, it is not clear a data member in D is produced in which step.

I was wondering if there is a good way (maybe some design pattern) to make the data class D more powerful such that it is clear whether a data member in D has been produced by a given step. Thanks.

Does this answer your question? [How to make sure people call methods in the right order?](https://softwareengineering.stackexchange.com/questions/294455/how-to-make-sure-people-call-methods-in-the-right-order) — gnat, Dec 16 '22 at 22:21
Not really, the steps in my case may not have outputs, and the flow is really the skeleton in the template method pattern. — tqw, Dec 16 '22 at 22:31
`I was wondering if there is a good way (maybe some design pattern) to make the data class D more powerful such that it is clear whether a data member in D has been produced by a given step` to what end? What are you trying to achieve? Is for tracking and debugging maybe? — Laiv, Dec 17 '22 at 09:54

score 1 · Accepted Answer · answered Dec 17 '22 at 12:31

It's not exactly clear what's going on in your example. If you're building data production-line style, you'd typically have something like this, passing by reference:

DataStruct d;
step1(d);
step2(d);
step3(d);
step4(d);
step5(d);

In terms of understanding which stage contributes which field(s) to the structure, in the trivial case it might be just easier to inspect the code within each method, but if there is need for real organisation and clarity, then typically you'd start to reflect it in the naming scheme of the fields, or in the definition of multiple structures that represent the final structure in different stages of production.

So with a naming scheme, you might have:

struct DataStruct {
    int S1Field1;
    int S1Field2;
    int S1Field3;
    int S2Field4;
    int S3Field5;
    (etc.)
}

...with the prefix representing the stage at which that field is built.

If that's insufficient for whatever reason, then you can start defining separate structures:

struct DataStructBuild1 {
    int Field1; //new
    int Field2; //new
    int Field3; //new
}

struct DataStructBuild2 {
    int Field1; //from Build 1
    int Field2; //from Build 1
    int Field3; //from Build 1
    int Field4; //new
}
(etc.)

DataStructBuild1 d1;
DataStructBuild2 d2;
DataStructBuild3 d3;

step1(d1);
step2(d1, out d2);
step3(d2, out d3);

(etc.)

Clearly with this latter approach, it's impossible for earlier steps to refer to fields that aren't built yet, because the structure they are passed to modify does not contain any unbuilt fields - it contains only the available inputs and the receptacles for the output of the given stage.

It might also be useful to use a combination of both approaches - separate structures for each state, and a naming prefix to represent the stage at which those fields become available - if the number of fields is large and there is a desire to keep careful track of which fields are already built and which are being built.

And if necessary, you can define a final structure that contains a clean naming scheme free of the prefixes, which are only added as scaffolding for better understanding in the context of the building process, but which don't need to be preserved into other parts of the program later.

I should add one final remark, @Christophe's answer involving OOP would be more appropriate where the building stage for the data is so large or complicated that it would be somehow under the purview of multiple programming teams (or at least multiple individuals who have separate areas of concern), and there needs to be a system of getters and setters defining an interface between the realms of the two teams, exceptions to ensure that people who didn't write the code and aren't familiar with it's innards are warned when they aren't using it properly, and so on.

When you're not working at the interface between two teams, but working entirely within your own scope, OOP is usually a very complicated and heavyweight approach to solving problems.

Christophe · Answer 2 · 2022-12-17T10:58:07.960

The current design seems to be data-driven and lack encapsulation. It’s the OO flavour of spaghetti code, since it is difficult to see when what data was updated by which function.

One solution could be to manage the state of D:

As a first refactoring step, you could add state information to D (e.g enum member if it’s about sequential states, bitset member if it’s about parallel states like availability of some elements of D); the steps should update (voluntarily) the state, and consult the state to check that D is usable. This is however still very error prone.
As a second refactoring step, you could encapsulate the data in D, and protect it behind getters and setters, that would update the state automatically (and raise exceptions in case something is requested that is not yet available. You can make D bullet proof.
As a third step, you could see if you could evolve to a “tell, don’t ask” logic. Since D would then have real behavior, you could make the behavior dependent of the state using the state pattern. This third refactoring step would be more challenging in view of the current data oriented approach. It could however decouple your algorithm from the your data structure and lead to greater reuse.

Sharing a data class in a flow

2 Answers2