4

I'm developing an application where it read data from different data sources. And then those data should be pre-processed and then go through some chain of steps (Filters ?) where those data will get processed and augmented. Finally those data should be written to a common data base.

I'm thinking of a Pipe and Filter kind of style to implement this. While I'm learning on this, I came across these invariants of this style [here].

Independent entities
---- Do not share state
---- Have no knowledge of other filters

Transformation
---- Incremental
---- Not dependent on order in the chain

And I'm having trouble understanding these. Why those are considered as invariants. ?

What happen if they share the state.
what happen if they have a knowledge of other filters.
What if they need to depend on other filters (In my case a pre-process is a must) and what is Incremental.

As I know violating the invariants might erode the code with the time. So if I use this Pipe and Filter style in my app, what kind of things will violate these invariants or what are the things that I can do to violate these invariants ?

Can anyone help me on this.

prime
  • 219
  • 1
  • 4
  • You might also want to read up on Monads. – BobDalgleish Dec 26 '17 at 18:56
  • 1
    To be precise, for pipes and filters, the invariant should be "do not share **mutable** state". If there is some immutable state like a configuration passed into several filters, and they all share it, this does not break the architecture. – Doc Brown Dec 26 '17 at 19:39
  • @DocBrown Can you please share some idea to share the state among the Filters. As I comment on the answer from Telastyn. (In my application people can add custom Filters (which are actually scripts which follows a basic pattern). So then I need to make sure they don't violate any invariants. Knowing how to violate might help me to catch the violation. Just to know, How can someone share the state between the Filters anyway (know it's not a good thing but just want to know how to do that)) – prime Dec 26 '17 at 19:42
  • @prime: I already gave an example, so what are you asking for? – Doc Brown Dec 26 '17 at 20:25
  • @DocBrown one that break the architecture. What you said wont break the architecture as you mentioned right ? – prime Dec 27 '17 at 04:37
  • Sharing state and "having knowledge" is about dependencies - if one filter shares state with some other, or relies on the implementation details of the other in some way (expects certain things to happen in a specific way, or calls methods on the other filter, etc.), then you can't change or swap out the other filter with something else that does things in a different way, because the first one will not work anymore. That means that the architecture is not flexible, not really modular, etc. And you want to avoid that. – Filip Milovanović Dec 27 '17 at 08:54
  • @prime: simple. Lets say you build some image processing filters. Filter 1 may do something with the colors of the image, and as a side effect, it writes the image size into a database. Filter 2 expects the image size to be available in that database, otherwise it does not work. Now since there is shared mutable state in that database, you cannot just switch the order in which filter 1 and 2 are applied - architecture broken. – Doc Brown Dec 27 '17 at 09:22
  • @FilipMilovanović: sharing **mutable** state is **not** the same as sharing dependencies, and just because two filters share a dependency (like a library used by both), the architecture can still be intact. – Doc Brown Dec 27 '17 at 09:30
  • @DocBrown: I didn't say anything about sharing a dependency (a peace of code), I was talking about one filter being dependent on the other - when I used the word "dependencies", I didn't mean the nodes (a piece of code, or a library), I meant the arrows/edges (the actual relation between two filters - the way it would be depicted in an UML diagram.) – Filip Milovanović Dec 27 '17 at 09:59
  • 1
    @FilipMilovanović: that's the problem with this word - "dependency" means different things in different contexts, and specificially here, using this word for an explanation can actually confuse things ;-) And by the way, in several UML diagrams, a dependency arrow will exactly mean only "dependency from another component/piece of code/class", not "shared mutual state". – Doc Brown Dec 27 '17 at 10:09
  • @DocBrown: That's a valid point; I cannot edit the comment now, but if it helps any future readers, I would perhaps replace the word "dependencies" with "(implicit or explicit) coupling between the filters" - hopefully that makes more sense. – Filip Milovanović Dec 27 '17 at 10:22

1 Answers1

4

Why those are considered as invariants. ?

Because they're the intrinsic things that define this sort of architecture. If you allow shared state or knowledge of filters, you're no longer doing Pipe and Filter, you're doing something else.

The entire point of this architecture is to create independent, composable, parallelizable streams of work. This allows you to optimize the processing via reordering the filters. This allows you great scale, since the entity processing can be farmed out to many machines. It allows easy development since each filter can be implemented (and tested, and deployed) in isolation. And since these rules are uniform for all entities and all filters, it allows you to make high quality tooling for using them.

As soon as you start making exceptions to these rules, you start losing the benefits of having them. If you share state, then your entities cannot be trivially parallelized. If your filters depend on order, you can't optimize them, and you can cause subtle errors due to the implicit dependency.

Telastyn
  • 108,850
  • 29
  • 239
  • 365
  • Things getting clear now. Thanks for the detailed answer. In my application people can add custom Filters (which are actually scripts which follows a basic pattern). So then I need to make sure they don't violate any invariants. Knowing how to violate might help me to catch the violation. Just to know, How can someone share the state between the Filters anyway (know it's not a good thing but just want to know how to do that). – prime Dec 26 '17 at 19:37
  • 2
    @prime - If they're literally scripts (arbitrary code) then there's literally an infinite number of ways they can share state between themselves. Without knowing more, I can't say for sure. – Telastyn Dec 26 '17 at 19:40
  • Well one script can do a single thing. Lets say script is supposed to read a log file and filter out specific data set from that log. And there are many data types and filtering them are different. So we need multiple scripts. So what kind of state can be shared among the Filters and how can it be done. (asking how to violate the P&F) Couple of simple examples will help me a lot. – prime Dec 26 '17 at 19:44
  • @prime - if multiple scripts share that log file, that's by definition shared state. – Telastyn Dec 26 '17 at 20:25
  • what if they don't edit or change the log file but just read it only. Will that be violating the architecture ? – prime Dec 27 '17 at 04:39