26

I work in a Data Warehouse that sources multiple systems via many streams and layers with maze-like dependencies linking various artifacts. Pretty much every day I run into situations like this: I run something, it doesn't work, I go through loads of code but hours later I realise I've managed to conceptualise the process map of a tiny portion of what I now know later in the day is required, so I ask someone and they tell me that this other stream has to be run first and that if I checked here (indicating some seemingly arbitrary portion of an enormous stack of other coded dependencies), then I would have seen this. It's incredibly frustrating.

If I were able to suggest to the team that perhaps it'd be a good idea if we did more to make the dependencies between objects more visible and obvious, rather than embedding them deeply in recursive levels of code, or even in the data that has to be present due to it being populated by another stream, perhaps by referring to a well known, tried and tested software paradigm — then I might be able to make my job and everyone else's a lot simpler.

It's kind of difficult to explain the benefits of this to my team. They tend to just accept things the way they are and do not 'think big' in terms of seeing the benefits of being able to conceptualise the entire system in a new way — they don't really see that if you can model a huge system efficiently then it makes it less likely you'll encounter memory inefficiencies, stream-stopping unique constraints and duplicate keys, nonsense data because it's much easier to design it in keeping with the original vision and you won't later run into all these problems that we are now experiencing, which I know to be unusual from past jobs, but which they seem to think of as inevitable.

So, does anyone know of a software paradigm that emphasises dependencies and also promotes a common conceptual model of a system with a view to ensuring long term adherence to an ideal? At the moment we pretty much have a giant mess and the solution every sprint seems to be "just add on this thing here, and here and here" and I'm the only one that's concerned that things are really beginning to fall apart.

Nathan Tuggy
  • 345
  • 1
  • 6
  • 14
Christs_Chin
  • 369
  • 2
  • 4
  • 2
    Could you clarify what kind of "maze-like dependencies linking various artifacts" these are? Are they build dependencies, that could be resolved with a build tool like Maven? Are they input dependencies, where one of these artefacts depends on some input that is not obvious or clear? Are they key dependencies between database tables? – FrustratedWithFormsDesigner Jan 17 '17 at 15:55
  • The system is PLSQL, Unix bash, OWB etc so there are all sorts of dependencies. Sometimes data is required of a certain format, in a certain place, at a certain time, by a certain module, but it's not remotely obvious from the code and can only be discerned in two ways: by going through the a mountain of code, taking perhaps days, to find out that some data had a delimiting semi colon in a part of the system that you didn't even know was being referenced since it was buried in 10 layers of recursively called code, or by asking someone, all the time, every time. It doesn't promote independence. – Christs_Chin Jan 17 '17 at 16:26
  • 4
    Literally all of them – Miles Rout Jan 17 '17 at 21:32
  • Is it an option for each script to directly call all its dependencies at the beginning? If that would create a lot of overhead, perhaps each dependency could have a "have I already run?" check. – paj28 Jan 23 '17 at 12:55
  • 3
    Little tangent: Because Haskell is lazy, you effectively do not specify the order of operations when you write code. You only specify dependencies. Function C depends on the results of functions A and B. So A and B have to be run before C, but it could work equally well if A is run first, or if B is run first. I just thought that was interesting. – GlenPeterson Jan 23 '17 at 16:20
  • @GlenPeterson: Good that you mention Haskell: making dependencies explicit is one of the strengths of functional programming. E.g. instead of changing global variables, you have to explicitly define inputs and outputs to functions. – Giorgio Jan 23 '17 at 18:00
  • 1
    There is a book called Design patterns (the book sucks, but most of what it says is good, except the bit about singleton). It has several sections on managing dependencies. – ctrl-alt-delor Jan 23 '17 at 22:46
  • This isn't enough for an answer, but when inheriting a system like the OP has, I start by getting the biggest whiteboard I can find, and just start drawing out the architecture of any pieces I touch. As time passes, the dependencies begins to show themselves, and, most importantly, the low hanging fruit starts to stand out in terms of areas to consolidate and clean up. Once an area of the overall architecture and dependency matrix starts becoming clearer, you can start to break it out and formalize it as a sub-system's architecture, and do the same with that. – GWR May 29 '18 at 14:45

8 Answers8

19

Discoverability

Its absence plagues many organizations. Where is that tool that Fred built again? In the Git repository, sure. Where?

The software pattern that comes to mind is Model-View-ViewModel. To the uninitiated, this pattern is a complete mystery. I explained it to my wife as "five widgets floating above the table talking to each other via some mysterious force." Understand the pattern, and you understand the software.

Many software systems fail to document their architecture because they assume that it is self-explanatory, or emerges naturally from the code. It isn't, and it doesn't. Unless you're using a well-defined architecture, new people will get lost. If it's not documented (or well-known), new people will get lost. And veterans will get lost too, once they've been away from the code for a few months.

It is the team's responsibility to come up with a sensible organizational architecture and document it. This includes things like

  • Folder organization
  • Project references
  • Class documentation (what it is, what it does, why it exists, how it is used)
  • Project, module, assembly, whatever documentation.

It is the team's responsibility to make things organized and discoverable so that the team does not constantly reinvent the wheel.

By the way, the notion that "code should be self-documenting" is only partially correct. While it is true that your code should be clear enough so that you don't have to explain every line of code with a comment, the relationships between artifacts like classes, projects, assemblies, interfaces and the like are non-obvious, and still need to be documented.

Robert Harvey
  • 198,589
  • 55
  • 464
  • 673
  • Design patterns help too. Not in and of themselves but because any developer worth his salt knows what a factory, adapter, facade is etc etc. – Robbie Dee Jan 17 '17 at 16:05
  • 3
    Sure, but people who lean too hard on design patterns are part of the problem. They're the ones writing code without any documentation, assuming that everyone else will understand what the hell they did just by looking at the code. Also, *software design patterns are not architecture* (for the most part). – Robert Harvey Jan 17 '17 at 16:07
  • I take your point but I'd rather go armed into a morass of code knowing the coder's intention than without it... – Robbie Dee Jan 17 '17 at 16:13
  • 1
    Where is that tool that Fred built again? In the Git repository, sure. Where? - Exactly! The MVC pattern is too specific to front-end development (i think), and patterns are only useful if everyone in the team knows them so this just moves the problem from dependencies not being obvious, to them being obvious IF everyone knows how to find them. But the problem presupposes this not being the case. As such I'm hoping there's something that promotes a really obvious way of explaining dependencies that doesn't require some other learned conceptual framework for you to use. – Christs_Chin Jan 17 '17 at 16:20
  • 7
    It's called "documentation." Beyond that, you need a sensible dependency framework that everyone supports. Unfortunately there isn't a boilerplate template that you can just drop into your project; the software's organizational structure is something that your team creates itself, with the assistance of a sensible architecture. – Robert Harvey Jan 17 '17 at 16:23
  • 1
    @Christs_Chin It helps to have all your devs on the same page for sure. If one person's must have tool is black magic bloat to another, that isn't going to help anybody... – Robbie Dee Jan 17 '17 at 16:28
  • 5
    @RobertHarvey: Heard quite recently: "We write code that doesn't need documentation". Wrong. You are writing code without documentation. – gnasher729 Jan 17 '17 at 16:31
  • 3
    Some good stuff here. N.B. there is a difference between writing code that doesn't require *comments* and writing supporting documentation. – Robbie Dee Jan 18 '17 at 10:09
  • Patterns can work, if done correctly. Unfortunately as with most religions they are not always done correctly. The design patterns for dependencies allow you to keep all of your dependencies in one place, so do so. I one had a quality assurance engineer come to me, after running a complexity analyser, so smug as he had found a module of mine with a massive dependency complexity, I asked him what the value of the other complexity metrics was. It was zero. I told his, it is because this is where all the dependency is. By keeping it all here, we keep it simple. – ctrl-alt-delor Jan 23 '17 at 22:53
  • Yes you need documentation. However don't document how it does it, that should be clear in the code. Document why it does it, in comments. Use contracts to document what it does. Site the pattern you use with a hyper-link to single document that is in your repository (so it does not get lost). Write an overview document. – ctrl-alt-delor Jan 23 '17 at 22:59
  • 1
    @richard: "Why" doesn't quite cover it. Relationships between software objects should also be documented. That's not "why," it's "how." Good documentation should explain "how" the various software components interact with each other, preferably with usage examples. Code contracts aren't enough. Dependency injection (regardless of the sophistication of your setup) is not enough. – Robert Harvey Jan 23 '17 at 23:22
  • @RobertHarvey Relationships between software objects is “what” (not “how”): What relationships you have. The code should show how, with a small comment /*Uses pattern xyz see abc*/ So yes you may need some docs, probably a nice big diagram of the overall architecture. If you can do this on one page then you have good architecture, or a small project. – ctrl-alt-delor Jan 24 '17 at 09:28
  • 1
    @richard: I'll stipulate on “what” (not “how”). You can call it anything you want as long as you don't use words like "how" and "why" as an excuse not to do it. It takes a bit more than a small comment. – Robert Harvey Jan 24 '17 at 14:25
10

The best way to approach these sorts of problems is incrementally. Don't get frustrated and propose wide, sweeping architectural changes. Those will never get approved, and the code will never improve. That's assuming you can even determine the correct wide, sweeping architectural changes to make, which is unlikely.

What is likely is that you could determine a smaller change that would have helped you with the specific problem you just solved. Maybe inverting some dependencies, adding some documentation, creating an interface, writing a script that warns of a missing dependency, etc. So propose that smaller change instead. Even better, depending on your company culture, they may tolerate or even expect you to make improvements like that as part of your original task.

When you make these smaller changes a regular part of your work, and by your example encourage others to do so as well, they really add up over time. Much more effective than whining about single larger changes you aren't allowed to make.

Karl Bielefeldt
  • 146,727
  • 38
  • 279
  • 479
  • 2
    I agree with the idea of incremental changes. The problem is that, without some organizational principles already in place, you might just be creating more chaos. Consider the effect of moving just a single project, class or other artifact (in which other modules depend) to a more sensible location. – Robert Harvey Jan 17 '17 at 17:17
  • 1
    Great stuff. My travails have often been made less arduous by a few diligent souls who has the nous to add a tool/widget here and there to create order from the chaos. While I am not a fan of reams and reams of documentation, a well written cheat sheet or bullet pointed list of gotchas/features can help greatly. – Robbie Dee Jan 18 '17 at 10:12
  • +1 for proposing small changes that are likely to get approved. I experienced that and it helped me becoming someone with more influence, and then my proposals got more impact. – RawBean Jan 20 '17 at 09:45
2

Architecture.

There is no single, specific, universal principle or practice that solves the discoverability and maintainability problems which applies to all aspects of all software. But, the broad term for the stuff that makes a project sane is architecture.

Your architecture is the whole body of decisions around each point of potential (or historical) confusion -- including the designation of how architectural decisions are made and documented. Everything pertaining to development process, folder structure, code quality, design patterns, and so forth are all things that might go into your architecture, but not one of them is an architecture.

Ideally, those rules are unified by a singularity of mind.

A small team can certainly create architecture collaboratively. But, with varying opinions, this can lead quickly to a very schizophrenic architecture that doesn't serve to maintain your sanity. The simplest way to ensure that your architecture, and the many TLA's and patterns therein, all serve the success of the team with a singularity of mind is to make a single mind responsible for them.

Now, that doesn't necessarily require an "architect" to pontificate. And, while some teams may want an experienced person to just make those decisions, the primary point is that somebody needs to own the architecture, especially as the team grows. Somebody keep their finger on the team's pulse, moderate architectural discussions, document decisions, and monitor decisions and work going forward for compliance with the architecture and its ethos.

I'm not a big fan of any one person making all the decisions; but, identifying an "architect" or "technical product owner" who is responsible for moderating architectural discussions and documenting decisions combats a greater evil: The diffusion of responsibility that leads to no discernible architecture.

svidgen
  • 13,414
  • 2
  • 34
  • 60
  • You're absolutely correct in identifying the diffusion of responsibility as being responsible for no discernable architecture. Decisions have now only recently been made to redress this issue. I always think a good solution for this would be to create an entire distributed system through another software system that acts as a harness of sorts, where you decide what goes into the system, but it decides where, according to how the architect programs it. You'd have one view into multiple different systems and technologies and would navigate them via some system architectural diagram... – Christs_Chin Jan 24 '17 at 14:25
  • I think your point in this answer is the single-greatest way to combat/prevent the type of thing the OP is talking about. It even applies to inheriting a mess as the OP has. – GWR May 29 '18 at 14:40
1

Welcome to Software Engineering (in both senses) ;) This is a good question, but really there are no easy answers, as I'm sure you are aware. It's really a case of evolving into better practices over time, training people to be more skillful (by definition most people in the industry are mediocre competence)...

Software engineering as a discipline suffers from build it first and design it-as-we-go mentality, part out of expediency and part out of necessity. It is just the nature of the beast. And of course hacks get built on hacks over time, as the aforementioned coders put in place functional solutions quickly that resolve the short term need often at the cost of introducing technical debt.

The paradigm you need to use is essentially get better people, train the people you have well, and emphasize the importance on taking time over planning and architecture. One cannot easily be that "Agile" when working with a monolithic system. It can take considerable planning to put in place even small changes. Getting a great high-level documentation process in place will also help key people get to grips with the code more quickly.

The ideas you could focus on would be (over time, gradually) isolating and refactoring key parts of the system in a way that makes them more modular and decoupled, readable, maintainable. The trick is in working this is to existing business requirements, so that the reduction in technical debt can be done simultaneous with delivering visible business value. So the solution is part improving practices and skills and part trying to move more towards long-term architectural thinking, as I can tell you already are.

Note that I have answered this question from a software development methodology perspective rather than a coding technique perspective because really this is a problem that is much bigger than the details of coding or even architectural style. It's really a question of how you plan for change.

Bradley Thomas
  • 5,090
  • 6
  • 17
  • 26
  • 6
    I hear what you're saying, but your answer is ultimately unsatisfying, and frankly a bit insulting. It's a larger problem than just hiring better people; even in the small shop I work in, we struggle with this, and I think it's more than just a people problem; I think it has some specific technical pain points. – Robert Harvey Jan 17 '17 at 15:37
  • 1
    I agree there are technical aspects, but I think those are relatively minor compared to the emphasis on a stronger methodology for planning change. I don't see this as being about design patterns as much as a cultural shift towards *more* planning and analysis, *earlier* planning and analysis and *better* planning and analysis. – Bradley Thomas Jan 17 '17 at 15:41
  • Alright, I'll post my own answer as a comparison. I don't think it has *anything* to do with software patterns. – Robert Harvey Jan 17 '17 at 15:43
  • Brad, thanks for the answer. Your response is appreciated as I know I'm not alone in being aware of this problem. It just seems like this in my team. I also agree with Robert Harvey, in that this problem is widespread and I don't want to give up on the belief that there is a solution, either in a new type of software or a new working practise. – Christs_Chin Jan 17 '17 at 15:44
  • @Christs_Chin I think the problem is soluble. I just view it as another manifestation of the problems of short termism in coding, arising from the causes of technical debt. The solution therefore is to put in place the known solutions to technical debt. Having more competent people is a VERY big part of that. – Bradley Thomas Jan 17 '17 at 15:58
  • *"Software engineering as a discipline suffers from build it first and design it-as-we-go mentality"* - wow, just WOW! I could see how in a web development/mobile app development shop that could be the case where you can roll out patches for your various transgressions farily quickly, but I can assure you than in a proper engineering environment, nothing could be further from the truth. On multi-billion pound projects we plan to deploy once and if we can't it becomes hugely expensive - fast! – Robbie Dee Jan 17 '17 at 16:03
  • @RobbieDee: If it's a multi-billion pound development effort, you have the benefit of an army of people to help you with all of the little details. – Robert Harvey Jan 17 '17 at 16:04
  • @RobbieDee The implementors are still doing a lot of technical design - and much of that is last minute, even on huge, expensive projects, in my experience. – Bradley Thomas Jan 17 '17 at 16:07
  • @RobertHarvey To a degree but large teams come at a well documented cost. See diseconomies of scale, magical number 7, the mythical man month etc etc. – Robbie Dee Jan 17 '17 at 16:11
  • 2
    Exactly my experience: you must get your team members understand what they are doing. I see people mixing MVVM and MVC, others using WPF controls in a way which was normal with Windows Forms (or rather: VB6), people programming in C# without a basic understanding of object-orientation... Teach them. Teach them again. Be frustrated. Teach them again, ... Often thinking of giving up. And teaching them again... – Bernhard Hiller Jan 18 '17 at 11:27
  • @BernhardHiller it seems you're suggesting that Software Engineering is a difficult subject...too true! Unfortunately education is only possible for people who see value in the learning opportunity and the team i'm in don't value maintainability and readability at all. They're all incredibly smart and capable, but their methodology isn't this way inclined. Brad Thomas was correct in pointing out the build first and quick mentality. It's inevitable with with some companies' business model and mindset. It's basically I.T as fast food. Quick and cheap but terrible in the long run. – Christs_Chin Jan 18 '17 at 14:57
1

I like @RobertHarvey's idea of conventions and think they help. I also like @KarlBielefeldt's idea to "document as you go" and know that's essential because that's the only way to keep documentation current. But I think the over-arching idea is that documenting how to find all pieces of your code, build, and deploy them is important!

I recently emailed a significant open source project that had had some XML configuration that generated code was totally undocumented. I asked the maintainer, "Where is this XML code generation process documented? Where is the test database setup documented?" and he said, "It's not." It's basically a single-contributor project and now I know why.

Look, if you're that person and you are reading this, I really appreciate what you are doing. I practically worship the fruits of your labors! But if you spent an hour documenting how your really creative stuff is put together, I might spend a couple days coding new features that could help you. When faced with the brick wall of "lack of documentation isn't a problem," I'm not even going to try.

In a business, a lack of documentation is a huge waste of time and energy. Projects like that often get farmed out to consultants who cost even more, just so that they can figure out basic stuff like, "where are all the pieces and how do they fit together."

In Conclusion

What's needed is not so much a technology or methodology, but a culture shift; a shared belief that documenting how things are built and why is important. It should be part of code reviews, a requirement for moving to production, tied to raises. When everyone believes that and acts on it, things will change. Otherwise, it's going to be like my failed open source contribution.

GlenPeterson
  • 14,890
  • 6
  • 47
  • 75
  • 2
    I suspect that part of the cultural problem lies in the Agile belief that "if it's not part of a User Story (i.e. it doesn't contribute directly to stakeholder value), then it's not important." *Hogwash.* Related conversation here: [In agile, how are basic infrastructure tasks at the start of a project planned and allocated?](http://softwareengineering.stackexchange.com/q/340538) – Robert Harvey Jan 23 '17 at 16:42
  • @RobertHarvey Yes. Everyone in my team is incredibly bright and very easy to get on with. The scrum masters and project managers are well intentioned and driven, and the practices are the most Agile within which i've worked. But the documentation is lacking, probably for the very reason you suggest. Plus when documentation is created, a further layer of randomness in communicative effectiveness is introduced in the ability of the person to identify pertinent concepts and also to explain them, not to mention their attitude towards having to undertake such a task. Usually it's just "Ask somone" – Christs_Chin Jan 24 '17 at 11:41
  • @GlenPeterson Yes, I agree this would be helpful. But it should be specified not only that it should be built, but also how and what qualifies as documentation. For instance as a recent example here, someone included a list of new numbers our system will identify. That's it. There was no mention of how these numbers enter the system, where, why, by whom, how often or anything useful, only that they do. At no point have i wondered what numbers our system will identify as relevant. But i've often wondered, where do they enter, where do they go and what happens on the way. It's still a mystery. – Christs_Chin Jan 24 '17 at 12:19
  • 1
    @Christs_Chin So much of communication is based on context. Without that context, most communications are almost meaningless. I feel your pain. But I think it's hard to write (English) so that others can understand you. Sometimes early specs for a system have the context you need to understand it even if they are horribly out of date, the context usually helps. – GlenPeterson Jan 24 '17 at 14:39
1

To answer the question as it is posed (rather than giving you advice for your particular situation):

The programming paradigm known as pure functional programming requires that everything which affect the output of a function must be specified in input parameters. There is no hidden dependencies or global variables or other mysterious forces acting invisibly across the code base. There is no "you have to do this first" temporal coupling.

JacquesB
  • 57,310
  • 21
  • 127
  • 176
0

Each data warehouse is different but there is a lot you can do to make things easier for yourselves.

For starters, every row in the database had a DATE_ADDED and DATA_UPDATED column so we could see when it was added to the database and when it was changed. We also had a SOURCE_CODE column so we could track where every bit of data entered the system.

Next we had common tools that ran across all our data warehouses such as sorts, table matches, slicers and dicers etc.

Bespoke code was kept to an absolute minimum and even then, it had to confirm to various coding and reporting styles.

I'm going to assume you're already familiar with ETL suites. There is a lot of functionality you get for free these days that wasn't present when I was in the game about a decade ago.

You might also want to look at data marts for presenting a more friendly, sanitised version of your data warehouse. Not a silver bullet of course but could help with certain issues rather than having to rebuild/correct your data warehouse.

Robbie Dee
  • 9,717
  • 2
  • 23
  • 53
  • thanks for the reponse. Yes, we use all of these fields, but they only really assist with the identification of a single row, not with dependencies between streams, layers and systems. You're right about the ETL suites - we were in the process of upgrading to a well known ETL tool from one that was going out of support but instead, ended up moving back to PLSQL. It's fine to code in, but for maintainability and to understand the overall system from the code level, it's absolutely terrible. – Christs_Chin Jan 17 '17 at 16:05
  • 1
    The ideal is that you can track data end to end be it via staging tables or flat files but if you don't have that, you're left with walking code. – Robbie Dee Jan 17 '17 at 16:17
0

I don't know how much relevant it is to your case, there are some strategies to make dependencies more visible and general maintenance of code-

  • Avoid global variables, use parameters instead. This applies to cross language calls also.
  • Avoid changing/mutating values of the variables, as much as you can. Make a new variable and use, when you need to change the value, if possible.
  • Make the code modular. If it is not possible to describe what (not how) portion is actually doing in a simple sentence, break it up into modules which satisfy the condition.
  • Name your code portions properly. When you can actually describe what a portion of code is doing in simple terms, those terms become the name of the portion. Thus, the code becomes self documenting through names of modules/classes/functions/procedures/methods etc.
  • Test your code. Test if the entities in your code justify their names, discussed in the previous point.
  • Log events in the code. At least maintain two levels of log. First one is always enabled (even in production) and logs only critical events. And use the other to log basically everything, but can be turned on or off.
  • Find and use suitable tools to browse, maintain and develop your codebase. Even a simple "Search Everything" option of Visual Studio Code made my life a lot easier for certain cases.
Gulshan
  • 9,402
  • 10
  • 58
  • 89