When to use DAG (Directed Acyclic Graph) in programming?

Question

I recently found a framework named ecto.

In this framework, a basic component named "plasm", which is the ecto Directed Acyclic Graph.In ecto, plasm can be operated by ecto scheduler.

I am wondering what's the advantage of this mechanism, and in what other situations can we exploit the concept of DAG?

Most Source Control Management Systems implement the revisions as a DAG. — Oded, Oct 28 '12 at 16:40
[Planning](http://en.wikipedia.org/wiki/Automated_planning_and_scheduling) is a whole branch of problems that deals with DAG **a lot**. — TC1, Oct 29 '12 at 09:00
Many things that get represented as trees, should *really* be represented as DAGs when keeping in mind the strange-yet-still-somewhat-common edge cases. — Joachim Sauer, Oct 29 '12 at 13:00

Md Mahbubur Rahman · Accepted Answer · 2013-02-10T15:15:09.740

32

Nice Question.

Code may be represented by a DAG describing the inputs and outputs of each of the arithmetic operations performed within the code; this representation allows the compiler to perform common subexpression elimination efficiently.
Most Source Control Management Systems implement the revisions as a DAG.
Several Programming languages describe systems of values that are related to each other by a directed acyclic graph. When one value changes, its successors are recalculated; each value is evaluated as a function of its predecessors in the DAG.
DAG are handy in detecting deadlocks as they illustrate the dependencies amongst a set of processes and resources.
In many randomized algorithms in computational geometry, the algorithm maintains a history DAG representing features of some geometric construction that have been replaced by later finer-scale features; point location queries may be answered, as for the above two data structures, by following paths in this DAG.
Once we have the DAG in memory, we can write algorithms to calculate the maximum execution time of the entire set.
While programming spreadsheet systems, the dependency graph that connects one cell to another if the first cell stores a formula that uses the value in the second cell must be a directed acyclic graph. Cycles of dependencies are disallowed because they cause the cells involved in the cycle to not have a well-defined value. Additionally, requiring the dependencies to be acyclic allows a topological order to be used to schedule the recalculations of cell values when the spreadsheet is changed.
Using DAG we can write algorithms to evaluate the computations in the correct order.

EDIT :

Ordering of formula cell evaluation when recomputing formula values in spreadsheets can be done using DAGs
Git uses DAGs for content storage, reference pointers for heads, object model representation, and remote protocol.
DAGs is used at Trace scheduling: the first practical approach for global scheduling, trace scheduling tries to optimize the control flow path that is executed most often.
Ecto is a processing framework and it uses DAG to model processing graphs so that the graphs do ordered synchronous execution. Plasm in Ecto is the DAG and Scheduler operates on it.
DAGs is used at software pipelining, which is a technique used to optimize loops, in a manner that parallels hardware pipelining.

Good Resources :

edited Feb 10 '13 at 15:15

answered Oct 28 '12 at 17:25

Md Mahbubur Rahman

4,747
5
32
38

2

No loops? I would think that as long as a loop terminates, it should qualify. Instead of being A -> B -> C, it might go A -> B -> A1 -> B1 -> A2 -> B2 -> C. Cyclic in one sense, but not in another. More like a spiral than a circle. – GlenPeterson Oct 29 '12 at 01:58
@GlenPeterson, Yes you are right. I have edited my answer. Thanks for comment. :) – Md Mahbubur Rahman Oct 29 '12 at 02:15
Still don't think "Straight Line" is necessary. The 'G' in DAG stands for Graph. Check out my answer below. Sorry I didn't read yours carefully enough before answering, but I did +1 your answer for your completeness and over-all level of enlightenment. – GlenPeterson Oct 29 '12 at 02:24
@GlenPeterson, Sorry for mistake. I have updated my answer. I also like your answer. So made +1 to your answer. – Md Mahbubur Rahman Oct 29 '12 at 02:37
4

Thanks for your +1. I still think all code is DAG, not limited to arithmetic expressions. I/O, exceptions, multi-process interactions, and hardware interrupts are all just other start or end nodes in a Directed (because they are start or end), Acyclic (no infinite loops) Graph (finite set of ordered pairs of nodes). An interesting follow-up to Ricky's question might be, "Is there any correct and working code which is not a DAG." I think the answer is "No," but would be delighted to have someone prove me wrong. – GlenPeterson Oct 29 '12 at 13:05
@GlenPeterson, +1 to your comment. Really informative and resourceful. It is true that, I have learned several topics from your comments. Thanks. :) – Md Mahbubur Rahman Oct 29 '12 at 17:34
Your compliment made my day. I have learned from your comments as well - thank you! – GlenPeterson Oct 30 '12 at 20:11
A loop cannot exist in a DAG, [as per this definition](http://en.wikipedia.org/wiki/Directed_acyclic_graph): "a directed graph with no directed cycles". So I'd say a program in general is not a DAG (an example of a structure which does correspond to a DAG is a binary tree, for example). _Specific_ programs _may_ be DAGs, though. – Andres F. Apr 16 '13 at 17:00
@GlenPeterson You're *almost* right. However, some functions must loop/recurse. Functions that terminate (aka will give a result) cannot infinitely loop. "Total" functions are functions that can calculate/prove they'll run (less than) some finite number of times. While it's theoretically possible to terminate yet be unable to efficiently prove it, even NP-Complete functions can guess-and-check every possible input and eventually finish! In practice nearly everything will terminate with this loop-can-only-run-x-times restriction, and it's usually easy to prove. – Ryan Taylor Jun 13 '15 at 18:00

score 13 · Answer 2 · answered Oct 28 '12 at 17:03

The answer is that it doesn't have much of anything to do with programming. It has to do with problem solving.

Just like linked-lists are data structures used for certain classes of problems, graphs are useful for representing certain relationships. Linked lists, trees, graphs, and other abstract structures only have a connection to programming in that you can implement them in code. They exist at a higher level of abstraction. It's not about programming, it's about applying data structures in the solution of problems.

If still you want some relationship with programming then please consider following points:

DAG (known as Wait-For-Graphs - more technical details) are handy in detecting deadlocks as they illustrate the dependencies amongst a set of processes and resources (both are nodes in the DAG). Deadlock would happen when a cycle is detected.
Once you have the DAG in memory, you can write algorithms to:
- make sure the computations are evaluated in the correct order (topological sort)
- if computations can be done in parallel but each computation has a maximum execution time, you can calculate the maximum execution time of the entire set

To again show how that's beyond the scope of programming alone, think about how you whiteboard the tables in a relational database to mentally parse the length of the path from 1 table to another, this is the equivalent of mentally using a DAG to determine performance of your data model — Jimmy Hoffa, Oct 28 '12 at 20:18

GlenPeterson · Answer 3 · 2013-04-16T19:14:00.677

Other people have applied DAG to data, but I think it is at least as applicable (if not more so) to code. Mahbubur R Aaman mentions this, so really this is more of an addendum to his answer than a complete answer on its own.

It occurs to me than any imperative computer program that is free of infinite loops (thanks @AndresF.) is a Directed Acyclic Graph (DAG). Meaning that the possible paths of execution of the code are directed (first this, then that), and acyclic (not forming infinite loops). They are a graph because the path through any significant code is rarely as simple as a list or a tree.

I worked in XSLT for maybe 4 years. I had a terrible time trying to explain why it was not a good general purpose programming language, but DAG is the reason. Specifically, XSLT is a data driven language. You define functions (yes, in the functional-programming sense) but you don't necessarily call these functions from your code. Rather, XSLT sets up a combination of selection of, and iteration through, the nodes of an input XML document. This lets the structure of the input data determine which functions are called and in what order.

This was very interesting and very cool until your program encountered a data condition that you didn't test for at 2:30 AM and you had to wake up and fix it. When you let the data define the DAG, then the definition of the DAG becomes all the possible input conditions - which for any non-trivial business application are beyond incalculable; they are unimaginable.

At first I thought that functional programming may not be a DAG because execution order is sometimes not clear to, or even thought about by the programmer. But a functional program does define dependencies. In fact, the declarative nature of functional programming could be thought of as defining only dependencies (a^2 = b^2 + c^2) without specifying execution order (it does not matter whether 'b' or 'c' is squared first, so long as they are both squared before being added together).

But while Functional programming may be deliberately vague about order of operations at a detailed level, it is exquisitely clear about dependencies. These are the very features that make it so amenable to concurrency. In any case, there is still a graph of paths through the code, and that graph is still directed (dependencies must be evaluated before dependent tasks), so I think DAG applies there as well.

Nice question - thanks for posting!

Is this imperative program a DAG in your opinion: `while (true) { print("hi"); }` ? Maybe you want to exclude non-terminating programs? — Andres F., Apr 16 '13 at 16:55

score 6 · Answer 4 · answered Oct 28 '12 at 19:38

Currently DAG is underrated in programming. Historically many things related to development were made with trees and hierarchies because moving something in a box is convenient for our brain to make complex things easier to manage. But if you look at events and how they depend on other events and states then you will get DAG because anything in our life and in the program can depend on anything in the past but not in the future, so you would get perfectly "acyclic" relations to be applicable to DAG concept. Although this rarely used explicitly in development, having this in mind would help to understand things better

theD · Answer 5 · 2012-10-29T04:15:42.040

I am wondering what's the advantage of Plasm in Ecto...

DAG can be used to model a collection of tasks in a sequence with the constraint that certain tasks must be done before the others. Ecto is a processing framework and it uses DAG to model processing graphs so that the graphs do ordered synchronous execution. Plasm in Ecto is the DAG and Scheduler operates on it.

in what other situations can we exploit the concept of DAG?

DAWG is a data structure that represents a set of strings and allows for a query operation that tests whether a given string belongs to the set in time proportional to its length.
Git uses DAGs for content storage, reference pointers for heads, object model representation, and remote protocol.

Although it's been a long time...but i think this answer really helps me understanding the spirit of ecto. Have to point it out. Thanks! — Po-Jen Lai, Jul 30 '15 at 16:39

score 0 · Answer 6 · answered Oct 29 '12 at 02:09

As a real world example, our software is similar to an IDE where the end user can define a series of operations to be performed on an image (machine vision inspection). These inspections can have dependencies on other inspections or may have inspections dependent upon them. Since this is all configurable by the end user, we cannot make optimizations for parallel processing at design time. By representing these inspections and dependencies as a DAG, we can optimize the parallelism of the overall inspection for maximum performance at run time.

score -1 · Answer 7 · answered Oct 29 '12 at 06:07

-1

Just for another example, the memory management rules in Cocoa apps are made so that all strong references form a directed acyclic graph, which is done to guarantee the absence of leaks.

answered Oct 29 '12 at 06:07

millenomi

107
3

dlmeetei · Answer 8 · 2016-02-03T06:12:29.077

-2

Adding another answer as have not seen a reference to build systems like make which uses DAG to find out dependencies for building.

More details here

edited Feb 03 '16 at 06:12

answered Jan 31 '16 at 18:03

dlmeetei

99
1

Did I say anything wrong, Why It was downvoted – dlmeetei Feb 02 '16 at 14:23
You bounced a rather old question with a rather poor answer. If you are tempted to write an answer that is "adding this because no one else mentioned it... " and have only a single sentence, it isn't that good of an answer. Please try to *fully* answer the question and explain how the application uses a DAG, and how this design works, and why that was chosen over other options. Ideally, several paragraphs worth of content. – Feb 02 '16 at 22:45
Ok, Let me elaborate it later – dlmeetei Feb 02 '16 at 22:46
Ok, instead of repeating, Just updated with a link which details how it is being used in tools like `make` – dlmeetei Feb 03 '16 at 06:13
Links have a nasty habit of going stale or failing. If that happens, you're right back where you started - a short one-line answer that doesn't help much. Can you summarize the contents of the link so this answer can stand on its own? (Keep the link, just make sure the answer is a good one even without the link). – Dan Pichelman Feb 04 '16 at 14:00

When to use DAG (Directed Acyclic Graph) in programming?

8 Answers8