4

I have a set of steps that need to run sequentially over a list of data. However each step only operates on a single piece of data at a time. Because of this, I can run the steps asynchronously.

For example I have two pieces of data, and two steps. When step one is done processing the first piece of data, it passes it to step two. Now step one processes the second piece of data while step two is processing the first piece of data. So forth and so on until all of the data has been processed by each step.

In contrast, looping through the list of items and executing every sequentially in order turns into the equation:

total_time = (step1 + step2 + ... + stepN) * number_of_items

Given that I know the execution time of each step for a single piece of data, and I know the number of items to be processed, is there a formula that will tell me the total execution time for processing all the data items asynchronously?

I have written a program (see below) that simulates this process. I'm not entirely sure that it is computing it correctly, but I believe it is close. This was my attempt at figuring out how to solve the problem. I'm including it here to show my understanding of the problem, which may be wrong.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;

namespace ConsoleApplication1 {
    class Step : Queue<object> {
        public Step(int totalExecutionTime)
        {
            _totalExecutionTime = totalExecutionTime;
            CurrentExecutionTime = _totalExecutionTime;
        }
        readonly int _totalExecutionTime;
        public Queue<object> NextStep { private get; set; }
        public int CurrentExecutionTime { get; private set; }
        public void RunTick()
        {
            if (Count == 0)
                return;
            if (CurrentExecutionTime == 0)
                CurrentExecutionTime = _totalExecutionTime;
            CurrentExecutionTime--;
            if (CurrentExecutionTime == 0 && NextStep != null)
                NextStep.Enqueue(Dequeue());
        }
    }
    class Program {
        static void Main(string[] args)
        {
            int totalItems = 10;
            int[] stepLengths = new int[] { 1,2,5,2,2 };
            if (File.Exists("output.txt"))
                File.Delete("output.txt");
            WriteLine("ASYNC: {0}", TotalTime(totalItems, stepLengths));
            WriteLine("SEQUENTIAL: {0}", stepLengths.Sum() * totalItems);
            Console.WriteLine("DONE");
            Console.ReadLine();
        }

        static int TotalTime2(int totalItems, int[] stepLengths)
        {
            throw new NotImplementedException();
        }

        private static int TotalTime(int totalItems, int[] stepLengths)
        {
            var steps = new List<Step>();
            Step prevStep = null;
            foreach(var stepLength in stepLengths) {
                var step = new Step(stepLength);
                steps.Add(step);
                if (prevStep != null)
                    prevStep.NextStep = step;
                prevStep = step;
            }
            var first = steps.First();
            for(int i = 0; i < totalItems; i++) {
                first.Enqueue(new object());
            }
            var last = steps.Last();
            var revSteps = steps.Reverse<Step>().ToList();
            var totalTime = 0;
            do
            {
                Display(steps, totalTime);
                revSteps.ForEach(s => s.RunTick());
                totalTime++;
            } while (last.Count < totalItems);
            if (last.CurrentExecutionTime > 0)
            {
                do
                {
                    Display(steps, totalTime);
                    revSteps.ForEach(s => s.RunTick());
                    totalTime++;
                } while (last.CurrentExecutionTime > 0);
            }
            Display(steps, totalTime);
            return totalTime;
        }

        static void Display(List<Step> steps, int totalTime)
        {
            Write("[");
            steps.ForEach(s => Write("{0},", s.CurrentExecutionTime.ToString().PadLeft(2)));
            WriteLine("]");
            Write("[");
            steps.ForEach(s => Write("{0},", s.Count.ToString().PadLeft(2)));
            WriteLine("] {0}", totalTime.ToString().PadLeft(5));
            WriteLine();
        }
        static void Write(string fmt, params object[] args)
        {
            Console.Write(fmt, args);
            using (var w = new StreamWriter("output.txt", true))
            {
                w.Write(fmt, args);
            }
        }

        static void WriteLine(string fmt, params object[] args)
        {
            Console.WriteLine(fmt, args);
            using (var w = new StreamWriter("output.txt", true))
            {
                w.WriteLine(fmt, args);
            }
        }

        static void WriteLine()
        {
            Console.WriteLine();
            using (var w = new StreamWriter("output.txt", true))
            {
                w.WriteLine();
            }
        }
    }
}
gnat
  • 21,442
  • 29
  • 112
  • 288
Charles Lambert
  • 2,019
  • 17
  • 18
  • 1
    You cannot solve this problem without knowing about your scheduler. How many concurrent threads can you run? How are they scheduled? Is there anything else running as well, that could affect your timing? Are you counting only the CPU time (the actual running time of the threads), or idle as well (the total time you need to run the program to get the final result)? – littleadv Dec 01 '11 at 22:19
  • @littleadv - I understand those are valid things to take into consideration, but I left that stuff out because it is not important to the problem i'm trying to solve. You could say I'm assuming infinite threads, no time for thread switching, etc. – Charles Lambert Dec 01 '11 at 22:34

1 Answers1

2

Assuming that the processes are completely independent (no resource contention), and each process has an infinite input queue, processing all the items will take about as long as it takes the slowest process to handle all the data, plus the time for the first item to reach the slowest process, plus the time to complete the last item after it leaves the slowest process.

kevin cline
  • 33,608
  • 3
  • 71
  • 142
  • I thought the same thing originally, but my numbers do not match what is happening. For example, the steps with execution times of {1,2,5,2,2,4,3} (measurement in ticks), my program says it takes 61 ticks to process. Your formula says it takes 64 ticks. – Charles Lambert Dec 01 '11 at 22:47
  • 3
    @CharlesLambert - I think you are looking for an impossible level of precision. – John Buchanan Dec 01 '11 at 23:22
  • @CharlesLambert Your simulation program is wrong. In each tick, it advances the progress of all steps, even those that have no item to work on. – JGWeissman Dec 02 '11 at 00:38
  • @JGWeissman - if the number of items to be processed for a step are zero, the method just returns `if(Count == 0) return` – Charles Lambert Dec 02 '11 at 04:19
  • @JohnBuchanan - if I had listened to all the people who have told me that things were impossible, I would not be where I am today. If this has, in fact, been proven impossible to figure out, then add it as an answer and explain or point to the relevant proof showing such. – Charles Lambert Dec 02 '11 at 04:23
  • 1
    @Charles -- of course it is possible to compute this number. It cannot be done by formula, but it is certainly computable in O(i*p). – kevin cline Dec 02 '11 at 17:10
  • @CharlesLambert You still have a problem. Since the last step does not dequeue its items, the condition (Count==0) is not always true when there are no items to process. After the last step processes the first item, it immediately starts processing the the second item which it has not yet received. – JGWeissman Dec 02 '11 at 18:43
  • @JGWeissman - good catch, i'll update it later today – Charles Lambert Dec 02 '11 at 19:33
  • @kevincline - that was what I was wanting to know, if it could be done by formula – Charles Lambert Dec 02 '11 at 19:35