5

It took me time T to develop program A, which is measured at 1000 lines of code (SLOC), in certain a language and domain and of a certain complexity. Is there a method to determine how much time it will take to develop program B, which is estimated to be 4000 lines, that has the same level of complexity, is in the same domain, and is developed in the same programming language?

I expect that the time it takes me will be grater than 4T. Is there a formula to estimate how T grows as SLOC count grows?

Thomas Owens
  • 79,623
  • 18
  • 192
  • 283
Andrei
  • 153
  • 2
  • 3
  • 15
    It will take you 50% longer than you expect, even if you take this 50% into account. – JUST MY correct OPINION Feb 05 '11 at 08:05
  • 1
    As for your second question, if you can solve that (make code bug-free before it hits QA), bottle it and make yourself richer than a dozen Bill Gates. – JUST MY correct OPINION Feb 05 '11 at 08:06
  • At our company, we use 3 dice. –  Feb 05 '11 at 08:07
  • It's sure to be O(n^2). – Paul R Feb 05 '11 at 08:08
  • 1
    O(N^2) is overestimation. It might be n^alpha where alpha is > 1 and < 2. –  Feb 05 '11 at 08:10
  • 1
    I would think a good theoretical questimate of the form is difficulty equals A*N +B*N^2. The first term covers the cost to add a single line of code, without having to be concerned with interaction. The B terms covers interactions. So for small N, I would expect linear scaling, but once the code becomes large enough the interactions (side effects) dominate. Your postulated size may be too small for the quadratic term to be important. But, then I suspect software engineering can effect the exponent, as well as the constants A and B. – Omega Centauri Feb 06 '11 at 04:24
  • Now, tricky question, how long would it take a guru to reduce 4KLOC into much faster and more stable 1KLOC equivalent ;-) – vartec Mar 15 '12 at 13:55
  • hm… if the quadratic term becomes relevant, that could imply that you have a problem with tight coupling and should start over anyway. – Emilia Bopp Jun 03 '14 at 12:15

7 Answers7

15

Applications can't be quantified in terms of LOC - it just doesn't work. Ever. So please, save yourself the hassle and don't do it.

Edit: Unless this is some sort of homework question... in which case the professor is a twit and you should go to a better school - n^2

  • 3
    While *ALL* applications can't be quantified in terms of SLOC, for any given organization, their bread-and-butter apps most certainly can. Read "Software Engineering Economics", by Barry Boehm - and consider the fact that General Dynamics Fort Worth Division used to bet the company bottom line on a COCOMO estimator when they bid F-16C/D software development tasks to the Air Force. (I was there. I occasionally wrote such bids. They'd gone to a lot of trouble to collect the data and calbibrate their estimator.) – John R. Strohm Feb 05 '11 at 23:32
  • That book is dated 1981, when popular languages included FORTRAN, ADA, Assembler and many others that had limited constructs that could (or should) be put into one line, so #LOC was likely proportional to effort and/or binary size. I've not read the book, but with tools like http://jsbeautifier.com and 'indent' around now, the total number of lines in a project is of no particular relevance unless used to compare to similar/previous projects by the same team. (http://www.amazon.com/dp/0138221227/?tag=stackoverfl08-20) – JBRWilkinson Feb 08 '11 at 10:47
6

People have developed a number of models to try to estimate things like this. While I wouldn't try to claim that any of them is anywhere close to entirely reliable or accurate, there are a few that seem to take enough factors into account to give halfway reasonable estimates.

Just for one example, Barry Boehm's COCOMO II model seems to fit your situation reasonably well. According to one online implementation, your original 1 KLOC should have taken around 4 person months of effort, and your 4 KLOC should take around 10 person months (for one set of assumptions -- feel free to plug in more appropriate values for the type of development and such).

At the same time, I'd have to agree with others who've pointed out that lines of code is rarely a very good measure (of much of anything). Estimation based on function points (for one possibility) seems rather more accurate to me. Even at best, however, it will take substantially more work, and it may be open to question whether it produces results enough more accurate or reliable to justify that work, especially for a fairly small project like this.

Edit: Oops -- I pasted in the wrong link (that was for the original COCOMO model, not COCOMO II). COCOMO II is a bit more work to use (it might take a minute or two instead of 30 seconds), but produces (what are supposed to be) more accurate results. Online implementations are available. It definitely attempts to take more factors into account in any case (e.g., whether you can re-use any/all of the existing 1000 lines of code in the new project).

Jerry Coffin
  • 44,385
  • 5
  • 89
  • 162
  • Function points also have their limitations. They are useful for data-intensive projects without complex algorithms. The danger can be that you count "output price of lottery ticket" as one function point and "output tomorrow's winning lottery number" as equally complex, at just one function point. – MarkJ Mar 15 '12 at 06:44
3

This is a bit controversial, but for project management SLOC is typically used for determining what the estimated timelines (i.e. read Software Estimation: Demystifying the Black Art (Best Practices (Microsoft))); however, what is usually underlined time and time again is that you need a large enough data set of similar problems you can start to notice trends in how fast it takes to develop things. Note that this also generally applies to very large code bases as well and you don't start to see accurate estimates until you are in the 100,000+ SLOC.

To build on MainMa's diving analogy a bit, if you are driving in a major city and all of the trips are less than 50 km you might eventually be able to say with a degree of confidence that the trip will take about 30 minutes under normal traffic conditions but the range of an individual trip might take between 15 minutes and two hours for any given instance.

This is similar to trying to estimate how long it will take to write a given function or story point since not all are the same. Resolving a story point that only involves getting some data and converting it to a report might only take a couple of hours for someone familiar with the project where as trying to improve upon some underlying queuing code your program is using might take several days. This is generally where evidenced-based scheduling is better as the developer is the one driving the estimate based upon their experience with the given task and then you adjust things based upon the historical evidence that relates to the developer which is why this technique tends to be better for task estimation.

Going back to the SLOC's as noted before, they can be used for estimating when a major project will be completed but only at the large scale and then don't scale down very well and require historical evidence of similar projects under similar conditions to generate the time-line estimate and they are really only used as guidance at the end of the day. Going back to the diving analogy. This is similar to long haul road trips (i.e. starting at 1,500 km) since the sheer amount of distance ensure that even though you might run into parts of the trip where you are crawling through traffic, you will also encounter times where you can go the speed limit for an extended period of time. This means that after you have done the trip a couple of times you can give a pretty reasonable estimate as to how fast you were averaging during the trip and how long it will take to get from point A to point B. Large projects are the same way: the sheer size of the project allow for project planners to be able to say that, "We have done a project of similar scope before in the past, it will likely be as big as those projects so the time to complete it will likely be similar to them."

rjzii
  • 11,274
  • 6
  • 46
  • 71
1

If you want your code to have less bugs, you should write a lot of automated tests, and do it before and while you write the code, and not after a component is ready. There are testing frameworks for different languages and platforms. You can read about Test Driven Development, there are a lot of online and offline resources on the subject.

Ilya Kogan
  • 234
  • 1
  • 6
  • *One really good reason to write tests before/during:* If you don't then "it'll just work" and nobody will bother to write tests because, well, "it works". But then a regression bug may (will, really) be introduced or an edge-case encountered and there will be no tests to catch it until it ships to a client. Oops. Of course it also speeds up development by really cutting down on the feedback-cycle between write-and-crash-and-burn-with-bug (just accept it: bugs and incorrect logic are an axiom of programming). The shorter this cycle, the less time is wasted :) –  Feb 05 '11 at 09:02
1

Time (T) required for development (of a program) is not only function of lines of code (SLOC). It's also function of quality (Q) (and probably n+1 more variables).

If Q is low, then T grows somewhat linearly with SLOC. (You just bang more lines of code, and it's more or less a physical activity).

When Q gets higher T starts to grow exponentially and gets ever closer to infinity. (It's very hard to write a totally bug free code of more than three SLOC).

So, I think, it's almost impossible to estimate T if only given SLOC. Maybe, if you are lucky you might hit in the range of +-1 order of magnitudes. Eg. you estimate 10 days, and it might take something between 1 and 100 days.

Maglob
  • 3,839
  • 1
  • 24
  • 27
  • Lines of communication is also a factor in that the more people that are working on the project, the more people you need to discuss changes with which slows done the project. – rjzii Feb 05 '11 at 16:41
1

4K lines of simple code may take you 1/10th the time as 1K lines of complex code. And 4K lines of complex code may take you 40 times the time as 1K lines of simple code. The measure is meaningless.

Matthew Read
  • 2,001
  • 17
  • 22
-1

You can't simply look at LOC/SLOC by itself the way you are trying to. The only way you can use LOCs with some degree of success (and as a guideline, not as an infallible rule) from previous projects to estimate future project sizes is by having a decent number of projects with their SLOC, number of resources (developers) and time of completion accounted for. Then you can use that to extrapolate.

But to take just one project, one single project, specially one that is not that big (1K is fairly small), that's just too little data to use LOC metrics in any meaningful manner.

If this is a homework, your professor is a clueless dick btw.

However, if this is for real, and if you are really that pressed, you could use the following guidelines:

EXPECTED_COMPLETION_TIME = 
  ( PREVIOUS_COMPLETION_TIME / PREVIOUS_KLOC ) * EXPECTED_KLOC * SPILLOFF

With SPILLOFF = 1 giving you a 30% chance of success (a 70% chance of failure), SPILLOFF = 1.5 giving you a 60% probability of success (a 40% chance of failure) and SPILLOFF = 2 giving you a 90% chance of success (a 10$ chance of failure.) The reason for using such estimates is that failures in completing software projects tend to exhibit an exponential distribution with respect to the allocated time per time allocate (or whatever other resource you choose to use.)


When you have consistent work within an organizations or when you work in similar environments, and technology (not just the language, but the technology) as well as processes are uniform, then you can do some estimations with some margins of errors based on prior projects. In such cases, you want to give more weight to the most recent projects.

Say, for the last n + 1 projects (say n + 1 = 5 or 10... notice, it's n + 1, not n), you could do the following but only if you carefully keep track of the number of people involved in a project, actual number of LOC, actual completion time, and estimated completion time as estimated prior to the start of the project.

SUM = 0
FOR i = 1 to n
  KLOC_PER_HEAD(i) = KLOC(i) / TEAM_SIZE(i)
  ACTUAL_COST(i) = ACTUAL_COMPLETION_TIME(i) / KLOC_PER_HEAD(i)
  RUNOFF(i) = ACTUAL_COMPLETION_TIME(i) / ORIGINALLY_ESTIMATED_TIME(i)

  SUM = ( ACTUAL_COST(i) * RUNNOF(i) ) + SUM
END 

LAST_COST = ACTUAL_COST(n+1) = ACTUAL_COMPLETION_TIME(n+1) / KLOC_PER_HEAD(n+1)
LAST_RUNNOF = ACTUAL_COMPLETION_TIME(n+1) / ORIGINALLY_ESTIMATED_TIME(n+1)
LAST = LAST_COST * LAST_RUNNOF

ESTIMATE = ( ( (SUM / (n + 1) ) + LAST ) / 2 ) * SPILLOFF

With SPILLOF as defined previously.

luis.espinal
  • 2,560
  • 1
  • 20
  • 17
  • OMG - you're actually calculating work estimates based on LOC? How about the time it took to do the work last time, rather than the LOC of last time? – JBRWilkinson Feb 06 '11 at 18:37
  • @JBRWilkinson - within certain constrains (*as pointed in my post*), this work to a certain precision. As for your question, calculating the work done last time and nothing else only works if the next project is of a similar scope and magnitude. Furthermore, work is a function of several variables, at least amount of time resources (man-hours and/or time of completion) and physical resources (typically people, and in extreme cases, CPU/hardware expenditure and electricity.) – luis.espinal Feb 07 '11 at 00:57
  • @JBRWilkinson - con't - These resources only measure effort, but you cannot implicitly deduce the scope or size of the project. Even within projects written with the same language, similar amount of resources might be needed for projects of different "size". You have to measure the size of the project a-priori (ie. expected function points, requirements and/or expected LOCs to name a few) or after (total implemented FPs, requirements, LOCs, etc.) – luis.espinal Feb 07 '11 at 01:02
  • @JBRWilkinson - con't - once you consistently and methodically (these two are the key) measure spent resources and deliverable sizes (and hopefully type), you can more or less *estimate* what it takes *within your organization* to implement a system of a certain *size* (and possibly type) - assuming you have a process that you repeat consistently. You have to refine your numbers every X number of projects to account for outliers, and it is not hard to set up tracking of this data if you work methodically. **This is really no different from projects in other engineering disciplines.** – luis.espinal Feb 07 '11 at 01:08
  • @JBRWilkinson - I've personally witnessed these type of methods work well within some organizations, and fail miserably with others - a function of the quality of work within each organization. A good software shop, be it agile, iterative or even waterfall can make this work (they can make anything works anyways). If still think just using work measures alone can help with estimations, let me know how you account for project scope and size. **And to whoever mod me down, would be nice to hear your methodology and experience.** Check the works of Phillip G. Armour when you get a chance. – luis.espinal Feb 07 '11 at 01:10
  • What happens when, at some point in a company history, they decide to use new coding guidelines that mandate that all control structures must have opening and closing curly braces on separate lines. Your #LOC calculation would have just doubled or trebled, but the actual effort remains the same. – JBRWilkinson Feb 08 '11 at 10:30
  • Are you writing in assembly code or FORTRAN? In that case, #LOC makes more sense. – JBRWilkinson Feb 08 '11 at 10:38
  • Java for the last 12 years, enterprise (J2EE, post Java 1.5 JEE, Spring-based, Axis) and systems level (CORBA, custom layer 3 network protocols) plus Perl, awk, shell scripting, VB, Groovy and C here and there where needed. Haven't done C++/Assembly (or Pascal or Ada) for over a decade. Never done Fortran. LOC-based metrics (with context/language-specific normalization) can be applied to *any language* (specially on the ALGOL family). FP-based metrics are more suitable for UI, web forms or template languages like Velocity or JSP or for code bases with a high ratio of input entries/module. – luis.espinal Feb 08 '11 at 15:44
  • @JBRWilkinson - the goal is not just to count LOC (or worse, target or even reward for a fixed LOC number.) You simply use it as a proxy (just like any other metric) to gauge, estimate and predict size. LOC can be pure LOC, statement count, etc. Replace LOC with FPs, module count or requirement count and the concept remains the same - they are all proxies for estimating a magnitude of complexity. They are *magnitude metrics*. Then you track historical trends resources devoted to systems of X or Y magnitude (as well as continuous quality metrics like bug count, LCOM, Halstead or cyclomatic.) – luis.espinal Feb 08 '11 at 15:53
  • @JBRWilkinson - con't - the advantage of LOC over other **magnitude/size** metrics is that it can be automated. That it's much more difficult with FPs (and near impossible with requirements). LOC by itself means nothing. Couple it with other metrics, and historical data, and then you can track and predict both complexities and bottlenecks and risk areas. Tracking of historical LOC and cost is done automatically (can't be done by hand), and with cooperation with PMPs and SCMs, and it requires some cost of setup up front. **Really**, it is not that different from other branches of engineering. – luis.espinal Feb 08 '11 at 15:56