4

I'm looking for a way to quantify where my team should spend it's time addressing technical debt in our codebase. One idea for this is to measure file churn (edits over time). I got the idea from this video where Michael Feathers talks about escaping the technical debt cycle:

https://youtu.be/7hL6g1aTGvo?t=16m53s

What I'd like to measure is the total number of times each file was edited in the codebase. I'd also like total lines changed for each file.

I tried git log --state but that's not the out put I'd like to see. I don't care about each individual commit, or the authors, I just want total raw numbers accumulated overall all edits for every file that still currently exists in master.

Ross Patterson
  • 10,277
  • 34
  • 43
Christopher Perry
  • 229
  • 1
  • 3
  • 7
  • Possible duplicate of [How does one find out if there is a technical debt on the project? How to measure its volume?](https://softwareengineering.stackexchange.com/questions/368532/how-does-one-find-out-if-there-is-a-technical-debt-on-the-project-how-to-measur) – gnat May 27 '18 at 20:33
  • What version of git do you use and where does `--state` comes from? It does not exist in https://mirrors.edge.kernel.org/pub/software/scm/git/docs/git-log.html – Patrick Mevzek Jun 02 '18 at 22:25

3 Answers3

3

The following will show, for each file in current HEAD, the number of times it has been changed (number of commits that touched it), the number of lines that were added and deleted, where a change in a linge is both an addition and a delete:

for f in $(git ls-tree -r master --name-only) ; do \
git log --numstat --oneline "$f" | grep -E "$f\$" | \
perl -ne 'BEGIN {$a=0; $d=0; $l=0; $file="";} if (m/^(\d+)\s+(\d+)\s+(\S+)/) {$l++; $a+=$1; $d+=$2; $file=$3;} END {print "$file changes=$l added=$a deleted=$d\n"; }'; done

It is not optimized but it does the job. You may have problems for binary files and files whose names have spaces and other shell interpreted characters. This could be fixed.

For example in one of my project it returns:

.gitignore changes=3 added=4 deleted=0
Changes changes=48 added=1261 deleted=64
EPP_namespaces.txt changes=3 added=149 deleted=1
INSTALL changes=1 added=94 deleted=0
LICENSE changes=1 added=340 deleted=0
MANIFEST.SKIP changes=6 added=54 deleted=17
Makefile.PL changes=27 added=115 deleted=33

etc.

Patrick Mevzek
  • 189
  • 1
  • 1
  • 12
  • Unverified, but if you want to limit date range you should be able to add parameters to the `ls-tree` command, a la https://stackoverflow.com/a/21743961 – TrevorWiley Apr 24 '21 at 03:05
2

You can get the number of times a file was committed by using:

git log --format=oneline [path_to_file]

It gives results like this:

078d420881d6000e3d545dd22d78f0d6c7f75805 (HEAD -> master) Allow user to adjust width of the effect. 
8b63fa83ae3808d8f745b91c23f64d8628ae73b9 First working version. 
6e4c20fe911bcddedc82e5b8b732744b84447b08 Initial Commit

You can pipe that to something like wc -l to get the number of lines:

git log --format=oneline foo.cpp | wc -l

result:

   3

You could write a shell script to walk your source directory and run this command on each file, for example.

You can get the number of changes per commit by doing:

git log --format=oneline --numstat foo.cpp

Note that with merges, the output is a little more complicated.

user1118321
  • 4,969
  • 1
  • 17
  • 25
1

With a little help from stackoverflow.com/a/28109890 I was able to modify Patrick Mevzek's answer to work with spaces:

(
    echo "files,changes,added,deleted" && while IFS= read -r -d '' file; do
        git log --numstat --oneline "$file" | grep -E "$file\$" | perl -ne '
            BEGIN {$a=0; $d=0; $c=0;}
            if (m/^(\d+)\s+(\d+)\s+/) {$c++; $a+=$1; $d+=$2;}
            END {print "'"$file"',$c,$a,$d\n"; }
        ';
    done < <(git ls-tree -r master --name-only -z)
) > file_change_analysis.csv
TrevorWiley
  • 111
  • 2