Since your edit is specifically about benchmarking and collecting metrics to assess the codebase you are working with, I can include some of the ones (in addition to my answer below) that I have found very useful.
Web Performance
- Requests per page--the more requests the browser has to make to render one page, the slower the app feels
- Mean response time
- 90th percentile response time
- Max response time
I've inherited a couple legacy single page apps, and they made a new request per control on the page. Most frameworks do have a way to initialize state once and only send requests when the user actually does something. You also might get some wins by using something to bundle JavaScript and CSS together in one virtual file even if the source is broken up into several smaller files.
Code Quality
- Cyclomatic Complexity
- Static Analysis: number of problems according to severity (tweak the rules so they match your standards)
- Code duplications: JetBrains has some tools to help detect duplication, and sometimes can generate a method to handle the duplication for you. I'm sure other vendors have something equivalent
Code quality is something that is hard to measure objectively. Static analysis tools like SonarCube, FxCop, Resharper, etc. can help quantify troublesome code constructs--but they do need to be tuned to what you and your organization think is important. Each of them can have conflicting rules so you need to break the tie.
Cyclomatic Complexity may be high for very good reasons. However, instances of that should be affecting a relatively small amount of your code. Honestly, you just need to treat the results of this one like a thermometer. High numbers can indicate something needs attention, but once you've done further triage you may decide to leave that part of code alone.
Unit Testing
- Number of tests per public function
- Time to run tests per project
- Number of failures
- Number of intermittent failures
- Number of systems you need running to execute the tests (speaks toward fragility)
Tests (when present) are very fragile when great pains have been made to peer into the internal state of an object. You'll get much better results by having several unit tests per public method since they test the interface as defined by the class. That arrangement allows greater freedom to rework the insides while making sure the important behavior remains constant.
Also, I've seen very long running tests that require servers running in order for them to work. I've seen tests that assume that a file exists on a fileshare and the network never goes down. We can all agree that's a bad idea.
Examine Logs
- Number of errors per day
- Frequency of common messages
It's hard to put something quantifiable on this one, however you may catch loops running more often than they should, or more data looked at in a processing queue than seems reasonable. You are looking for anomalies that could indicate a problem with the algorithms used.
It could be there is very little logging at all, and that is it's own problem.
Eventually, you'll want a way to do better and more robust log analysis across your systems. That's when you can look into LogStash and other similar solutions.
(original answer)
I can provide some general advice that can demonstrate specific things where the application needs to be changed. However, the stuff that really matters is going to be how it affects the users, and that requires information that I don't and shouldn't have.
Start with things that can be checked easily:
- Static analysis tools like SonarCube or FxCop can highlight fragile code constructs.
- Running the application with debug tools in your browser can show how many requests it takes to populate a screen.
- Unit tests (or lack thereof) can demonstrate the confidence you can have if you change something you won't break something else accidentally.
- Logging is a good way of leaving footprints behind so you can trace how things work. If you don't have unit tests, this is usually the only thing you can do to find out what the real requirements are.
It's unfortunate, but copy and paste code is common in any application that's been in use for more than a few months. Typically what happens is that there is an emergency fix, or a "really easy" way to solve a problem. Then there's schedule pressure to get it out because users are seriously affected. The empty promise to "fix it later when we have time" never gets fulfilled and since the code sort of works people forget about the problems--until there's a need to change fundamental parts of the application.
The statistics allow you to quantify in some way what you are dealing with. From those statistics you need to generate a plan of attack:
- What should your target statistics be?
- What are the biggest measured problems?
- What are the biggest cognitive problems (i.e. results of manual analysis of the code like you've done)?
- What can you live with?
- What problem are you trying to solve?
All of these questions will help you arrive at what is most important:
- What am I doing?
- When am I done?
- How am I going to get there?
You aren't going to be able to reverse years of technical debt in a month, so you have to pick the biggest impediment your application has to doing it's job--or what will soon be it's job. Keep running those data points as you fix things so you can show how you are progressing. Don't pursue perfection, just pursue better until you have attained "good enough".