- Yes - Web Applications are different to Web Sites
I would treat them separately. If you have one part of your site that is simply a collection of documents (that look the same to anonymous users and logged-in users alike) - then the best method of structuring it is very different from a web app that serves dynamically different pages to each user. Split those two parts of the site up into two apps / components and tackle each part differently.
- Start Using Version Control
Once your code is under version control, you can go through and, confidently, remove all unnecessary code that you had previously kept 'just in case' etc. I don't know how I survived without version control.
If four different urls all point to the same resource, then the problem is much bigger.
You end up dealing with an infinite amount of urls. As soon as you can, ensure that you have an URL Normalisation policy in place. Once that is done, you can start attaching semantic meanings to URLs and be able to do reverse lookups from resource-to-url. This allows you to separate the 'web imprint' from the 'resources' of the site.
You have to ask yourself, "given an url what is its normalised form?". Once you have got this pinned down. Then then 50,0000+ urls on your site can be reduced to say, 2,000. which is a lot easier to comprehend and manage in your mind.
see: http://www.sugarrae.com/be-a-normalizer-a-c14n-exterminator/
- Start by modelling 'what is', not 'what you want it be'
If you are tidying up a legacy site, that was not designed with best practice in mind from the start, then it is tempting to jump from 'a mess' to 'the ideal design'. I believe that you need to do it in at least two steps: 'mess' -> 'well modelled legacy code' -> 'ideal new code with added features'. Stop adding features. Concentrate on fixing the mess or encapsulating it behind an anti-corruption layer. Only then, can you start to change the design into something better.
See: http://www.joelonsoftware.com/articles/fog0000000069.html
See: http://www.laputan.org/mud/
- Putting it under test is a good idea.
Create a test suite / framework and start adding tests. But, its quite tricky to test some legacy code. So, don't get too hung up over it. As long as you have the framework there,
you can add tests little by little.
See: http://www.simpletest.org/en/web_tester_documentation.html
- Have courage in your convictions
Most of the literature on best practices of software development is desktop-centric/Enterprise App Centric. While your site is in a mess you read these books and you can be in awe of the wisdom that exudes from them. But, do not forget that most of this best practice has been accrued in times before the web/SEO became important. You know a lot about the modern web, more than is mentioned in classic books like POEA, Gof etc. There is a lot to take from them, but do not completely discard your own experience and knowledge.
I could go on. But those are some things that I have picked when refactoring an old legacy site into a shiny new one.