3

Inevitably I'll stop using an antiquated css, script, or image file. Especially when a separate designer is tinkering with things and testing out a few versions of images. Before I build one myself, are there any tools out there that will drill through a website and list unlinked files? Specifically, I'm interested in ASP.NET MVC sites, so detecting calls to (and among many other things) @Url.Content(...) is important.

xanadont
  • 155
  • 4
  • 2
    There seems to be a similar question on SO (http://stackoverflow.com/q/5665979/866172) that doesn't have any answer for more than a year now, suggesting there is no such tool yet. The only attempt at an answer explains why such a tool does not exist yet. – Jalayn Oct 19 '12 at 06:17

1 Answers1

0

Aside strictly static website, the task would be rather random:

  1. You can't scan the source code in order to find the links, since links can be generated. Imagine the following case:

    On a page, when a user effectuates an action, an image is added to the DOM (so you actually don't have any <img/> element in HTML originally). The link to a image is assigned by JavaScript. In order to find a part of this link, JavaScript does an AJAX request; the other part is hardcoded in JavaScript code. The final URI is http://example.com/photos/nature/polar-bear.jpg?width=800

    The server receives the request for the image and rewrites the URL to http://example.com/generate-photo.aspx?category=nature&name=polar-bear.jpg&width=800. It appears that the new URI points to a dynamic resource which generates the image by taking an existent one (/photos/catalog/133d6566-3c98-4690-be4a-caad41c0e21d.jpg) and adding a copyright.

    Could you possibly track this situation automatically?

  2. You can't rely on logs, since the fact that the resource was not requested for a while doesn't mean that it will never be requested.

The only viable alternative is to:

  • List every resource on the website,

  • Collect the statistics from the logs in order to filter the resources which were used for the past N months. Don't forget about a huge amount of small issues which can arise: remember that there is URL rewriting, that you need to canonize the requests, that there are default pages (http://example.com/index.html will mostly be called http://example.com/), etc.

  • Based on those statistics, forget about the resources which are in use: you don't need to remove them.

  • For the remaining resources, try to guess for each one the context in which it could be used, and check if it is. This last step is extremely complex for a program and requires human brain (or years and years of R&D).


As a side note, do you know that instead of Url.Content, ASP.NET MVC 4 allows to use ~ directly, like this:

<a href="~/Products/Edit/458">Edit</a>
Arseni Mourzenko
  • 134,780
  • 31
  • 343
  • 513
  • These are valid points. But I'm thinking of a flexible solution where it allows one to extend the engine with custom regexs or plugin providers. So if you have a non-standard way to point to a resource then you can handle your specific case. In any case, I'll probably just create a tool for my specific needs. – xanadont Oct 22 '12 at 14:07