I am looking for a way to compare a set of files against a given file with each comparison giving me a "closeness" metric. I should then be able to order based upon the metric to find the closest file. I considered using diff, but afaik this only provides a yes or no on whether a particular line matches, which is too large a scale for my purposes as for my purposes a word change in a line of text is closer than a completely different line whereas diff returns no-match for both cases.
Would I be able to use a soundex effectively on a 100 or more line file or is there a better algorithm? Also is there a metric which would provide a positive match if the lines which were similar were on drastically different line numbers?
Thanks