5

This is my first question here so I hope it is not off topic. Although I am using the Linux inotify library to listen for changes to files, and I compare use of that against the Subversion program, I am specifically looking for the algorithm used.

To a human it is very easy to tell if a file has been created or modified. Clicking the New button constitutes the former, and clicking the Save button constitutes the latter. To Linux, both those actions have serious overlap. In text editors, for example, generally a swap file is created and then copied/moved. This makes it difficult to distinguish via inotify between a minor edit to a file and a deliberate overwrite of a file. What I am trying to understand is how a program such as Subversion recognizes the difference between a user having modified a file with a text editor, and a user having actually deleted the file and opened a new file with the same name.

Edit: It has been pointed out that subversion does not do what I want it to do, so it was a blunder on my part to use it as an example. Instead allow me to rephrase the question: "Is there any known program or programming approach to match high level actions such as creating new files and saving them to low level actions such as modifying, moving, copying, etc. such that I can log all the files in the system and changes to them"?

puk
  • 151
  • 5
  • 3
    What makes you think Subversion distinguishes between editing a file, or deleting it and replacing it? AFAIK it doesn't, and no one would want it to do that. – kevin cline May 11 '12 at 01:31
  • @kevincline I remember before, if I deleted a file and then copied one over with the same name, svn would complain about duplicates (can't recall the exact error) – puk May 11 '12 at 01:44
  • 1
    there is a created-on timestamp for each file on most filesystems I'd gather subversion uses that to compare... – ratchet freak May 11 '12 at 01:56
  • @ratchetfreak I don't think that exists on linux, and even if it does, it doesn't help with something like vim which deletes the original file anyways http://stackoverflow.com/questions/10542366/watching-a-file-for-changes-in-linux/10542513#comment13642830_10542513 – puk May 11 '12 at 02:06

3 Answers3

3

If you want to learn how subversion does to understand the working directory, you can look at the source pretty easily. As a longtime SVN user, I can pretty confidently say that SVN does not make any distinction at all to what happens to a file before the commit -- it just checks against what you are committing against the repository. Nothing more, nothing less.

Wyatt Barnett
  • 20,685
  • 50
  • 69
  • Now I feel silly for phrasing the question in such a way as to make subversion seem more important than the underlying problem. I will edit it right now.\ – puk May 11 '12 at 02:36
2

I'm not sure what you mean by an algorithm, but certainly you can do this. How hard it will be depends on what operating system and file system you are using, and the level of detail you need.

If you think about it, this is just what the file system is doing anyway. A file is an abstraction. At base it's just a collection of spots on a disk. All those operations like creating a file, deleting a file, or modifying a file are just the file system managing lists of spots. What you are asking for is for the file system to share information with you as it performs its management chores.

At one level there are library calls like stat and inotify, but these are just wrapping lower level OS or device driver calls. You can tap in to those too. In the 'olden' days you would have had to write interrupt hooks to monitor the OS calls. Now a sophisticated API for hooking into the file system may be provided for you. See for example this article on the NTFS transactional file system API. I also just stumbled across this paper "VFS Interceptor: Dynamically Tracing File System Operations in real environments" which discusses the design of a tool for tracing file system operations.

Charles E. Grant
  • 16,612
  • 1
  • 46
  • 73
  • thank you very much, I will have a look at them when I get home. However, I have a sneaking suspicion that a file system's concept of a "file" is different from a user's. When `vim` deletes my original file and moves the swap file in place of it, to the OS, a "new" file has been created, but to me, an existing file has been modified. – puk May 11 '12 at 04:45
1

You could read the file in yourself before setting up your inotify, then whenever you get an event, diff the file with your stored copy and see how much has changed. In a simple bash example with inotify-tools installed it would be something like this (with the file you're watching as first parameter):

#!/bin/bash
TEMPFILE=/tmp/${0}_${RANDOM}_$$.tmp
cp $1 $TEMPFILE
while true; do
   inotifywait $1
   changesize=`diff $1 $TEMPFILE | wc -l`
   if test $changesize -gt 10 ; then
       echo "Big change"
   else
        echo "Minor edit"
fi

NOTE: This is a sample, treat it as pseudocode, I haven't test it for syntax errors and such so adapt before using.

silentcoder
  • 131
  • 2
  • I think this still fails for something like `cat test.txt > test.txt` this is a `CREATE` but the file looks like the old one. – puk Apr 15 '13 at 16:21