1

I was trying to find a "perfect" syncing program between a network share and a local folder, when I realised that it's probably impossible to do it right unless all the filesystem operations were captured by a log of some sort, that could then be replayed on the remote server. Of course the same thing is required at the other end for retrieving changes from the remote server, but one could also use the existing system for that purpose, and if worst comes to worst, manual merging.

The advantage of this approach seems to be that handling deletes, renames, moves and copies, etc., become trivial, because the filesystem records these operations. Each commit would then consist of merely choosing the next meaningful changed version. The version control log would be a low-resolution version of the filesystem log, but will be more portable and persistent because it can be transported.

Is this viable, or have I understood the concepts and limitations wrong?

Milind R
  • 155
  • 8

2 Answers2

3

File system operations are actually often at a coarser granularity than the changes that VCS tracks. A file may be over-written in a single operation, when only a line has really been changed.

Other times, the file system operations are too fine-grained and miss the big picture. Someone may "copy" a file and then rewrite everything except the copyright notice. Or they may delete a file, copy it from somewhere else under a new name but with the same contents.

VCS does the right thing in computing diffs directly on what's in the files rather than analyzing file system operations. The latter would be far more fragile and headache-inducing.

  • True. Two ways of mitigating that : 1) use a file system driver to capture how much ever is possible. 2) use both approaches, and form a master list with the more fine grained of the two. – Milind R Nov 03 '15 at 11:55
  • 1
    How does having a long and fine-grained list of changes help you solve the merging problem? It only makes it worse. – Aleksandr Dubinsky Nov 03 '15 at 11:58
  • My intuitive idea was that seeing a rename operation will help understand why one file vanished and another appeared. Ultimately of course we'll keep a coarse grained commit log, but we can use more information to select the best boundary. – Milind R Nov 03 '15 at 12:01
  • 1
    @MilindR As an additional indicator, it may be helpful, but like I hinted at, there's multiple ways to "rename" a file, and simply looking at the fs operations will only reveal the most obvious way. But you didn't answer my question. Beside tracking renames/moves, what else will you accomplish? Those are easy to detect anyway (edit distance). The tough part is handling merge conflicts. What might be most useful there is not going lower-level, but going higher. Ie, tracking refactoring operations like renaming of variables. – Aleksandr Dubinsky Nov 03 '15 at 15:47
  • I very much understand that the challenge is in going higher-level. But the point I should have mentioned earlier is that there is a fundamental difference in renaming a file, and deleting it and getting the same thing from elsewhere with a different name but same contents. The former should result in a rename operation in a commit, while the latter should result in the disappearance of one file and appearance of the other : the new file and the old file are not related and so their histories should not be tied. – Milind R Nov 04 '15 at 04:25
  • 1
    @MilindR I would disagree with you there. No matter what I did with the file, if the content of the new file is the same, I would want my vcs history to be "optimized" to show a rename. I always aim for the history to be clean, and the minutiae of how I messed up my file manipulations especially should be hidden. God forbid I start to pay attention to *that*. – Aleksandr Dubinsky Nov 04 '15 at 12:45
  • What if that new file was not doing the same thing as the old one? The content might be the same for now, but it's purpose is different and will evolve differently. – Milind R Nov 05 '15 at 13:36
  • 1
    @MilindR I still doubt the file system history will reveal this intent. Why wouldn't I just rename the file with the intent of giving it a different purpose? It would be the easiest way to get rid of the old file and create a new one with the same (or similar) content. If you'd like the rename status to communicate intent, it would make sense to let the user specify/modify that while making the commit. – Aleksandr Dubinsky Nov 05 '15 at 21:31
2

You've misunderstood because the filesystem transaction log is a very temporary beast, it does not hang around long enough to be used for history (at least in no filesystem I know of).

But otherwise, yes - its a valid principle. The VCS log would be the only log you have however. It cannot be transported though as it is not a transaction log, it a 'living' history of operations. You could transport the view of this history however.

Sync is already implemented like this - SVNSync does exactly the replay operations you describe.

gbjbaanb
  • 48,354
  • 6
  • 102
  • 172
  • I thought journaling file systems kept meticulous logs. Either way, a file system driver should work right? – Milind R Nov 03 '15 at 11:56
  • 1
    @MilindR they do, but not forever - where would they keep them? You'd fill up your disk with journals. So they keep data in the transaction log file only until the data is flushed to disk, then the log is reset. See [this page about NTFS journalling](http://ntfs.com/transaction.htm) – gbjbaanb Nov 03 '15 at 12:01
  • Useful link, thanks. The very robustness seen there is what prompted me to wonder about piggybacking on it. So we can capture the operations happening in a local repository till the next commit, and then flush? – Milind R Nov 04 '15 at 04:20
  • 1
    You'd have to write such a thing as part of the filesystem driver - I doubt you'd have enough control over when the log was flushed to reliably use it for userland purposes. – gbjbaanb Nov 04 '15 at 08:35