8

As a part of learning system programming, I am looking to implement a file shredder. The simplest way (and probably seen as naive) would be to replace the data bytes with zeroes (I know OS splits the files and I'll replace bytes in all those chunks). But when I google on this topic, I am surprised to find multiple pass algorithms, some going as high as 35!

Could someone elucidate the benefit of multiple pass please? I couldn't find any explanation.

Thanks

Mike
  • 81
  • 1

4 Answers4

14

Imagine a physical disk storing the binary value 0101. Physically, on the disk, the charges exist as real values, which are rounded up or down by the disk controller

binary -> physical charge

0 1 0 1 -> 0.1 0.9 0.1 0.9

If you were to overwrite the data with zeros, some residual charge would remain from the previous values, so you could in this simple example, the new values being

binary -> physical

0 0 0 0 -> 0.01 0.09 0.01 0.09

Equipment that is sensitive enough to read these charges at high resolution, can then be used to extract this "shadow" of the overwritten data. That's why rewriting multiple times (and using random values) helps obscure the data.

pufferfish
  • 279
  • 1
  • 4
  • -1, no it's not. We've been pusing the limits on disks so long that we've unambiguously entered the domain of quantum physics. This analog assumption just doesn't hold anymore. Each magnetic domain (grain) on a platter points in one direction, and only one. There are just a few hundred grains per bit at most, they're strongly coupled, and they're not at all cooled. Furthermore, the actual bits are transformed by a PRML and ECC function, so you can't even directly say to which bit an individual grain corresponds. Essentially, 1TB+ disks are possible because this residual is now fully used. – MSalters Aug 19 '11 at 13:49
  • 3
    @MSalters - You are assuming that all disks in use are like this. WD Still makes disks that do not utilize this. The question was why use 35 passes. It is to obscure the data for the reasons shown. Until the old style drives are no longer in use then this type of destroyer is needed. What is missing is that new controlers do not give you the fine grain control over the hardware. Laws designed to prevent the destruction of evidence have lead to controlers that do not overwrite previously used areas until they have no other choice. – SoylentGray Aug 19 '11 at 15:08
  • 4
    @MSalters, whether it's necessary is irrelevant. This is the correct answer *to the question posed by the OP*. – Caleb Aug 19 '11 at 15:45
  • @MSalters, yes the entire grain points in one direction, but the quantization axis may differ from grain to grain, inducing some variation. This would be affected by thermal fluctuations, magnetic fluctuations from the read head passing over, or a neighboring grain being flipped. – rcollyer Aug 19 '11 at 16:57
  • @Chad: All magnetic materials have grains. Simple math proves that WD's disks use a few hundred grains per bit, given the size and capacity of their platters. You might be confused by patterned media. Those intentionally delineate grains to reduce coupling. Non-patterned media just have grains randomly distributed. – MSalters Aug 22 '11 at 08:09
  • @MSalters - No i am saying first that there are still quite a few of the old style drives in use. Second, the controllers today are designed to resist file shredding. You can not choose to overwrite bits at specific locations on the drive like the old controllers allowed. The drives do not provide accurate feedback so if you write a program it will appear to do what you want but it will not in most cases. – SoylentGray Aug 22 '11 at 13:46
7

The multipass erase is necessary to destroy data on magnetic storage devices. Data can be recovered with the right equipment even if it was overwritten by another sequence of 1s and 0s from the layers below or in between.

However, there are voices on the internet which claim that multipass erasure is no longer necessary, as the areal density of data on modern harddrives has increased 10 000 fold.

Caleb
  • 38,959
  • 8
  • 94
  • 152
Falcon
  • 19,248
  • 4
  • 78
  • 93
0

It is said that experts with special equipment can reconstruct a formatted drive. Therefore the advise is to overwrite the data on the drive multiple times with differing (random) patterns.

Ingo
  • 3,903
  • 18
  • 23
0

The overwriting of data with 0s in multiple passes only makes sense for magnetic storage devices, because of what @pufferfish said. For SSD and other flash storage mechanisms this fails, see http://www.usenix.org/events/fast11/tech/full_papers/Wei.pdf

Moral of the story: Dealing with hardware problem in software may change when hardware technology changes, although the API will not change.

Residuum
  • 3,282
  • 28
  • 31