Avoiding corruption on CHAN FAT FS on embedded logging device

Question

In testing ELM FAT FS on STM32F4 I am finding USB memory stick corruption errors after unexpected shutdown or removal. e.g power removal, media removal.

In some cases the corruption is recoverable in Windows, but often the data are lost and the media needs a format leading to loss of all data.

I have tried both f_sync and f_open / f_close pairs round each write and get a similar number of file system corruptions.

I am logging about 2 entries per second onto 4 streams. I have set the FS for 6 concurrent handles. This works flawlessly for in excess of 18 hours if care is taken to stop streaming and close all file handles before removing the device.

What are others doing to circumvent this problem?

I can think of :-

power sensing and management ,1/2 the problem but irrelevant for USB devices
Longer or shorter write blocks and frequencies
Move to different medium e.g.SD card with removal indicator
Is f_sync preferred over open/close or vice versa

I'd like to draw on someone else's experience if possible. I'm currently using Sandisk Cruzer blade 32Gb devices.

I have a drawer full of memory sticks that have been destroyed in commercial devices over the years, possibly by the same circumstances.MAny cannot even be re-formatted.

The problem may not be the FatFS driver, even if it has a buffer and you would synchronize it periodically by flushing to media, you can't really know how long it takes for the USB flash controller to handle the write. Unplugging an USB flash drive unepectedly is just something that is not supposed to happen. — Justme, Feb 02 '23 at 14:06

Marcus Müller · Accepted Answer · 2023-02-02T10:30:27.657

So, unless you're doing a lot of file system operations (like creating files, deleting old ones, moving them around) just before the power went out, even on FAT, you'd not regularly corrupt the whole filesystem – since you're just appending to files, the end of that file might not be there when you repair the file system, but that should be about it.

So, I suspect something else happening: What you perceived to do to enhance reliability of writing actively wears down your storage media very rapidly. You touch upon this in your "longer or shorter blocks and frequencies": yep, I think buffering your writes, and doing whole-block writes instead of 2 entries per write (I'm assuming an entry is not 2 kB, but more like 2 B) would help.

What happens inside a NAND flash device like USB sticks is this (simplified below; this is for understanding the principle, not reimplementation):

You want to write 4 Bytes to address 0x10F004 (to 010F007) (really, just a random example). So, the controller inside the stick

figures out to which logical address block that address belongs. Say, your stick internally has a 4 kB block size (again, just random example), so that means that this data resides on the 271. logical block, and is in positons 7 to 11.
It then looks at an internal table, figures out that the 271. logical block is currently saved in the 5000. physical block.
It reads that block into its internal RAM,
applies the forward error correction that was part of the data on the physical block to correct any errors (as far as possible),
changes the four bytes at the 7th to 11th position in the decoded data in RAM
it looks up a physical block in another table that hasn't been used for writing to in a long time. For the heck of it, let's say that's physical block 1234.
it adds error-correction information to the data in RAM
zeros-out the whole physical block 1234, so that data can be written to it.
writes that whole 4 kB + error correction of data to physical block 1234
changes the entry in the first table to make logical block 271 now point to physical block 1234 instead of 5000.
Adds physical block 5000 to the end of the "blocks not written to in a while" table.

The reason you're looking at this here is

cost-effective NAND flash can only work in rather large blocks, not on individual bytes – that's both due to its addressing methodology, as well as the necessity to have an error-correcting code on there that can be used to correct errors, which are virtually guaranteed to appear¹.
since being written to degrades flash memory, you need to "fairly" distribute writes across the whole set of usable blocks. Thus, you wouldn't write the modified data to the same block as before (you can't overwrite a 1 with a 0, anyways, in flash. You always need to zero out the whole block and then set all the 1s.), you wear level across as much of the chip as you can.

Now, in step 11. above I said that the former physical block is added back to the list of blocks to be written to – but that only happens if during reading of the block, and error correction, nothing suspicious of wear happened. If that happened, block 5000 will not be added to the table of blocks that the controller picks new write targets from. But then the number of blocks that can be written to is decreased – and that only works for as long as the number of used blocks + blocks in the write-to-table is larger than the nominal drive capacity (divided by block size). If that happens, the controller has no place to write new data to. Your stick becomes broken and read-only.

So, what to do?

In any case, if deleting a file is part of the regular operation of your software, then make sure that the file system layer also tells the USB stick to TRIM/DISCARD that logical block (that allows for these blocks to go through step 11, so you get new write-to blocks, and wear leveling works on all unused space)
Turn off "modification time" for your filesystem – if you need to update the field in the FAT that says "this file was last written to at…" every time you write something, that instantly doubles your block write rate.
As you hinted at, accumulating data before writing it sounds like a good idea.
- I don't know your FAT implementation, but maybe it does have write buffers?
- But even in that case, your f_sync would suppress that capability
- It would however require you to implement enough power stabilization (large capacitors? Supercaps? Backup battery?) and brownout detection to let you sync your buffered data to stick once the power starts going out
Avoid using 4 files to store 4 streams. That forces FAT to write to four different blocks! Instead, just interleave into one file.
- If you can: do a simple scheme that has fixed amounts of bytes for each stream (i.e., instead of writing 2 signed chars to 4 files, you would write a single struct _foo {char a[2]; char b[2]; char c[2]; char d[2]} foo; to one file)
- Or, if there's not always the same amount of data from each channel, a simple key-value pair (i.e., enum _channel { a, b, c, d }; typedef enum _channel channel; struct _key_value { channel chan; char value[]; }; if the length of each channel is constant (not necessarily even the same),
- or if you can have variable-length data chunks, a type-length-value tuple (i.e., struct _key_length_value { channel chan; unsigned char length; char value[]: }; so that both type of data and amount of data can be recovered by the reader.
FAT sounds like the wrong file system altogether. What you describe as outage scenario (sudden loss in power) actively screams "use a journal!", so a journaling file system or an equivalent data structure, written to the raw block device instead of going through a file system (why a file system if you only have a known number of data streams, optimally one?) would be worthwhile investigating.
- Of course, using a different file system than FAT or even no file system at all will put software requirements on the reading end of your ecosystem. In embedded, that's often no problem (as you control which software needs to be available on the reader), but it can be a hassle or a showstopper (especially in consumer electronics).
- Also, fun fact, I know of not a single implementation of a major journaling file system (ext3/4, XFS, JFS, F2FS …) for microcontrollers, so err, this is advice easier given than implemented.
- But if you implement this as a data structure on your own (no matter whether on top of an existing file system or raw device), not so hard:
  - say you write in 1kB blocks at the end of your file; you leave the first byte as 0x00 of each block, and only after a complete block of data has been successfully written, update the first byte of the block to be 0xFF. Only after that has been successfully written, you can start writing the next block.
  - On the reading end, you round down your file size to whole multiples of 1 kB (everything else must be broken), and if your last block doesn't start with 0xFF, then it's not properly been written, so you ignore it as well.
  - instead of a single 0x00 / 0xFF byte, you can also use timestamps (as long as these can't be all-0) so that each block contains info of when it was written.
  - and to make things a bit robust against random corruption, add a hash to the beginning of the file, after the "canary" token (that 0x00/0xFF byte or whatever you choose). XXH32 from the xxHash hash suite is very fast to compute, and it only takes 48 bytes of RAM to calculate, and uses 32 bit = 4 B on your storage only. This might be a low price for having the certainty that your data is intact with a probability of being wrong that's basically inexistent.

¹ Zero-error memory would be astronomically expensive, slow and power hungry. Having more cheap memory, some of which you have to reserve for error-correction information, is much more affordable. In the USB stick market, you can be pretty sure that there's even significant parts of the flash memory that are simply not in the table of blocks to be ever used, internally to the stick. Having a few bad blocks on a large wafer and hiding them from usage, thus selling all of that wafer is much more cost-effective than throwing out any flash memory chip that contains a single non-working block.

Thank you for this answer. I really wasn't expecting an answer in this depth and quality. I am amazed. Im writing pretty long blocks - 4096 bytes but had not gone to sync them to page boundaries - that will definitely help. I have an embedded timestamp, so can do away with the directory. That leaves me to implement a large array in memory/flash/whatever and present the device to USB as a disk. A much cleaner solution. I'm using a STM32F407 and will probably take your last suggestion since there may be sufficient in chip flash for my needs. — ChrisR, Feb 02 '23 at 15:09
JUst read through it again. There are several gems. Thanks again for your concise and detailed reply. Much appreciated. — ChrisR, Feb 02 '23 at 16:05

Avoiding corruption on CHAN FAT FS on embedded logging device

1 Answers1