feeds: Make Feeds.save fully atomic, assuming a working fsync
If the disk is full (or there are other OS-level issues), a file may
not be completely written to the disk.
The write-flush-fsync-rename sequence is much safer. The fsync
invocation matches the recommendation in the docs [1]:
If you’re starting with a buffered Python file object f, first do
f.flush(), and then do os.fsync(f.fileno()), to ensure that all
internal buffers associated with f are written to disk.
The purpose of each step is:
* write: move the data into a library buffer
* flush: flush the library buffer into a kernel buffer
* fsync: flush the kernel buffer onto the disk at $tempfile
* rename: adjust the metadata so that the $filename points to the
$tempfile data, release the old data
This means that if the rename works we get the new data, and if the
rename fails we still have the old data.
However, POSIX's fsync is implementation defined unless
_POSIX_SYNCHRONIZED_IO is defined [3,4], and some OS X implementations
go the no-op route, as Stewart Smith points out in his excellent "Eat
My Data: How everybody gets file I/O wrong" [4]. If you want to run
rss2email on such a system, verifying your data integrity is up to you
;).
We used to write-rename the data file (but not the config) on *nix
[2]. Now we do the full write-flush-fsync-rename for both the config
and data files on both *nix and other systems.
[1]: http://docs.python.org/3/library/os.html#os.fsync
[2]: For rss2email, *nix is "has fcntl, but isn't SunOS
[3]: http://pubs.opengroup.org/onlinepubs/
009695399/functions/fsync.html
[4]: https://www.flamingspork.com/talks/2007/06/eat_my_data.odp
Reported-by: Etienne Millon <me@emillon.org>
Signed-off-by: W. Trevor King <wking@tremily.us>