Failure on attempt to serialize a pre-epoch timestamp received from an ext3 filesystem
While working on a tool to index files for one of my projects, I wound up getting a panic during the call to serde_json:
thread 'main' panicked at 'SystemTime must be later than UNIX_EPOCH: SystemTimeError(Duration { secs: 1, nanos: 0 })', libcore/result.rs:945:5
I tracked it down to a specific file in my home media server. Apparently, the combination of Linux and the ext3 filesystem is perfectly capable of representing at least one date before the UNIX epoch.
-
ls -lh:Dec 31 1969 - Python's
os.stat:st_mtime=-1 - Rust's
std::fs::metadata:modified: Ok(SystemTime { tv_sec: -1, tv_nsec: 0 })
...and the irony is, SystemTime is the one piece of the Rust-provided metadata that I felt was already in a format suitable for a generic index file to be shared between programs written in different languages.
Well... that and it was a hassle to track down because Rust itself didn't complain and the panic message during serialization wouldn't tell me which file of the hundreds of thousands was causing it to die. For lack of a purpose-built tool, I had to manually bisect it until I narrowed it down.
...and I found another one:
ssokolow@monolith rdbackup-excludes-indexer [master] % target/nightly/i686-unknown-linux-musl/release/rdbackup_exclude_indexer /srv/DVD-bound/ 2>&1 | grep 361289696
/srv/DVD-bound/Games/Ripped_CDs/Floppies Backed up using ddrescue/Microsoft® Mouse Setup v9.01 (potential data loss)/MOUSE901/SETUP.INI: Some(SystemTime { tv_sec: -361289696, tv_nsec: 0 })
thread 'main' panicked at 'SystemTime must be later than UNIX_EPOCH: SystemTimeError(Duration { secs: 361289696, nanos: 0 })', libcore/result.rs:945:5
(Yes, my folder structure for stuff bound for archival DVDs is an accreted mess that needs to be refactored.)
UPDATE: It appears that this example only fails because of integer overflow on i686-unknown-linux-musl. On x86_64-unknown-linux-gnu, it works perfectly well and ls -lh reports the mtime as Aug 26 2094.
use std::time::Duration;
use serde_json;
fn main () {
let t = std::time::UNIX_EPOCH - Duration::new(1, 0);
println!("{}", serde_json::to_string(&t).unwrap());
}
The relevant portion of the backtrace:
thread 'main' panicked at 'SystemTime must be later than UNIX_EPOCH: SystemTimeError(1s)', libcore/result.rs:945:5
...
10: serde::ser::impls::<impl serde::ser::Serialize for std::time::SystemTime>::serialize
at /Users/lambda/.cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.70/src/ser/impls.rs:562
11: serde_json::ser::to_writer
at /Users/lambda/.cargo/registry/src/github.com-1ecc6299db9ec823/serde_json-1.0.24/src/ser.rs:1991
12: serde_json::ser::to_vec
at /Users/lambda/.cargo/registry/src/github.com-1ecc6299db9ec823/serde_json-1.0.24/src/ser.rs:2025
13: serde_json::ser::to_string
at /Users/lambda/.cargo/registry/src/github.com-1ecc6299db9ec823/serde_json-1.0.24/src/ser.rs:2056
14: rust2018::main
at src/main.rs:9
The code in question:
#[cfg(feature = "std")]
impl Serialize for SystemTime {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
use super::SerializeStruct;
let duration_since_epoch = self
.duration_since(UNIX_EPOCH)
.expect("SystemTime must be later than UNIX_EPOCH");
let mut state = try!(serializer.serialize_struct("SystemTime", 2));
try!(state.serialize_field("secs_since_epoch", &duration_since_epoch.as_secs()));
try!(state.serialize_field("nanos_since_epoch", &duration_since_epoch.subsec_nanos()));
state.end()
}
}
The issue is, there's no official way to get the underlying value out of SystemTime; you can only use duration_since(UNIX_EPOCH), but a duration stores a u64 number of seconds, it can't represent a negative value.
This should probably be filed as an issue against the standard library; there needs to be some way to extract the underlying value of SystemTime, even if it's platform-dependent, as otherwise you can't losslessly serialize and deserialize them.
It should still be possible to work around it. I'll try to verify it later today, but, as I remember, the let duration_since_epoch = self.duration_since(UNIX_EPOCH) does provide you with the information necessary to serialize it reliably.
Specifically, the Err variant of the Result has a .duration() function which will tell you how far before the epoch your SystemTime was.
That's how an mtime of -1 gets exposed in the panic message as SystemTimeError(Duration { secs: 1, nanos: 0 }).
Serializing secs_since_epoch as an i128 with the Err case being 0 - duration_since_epoch.as_secs() should do the trick.
(In other words, the output is semantically equivalent to a { sign: bool, magnitude: Duration } pair.)
That said, definitely a footgun in the standard library to be remedied.
Yep, I figured that out after writing the previous comment, and started working on a patch to serde to fix it.
My main concern is that i128 was only fairly recently stabilized, so would require bumping the minimum required version of the compiler. Also, even if you made the implementation conditional on i128 support, it would be a breaking change for Serializer and Deserializer implementations that don't yet have i128 support.
So I think you'd have to do it without i128 for secs_since_epoch, which makes it more complicated. On Unix like platforms and Redox, it's at most an i64 number of seconds offset from the epoch. On Windows, it's a u64 number of hundred nanoseconds since January 1, 1601, which has a strictly lower range than an i64number of seconds, so that should be OK. But on WASM, they just use a Duration to represent the time since the epoch, so you actually could have the full u64 range that you need to be able to represent.
I think that you'll just have to use an i64 here, and consider the loss of one bit of range on the top end to be preferable to not being able to represent negative values, which are likely to be more common.
Hmm. Good point.
I have no problem with being clamped to a little over 292 billion years from the epoch for positive range, but I really would like to avoid this being another non-obvious location where someone who wants high reliability has to guard against failure.
Would it be a breaking change to switch to i128 down the road, once it's better supported?
Would it be a breaking change to switch to
i128down the road, once it's better supported?
I think it will always be a breaking change, as some formats may never support i128.
I also think that it might be a breaking change to change the type at all, since the Deserialize impl parses into a u64, so if you changed it to serialize an i64, then programs using the older Deserialize impl wouldn't be able to parse it.
On the other hand, it might be argued that you're replacing a very likely panic on serialization with a less likely properly returned error on deserialization only in the case of two different programs using different versions of Serde. I don't know if Serde has a stability policy for the types used for serializing std types that would cover this.
Of course, one alternative is that you could just use a newtype wrapper and implement Serialize and Deserialize yourself.
My main concern here is getting rid of the footgun if at all possible. I really don't want to have to maintain a special "Never allow these types to creep into structs I'm deriving Serialize/Deserialize on, because the compiler certainly won't warn you" audit list.
Of course, one alternative is that you could just use a newtype wrapper and implement Serialize and Deserialize yourself.
I've already spent more time trying to get comfortable with the Serde docs for customizing deserialization of complex data than on the rest of the project put together, so I think I'll just go for pragmatism over perfection, spend 5 minutes to manually convert SystemTime instances into a custom "signed Duration" struct , and derive Serialize on that instead.
Then I can move on to hacking up a quick implementation of my "Use \u0000 as the prefix for hexadecimal byte escapes" solution for getting serde_json to accept the full range of paths actually present on my filesystem and finally have an initial dogfoodable version, days after I expected it.
I'm coming back to this with the intent to polish up my workaround into something that shouldn't require a change to my JSON schema if Serde fixes the problem and I remove the workaround and something struck me about this:
I also think that it might be a breaking change to change the type at all, since the Deserialize impl parses into a u64, so if you changed it to serialize an i64, then programs using the older Deserialize impl wouldn't be able to parse it.
On the other hand, it might be argued that you're replacing a very likely panic on serialization with a less likely properly returned error on deserialization only in the case of two different programs using different versions of Serde. I don't know if Serde has a stability policy for the types used for serializing std types that would cover this.
...it's conceptually equivalent to saying "We're going to use 32-bit time_t in 64-bit programs to minimize the chance that un-patched 32-bit programs will be the point of failure when Y2038 rolls around".
(Except that this DoS is caused by things like chkdsk goofs on metadata imported from old floppy disks, and off-by-one errors or timezone mis-conversions pushing POSIX timestamps that were set to 0 back to -1 or other values before the epoch.)
If Serde has a stability policy that would preclude a fix to properly handle data that occurs in the wild in the name of ensuring output is deserializable with prior versions of Serde, that's a terrible policy. It's essentially saying "Future programs must remain vulnerable to DoS vulnerabilities from real-world input in order to harmonize with such vulnerabilities in old versions".
Widening the range of acceptable values to align with in-the-wild input should never be blocked by stability guarantees on generated output... especially since such dates could already be generated in JSON from other serializers attempting to interoperate with Serde.
As for u64 issues, if platforms that don't support i128 are a concern, you could always do something like sending the sign bit out-of-band. If SystemTimeError was encountered on trying to serialize, emit - followed by the serialized contents of SystemTimeError::duration. If the byte stream begins with -, set a boolean indicating whether to add or subtract the resulting Duration from UNIX_EPOCH and then parse the rest as u64 and build a Duration from it. Unless I've misunderstood Serde's architecture, this should all be internal implementation details for SystemTime's Serialize and Deserialize impls.
This issue can be closed because it has been fixed in serde.
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=b8e482b432ae8de5c2c0d24555dae49a
use std::time::Duration;
use serde_json;
fn main() {
let t = std::time::UNIX_EPOCH - Duration::new(1, 0);
println!("{:?}", serde_json::to_string(&t));
}
Err(Error("SystemTime must be later than UNIX_EPOCH", line: 0, column: 0))
There is no longer a panic. This can also be seen by the code in serde https://github.com/serde-rs/serde/blob/5b140361a31c21713e59fe4cc35ab1d192bbc79f/serde/src/ser/impls.rs#L611 changed in https://github.com/serde-rs/serde/commit/a81968af3caea9528e8e935a9a19ccad19a16778 .
While that is an improvement, given that both file timestamps and JSON integers may be negative, and that most people probably never think to test with negative file timestamps, I think this is still too much of a footgun to be considered solved in a Rust library.
The "run a slow batch job, watch it fail at 80% because Serde introduced a surprising artificial limitation that needs to be worked around with a newtype wrapper" bug is still present.