libs-team icon indicating copy to clipboard operation
libs-team copied to clipboard

Add basic tempfile API to stdlib

Open NobodyXu opened this issue 1 year ago • 13 comments

Proposal

Problem statement

Tempfile is a widely used functionality, with the previous breaking change of the tempfile crate being about 7 years ago, I believe its API is stable enough that it should be part of the stdlib.

Motivating examples or use cases

The crate tempfile has 168,118,646 downloads and is used very often in the ecosystem, I believe it is quite a fundamental functionality that the ecosystem depends on.

Having it in stdlib would make it

  • easier to access
  • more trusted than third-party dependencies, one less crate to audit on
  • faster to compile for low-level crates depending on it, which might block compilation of other crates

Solution sketch

mod fs {
    impl File {
        /// Create a new temporary file.
        /// The new file could be in memory or on disk, there's no guarantee about it.
        ///
        /// The temporary file will be automatically removed by the OS when the last handle to it is closed.
        /// This doesn’t rely on Rust destructors being run, so will (almost) never fail to clean up the temporary file.
        ///
        /// # Security
        /// NOTE that other processes might be able to get access to your tempfile (via ptrace, procfs, kernel bug, etc).
        /// It does not provide any security guarantee on that.
        pub fn tempfile() -> Result<File>;

        /// Create a new temporary file under the path `dir`, the file will be created on the filesystem
        /// `dir` is in.
        ///
        /// # Security
        /// NOTE that other processes might be able to get access to your tempfile (via ptrace, procfs, kernel bug, etc).
        /// It does not provide any security guarantee on that.
        pub fn tempfile_in<P: AsRef<Path>>(dir: P) -> Result<File>;

        /// Create a hardlink to `path`.
        /// If `self` is a tempfile, then it will be visible at `path`.
        ///
        /// NOTE that if `self` is a tempfile, then this operation can fail with `io::ErrorKind::Unsupported`.
        pub fn hardlink<P: AsRef<Path>>(&self, path: P) -> Result<()>;
    }

    impl OpenOptions {
        /// Open a tempfile under `dir`, the `File` returned will be writable even if `.write(true)` is not called.
        /// If `create_new` is set to `false`, then you can try using [`File::hardlink`] to persist the [`File`].
        ///
        /// NOTE that if `self` is a tempfile, then this operation can fail with `io::ErrorKind::Unsupported`.
        pub fn open_tempfile<P: AsRef<Path>>(&self, dir: P) -> Result<File>;
    }
}

mod path {
    pub struct PersistError<O: TempObject> {
        pub error: Error,
        pub temp_path: TempPath<O>,
    }
    impl<O> std::error::Error for PersistError<O> {}
    impl<O> From<PersistError<O>> for Error {}
    impl<O> From<PersistError<O>> for TempPath<O> {}

    pub trait TempObject {
        /// [`io::ErrorKind::AlreadyExists`] should be returned if the `path` already has an object.
        fn create(path: &Path) -> Result<Self>;
        fn keep(&self, path: &Path) -> Result<()>;
        fn persist(&self, new_path: &Path) -> Result<()>;
        fn remove(&mut self, path: &Path) -> Result<()>;
    }
    impl TempObject for File {}
    impl TempObject for DirEntry {}

    #[derive(Debug)]
    pub struct TempPath<O: TempObject> {
        path: PathBuf, 
        object: O,
    }
    impl<O: TempObject> Drop for TempPath<O>  {}
    impl<O: TempObject> TempPath<O>  {
        pub fn new() -> Result<Self>;
        pub fn new_in<P: AsRef<Path>>(p: P) -> Result<Self>;

        pub fn with_prefix<S: AsRef<OsStr>>(prefix: S) -> Result<Self>;
        pub fn with_prefix_in<S: AsRef<OsStr>, P: AsRef<Path>>(
            prefix: S,
            dir: P
        ) -> Result<Self>;

        pub fn path(&self) -> &Path;

        pub fn close(self) -> Result<()>;

        /// Persist temp path with name `new_path`, if `overwrite` is true then the existing object at `new_path`
        /// will be overwritten.
        pub fn persist<P: AsRef<Path>>(
            self,
           new_path: P,
           overwrite: bool,
        ) -> Result<O, PersistError<O>>;

        /// This function could fail since we need to mark the file as non-temporary on some platforms (e.g. windows)
        pub fn keep(self) -> Result<(PathBuf, O), PersistError<O>>;
    }
    impl<O: TempObject> Deref for TempPath<O>  {}
    impl<O: TempObject> DerefMut for TempPath<O>  {}

    pub struct Builder;

    impl<'a, 'b> Builder<'a, 'b> {
        pub fn new() -> Self;

        pub fn prefix<S: AsRef<OsStr> + ?Sized>(&mut self, prefix: &'a S) -> &mut Self;
        pub fn suffix<S: AsRef<OsStr> + ?Sized>(&mut self, suffix: &'b S) -> &mut Self;
        pub fn rand_bytes(&mut self, rand: usize) -> &mut Self;

        /// [`io::ErrorKind::AlreadyExists`] should be returned if the `path` already has an object.
        pub fn make<F, O>(&self, f: F) -> Result<TempPath<O>>
        where
            F: FnMut(&Path) -> Result<O>,
            O: TempObject;

        /// [`io::ErrorKind::AlreadyExists`] should be returned if the `path` already has an object.
        pub fn make_in<F, O, P>(&self, dir: P, f: F) -> Result<TempPath<O>>
        where
            F: FnMut(&Path) -> Result<O>,
            P: AsRef<Path>,
            O: TempObject;
    }
}

Once we have the basics done, we can add it more methods from tempfile, i.e. the named tempfile, the temp dir, etc.

Links and related work

tempfile, the existing popular implementation of tempfile.

NobodyXu avatar Jul 29 '24 14:07 NobodyXu

The crate tempfile has 168,118,646 downloads and is used very often in the ecosystem, I believe it is quite a fundamental functionality that the ecosystem depends on.

Yes the tempfile package is very popular, but I'm not sure if the two functions tempfile() and tempfile_in() are the right "fundamental functionality that the ecosystem depends on". In my experience I've only use tempfile::{Builder, TempDir} because most of the time we do need the Path to interact with external interfaces that expected a file. You may get a feel of how the tempfile APIs are used with https://github.com/search?q=language%3ARust+%2F%5Cbuse+tempfile%5Cb%2F&type=code.

Also the OS-managed tempfile()/SpooledTempFile and Rust-managed NamedTempFile/TempDir are implemented totally differently, so one can't say uplifting tempfile() into std help the case of NamedTempFile/TempDir either.

In any case if the team feels that tempfile() is really the fundamental one I wish it is available as std::fs::File::temporary() (or File::temp()) instead.

kennytm avatar Jul 29 '24 16:07 kennytm

Yes the tempfile package is very popular, but I'm not sure if the two functions tempfile() and tempfile_in() are the right "fundamental functionality that the ecosystem depends on".

I'd want to do this incrementally, by first adding the most simple APIs.

I wish it is available as std::fs::File::temporary() (or File::temp()) instead.

Thanks, that is indeed better!

NobodyXu avatar Jul 30 '24 00:07 NobodyXu

cc @kennytm I just added the API you wanted for named tempfile/tempdir.

I think the core of it is TempPath, so I created a version of it that is generic and can serve file/dir or anything user implemented.

NobodyXu avatar Jul 30 '24 15:07 NobodyXu

        /// [`io::ErrorKind::AlreadyExists`] should be returned if the `path` already has an object.
        pub fn make<F, O>(&self, f: F) -> Result<TempPath<O>>
        where
            F: FnMut(&Path) -> Result<R>,
            O: TempObject;

where's R defined, in F: FnMut(&Path) -> Result<R>,? same for make_in

programmerjake avatar Jul 30 '24 17:07 programmerjake

Thanks @programmerjake updated, it should be O.

NobodyXu avatar Jul 31 '24 08:07 NobodyXu

also note that os-managed tempfiles may not be available on every platform, while rust-managed tempfiles can work on any platform where files can be deleted.

lolbinarycat avatar Jul 31 '24 21:07 lolbinarycat

Having it in stdlib would make it

  • easier to access

  • more trusted than third-party dependencies, one less crate to audit on

  • faster to compile for low-level crates depending on it, which might block compilation of other crates

Note that all of these are true of literally every piece of code ever, which makes them weak motivation for this change since it's not true that all code should be in the standard library.

What makes it important that this specifically be in std, vs being just a well-known community crate (like how regex is)?

scottmcm avatar Aug 08 '24 17:08 scottmcm

What makes it important that this specifically be in std, vs being just a well-known community crate (like how regex is)?

@scottmcm tl;dr tempfile API is a missing pieces in stdlib

  • it has a stable interface, the API for tempfile is unlikely to change, even if new syscalls are added to the OSes or some convention has been changed, whereas regex API could still change due to its sheer complexity and requirements to be compatible with other regex implementation, i.e. new regex syntax favor
  • it currently has one popular implementation, other implementation, such as the implementation in C, is unlikely to diverge too much, the basics is the same, just how random filename is generated is different, where as regex has two different implementations based on compile-speed requirements (regex-lite vs regex)

In additional to that:

  • The APIs of tempfile can fit in the existing std::fs nicely, one could say that this is actually an API missing in stdlib, i.e. we provide file/fs abstraction and a lot of convenient methods (io::copy, create_dir_all, remove_dir_all, fs path parsing, std::env::tempdir), but stdlib is missing an API for tempfile/tempdir, @scottmcm I'd argue that it's actually a missing API in stdlib
  • Having such API in stdlib would make it easier to write new binary from scratch with smoother experience, currently tempfile not in stdlib, you'd have to add them to dependencies for a very basic, very fundamental functionality that you'd assume it's in stdlib

NobodyXu avatar Aug 10 '24 08:08 NobodyXu

@scottmcm Let's think about this the other way around:

Why tempfile should not be part of stdlib?

If we are developing a new PL, if there any reason tempfile should not be in stdlib?

It is a quite fundamental functionality, abstracts over platform details, and has a stable API, and there's no different way to implement it, except for some tunables.

NobodyXu avatar Aug 10 '24 08:08 NobodyXu

if tempfile should be in stdlib, doesn't require a portable interface to the system RNG?

lolbinarycat avatar Aug 11 '24 22:08 lolbinarycat

if tempfile should be in stdlib, doesn't require a portable interface to the system RNG?

stdlib already has it internally, and there's proposal to expose that.

NobodyXu avatar Aug 11 '24 23:08 NobodyXu

We did a first-pass discussion about this in today's @rust-lang/libs-api meeting. We didn't come to a consensus yet, but some initial reactions were that some of the methods (e.g. open_tempfile) seemed non-portable, and that the full surface area seemed a little larger than necessary for a first pass (though we didn't come to a consensus yet on which bits).

joshtriplett avatar Aug 27 '24 22:08 joshtriplett

Perhaps we can reduce the API surface to:

mod fs {
    impl File {
        /// Create a new temporary file.
        /// The new file could be in memory or on disk, there's no guarantee about it.
        ///
        /// The temporary file will be automatically removed by the OS when the last handle to it is closed.
        /// This doesn’t rely on Rust destructors being run, so will (almost) never fail to clean up the temporary file.
        ///
        /// # Security
        /// NOTE that other processes might be able to get access to your tempfile (via ptrace, procfs, kernel bug, etc).
        /// It does not provide any security guarantee on that.
        pub fn tempfile() -> Result<File>;

        /// Create a new temporary file under the path `dir`, the file will be created on the filesystem
        /// `dir` is in.
        ///
        /// # Security
        /// NOTE that other processes might be able to get access to your tempfile (via ptrace, procfs, kernel bug, etc).
        /// It does not provide any security guarantee on that.
        pub fn tempfile_in<P: AsRef<Path>>(dir: P) -> Result<File>;

        /// Create a hardlink to `path`.
        /// If `self` is a tempfile, then it will be visible at `path`.
        ///
        /// NOTE that if `self` is a tempfile, then this operation can fail with `io::ErrorKind::Unsupported`.
        pub fn hardlink<P: AsRef<Path>>(&self, path: P) -> Result<()>;
    }

    impl OpenOptions {
        /// Open a tempfile under `dir`, the `File` returned will be writable even if `.write(true)` is not called.
        /// If `create_new` is set to `false`, then you can try using [`File::hardlink`] to persist the [`File`].
        ///
        /// NOTE that if `self` is a tempfile, then this operation can fail with `io::ErrorKind::Unsupported`.
        pub fn open_tempfile<P: AsRef<Path>>(&self, dir: P) -> Result<File>;
    }
}

for now?

It would still be very useful as the ability to create a tempfile, and also hardlink it to replace another file to achieve atomic update @joshtriplett

(Sorry for the late update, I was busy with other stuff and forgot this)

NobodyXu avatar Mar 19 '25 13:03 NobodyXu