wasm-micro-runtime icon indicating copy to clipboard operation
wasm-micro-runtime copied to clipboard

Proposed implementation of WASI libc filesystem on Windows

Open zoraaver opened this issue 2 years ago • 3 comments

Motivation

We would like to have a working implementation of all the WASI libc filesystem functions on Windows. This is currently not possible due to the use of various POSIX file functions which are not available on Windows or are not fully POSIX compliant.

Requirements

The solution outlined in this proposal is not necessarily specific to POSIX filesystem functions (it could be equally applied to the POSIX time functions for example) but we will only concern ourselves with filesystem functions. The solution can be easily extended to other POSIX functions in the future.

Proposal

The core issue is that the WASI libc implementation assumes the underlying platform is a POSIX platform. In the codebase, there is already an existing method of wrapping platform-specific APIs: a ‘wrapper’ function is defined in an interface header file and then implemented on a per-platform basis. e.g. os_mutex_lock is declared in core/shared/platform/include/platform_api_vmcore.h and is implemented using platform-specific APIs for each platform under core/shared/platform (see Windows implementation and POSIX implementation). This approach has the advantage of avoiding the use of lots of #ifdef’s when we need to use any platform-specific APIs. The proposal here is to adopt the same approach for the POSIX filesystem functions: wrap them in platform-agnostic interface functions which are implemented on a per platform basis. This will mean the WASI libc implementation will no longer reference any POSIX file functions directly, but rather wrapper functions which will call the underlying POSIX function on POSIX platforms and call the equivalent Windows APIs on Windows. Obviously we are only interested in Windows here, but this solution applies to any non-POSIX compliant platform. The benefit of this solution is that it allows us to keep the majority of the WASI libc implementation the same since most of the code is platform-agnostic already.

This approach is very similar to that taken by WasmEdge; there are both Windows and Linux classes to wrap any platform-specific details: Windows, Linux. The implementation is in C++ but it will still be useful to use as a reference when implementing POSIX file functionality for Windows. (the Linux implementation mostly passes through parameters to the underlying POSIX functions as expected).

Implementation

Wrapper Function Naming

To make it easy to understand what a wrapper function does without digging into a specific implementation, we can name them according to their equivalent POSIX function but with an addition os_ prefix. e.g. preadv would become os_preadv.

Abstracting Platform Handles

Aside from relying on the existence of various POSIX functions/data structures, we also assume that a platform handle (a.k.a. file descriptor on POSIX) is of an int type here which will not be true in general. In order to support Windows and any other non-POSIX platforms, we will need to define this type on a per platform basis.

Instead of int, we can use a type os_handle , which would be typedef’ed to int on POSIX. On Windows, we would define it as follows:

typedef struct windows_handle {
  HANDLE handle;
  __wasi_fdflags_t fdflags;
} os_handle;

We need to store the fd flags separately from the raw Win HANDLE since there is no way of retrieving/storing these flags on a Windows HANDLE natively.

The POSIX wrapper functions would then accept/return this os_handle type where appropriate.

Wrapper Function Signature

We can keep the signature of each wrapper function the same as the equivalent POSIX function, with the exception of the following changes:

  • Explicitly return a WASI error (__wasi_errno_t) from each function rather than using errno. errno is not set by Windows APIs and while we could set it ourselves, it is not a very clean way to handle errors and its usage as far as possible should be restricted to the POSIX implementation. Where the original POSIX function returned a value (that is not already an error code), we can instead use an additional out parameter. Each platform will then be responsible for converting their native error code into the appropriate WASI error code.
  • Where the equivalent POSIX function would expect a data structure/type defined in the POSIX specification, we would instead use the corresponding WASI data structure, which is necessarily platform-agnostic. e.g.
    • ssize_t writev(int *fildes*, const struct iovec *iov, int iovcnt); would be wrapped in the following function: __wasi_errno_t os_writev(os_handle *fildes*, const struct __wasi_iovec_t**iov*, int *iovcnt*, ** ssize_t **bytes_written*);

This would include POSIX flags e.g. O_DIRECTORY, O_APPEND. For all the functions we need to implement, there are corresponding WASI flags e.g. __WASI_O_DIRECTORY , __WASI_FDFLAG_APPEND.

Wrapper Function Declaration

The most suitable place to declare these functions seems to be core/shared/platform/include/platform_api_extension.h.

Standard I/O Streams

In addition to wrapping the standard POSIX filesystem functions, we will need a way to obtain handles to stdin, stdout and stderr (if they are available at all on the platform). Currently, we default these to the values 0, 1 and 2 respectively which would only be correct on POSIX systems. Therefore, we can introduce an additional wrapper function

  • os_handle os_get_std_handle(os_std_handle std_handle);

together with constants to represent the stdin/stdout/stderr devices:

  • OS_STD_OUTPUT_HANDLE
  • OS_STD_INPUT_HANDLE
  • OS_STD_ERROR_HANDLE

defining them as appropriate on POSIX and Windows.

N.B. on POSIX, os_get_std_handle would just pass through the std handle without modification.

Directory Handles

We rely on the existence of POSIX directory handles (DIR*) here. Similar to Abstracting Platform Handles, we will need to define this directory handle type as os_directory_handle on a per-platform basis. On POSIX, it would be defined to DIR whereas on Windows, we will need to emulate DIR. We can use this port of dirent.h to Windows as a reference for the implementation of the specific structs and directory functions.

Wrapping fcntl

fcntl is a slightly special case since its behavior is very generic depending on the command provided. However, we only use it with the commands F_GETFL and F_SETFL to get/set flags on a file descriptor/handle. To avoid implementing a lot of unnecessary functionality, it would be simpler to replace its usage with two wrapper functions

  • __wasi_errno_t os_handle_get_fdflags(os_handle handle, __wasi_fdflags_t *out_flags);
  • __wasi_errno_t os_handle_set_fdflags(os_handle handle, __wasi_fdflags_t flags);

Testing

wasi-testsuite already contains many tests for the available filesystem functions but there are some gaps due to not having migrated over all the tests from the separate runtimes. To fill the gaps, we can move over the missing test cases from the previous wasmtime tests. See WASI filesystem functions test status for the test status of each filesystem function in wasi-testsuite and wasmtime. fd_fdstat_set_rights is the only function which is not tested at all in either the wasmtime or wasi-suite tests; so we will need to write a test for this function separately and add it to wasi-testsuite.

Currently the WASI tests are not run on Windows in CI but work is in progress already to enable them.

Alternatives

Implementing the POSIX interface directly

Instead of wrapping the POSIX interface, we could implement it directly. This would mean almost no changes to the core business logic and would avoid some boilerplate in the POSIX implementation of the wrapper functions (which will mostly pass through various parameters without modification). However, there are a few disadvantages:

  • It will add some boilerplate to non-POSIX platforms since we will need to implement POSIX types (e.g. iovec).
  • We would need to care on platforms which support a subset of POSIX functionality (like Windows) to ensure some degree of compatibility between our own POSIX implementation and the POSIX functions provided by the native platform.
  • We have less freedom to implement a simpler interface. Using wrapper functions allows us to simplify the POSIX interface where necessary since often we wouldn’t need the generality of the original function. e.g. see Wrapping fcntl. If we directly implement the POSIX interface, we could either
    • a) implement everything properly according to the specification and add unnecessary code/complexity
    • b) not implement it according to the specification and potentially cause confusion to developers when the function does not behave according to POSIX standards.

Instead of directly implementing all of the POSIX interface, we could also implement just [dirent.h](https://pubs.opengroup.org/onlinepubs/7908799/xsh/dirent.h.html) by itself. This is definitely possible as evidenced by this Windows port. However, it probably could not be a standalone implementation since [fdopendir](https://pubs.opengroup.org/onlinepubs/9699919799/functions/fdopendir.html) at least would need to interact with our internally defined platform handle types.

Abstracting platform-specific APIs at the WASI interface level

At the other end of the spectrum, we could abstract the use of some POSIX functions away at a higher level, by implementing some of the WASI functions themselves (declared here) per platform. The main advantage of this approach would be less complexity since it involves one less ‘layer’. For example, the WASI function fd_sync is itself a thin wrapper on top of [fsync](https://pubs.opengroup.org/onlinepubs/009695399/functions/fsync.html). The issue with this approach is that all of these WASI functions need to look up the host handle from the WASI fd as a minimum before invoking the POSIX function. Therefore we would need to:

  • a) duplicate this business logic across platforms which is definitely not ideal since it is platform-agnostic anyway.
  • b) somehow extract this piece of logic out of the WASI functions but it would involve a more significant/disruptive refactor of the codebase - if it is possible at all.

Tasks

  • [x] Fill WASI testing gaps
  • [x] Abstract POSIX functions/types from WASI libc implementation
  • [x] Implement POSIX filesystem functions on Windows

Appendix

Currently used POSIX filesystem functions

POSIX function Notes
fstat
fcntl
closedir
close
fdatasync
fsync
preadv
pwritev
readv
writev
posix_fallocate
openat
readlinkat
mkdirat
realpath
linkat
fstatat
fdopendir
rewinddir
seekdir
readdir
renameat
ftruncate
futimens
utimensat
symlinkat
unlinkat
lseek
posix_fadvise

WASI filesystem functions test status

Function WASI test suite status wasmtime tests status Action
fd_advise E E D
fd_allocate I E C
fd_close I I C
fd_datasync N I C
fd_fdstat_get I I C
fd_fdstat_set_flags E E D
fd_fdstat_set_rights N N W
fd_filestat_get E E D
fd_filestat_set_size N E C
fd_filestat_set_times N E C
fd_pread N E C
fd_prestat_dir_name I I C
fd_prestat_get I I C
fd_pwrite N E C
fd_read I E C
fd_readdir E E D
fd_renumber N E C
fd_seek E E C
fd_sync N I C
fd_tell N E C
fd_write E E C
path_create_directory I I C
path_filestat_get I E C
path_filestat_set_times N E C
path_link N E C
path_open I E C
path_readlink N E C
path_remove_directory I I C
path_rename N E C
path_symlink I E C
path_unlink_file I I C

Test status:

  • I = indirectly tested i.e. function is used in other tests but there is no explicit test for that function
  • E = explicitly tested, there are one or more tests dedicated to testing that function
  • N = not tested at all

Action:

  • C = copy relevant tests from wasmtime to WASI test suite
  • W = write new test to fill testing gap
  • D = testing is sufficient (whether indirect/explicit) so no action is required

zoraaver avatar Aug 22 '23 14:08 zoraaver

The issue with this approach is that all of these WASI functions need to look up the host handle from the WASI fd as a minimum before invoking the POSIX function.

it's necessary for other approaches too, isn't it?

yamt avatar Sep 26 '23 05:09 yamt

it's necessary for other approaches too, isn't it?

Yes, we always have to look up the host handle from the WASI fd, regardless of the approach. But if we implement the WASI functions for each platform, this logic would be duplicated for each platform since it is platform-agnostic.

zoraaver avatar Oct 12 '23 09:10 zoraaver

All the work has been merged to main. The only WASI function that remains to be implemented on Windows is poll_oneoff.

zoraaver avatar Feb 29 '24 14:02 zoraaver