Use `TEB` to get process information on Windows
Proposal
Problem statement
Standard library on Windows use some API bindings to get process information, but those bindings have a higher overhead than using TEB, an internal struct on Windows, which contains many process information within its members, we can use it to improve performance.
Motivating examples or use cases
- Get stdout/stderr handle
- Get command line pointer
- Get process id
- Get last error(most common case)
Solution sketch
The first step is to define types for TEB and its members:
windows.txt
The second step is to obtain pointer of TEB, there are different ways to do it:
- use NtCurrentTeb binding:
extern "system" {
fn NtCurrentTeb() -> *const TEB;
}
fn get_teb() -> *const TEB {
NtCurrentTeb()
}
- use register value:
/// x86_64
use std::arch::asm
fn get_teb() -> *const TEB {
let peb;
unsafe {
asm!(
"mov, {}, gs:[0x30]",
out(reg) peb,
)
}
peb
}
/// x86
use std::arch::asm
fn get_teb() -> *const TEB {
let peb;
unsafe {
asm!(
"mov, {}, fs:[0x18]",
out(reg) peb,
)
}
peb
}
The final step is to replace those bindings with member accessment of TEB.
For example:
pub fn get_last_error() -> WinError {
// SAFETY: This just returns a thread-local u32 and has no other effects.
unsafe { WinError { code: (*get_teb()).last_error_value } }
}
Alternatives
Links and related work
-
ZigusesTEBto get process information and treat it and most of its members as non-null pointers: https://github.com/ziglang/zig/blob/master/lib/std/os/windows.zig -
Local benchmark of different ways to get command line pointer:
TEB/inline-assembly time: [241.46 ps 243.36 ps 245.64 ps]
TEB/NtCurrentTeb time: [1.1975 ns 1.1998 ns 1.2028 ns]
TEB/GetCommandLineW time: [1.6770 ns 1.6790 ns 1.6817 ns]
What happens now?
This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.
Possible responses
The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):
- We think this problem seems worth solving, and the standard library might be the right place to solve it.
- We think that this probably doesn't belong in the standard library.
Second, if there's a concrete solution:
- We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
- We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.
I think this needs a stronger motivation. You mention performance but I would like to see some real-world numbers. E.g. it's unlikely people are getting the command line in a tight loop and any function call overhead is going to be dwarfed by parsing the command line in any case. The most common (GetLastError) is called only after a much more expensive call into the Windows API.
@ChrisDenton
E.g. it's unlikely people are getting the command line in a tight loop and any function call overhead is going to be dwarfed by parsing the command line in any case.
Indeed, but TEB also contains length of command line buffer, current cmd line parsing implementation on windows uses iterator and null-check, this could turn it into a slice iterator and get slight improvment(Caveat: this may get hacked if another process modifys the length, but I think is fine since it can also modify the pointer of cmd line and that's what GetCommandLineW returns, see this for detail).
The most common (
GetLastError) is called only after a much more expensive call into the Windows API.
Benchmarking with recursive read_dir(contains several get_last_errors) shows no significant difference, so this doesn't impact much as expected.
As this isn't an API change, only an implementation change, the ACP process is not needed for this. However as @ChrisDenton said, the motivation isn't sufficiently strong for making this change anyways.