Correct types for MM/FS/.. fields in `Process`/`PosixThread`
Motivation
Currently MM/FS/.. structures are non-nullable immutable fields in Process/PoxixThread.
pub struct Process {
// ..
process_vm: ProcessVm,
// ..
}
pub struct PosixThread {
// ..
file_table: RoArc<FileTable>,
fs: Arc<ThreadFsInfo>,
// ..
}
Unfortunately, these are wrong because
- These fields must be mutable for
unshare/execveto work properly.- For example,
unshare(CLONE_FS)must change the FS field. - As another example,
execve()must create a newProcessVmwhen the old one is shared with another process.
- For example,
- These fields should be nullable so that
exitcan drop the data accordingly.- Delaying the drop until the
Taskis dropped is not appropriate.- In this case, the drop will be done in the context of another task, thus blocking the other task from running.
- For example, closing a TCP socket may have to wait until the TCP socket enters the CLOSED state). [^1]
- Delaying the drop until the
[^1]: Note that we currently always close files regardless of whether file tables are shared with other processes in do_exit, which is quite wrong, although the wrongness is in a different way.
How to fix these problems?
Proposal 1
struct Process {
process_vm: Mutex<Option<Arc<ProcessVm>>>,
}
struct ThreadLocal {
process_vm: RefCell<Option<Arc<ProcessVm>>>,
}
- A copy is kept in
ThreadLocalso that we can access these key structures without holding additional locks. - The
unshare/execvewill update the fields inProcessandThreadLocaltogether. - We'll drop all
Arc<ProcessVm>s of the current thread/process when we exit. After that, the fields will containNone.
Proposal 2
struct PosixThread {
file_table: Mutex<RoArc<FileTable>>,
}
struct ThreadLocal {
file_table: RefCell<Option<RwArc<FileTable>>>,
}
-
exitwill drop only theRwArc<>inThreadLocal, with the following tricks:
impl<T> RwArc<T> {
// Drops this `RwArc` and invokes `f` _if it is the dropping `RwArc` is last one_.
pub fn drop_with<R>(self, f: impl FnOnce(&mut T) -> R) - R;
}
fn do_exit() {
// ...
thread_local
.file_table
.borrow_mut()
.take()
.drop_with(|file_table| file_table.close_all_files());
// ...
}
Unresolved question
-
Mutex<Option<Arc<T>>>is verbose. Are there other, cleaner solutions? - If we decide to go with these proposals, do we want to accept both
Proposal 1andProposal 2, justProposal 1, or justProposal 2?
Recently, I attempted to implement mutability related to Process VM and I referred to the suggestions in this PR. However, I encountered an issue where there is redundancy of VmSpace in both the Task's UserSpace and ThreadLocal's Vmar. I believe it's unnecessary for VmSpace information to exist simultaneously in both UserSpace and ThreadLocal. Currently, I have two ideas:
-
The first approach is to follow the proposal in this issue by additionally removing
VmSpacefromUserSpaceand providing a method that allows the underlyingOSTDto utilize thread-local variables to retrieve VM information, similar to what was implemented in the last commit in #1852 . However, this would make the semantics ofUserSpacerather strange. -
The second approach is to move VM-related information into
UserSpace, meaning no extraVmarinformation would be stored in theThreadLocal. Instead, it would reside withinUserSpace, which would then serve as a thread-local data dedicated specifically for VM usage.
// in ostd
pub struct UserSpace {
/// cpu context before entering user space
init_ctx: UserContext,
/// The VM address space of this task.
vm_space: Box<dyn TaskVM>,
}
pub trait TaskVM {
fn vm_space(&self) -> Option<Ref<'_, Arc<VmSpace>>>;
}
impl UserSpace {
pub fn new<T>(vm_space: T, init_ctx: UserContext) -> Self
where T: TaskVM {
Self {
vm_space: SpinLock::new(vm_space),
init_ctx,
}
}
// in kernel
pub struct ProcessVmar {
inner: RefCell<Option<Vmar>>
}
impl TaskVM for ProcessVmar {
}
Look forward to the advice.
pub trait TaskVM { fn vm_space(&self) -> Option<Ref<'_, Arc<VmSpace>>>; }
How do you feel about:
fn post_schedule_handler() {
// Call `VmSpace::activate` for the `VmSpace` in `ThreadLocal`
}
fn page_fault_handler() {
// Handle the page fault for the `ProcessVm` in `ThreadLocal`
}
fn init() {
ostd::inject_post_schedule_handler(&post_schedule_handler);
ostd::inject_page_fault_handler(&page_fault_handler);
}
I think this results in cleaner code in the OSTD.
- However, this would make the semantics of
UserSpacerather strange.
In an ideal design, we can have UserSpace = UserContext + VmSpace and have properly designed APIs to ensure that UserContext and VmSpace are always used consistently.
- For example, in
UserMode::execute()we used to callVmSpace::activate()and then run the userspace code. - But now the above won't be true. In the current implementation,
VmSpace::activate()is called atschedule().
So I feel that binding UserContext and VmSpace does not seem to help. This is especially true because if we look closely at the code, we'll see that the vm_space field in UserSpace is only accessed in a getter method UserSpace::vm_space().
This way I don't see too much value in continuing to bind UserContext and VmSpace in a single structure. Removing VmSpace from UserSpace does not look weird from my perspective.
fn page_fault_handler() { // Handle the page fault for the
ProcessVminThreadLocal}
I have also tried this approach. The approach in PR #1852 is that the interface type of the trait can be a bit odd because the current implementation includes a Ref in the return type (I couldn't think of a better way to do it). As for this approach, I think the problem is that after doing this, there would no longer be an explicit binding relationship between Task and VmSpace in OSTD. This would make the examples written in write_a_kernel_in_100_lines look very strange.
As for this approach, I think the problem is that after doing this, there would no longer be an explicit binding relationship between
TaskandVmSpacein OSTD. This would make the examples written inwrite_a_kernel_in_100_lineslook very strange.
I thought that in write_a_kernel_in_100_lines we could just manually call VmSpace::activate before running the userspace.
On second thought, I realize that this is only correct if there is only one userspace program loaded in the kernel (otherwise, if the task is preempted between VmSpace::activate and UserSpace::execute, we can start executing the UserSpace at the wrong VmSpace).
I can't say it's perfect, and I don't know if it's an acceptable solution. Personally, I think the 100-line demo should be as simple as possible. It's only a single-task demo, so I don't think the race condition that only occurs when we start multiple userspace tasks is important. On the other hand, one can argue that having a "misleading" demo is not very good.
As a future possibility, I think injecting handlers enables the following code pattern:
fn run_user_space() {
let irq_guard = disable_irq();
vm_space.activate();
// We need to disable interrupts before we switch privileges anyway, so passing an IRQ guard
// isn't really weird. (The userspace will still run with IRQ enabled, of course).
user_space.execute_with_guard(irq_guard);
}
fn read_user_data() {
// Disabling preemption also means that `page_fault_handler` must be atomic. This is
// unfortunate, but it _works_. For example, if we need to address disk I/O, we can move it out
// of `page_fault_handler` and into a slow path in `read_user_data`.
let preempt_guard = disable_preempt();
vm_space.activate();
vm_space.reader().read_val()
}
I don't know if anyone wants that. But I don't think there's anything wrong with OSTD allowing it (maybe in the future if someone makes a feature request).
On the other hand, it's not really necessary for OSTD to enforce that a task's VmSpace must be enabled when switching to that task. There are many reasons why it is not necessary, and perhaps we should just not enforce it:
- This has nothing to do with the safety/soundness of the kernel. Meanwhile, OSTD tries to be Linux agnostic. So the fewer assumptions it makes, the better.
- Not enforcing this makes things simpler. We'll have less code in OSTD, so our trust base will be smaller. Meanwhile, the
TaskVmtrait used to enforce it is very ugly, e.g., theReftype is odd. - Even if we're going to enforce it, the result is that we cannot enforce it:
- We need the kernel to implement
TaskVm::vm_space, nothing can prevent a kernel from returning a falseVmSpaceor justNone. - The
VmSpacefor a task is now mutable, so it can switch to anotherVmSpaceat will. But after that, it's up to the kernel again to make sure that theVmSpaceis activated correctly (i.e. it can't rely on OSTD, which activates aVmSpaceonly on context switching).
- We need the kernel to implement