Motivation

Currently MM/FS/.. structures are non-nullable immutable fields in Process/PoxixThread.

pub struct Process {
    // ..
    process_vm: ProcessVm,
    // ..
}

pub struct PosixThread {
    // ..
    file_table: RoArc<FileTable>,
    fs: Arc<ThreadFsInfo>,
    // ..
}

Unfortunately, these are wrong because

These fields must be mutable for unshare/execve to work properly.
- For example, unshare(CLONE_FS) must change the FS field.
- As another example, execve() must create a new ProcessVm when the old one is shared with another process.
These fields should be nullable so that exit can drop the data accordingly.
- Delaying the drop until the Task is dropped is not appropriate.
  - In this case, the drop will be done in the context of another task, thus blocking the other task from running.
  - For example, closing a TCP socket may have to wait until the TCP socket enters the CLOSED state). [^1]

[^1]: Note that we currently always close files regardless of whether file tables are shared with other processes in do_exit, which is quite wrong, although the wrongness is in a different way.

How to fix these problems?

Proposal 1

struct Process {
    process_vm: Mutex<Option<Arc<ProcessVm>>>,
}

struct ThreadLocal {
    process_vm: RefCell<Option<Arc<ProcessVm>>>,
}

A copy is kept in ThreadLocal so that we can access these key structures without holding additional locks.
The unshare/execve will update the fields in Process and ThreadLocal together.
We'll drop all Arc<ProcessVm>s of the current thread/process when we exit. After that, the fields will contain None.

Proposal 2

struct PosixThread {
    file_table: Mutex<RoArc<FileTable>>,
}

struct ThreadLocal {
    file_table: RefCell<Option<RwArc<FileTable>>>,
}

exit will drop only the RwArc<> in ThreadLocal, with the following tricks:

impl<T> RwArc<T> {
    // Drops this `RwArc` and invokes `f` _if it is the dropping `RwArc` is last one_.
    pub fn drop_with<R>(self, f: impl FnOnce(&mut T) -> R) - R;
}

fn do_exit() {
    // ...
    thread_local
        .file_table
        .borrow_mut()
	.take()
	.drop_with(|file_table| file_table.close_all_files());
    // ...
}

Unresolved question

Mutex<Option<Arc<T>>> is verbose. Are there other, cleaner solutions?
If we decide to go with these proposals, do we want to accept both Proposal 1 and Proposal 2, just Proposal 1, or just Proposal 2?

Feb 14 '25 05:02 lrh2000

Recently, I attempted to implement mutability related to Process VM and I referred to the suggestions in this PR. However, I encountered an issue where there is redundancy of VmSpace in both the Task's UserSpace and ThreadLocal's Vmar. I believe it's unnecessary for VmSpace information to exist simultaneously in both UserSpace and ThreadLocal. Currently, I have two ideas:

The first approach is to follow the proposal in this issue by additionally removing VmSpace from UserSpace and providing a method that allows the underlying OSTD to utilize thread-local variables to retrieve VM information, similar to what was implemented in the last commit in #1852 . However, this would make the semantics of UserSpace rather strange.
The second approach is to move VM-related information into UserSpace, meaning no extra Vmar information would be stored in the ThreadLocal. Instead, it would reside within UserSpace, which would then serve as a thread-local data dedicated specifically for VM usage.

// in ostd

pub struct UserSpace {
    /// cpu context before entering user space
    init_ctx: UserContext,
    /// The VM address space of this task.
    vm_space: Box<dyn TaskVM>,
}

pub trait TaskVM {
    fn vm_space(&self) -> Option<Ref<'_, Arc<VmSpace>>>;
}

impl UserSpace {
  pub fn new<T>(vm_space: T, init_ctx: UserContext) -> Self 
  where T: TaskVM {
      Self {
          vm_space: SpinLock::new(vm_space),
          init_ctx,
      }
  }

// in kernel

pub struct ProcessVmar {
    inner: RefCell<Option<Vmar>>
}

impl TaskVM for ProcessVmar {
}

Look forward to the advice.

Feb 24 '25 08:02 cchanging

pub trait TaskVM {
    fn vm_space(&self) -> Option<Ref<'_, Arc<VmSpace>>>;
}

How do you feel about:

fn post_schedule_handler() {
    // Call `VmSpace::activate` for the `VmSpace` in `ThreadLocal`
}

fn page_fault_handler() {
    // Handle the page fault for the `ProcessVm` in `ThreadLocal`
}

fn init() {
    ostd::inject_post_schedule_handler(&post_schedule_handler);
    ostd::inject_page_fault_handler(&page_fault_handler);
}

I think this results in cleaner code in the OSTD.

However, this would make the semantics of UserSpace rather strange.

In an ideal design, we can have UserSpace = UserContext + VmSpace and have properly designed APIs to ensure that UserContext and VmSpace are always used consistently.

For example, in UserMode::execute() we used to call VmSpace::activate() and then run the userspace code.
But now the above won't be true. In the current implementation, VmSpace::activate() is called at schedule().

So I feel that binding UserContext and VmSpace does not seem to help. This is especially true because if we look closely at the code, we'll see that the vm_space field in UserSpace is only accessed in a getter method UserSpace::vm_space().

This way I don't see too much value in continuing to bind UserContext and VmSpace in a single structure. Removing VmSpace from UserSpace does not look weird from my perspective.

Feb 24 '25 12:02 lrh2000

fn page_fault_handler() { // Handle the page fault for the ProcessVm in ThreadLocal }

I have also tried this approach. The approach in PR #1852 is that the interface type of the trait can be a bit odd because the current implementation includes a Ref in the return type (I couldn't think of a better way to do it). As for this approach, I think the problem is that after doing this, there would no longer be an explicit binding relationship between Task and VmSpace in OSTD. This would make the examples written in write_a_kernel_in_100_lines look very strange.

Feb 25 '25 02:02 cchanging

As for this approach, I think the problem is that after doing this, there would no longer be an explicit binding relationship between Task and VmSpace in OSTD. This would make the examples written in write_a_kernel_in_100_lines look very strange.

I thought that in write_a_kernel_in_100_lines we could just manually call VmSpace::activate before running the userspace.

On second thought, I realize that this is only correct if there is only one userspace program loaded in the kernel (otherwise, if the task is preempted between VmSpace::activate and UserSpace::execute, we can start executing the UserSpace at the wrong VmSpace).

I can't say it's perfect, and I don't know if it's an acceptable solution. Personally, I think the 100-line demo should be as simple as possible. It's only a single-task demo, so I don't think the race condition that only occurs when we start multiple userspace tasks is important. On the other hand, one can argue that having a "misleading" demo is not very good.

Feb 25 '25 02:02 lrh2000

As a future possibility, I think injecting handlers enables the following code pattern:

fn run_user_space() {
    let irq_guard = disable_irq();
    vm_space.activate();

    // We need to disable interrupts before we switch privileges anyway, so passing an IRQ guard
    // isn't really weird. (The userspace will still run with IRQ enabled, of course).
    user_space.execute_with_guard(irq_guard);
}

fn read_user_data() {
    // Disabling preemption also means that `page_fault_handler` must be atomic. This is
    // unfortunate, but it _works_. For example, if we need to address disk I/O, we can move it out
    // of `page_fault_handler` and into a slow path in `read_user_data`.
    let preempt_guard = disable_preempt();

    vm_space.activate();
    vm_space.reader().read_val()
}

I don't know if anyone wants that. But I don't think there's anything wrong with OSTD allowing it (maybe in the future if someone makes a feature request).

On the other hand, it's not really necessary for OSTD to enforce that a task's VmSpace must be enabled when switching to that task. There are many reasons why it is not necessary, and perhaps we should just not enforce it:

This has nothing to do with the safety/soundness of the kernel. Meanwhile, OSTD tries to be Linux agnostic. So the fewer assumptions it makes, the better.
Not enforcing this makes things simpler. We'll have less code in OSTD, so our trust base will be smaller. Meanwhile, the TaskVm trait used to enforce it is very ugly, e.g., the Ref type is odd.
Even if we're going to enforce it, the result is that we cannot enforce it:
- We need the kernel to implement TaskVm::vm_space, nothing can prevent a kernel from returning a false VmSpace or just None.
- The VmSpace for a task is now mutable, so it can switch to another VmSpace at will. But after that, it's up to the kernel again to make sure that the VmSpace is activated correctly (i.e. it can't rely on OSTD, which activates a VmSpace only on context switching).

Feb 26 '25 06:02 lrh2000

Correct types for MM/FS/.. fields in `Process`/`PosixThread`

Motivation

Proposal 1

Proposal 2

Unresolved question