A Scheduling Problem on a Multicore Operating System

Jan 27, 2020
#cs


Background

I'm currently working as a UROP (Undergraduate Research Opportunities Program) student in the PDOS (Parallel Distributed Operating Systems) group, which is part of MIT CSAIL (Computer Science & Artificial Intelligence Lab). MIT really loves its acronyms.

The project I am helping out with involves a multicore operating system. In the upcoming experiment phase, we will be toggling a lot of system parameters to take measurements. It would be pretty annoying if we had to reboot the system every time we wanted to change a parameter, since the process takes some time. Therefore, I was tasked with modifying the operating system to make it possible to change parameters at runtime. Some of these parameters perform sensitive actions like overwriting kernel text, so they can only be toggled when no other cores are running.

The Problem

I needed to find a way to safely pause and resume other cores on the machine. In other words, I had to implement the following kernel-space function (fn is a function that has no input or output, for our use case it changes the system parameter):

void pause_other_cores_and_call(void (*fn)(void));

It was tricky and fun to get the thread-safety arguments correct, so I wanted to share my process.

A Solution

The first step was figuring out how to notify the other cores that they should pause. This can be achieved with the PIC (Programmable Interrupt Controller), a chip that allows CPUs to send interrupts to each other. I won't go into the specifics because it isn't relevant for the algorithm, so let's just assume the following code triggers an interrupt on the other cores:

void pause_other_cores_and_call(void (*fn)(void)) {
  for (int i = 0; i < NCORES; i++) {
    if (i != mycpu()->id)
      send_ipi(i, T_PAUSE); // ipi = inter-processor interrupt
  }
}

In the interrupt (trap) handler, we can add a case to handle T_PAUSE.

void trap() {
  // ...
  switch(INTERRUPT_CODE) {
    // ...
    case T_PAUSE:
      // we'll fill this in later
      break;
  }
  // ...
}

If the core running pause_other_cores_and_call gets interrupted and the process gets scheduled onto another core by the OS, then we might not end up pausing the cores correctly. In the extreme case, if the process moves to core i before every check on line 3, then none of the cores will be paused. Therefore we need to disable interrupts.

void pause_other_cores_and_call(void (*fn)(void)) {
  disable_interrupts();
  for (int i = 0; i < NCORES; i++) {
    if (i != mycpu()->id)
      send_ipi(i, T_PAUSE); // ipi = inter-processor interrupt
  }
  enable_interrupts();
}

From now on, I'll refer to the core that is running pause_other_cores_and_call as the master core. The master core needs to know when it is safe to call the function fn. One way is to keep a num_paused_cores counter that is incremented by each of the other cores.

atomic<int> num_paused_cores {0};

void pause_other_cores_and_call(void (*fn)(void)) {
  disable_interrupts();
  for (int i = 0; i < NCORES; i++) {
    if (i != mycpu()->id)
      send_ipi(i, T_PAUSE); // ipi = inter-processor interrupt
  }
  while (num_paused_cores < NCORES - 1);
  fn();
  enable_interrupts();
}
void trap() {
  // ...
    case T_PAUSE:
      num_paused_cores++;
      while(true);
      break;
  }
  // ...
}

Great! Now we can guarantee that the function fn is only run once all the other cores are paused. Note that cores cannot be interrupted while in the trap handler, so they will remain in the spin-loop forever. But we're only halfway done, since we need to also resume the other cores once the function returns. I accomplished this by using an additional piece of state.

atomic<int> num_paused_cores {0};
enum pause_state_t { Running, Done };
pause_state_t state;

void pause_other_cores_and_call(void (*fn)(void)) {
  disable_interrupts();
  state = Running;
  for (int i = 0; i < NCORES; i++) {
    if (i != mycpu()->id)
      send_ipi(i, T_PAUSE); // ipi = inter-processor interrupt
  }
  while (num_paused_cores < NCORES - 1);
  fn();
  state = Done;
  enable_interrupts();
}
void trap() {
  // ...
    case T_PAUSE:
      num_paused_cores++;
      while(state == Running);
      num_paused_cores--;
      break;
  }
  // ...
}

Ah, if only it were that simple. We run into a nasty race condition if two cores enter the body of pause_other_cores_and_call at the same time. Since both cores will have interrupts disabled, the number of paused cores will never reach NCORES - 1, and the system will deadlock with some cores spinning indefinitely on line 12 of pause_other_cores_and_call and the others spinning indefinitely on line 5 of trap. Therefore we need some extra state to make sure only one core enters the critical section.

atomic<int> num_paused_cores {0};
enum pause_state_t { Ready, Running, Done };
pause_state_t state = Ready;

void pause_other_cores_and_call(void (*fn)(void)) {
  while (!atomic_compare_and_swap(&state, Ready, Running));
  disable_interrupts();
  for (int i = 0; i < NCORES; i++) {
    if (i != mycpu()->id)
      send_ipi(i, T_PAUSE); // ipi = inter-processor interrupt
  }
  while (num_paused_cores < NCORES - 1);
  fn();
  state = Done;
  while (num_paused_cores > 0):
  state = Ready;
  enable_interrupts();
}

The atomic compare and swap instruction atomically sets state to Running if it is equal to Ready. We also change the state from Done to Ready once all the cores have resumed.

To summarize the algorithm, the system starts out with state = Ready. The master core is the one that successfully performs the atomic compare and swap, and sets the state to be Running. Each of the other cores increments num_paused_cores, until it reaches NCORES - 1 upon which the master core knows it is safe to call fn. Afterwards, it sets state = Done. Each of the other cores exit the spin-loop and decrements the num_paused_cores counter. When it reaches zero, the master core resets the state to be Ready.

Now we have a truly safe (I think) way to pause and resume cores!

Acknowledgements

Thanks to Fintelia for mentoring me in this project and proofreading this blog post.





Comment