COLLOQUIUM Computer Science Department, Boston University Speaker: Alexandra (Sasha) Fedorova Harvard University Date: Friday, March 24, 2006 Time: 11:00 Place: Room MCS 135, 111 Cummington Street (for directions, see www.cs.bu.edu/colloquium) Title: Operating System Scheduling for Multicore Processors Abstract: Multicore processors, the currently emerging family of microprocessors, run multiple application threads in parallel, and, in contrast with conventional processors, threads share processor resources, such as the second-level cache. Contention for the shared cache may affect how a thread performs. The degree of contention for the cache varies depending on the threads sharing the cache, and, therefore, a thread's performance varies depending on the threads it runs with. A thread scheduler decides how to assign threads to processors in order to achieve goals such as optimal performance or fairness. Scheduling algorithms for multicore processors must take into account the effects of contention for the shared cache in order to work properly.Fair-share scheduling, a popular policy used by many scheduling algorithms, has a desirable property of predictability: given a share of processor's time a thread is expected to complete certain amount of work. I demonstrate that this property is lost if traditional fair-share schedulers are used on multicore processors: a thread's performance varies by as much as 36% depending on the threads it runs with. I present the cache-fair scheduling algorithm - the fair-share scheduling algorithm adapted for multicore processors. This algorithm accounts for the performance effects of contention for the shared cache when making scheduling decisions. My algorithm uses an analytical model to estimate the cache miss rate a thread would have if the cache were shared equally among all the threads, the fair miss rate. It then monitors the behavior of threads at runtime, and if a thread's rate of progress is affected due to the deviation from its fair miss rate, it adjusts the thread's share of CPU cycles proportionately. Introducing the shared cache awareness into the scheduling algorithm results in reducing performance variability by at least a factor of two and by as much as a factor of seven in cases of significant variability. I implemented the cache-fair algorithm in the Solaris operating system, and it works without any advance knowledge about the workload and relies only on hardware performance counters typically available on modern processors. Host: Rich West