Subject: IEEE-CS TC-RTS Newsletter for Thu Aug 05, 1999 _______________________________________________________________________________ __ _ __ ___ ___ __ __ I E E E Technical Committee |\ | |_ | | (_' | |_ | | |_ |_) C S on Real-Time Systems | \| |__ |/\| ,_) |__ |__ | | |__ | \ _______________________________________________________________________________ Table of Contents Line ----------------- ---- 1. best@cs.bu.edu Wed Jun 2 20:44:54 1999 (98 lines) QLinux: A QoS enhanced Linux Kernel for Multimedia Computing....... 2 2. Jane Liu [mailto:janeliu@cs.uiuc.edu] (1213 lines) RFC: How To Get Most Predictability Out Of Windows NT.............. 101 3. "Bruce H. Krogh" (117 lines) HYBRID SYSTEMS: CFP................................................ 1314 4. "Azer Bestavros" (111 lines) BU/NSF Workshop on Internet Measurement, Instrumentation, and Cha.. 1431 ------------------------------------------------------------------------------ <<<<<<<<<<<<<<<<<<* START OF THE IEEE-CS TC-RTS NEWSLETTER *>>>>>>>>>>>>>>>>>> ------------------------------------------------------------------------------ Message 1; Postmarked Wed Jun 2 20:44:54 1999 Subject: QLinux: A QoS enhanced Linux Kernel for Multimedia Computing Content-Length: 3643 QLinux: A QoS enhanced Linux Kernel for Multimedia Computing We are pleased to announce the public release of the QLinux kernel. QLinux, based on the Linux 2.2.x kernel, combines some of the latest innovations in operating systems research. It includes the following features: * Hierarchical Start Time Fair Queuing (H-SFQ) CPU scheduler * Hierarchical Start Time Fair Queuing (H-SFQ) network packet scheduler * Lazy receiver processing (LRP) network subsystem * Cello disk scheduling algorithm [not stable yet] The H-SFQ CPU scheduler enables hierarchical scheduling of applications by fairly allocating cpu bandwidth to individual applications and application classes. The H-SFQ packet scheduler provides rate guarantees and fair allocation of bandwidth to packets from individual flows as well as flow aggregates (classes). Lazy receiver processing enables accurate charging of TCP/UDP protocol processing overhead (including interrupt processing) to the appropriate process. The Cello disk scheduler supports multiple application classes such as interactive best-effort, throughput-intensive best effort and soft real-time and fairly allocates disk bandwidth to these classes. When enabled, these features replace the standard features/schedulers available in Linux. QLinux provides the flexibility of allowing any combination of these features to be compiled as needed. The current version for QLinux (based on the 2.2.0 kernel) is available for download from http://www.cs.umass.edu/~lass/software/qlinux A port to 2.2.9 kernel will be available in the near future. The QLinux developers can be reached at qlinux@cs.umass.edu. QLinux announcements are available by subscribing to the qlinux-announce mailing list To subscribe, send a mail to majordomo@cs.umass.edu with the body "subscribe qlinux-announce" QLinux is a joint effort between AT&T Labs-Research, Distributed Multimedia Computing Laboratory (Univ. of Texas) and the Laboratory for Advanced System Software (Univ. of Massachusetts). QLinux has been developed by the following people: Pawan Goyal (Ensim Corporation, formerly with AT&T Research) Jasleen Kaur Sahni (Univ. of Texas) Prashant Shenoy (Univ. of Massachusetts) Raghav Srinivasan (Univ. of Massachusetts) Harrick Vin (Univ. of Texas). T. R. Vishwanath (Univ. of Texas) We look forward to feedback from users of QLinux. Pawan Goyal, Prashant Shenoy and Harrick Vin (qlinux@cs.umass.edu) Acknowledgments --------------- Inputs and/or resources for the QLinux project were provided by Gisli Hjalmtysson (AT&T Research) and R. Gopal (AT&T Research) References --------- QLinux is based on the following research publications: [1] P. Goyal and X. Guo and H.M. Vin, A Hierarchical CPU Scheduler for Multimedia Operating Systems, Proceedings of 2nd Symposium on Operating System Design and Implementation (OSDI'96), Seattle, WA, pages 107-122, October 1996. [2] P. Goyal and H. M. Vin and H. Cheng, Start-time Fair Queuing: A Scheduling Algorithm for Integrated Services Packet Switching Networks, In IEEE/ACM Transaction on Networking, October 1997. A preliminary version appeared in the Proceedings of ACM SIGCOMM'96, pages 157-168, August 1996. [3] P. Druschel and G. Banga, Lazy Receiver Processing (LRP): A Network Subsystem Architecture for Server Systems, Proceedings of the 2nd Symposium on Operating System Design and Implementation (OSDI'96), Seattle, WA, Pages 261-275, October 1996 [4] P Shenoy and H M. Vin, Cello: A Disk Scheduling Framework for Next Generation Operating Systems, Proceedings of ACM SIGMETRICS Conference, Madison, WI, 44-55, June 1998. ------------------------------------------------------------------------------ Message 2; Postmarked Tue Jun 8 11:07:43 1999 From: Jane Liu [mailto:janeliu@cs.uiuc.edu] Subject: RFC: How To Get Most Predictability Out Of Windows NT Content-Length: 50536 REQUEST FOR COMMENTS HOW TO GET MOST PREDICTABILITY OUT OF WINDOWS NT June 1999 J. W. S. Liu, R. Rajkumar, Z. Deng, M. Seri, A. Frei, L. Zhang and C. S. Shih (Please send corrections and suggestions to janeliu@cs.uiuc.edu) 1. INTRODUCTION For real-time applications, Windows NT 4.0 has many shortcomings. The most well-known and frequently discussed ones are - small number of real-time priority levels, - lack of priority inheritance, - unpredictable interrupt latency, - priority inversion due to deferred procedure calls (DPC), - priority inversion due to LPC mechanism, and - timer granularity and accuracy. Recent publications on the use of NT for real-time applications typically make these shortcomings sound more deadly than they really are. This article describes approaches to improve the predictability of real-time applications running on NT. These user-level solutions do not require any extension or modification of the operating system. When used with well designed application software and load management strategies, they can significantly improve predictability. Following this introduction, Sections 2-7 describe the problems listed above in turn and then discuss possible solutions to minimize the problems. Section 8 describes NT/RK (NT/Resource Kernel). NT/RK is a resource management middleware that provides user level resource access control and usage monitoring functions designed to enhance the predictability of real-time applications running on Windows NT. Section 9 summarizes the article. 2. REAL-TIME PRIORITIES Windows NT 4.0 provides only 32 priority levels: - 16 (31-16) real-time priorities, - 15 (15-1) variable priority levels, and - 1 (0) system level. In general, each thread has two priorities, its base priority and its current priority. The current priority of a real-time thread (i.e., one with a priority 16 or higher) is equal to its base priority; the operating system never adjusts the priorities of real-time threads. There are three problems related to real-time priorities in NT 4.0. (1) 16 priority levels are sometimes too few, (2) the system only supports the round-robin (within equal priority) policy, and (3) many kernel-mode system threads run in the real-time priority range. TOO FEW PRIORITY LEVELS The small number of priority levels is not as serious a problem as it sounds for two reasons. First, a larger number of distinct priorities is often not necessary. Most real-time threads execute repeatedly. A commonly used scheme for assigning priorities to them is according to the rates of their execution: the higher the (peak) rate, the higher the priority. This scheme is called the rate-monotonic (RM) scheme [1,2]. An alternative to assign priorities based on relative deadlines (i.e., the maximum allowed response times) of threads: the shorter the relative deadline, the higher the priority. This is called the deadline-monotonic (DM) scheme. These schemes are known to be optimal among all fixed priority schemes. Other assignments (e.g., based on criticality) give poorer real-time performance. There may not be a large number of distinct rates (or relative deadlines) even in large applications. Examples are multimedia and digital control applications. The application may have 16 or fewer distinct frame rates or sampling rates even when it has a large number of video streams or feedback loops. Second, we can meet the real-time requirements of a large and complex application that requires a larger number of priority levels by keeping the processor utilization sufficiently low. In the explanation below, the term ASSIGNED PRIORITIES refers to the priorities that the scheduling algorithm used by the application (programmer) would assign to threads under the ideal condition when the system provides unlimited priority levels. SYSTEM PRIORITY LEVELS refers to the priority level provided by the system. The SCHEDULABLE UTILIZATION of an application is the processor load under which the application can surely meet its real-time requirements. The schedulable utilization depends on the timing attributes (e.g., iteration rates and execution times) of threads and the algorithm used to assign priorities to threads. For a given system of threads and their assigned priorities, the schedulable utilization is a function of the number of assigned priorities, the number of system priority levels, and the method used to map assigned priorities to system priority levels. A good way to map assigned priorities to system priority levels is the constant ratio mapping method [3] proposed by Lehoczky and Sha (Footnote 1). To explain this mapping, we suppose that according to the fixed priority scheme used by the application, threads have assigned priorities 1 to X, where X is a positive integer, and a smaller integer presents a higher priority. This is opposite to the convention used by NT, where a larger integer represents a higher priority. (Footnote 1: Lehoczky and Sha showed that when the system provides 256 priority levels, there is no loss in schedulable utilization of a rate monotonic system even when the system has thousands of distinct assigned priorities. This work provided the theoretical basis for the choice of 256 priority levels in most real-time operating systems.) Figure 1 illustrates constant ratio mapping of the assigned priorities when X > 16. Since NT has 16 real-time priority levels, we divide the assigned priorities into 16 disjoint ranges, [ai, bi], for i = 1, 2, ... 16. Assigned priorities in each range [ai, bi] are mapped to the system priority level 31-i+1. According to the constant ratio mapping, the ratios g = ai/bi for all i = 1, 2, ... 16 are identical. For example, if this ratio is 1/2, then assigned priorities 1 and 2 are mapped to 31, assigned priorities 3, 4, 5 and 6 are mapped to 30, etc. You can see that a mapping with this ratio allows you to map more than 4098 assigned priorities to the 16 system priority levels. We often measure the merit of priority-driven scheduling algorithm by its schedulable utilization, which was defined earlier as a bound on processor utilization under which all threads can meet their deadlines. Lehoczky and Sha showed that if the assigned priorities are chosen according to the RM algorithm, then schedulable utilization = ln (2g)+1-g if g > 1/2 = g if g < 1/2 Again, g is the ratio ai/bi (Footnote 2). Therefore, if you are willing to keep the processor half idle, you can have more than 4098 assigned priorities. If you assign priorities based on other attributes of threads, you will do worse, but there are tools (e.g., PERTS offerred by Tri-Pacific) with which you can determine by how much. (Footnote 2: The result is based on the general "periodic" task model, which is called the peak traffic model in communication literature. According to this model, every thread executes repeatedly for a finite or infinite number of times, and the "period" is the minimum length of time between consecutive releases of the thread. -- This is because the classical bound by Liu and Layland [1] does not require periodic tasks to be truly periodic. -- C. C. Han's Ph.D. thesis (UIUC) showed that one gets the same schedulable utilization based on the distance constraint model, which is more applicable to signal processing applications. According to that model, the maximum execution rate of a thread is not constrained; rather the time interval between consecutive completions of a thread is constrained to be no greater than the distance constraint of the thread.) _________________________________________________________________ Assigned Priorities a1 ---\ | |---> System priority level 31 b1 ---/ a2 ---\ | |---> 30 | | b2 ---/ a3 ---\ | \ | | | |--> 29 | | | / b3 ---/ a4 ---\ | \ | |--> 28 . . . . |--> 16 . / X ---/ _________________________________________________________________ Figure 1 Constant Ratio Mapping to 16 System Priority Levels ROUND-ROBIN AND FIFO POLICIES A slight complication is that SetThreadPriority() provided by Windows NT 4.0 for setting thread priorities allows user threads to specify only priorities 16, 22, 23, 24, 25, 26, and 31. In contrast, kernel threads can specify all 16 real-time priority levels. This means that we need to provide a priority mapping function within a device driver. Rather than calling SetThreadPriority() to set the priority of each thread after its creation, we call the mapping function. This function maps the specified assigned priority of the thread to one of the 16 real- time priority levels and then sets the thread priority to that level. Windows NT 4.0 schedules threads of equal priority on the round robin basis. As you will see shortly, the lack of support for FIFO within equal priority is more problematic than the user- level inaccessibility to all 16 real-time priorities. We will NOT have these problems with Windows NT 5.0 (also called Windows 2000). To explain, we note that NT 5.0 introduces a new object, called JobObject. Like a process group, each job may contain multiple processes. In a SMP system, we can statically configure the system by setting a limit-affinity flag of each job and, thus, binding all the processes in the job to a processor. Other features of the new job objects include the ability to set limits of a job object that control all processes in the job. Examples are user-mode execution time and memory usage limits on per job or per processes basis. A process (or a job) is terminated when the system finds that the process has accumulated more user-mode execution time than its previously set limit. A job containing periodic threads can periodically reset its one execution time limits and thus turns its execution time usage limit into an execution rate limit. From real-time scheduling point of view, the most important improvement offered by NT 5.0 over NT 4.0 are that (1) all 16 real-time priority levels are available to real- time user processes (and hence threads in them) and (2) we can choose between the FIFO and round-robin policies for these process. Specifically, in addition to priority classes offered by NT 4.0, NT 5.0 offers 9 job scheduling classes; class 9 is for real-time applications. A combination of job and process parameters allow us to choose either the FIFO or round-robin policy (within equal priority) for a process. The operating system schedules all processes according to the round-robin policy by default. However, a process that is in the real-time priority class and is in a job whose scheduling class has value 9 and scheduling class limits enabled is scheduled according to the FIFO policy. In other words, we choose FIFO for a process by putting the process in the real-time priority class, giving the scheduling class of the job containing the process a value 9, and enabling the scheduling class limits for the job. EFFECT ON SYSTEM THREADS Many kernel mode system threads execute at real-time priority level 16. Because they have the lowest of all real-time priorities, they should affect only real-time threads with priority 16. However, higher priority real-time threads may delay system threads. This means that in the presence of real- time applications, memory managers, local and network file systems, etc. may not work well. 3. PRIORITY INHERITANCE Windows NT does not support priority inheritance. Rajkumar, et al. [4] showed that this mechanism keeps the duration of priority inversion bounded each time when a higher priority thread is blocked by a lower priority thread (i.e., the higher priority thread waits while the lower priority thread executes). However, the mechanism does not prevent deadlock and allows a thread that requires multiple resources to be blocked a multiple number of times. In a system where a thread may require multiple resources, priority inheritance alone is insufficient. We discuss below three solutions. The solutions are for uniprocessor systems (Footnote 3). All of them do not minimize the duration of priority inversion in NT 4.0. The reason is that NT 4.0 does not support FIFO among equal priority. The effectiveness of the solutions improves significantly in NT 5.0 as it supports the FIFO policy. (Footnote 3: The solutions can be extended in a straightforward manner to deal with resource contention in a statically configured multiprocessor system (i.e., one where every process can execute only on one processor), if threads on different processors do not content for resources.) NPCS PROTOCOL A simple way to overcome the lack of priority inheritance is to use the nonpreemptable critical section protocol (NPCS) proposed by A. Mok. According to this protocol, a thread executes nonpreemptively when it holds any resource (e.g., a lock, a mutex object or a semaphore). Hence, deadlocks cannot occur, and a thread is blocked AT MOST ONCE no matter how many resources it needs and with how many threads it may have resource conflict. In the case of lock and mutex calls, this protocol is simple to implement at the user level in NT, and we have implemented a library function for this purpose (Footnote 4). The function assumes that priority level 31 is reserved for exclusive use by threads in their nonpreemptable sections and changes each lock as follows. store thread's priority; set thread's priority to 31; lock( ); The function restores the thread's priority when the thread no longer holds any lock. (Footnote 4: The case of counting semaphores is somewhat more complicated but can be taken care in a similar manner.) Such a user-level NPCS protocol can prevent unbounded priority inversion among application and device driver threads. It is important to note that setting thread priority to 31 is not the same as making the thread nonpreemptive, which some RTOSs make possible by providing preemption lock. Our user-level NPCS protocol cannot enforce the the exclusive use of priority level 31 for the purpose of emulating nonpreemption. Threads in applications that do not use this protocol may have priority 31. Because threads of priority 31 are scheduled on a round robin basis by NT 4.0, a thread holding a resource may be delayed by these threads, and the length of this delay is theoretically unbounded. NPCS protocol is effective on NT 5.0, however, for controlling priority inversion due to resource contention among user threads when these threads are scheduled according to the FIFO policy. When a thread T requests a lock and the NPCS protocol raises its priority to 31, there are no other ready threads at priority 31. (Otherwise, T would not be executing.) Because the thread is scheduled on FIFO basis, once its priority is raised to 31, the thread T will not be preempted by other threads until it releases all resources and its priority is restored. CEILING PRIORITY PROTOCOL Under the NPCS protocol, even when a thread requires no resource, it may be blocked once by a lower priority thread holding a resource. This is the major disadvantage of NPCP protocol. When most threads do not have resource conflict, we may not want to tolerate the extra blocking and possible degradation in average performance. A better alternative is the ceiling priority protocol (CPP); Ada95 uses this protocol [5]. CPP requires prior information on the resources required by each thread. Each resource has a PRIORITY CEILING, which is the highest priority of all threads that require the resource. (The priority ceiling of a resource with multiple units, e.g., a counting semaphore, is a function of the number of free units. When k units are available, the priority ceiling is equal to the highest priority of all threads that require more than k units.) The resource manager first determines the priority ceilings of all resources. According to CPP, a thread holding any resource executes at the highest priority ceiling of all resources it holds. On NT 4.0, it is also simple to implement this protocol at the user level if we use only half of the available priority levels. Specifically, we restrict real-time threads to have even priority levels (i.e., 16, 18, ... 30). If the highest priority of all threads requiring a resource is 2k, then the priority ceiling of the resource is 2k+1. The reason for having to do this is that NT 4.0 does not support the FIFO among equal priority policy. In NT 5.0, we can schedule all threads controlled by CCP on FIFO basis and thus be able to use all 16 real-time priority levels. The improvement the ceiling priority protocol provides is that if equal or higher priority threads do not require the resources required by a thread, the thread will not block them. The need for prior knowledge on resource requirements poses a problem when source code of some application is not available. When the sources of all applications are available, we can preprocess each application code to extract this information. Priority ceiling information can be maintained by the user-level protocol. Like the NPCS protocol, on NT 4.0, a user-level ceiling priority protocol cannot prevent unbounded priority inversion caused by threads not under its control executing at restricted priority levels. LOCK-FREE PROTOCOLS Anderson, et al. [6] proposed the use of lock-free protocol for synchronization when the operating system either does not provide any locking mechanism or the available locking mechanism is inadequate. On a single processor when priority-driven scheduling is used, the wait-free protocol is rather simple and offers another solution. On NT 4.0, round-robin scheduling of equal priority threads significantly complicates the wait-free protocol. Moreover, this approach is also not a completely safe solution because unbounded priority inversion can still occur during system calls. 4. HARDWARE INTERRUPT LATENCY Interrupt latency refers to the delay between the time instant when a device raises an interrupt to the instant when the interrupt service routine of the device starts to execute. The way Windows NT handles interrupts should not lead to longer or more unpredictable interrupt latency than a good RTOS with comparable capabilities. To explain this point, as well as the DPC problem later, we note that the relationship between interrupt request priority levels (IRQLs) and thread priorities is as shown in Figure 2. The IRQL table is platform dependent. The exact number of IRQLs and many specifics are not relevant to our discussion. It suffices for us to note that Dispatcher (scheduler) and DPCs execute at a priority lower than all device interrupt priorities. IRQL0, called Low here, refers to the 32 normal thread priorities collectively. IRQL 31: High ----| 30: Power failure | 29: Interprocessor interrupt | 28: Clock interrupt |- Hardware interrupts 27: Highest device Interrupt | ........ | 3: Lowest device interrupt ____| 2: Dispatcher/DPC ______ Software interrupts 1: APC ____| 0: Low ------ normal thread priorities Figure 2 Relationship between IRQL and thread priorities As in RTOSs, interrupt handling in Windows NT is carried out in two steps. In the first step, an interrupt service routine (ISR) executes. The ISR should be written so it does only the necessary work in order to make the device ready for interrupt again. The ISR queues a DPC that executes the DPC function provided by the device driver. This function does the time consuming part of interrupt handling, as the second step of interrupt handling. This is illustrated by the pseudo code below. ISR( ) { save state; service interrupt; queue DPC; } Interrupt latency consists of two factors. The first one is the sum of three parts. The first part is the time the processor takes to complete the current instruction and does the necessary chores before jumping to the starting address of the trap handler. The second part is the time the trap handler takes to save the context of the interrupted thread and starts the interrupt dispatcher. The third part is the time the interrupt dispatcher takes to identify the interrupting device and transfers control to the ISR of the device. Since these parts of the interrupt handling are nonpreemptable, a priority inversion occurs if a higher priority interrupt is requested while the system is thus busy on behalf of a lower priority interrupt. This part of interrupt latency time, therefore, should be added to the total blocking time when we compute the worst case response time of an interrupt. (In other words, we need to add this factor twice, once to the time required to service an interrupt and once as blocking time the interrupt may suffer.) Since interrupts are serviced in priority order, a lower priority interrupt is not serviced until all the outstanding higher or equal priority interrupts has been serviced (i.e., their ISRs completes). This delay is the second factor of interrupt latency. Unavoidably, a lower priority interrupt suffers a longer delay, and this is the behavior we want. Given the maximum execution times of all interrupt service routines and the maximum rates of interrupt requests at all priorities, an upper bound on this delay factor can be computed using a standard schedulability analysis method. In summary, interrupt latency in Windows NT is no less predictable than in most operating systems (Footnote 5). The duration of interrupt latency can be improved by making interrupt service routines as short as possible. This may require the rewrite of some existing device drivers that were poorly written in this respect. (Footnote 5: Exceptions are embedded operating systems in which each interrupt is vectored directly to the ISR of the interrupting device, rather than the interrupt dispatcher. This way, even the small amount of time the dispatcher spends to identify the interrupting device is saved, but the cost is hardware dependency.) 5. DEFERRED PROCEDURE CALLS A problematic source of priority inversion in Windows NT is DPCs. From Figure 2, you can see that DPCs are executed at IRQL2, which is higher than the priorities of user threads. Indeed, DPCs are executed in FIFO order whenever there is no ISRs ready to execute. The scheduler executes only when the DPC queue is empty. Since the execution times of DPCs of some device drivers (e.g., network protocol) can be quite large (e.g., in order of hundreds of microseconds to tens of milliseconds [7,8]), real- time threads can suffer a significant amount of blocking. We describe below two approaches to minimize the unpredictability introduced by DPCs. The first is to minimize the execution time of DPC, sort of like deferring the execution of the DPC function. The second is to avoid the use of interrupts as much as possible. USE OF KERNEL THREADS INSTEAD OF DPCs Modern real-time operating systems do some form of "priority tracking" in interrupt handling. This term, used in LynxOS literature, means scheduling the bulk of the work of a device driver at an appropriate priority. This is how a RTOS keeps blocking time caused by interrupt handling small and, more importantly, accountable in schedulability analysis. We can use the same method in Windows NT by using a kernel thread to execute the DPC function. (We continue to call the function executed in the second step of interrupt handling a DPC, even though what we describe here is not a DPC in Windows NT.) In other words, rather than having the ISR part of a device driver queue a DPC, it wakes up a kernel thread to execute the DPC function. This is similar to how the second step of interrupt handling is done in LynxOS. Specifically, the initialization routine (i.e., DriverEntry routine), the DPC function and ISR of a device driver should be as follows. - The initialization routine creates a kernel thread, called driver thread below, and sets the priority of the driver thread at the level specified by the device driver. The driver thread blocks waiting to be signed by the ISR; when signaled, the thread will execute the DPC function provided by the device driver. - The DPC function does the remaining part of interrupt handling when executed by the driver thread: DPCFunction { the remaining part of interrupt handling; } - When the interrupt service routine runs, it wakes up the driver thread: ISR( ) { save state; service interrupt; set event; } New device drivers for real-time applications should be thus structured for improved predictability. A question remains is what priority should the driver thread have. A choice is the priority of of the thread that opens the device driver. (This is what LynxOS uses.) An alternative is the priority of the thread causing the interrupt. A complication of this choice is that the correct priority of the driver thread often remains unknown until the driver thread has executed for a while. (For example, suppose that when an incoming message causes an interrupt, the network driver thread is to have the priority of the thread that will receive the message. The receiving thread is not identified until the message header is processed.) A way is to give the driver thread a high priority, say 30, initially. When it is awaken and executes, it sets its own priority to the correct level as soon as it finds the level. (Gallmeister [9] suggested this scheme as a way to emulate priority inheritance by message passing.) The potential roadblock to this approach is not technical; rather, it is how to deal with legacy code. Existing NT device drivers typically do not use kernel thread. Modification of each existing device driver is small, since most of the existing code can be used and only how they are invoked need to be changed. Nevertheless, such modification is not always possible; in particular, it is not possible to change the device drivers used by the kernel. ARCHITECTURAL RESTRICTIONS It is common to restrict choices of real-time system architectures to those that make the system more predictable and easier to validate. As an example, multiprocessor real-time systems are typically statically configured, i.e., processes and threads are bound to processors, rather than being dispatched to run on available processors. Static binding is necessary because it is impossible to predict the effect of scheduling anomalies if the scheduler may choose to run each thread on any available processor as the thread becomes ready. The support Windows NT provides for static configuration is processor affinity. Each thread has an affinity mask. A process or thread can pin down the processor on which it runs by setting its affinity mask accordingly. As we stated earlier, NT 5.0 allows us to set affinity on per-job basis, pinning down all the processes in each job to a processor. Many time-critical applications use mapped I/O. For example, in a typical digital controller, the hardware interface is set by the controller program at the start to collect digitized samples of sensor readings periodically (i.e., once set, the hardware does A/D periodically without processor attention). The controller reads the readings upon timer interrupts and writes its output to command registers in the interface. The interface converts the output to analog form needed to control the plant. There is no synchronization between the control program and the device interface. (We have thus used Windows NT to control an inverted pendulum; the structure of its interface is as described. The periods of both translational and rotational control loops are 20 milliseconds.) The accuracy of the timed loop depends solely on the accuracy of the timer. Interrupt latency and DPC blocking are all nonissues. Similarly, in many radar signal processing and tracking systems, producers and consumers tasks communicate via shared memory. There is no synchronization; consumers read the latest available data, and producers write data buffers without regard to lost data. With this architecture, their timing behavior can be made as predictable as possible. 6. LPC MECHANISM Another source of unpredictability in Windows NT is local procedure calls (LPC). LPCs provide the interprocess communication mechanism by which environment subsystem dynamic link libraries (DLL) pass requests to subsystem service providers. It is also used by remote procedure calls between processes on the same machine, as well as by WinLogin process and security reference monitor. Specifically, the LPC mechanism provides three schemes to communicate data across address space boundaries. Short messages are sent over an LPC connection when the sending process or thread makes a LPC call which specifies the buffer containing the message. The message is copied into the kernel space and from there into the address space of the receiving process. The other schemes make use of shared memory sections and are for exchanges of long messages between the sender and receiver. We confine our attention to the usages of LPCs for subsystem DLLs; a thread sends a request over an LPC connection to a service provider, which created earlier a named port and made port well-known. When the requester successfully connects to this well-known port, the requester and service provider each gets an unnamed port. The unnamed ports are then used to exchange request and result. When the requesting thread sends a request, the request message is inserted in the message queue of the service provider's port. The requester suspends itself and waits for reply. A work-thread in the service provider is waken up upon the arrival of the request. The work-thread removes the request from the message queue, executes the request, sends the result back on the requester's port, and then suspends itself. The arrival of the result on its port wakes up the requester to resume its execution. Unbounded priority inversion can occur since the LPC queue is FIFO ordered. Furthermore, without priority tracking, a work- thread may execute at a non-real-time or low real-time priority. Priority inversion occurs when a work-thread executing on behalf of a high real-time priority requester is preempted by a thread at a lower real-time priority. We can avoid this kind of priority inversion only by avoiding the use of the LPC mechanism. In Windows NT 4.0, the Win32 API functions that use LPC to communicate with subsystem service provider are (1) console (text) window support, (2) process and thread creation and termination, (3) network drive letter mapping, and (4) creation of temporary files. (We note that graphics related functions do not use the LPC mechanism.) In other words, a time critical application should not write to console, create and delete threads, create temporary files, etc. This is a serious restriction, especially for multi-mode applications. (During a mode change, new thread may need to be created and old ones terminated while threads that run in both the old and new modes must continue to complete in time. Avoiding the creation and deletion of threads means that threads running in all modes need to be created at initialization time.) 7. TIMER GRANULARITY AND ACCURACY Windows NT also provides the usual time services (e.g., SetTimer) and multimedia timers. A per thread (or process) timer is created when the thread makes a create timer call. After the thread initializes the timer and the timer DPC, it can set the timer to expire at the specified time (or periodically with the specified period). A timer event is said to occur when a timer expires. When a timer is set, the kernel queues a timer event in the timer event queue, which is sorted in order of expiration time. By examining this queue, the kernel can detect the occurrences of timer events. As most operating systems do, Windows NT checks for timer expirations only periodically at clock interrupts. When a clock interrupt occurs, the kernel first checks the timer event queue to determine which timers have expired since the last clock interrupt. For each expired timer, it queues a timer DPC to be executed after it has taken care of all timers that have expired since the last clock interrupt. While the expiration of regular timers are checked once every 10 milliseconds, the user can set the period of clock interrupts at which multimedia timer expirations are checked to any binary fraction of a second in the range from 1/1024 to 1/64 seconds. To use multimedia timer service, a thread - queries the minimum and maximum timer resolution of the platform; - specifies the desired clock interrupt period in a fraction of a second in the range from 1/1024 to 1/64 seconds, using Win32 functions timeBeginPeriod() and timeEndPeriod(); and then - starts timer events with timeSetEvent(), providing a function to be executed when the timer event occurs. Timer granularity and accuracy depend on three factors. They are (1) the frequency at which timer expirations are checked, (2) the order in which timer requests are serviced, and (3) the execution times of the functions executed when timer events occur. A common misconception is that we should be able to measure time using a timer to an accuracy in the order of timer resolution. In fact, a timer resolution of say 100 microseconds only means that the operating system will not mistake two timer events at expiration times this far apart as one timer event. If one expiration time is at 50 microseconds before the clock interrupt and the other one is at 50 microseconds after the clock interrupt, the the time instants at which the corresponding timer functions are queued are approximately 1/1024 seconds apart if the clock interrupts are 1/1024 seconds apart. Because timer events are acted upon by the kernel periodically, the maximum timer error is at least equal to the period of clock interrupts. The second source of error arises from the fact that timer events are not acted upon by the kernel in time order. In some operating systems, when more than one timer are found expired at a clock interrupt, the kernel takes care of the timer with the latest expiration time first and in decreasing time order. In other words, it services timer events in LIFO order. This is how Windows NT 4.0 does it. (By the way, so does Linux.) Therefore, if the order of occurrences of timer events are important, you will need to take care this matter. (For example, if two timer expiration times are within the same clock interrupt period, you need to give the timer that is supposed to trigger an earlier activity a later expiration time.) In contrast, NT 5.0 services timer events in FIFO order. The third source of timer error is the execution times of timer functions. Timer error perceived by threads grows with the execution times of these functions. One can keep timer error minimum by minimizing the execution time of each timer function. To illustrate how, suppose that we want to use a periodic timer to time the repeated execution of a thread. We should not specify the function executed by the thread in the set timer call. Rather, we specify a function which when executed wakes up the thread and causes it to be queued and later scheduled by the scheduler at the thread's priority. In other words, - we create the thread and set its priority; - after initialization, the thread suspends itself waiting to be signaled; and - when executed, the timer DPC function executes "set event" to wake up the thread. Thus, the timer only controls when a thread is made ready for execution, leaving when the thread is scheduled and executes to the scheduler. 8. RESOURCE KERNEL We suggest the use of a tool such as NT/RK(Resource Kernel) to support timely and enforced execution of real-time applications in a Windows NT environment. NT/RK provides on Windows NT "Resource Kernel" primitives of Real-Time Mach [10]. Both legacy applications and new applications using the NT/RK primitives can benefit from the timeliness guarantees provided by NT/RK. RESOURCE KERNEL Timely execution of real-time applications is accomplished by the use of reservations. To execute in an environment monitored and controlled by NT/RK, an application must request a share of each time-multiplexed resource (e.g., processor, file system, etc.) it needs. A time-share reservation is specified by the following parameters: P: the length of periodic intervals over which the specified amount of a resource is requested; C: the amount of resource requested within each interval of length P, D: a deadline by which the C amount of resource must be satisfied. The ratio C/P determines the load imposed by the reservation on the system, while D deals with the latency requirement of the reservation in question. Reservations are independent of the threads that use the reservation. Zero or more threads from one or more processes can be bound to a single reservation. In effect, the threads bound to a reservation run on a "virtual machine" that runs slower than the underlying physical machine. This model can therefore be used to deal with both periodic application tasks and aperiodic tasks. The {P, C, D} model actually corresponds to the aperiodic server model (called the sporadic server) that deals with aperiodic tasks in real-time systems. When receiving a reservation request, NT/RK performs admission control to check whether the requested reservation can be granted. If granting the requested reservation will not cause the system to violate previously granted guarantees, the reservation is accepted and guaranteed. Else, the request is rejected. Accepted reservations are scheduled using an internal scheduling policy (deadline-monotonic scheduling or earliest deadline first). NT/RK also performs dynamic monitoring of consumed resources and enforces the limits set by the corresponding reservations. IMPLEMENTATION The current version of NT/RK is implemented as device driver that is installed into the Windows NT "checked build" kernel. This driver effectively becomes the kernel scheduler, scheduling and suspending tasks as per the scheduling policy used to schedule and enforce reservations. 9. RECOMMENDATIONS This article suggests several ways to increase the predictability of real-time applications running on Windows NT. They are (1) use a good mapping function (such as the constant ratio mapping) to effectively increase the number of real-time priority levels and schedulable processor utilization; (2) provide user-level resource access control protocols (such as NPCS and CCP protocols) to control priority inversion; (3) avoid the use of LPC mechanism (i.e., activities such as write to console, create and delete threads, create temporarily files) during the execution of time-critical applications; (4) restructure device drivers so that the bulk of interrupt handling is executed by a kernel thread scheduled at an appropriate priority; (5) minimize the execution times of timer callback functions; and (6) use a tool such as NT/RK to manage and control utilizations of system resources. Among the improvements offered by NT 5.0 are (1) all 16 real-time priority levels are available to user threads, (2) real-time threads can be scheduled on the FIFO basis, and (3) timer events are served on FIFO basis. The support of FIFO policy for real-time threads significantly improves the effectiveness of user-level NPCS and CCP protocols. REFERENCES [1] Liu, C. L. and Layland, "Scheduling Algorithms for Multiprogramming in A Hard Real-Time Environment," J. Assoc. Comput. Mach., vol. 20, pp. 46-61, 1973. [2] Lehoczky, J. P., L. Sha, J. K. Strosnider, and H. Tokuda, "Fixed Priority Scheduling Theory for Hard Real-Time Systems," Foundations of Real-Time Computing, Part 1: Scheduling and Resource Management, Edited by A. M. van Tilborg and G. M. Koob, Kluwer Academic Publishers, 1991. [3] Lehoczky, J. P. and L. Sha, "Performance of Real-time Bus Scheduling Algorithms," ACM Performance Evaluation Review, vol. 14, May 1986. [4] Sha, L., R. Rajkumar, and J. P. Lehoczky, "Priority Inheritance Protocols: An Approach to Real-Time Synchronization," IEEE Transactions on Computers, Vol. 39, 1990. [5] Cohen, N. H., Ada as a Second Language, McGraw Hill, 1996. [6] Anderson, J., S. Ramamurthy, and K. Jeffay, "Real-time computing with lock-free objects," ACM Transactions on Computer Systems, Vol. 15, No. 6, pp. 388-395, May 1997. [7] Cota-Robles, E and J. P. Held, "A comparison of Windows Driver Model and Latency Performance on Windows NT and Windows 98," Proceedings of USENIX Symposium on Operating Systems Design and Implementation, February 1999. [8] Jones, M. B. and J. Regehr, "The problems you're having may not be the problems you think you're having: results from a latency study of Windows NT," Proceedings of Workshop on Hot Topics in Operating Systems, March 1999. [9] Gallmeister, B. O., POSIX.4: Programming for the Real World, O'Reilly & Associates, Inc., 1995. [10] Mercer, C. W., S. Savage, and H. Tokuda, "Processor Capacity Reserves: Operating System Support for Multimedia Applications," Proceedings of IEEE International Conference on Multimedia Computing and Systems, May 1994. ------------------------------------------------------------------------------ Message 3; Postmarked Sat Jun 26 17:10:58 1999 From: "Bruce H. Krogh" Subject: HYBRID SYSTEMS: CFP Content-Length: 4182 CALL FOR PAPERS HYBRID SYSTEMS: COMPUTATION AND CONTROL (HSCC'00) Third International Workshop Pittsburgh, PA USA http://www.ece.cmu.edu/~hs00 Important Dates --------------- Submission deadline: October 15, 1999 Notification of acceptance: December 15, 1999 Final versions due: January 15, 2000 Workshop: March 23-25 (Thur-Sat), 2000 Aims and Scope -------------- The Workshop on Hybrid Systems attracts researchers from industry and academe interested in modeling, analysis, and implementation of dynamic and reactive systems involving both discrete (integer, logical, symbolic) and continuous behaviors. It is a forum for the latest developments in all aspects of hybrid systems, including formal models and computational representations, algorithms and heuristics, computational tools, and new challenging applications. The Third International Workshop continues the series of workshops held in Grenoble, France (HART'97), University of California at Berkeley, USA (HSCC'98), and Nijmegen, The Netherlands (HSCC'99). Proceedings of these workshops have been published in the Lecture Notes in Computer Science (LNCS) series by Springer-Verlag. Scientific Program and Topics ----------------------------- Sessions will include presentations of contributed and invited papers. In keeping with the tradition of previous workshops, there will be ample time and space for informal discussions. Submissions are invited in all areas pertaining to the design, analysis and implementation of hybrid control systems. Topics include but are not limited to: - modeling and representations of hybrid systems - reasoning about hybrid systems at multiple levels of abstraction - specification and implementation languages - computer-aided design and simulation - algorithms and heuristics for verification - control (synthesis, controllability, stability) - optimization of hybrid systems - engineering applications Reports on case studies and tool development are particularly encouraged. Tool demonstrations will form an integral part of the workshop. Venue --------- The workshop will be held at the University Club, a private club near the campuses of Carnegie Mellon University and the University of Pittsburgh. Submissions ----------- Researchers are invited to submit the postscript file of an extended abstract via e-mail to: hs99@ece.cmu.edu. The abstract should not exceed 10 pages. The first page should contain the title of the paper, each author's name and affiliation, complete contact information for the corresponding author (postal and e-mail addresses, telephone and fax numbers), and a one-paragraph summary of the contribution. Full versions of the accepted submissions will be published in the Springer LNCS series. The proceedings will be available at the workshop. Workshop Co-chairs --------------------------- Bruce H. Krogh (krogh@ece.cmu.edu) and Nancy Lynch (lynch@theory.lcs.mit.edu) Program Committee --------------------------- Rajeev Alur, Eugene Asarin, Marica Di Benedetto, Gautam Biswas, Rene Boel, Michael Branicky, Peter Caines, Datta Godbole, Mark Greenstreet, Stefan Kowalewski, Bruce H. Krogh (co-chair), Yassine Lakhnech, Michael Lemmon, Bengt Lennartson, Nancy Leveson, Daniel Liberzon, John Lygeros, Nancy Lynch (co-chair),Oded Maler, Manfred Morari, Joerge Raisch, Anders Rantzer, Anders Ravn, Alberto Sangiovanni-Vincentelli, Roberto Segala, Henny Sipma, Eduardo Sontag, Claire Tomlin, F.W. Vaandrager, Howard Wong-Toi, Sergio Yovine, Feng Zhao Steering Committee -------------------------- Panos Antsaklis, Tom Henzinger, Bruce Krogh, Nancy Lynch, Oded Maler, Amir Pnueli, Alberto Sangiovanni-Vincentelli, Shankar Sastry, Jan van Schuppen, Frits Vaandrager. Additional Information ----------------------------- To stay informed about HSCC'00, register for e-mail announcements by sending e-mail to hs99@ece.cmu.edu. Also refer to the Workshop webpage at http://www.ece.cmu.edu/~hs00. *** Bruce H. Krogh Dept. of Electrical and Computer Engineering Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213-3890 ph. +1 412 268 2472 fax -3890 e-mail: krogh@ece.cmu.edu ------------------------------------------------------------------------------ Message 4; Postmarked Thu Aug 5 23:20:53 1999 From: "Azer Bestavros" Subject: BU/NSF Workshop on Internet Measurement, Instrumentation, and Characterization Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Length: 4121 ---------------------------------------------------------------------- BU/NSF Workshop on INTERNET MEASUREMENT, INSTRUMENTATION AND CHARACTERIZATION Boston University Boston, Massachusetts, USA Monday August 30, 1999 (Preceding ACM SIGCOMM'99) ---------------------------------------------------------------------- PRELIMINARY CALL FOR PARTICIPATION Objectives and Overview ----------------------- Because of its growth in size, scope, and complexity --- as well as its increasingly central role in society --- the Internet has become an important object of study and evaluation. Many significant innovations in the networking community in recent years have been directed at obtaining a more accurate understanding of the fundamental behavior of the complex system that is the Internet. These innovations have come in the form of better models of components of the system, better tools which enable us to measure the performance of the system more accurately, and new techniques coupled with performance evaluation which have delivered better system utilization. The continued development and improvement of our understanding of the properties of the Internet is essential to guide designers of hardware, protocols, and applications for the next decade of Internet growth. As a research community, an important next step involves a comprehensive look at the challenges that lie ahead in this area. This includes an evaluation of both the current unsolved challenges and the upcoming challenges the Internet will present us with in the near future, and a discussion of the promising new techniques that innovators in the field are currently developing. To this end, the Networking Research Group at Boston University, with support from the National Science Foundation (pending), is organizing a one-day workshop which will be held at Boston University on Monday, August 30, 1999, and which will immediately precede SIGCOMM '99. Workshop Program and Speakers ----------------------------- The BU/NSF Internet Measurement, Instrumentation and Characterization (IMIC) workshop will feature four technical sessions: - Modeling and Characterization - Internet Instrumentation and Measurement - End-to-End Protocols and Services; - Network Support for Next Generation Internet Applications. Each session will consist of 3 invited presentations to be followed by an open discussion. The workshop will conclude with a panel of researchers from academia and industry, as well as representatives from funding agencies, who will discuss opportunities and challenges that lie ahead, and initiatives to be undertaken. A final report, including materials presented and discussion summaries will be available on-line as a NSF report. For more information, please check the Workshop's web page. Registration ------------ An on-site registration fee of $50.00 (payable in cash, money order, or check drawn on a US bank) will be charged per attendee to defray the costs of food and beverage services. Pending support from the NSF, the registration fee will be waived for full-time students on a First-Come-First-Serve basis. If you are interested in such a waiver, please contact Michael Mitzenmacher at (michaelm@deas.harvard.edu). Workshop Organizing Committee ----------------------------- - Paul Barford, Research Fellow, Boston University - Azer Bestavros, Associate Professor, Boston University - John Byers, Assistant Professor, Boston University - Mark Crovella, Assistant Professor, Boston University - Ibrahim Matta, Assistant Professor, Boston University - Michael Mitzenmacher, Assistant Professor, Harvard University ---------------------------------------------------------------------- For more information check the IMIC Workshop Home Page at http://www.cs.bu.edu/pub/imic ------------------------------------------------------------------------------ <<<<<<<<<<<<<<<<<<<* END OF THE IEEE-CS TC-RTS NEWSLETTER *>>>>>>>>>>>>>>>>>>> ------------------------------------------------------------------------------ The TC-RTS repository is maintained by Azer Bestavros at Boston University WWW Home Page of the TC-RTS is at: http://cs-www.bu.edu/pub/ieee-rts/Home.html Internet address for anonymous FTP to the TC-RTS repository is: cs-ftp.bu.edu Contributions to this forum should be sent via E-mail to: IEEE-RTTC@cs.bu.edu Requests / inquiries should be sent via E-mail to: IEEE-RTTC-request@cs.bu.edu ------------------------------------------------------------------------------