COLLOQUIUM Computer Science Department, Boston University Speaker: Magdalena Balazinska MIT Date: Monday, March 21 Time: 9:00 Place: Room MCS 135, 111 Cummington Street (for directions, see www.cs.bu.edu/colloquium) Title: Load Management and Fault-Tolerance in a Distributed Stream Processing System Abstract: Recently, a new class of data management applications has emerged in areas such as sensor-based environmental monitoring, financial services, network monitoring, and military applications. These "stream processing applications" require low-latency processing of large-volume data streams. Because traditional database management systems are ill-suited for high-volume, low-latency stream processing, new systems, called stream processing engines (SPEs), have been developed. In this talk, we present the software architecture and algorithms in Borealis, one of the first distributed stream processing engines. We discuss how our system meets two important challenges: (1) distributed load management, and (2), fault-tolerant operation in the face of node failures, network failures, and network partitions. We present a mechanism that enables autonomous participants to collaboratively handle load. Our approach is based on contracts that participants negotiate offline. At runtime, participants move load only to partners with whom they have a contract and pay each other the contracted price, making the mechanism lightweight. We show that our approach provides incentives that foster participation and leads to good system-wide load balance properties. For fault-tolerance, we present a replication-based scheme that masks most node and network failures. When network partitions occur, our approach addresses the traditional availability-consistency trade-off by striving to minimize inconsistencies, while ensuring that the system meets the desired availability specified by the application or user. Biography: Magdalena Balazinska is a PhD candidate in the Networks and Mobile Systems Group at the MIT Computer Science and Artificial Intelligence Laboratory. Her research interests focus on distributed data management systems. She is one of the major contributors to Borealis/Medusa, a distributed stream processing engine developed at MIT, Brown University, and Brandeis University. As part of her dissertation work, she is devising, implementing, and evaluating various load management and fault-tolerance techniques for distributed and federated stream processing. During the course of her PhD, Magdalena has also worked on various other data management problems including scalable resource discovery, nomadic data-access, and censorship circumvention. Before joining MIT, Magdalena received a B.E. and M.S. from Ecole Polytechnique de Montreal. Host: George Kollios