Research Overview


Introduction

The OCEANS project has as its goal the improvement of distributed information systems like the World Wide Web based on measurement, analysis, and careful redesign. The project undertook the first published study of the effectiveness of caching in the Web; based on that work members have gone on to study the benefits of eager document dissemination and speculative prefetching. The project's early study of the characteristics of client use of the Web has been widely cited, and recent work has concentrated on statistical characterization of the properties of reference locality in the Web. OCEANS is now beginning a full-scale implementation of a number of our techniques in the context of a experimental high-performance, low-cost distributed Web server.

The OCEANS group research work encompasses a mosaic of projects that we describe below.


Establishing the Self-Similarity of Web Traffic
PIs: Azer Bestavros and Mark Crovella

In this study, Professors Crovella and Bestavros established that Internet traffic attributed to the Web is statistically self-similar. Furthermore, they were able to trace the genesis of this self-similarity through a rigorous analysis of Web file systems and user traces. Previous studies have established the bursty (self-similar) nature of network traffic only at the LAN (Ethernet) level. The presence of self-similar characteristics in Internet traffic has many important implications. First, it implies that currently adopted Markovian traffic models for analysis and simulation purposes are inadequate because they allow Internet traffic to be smoothed out through finite buffering---an impossible outcome in the presence of self-similar traffic patterns. Second, it implies that current transport protocols---utilizing negative acknowledgements as indicative of congestion---may be too pessimistic and thus possibly not utilizing available bandwidth efficiently. This implication was confirmed in a recent study by other members of the Oceans group. Perhaps the most intriguing result from this study was the attribution of the genesis of self-similarity to the heavy-tailed nature of the distribution of file sizes in particular, and of information quanta in general. The significance of this finding is that traffic self-similarity is related to a universal property of information representation, which is rooted in the very way humans process ``information''.


Exploiting Locality of Reference Properties
PIs: Azer Bestavros, Mark Crovella and Abdelsalam Heddaya

The first pilot study conducted within the OCEANS group established the importance of client-based caching in large-scale information retrieval systems (henceforth referred to as the Web). However, this result pointed out that client-based caching does not scale up and, alone, is not enough to alleviate the performance problems of the Web. In a sequence of studies, members of the OCEANS Group have traced the reason for the limited performance of client-based caching to the absence of strong temporal locality of reference properties in Web access patterns. Furthermore, they showed that other forms of locality of reference properties (namely, spatial and geographical) exist and are strong enough to be exploited efficiently. Based on this, they proposed and evaluated through extensive trace-driven simulations two server-initiated protocols for Web information retrieval. The first protocol is a hierarchical data dissemination mechanism that allows information to propagate from its producers to servers that are closer to its consumers. This dissemination reduces network traffic and balances load amongst servers by exploiting geographic and temporal locality of reference properties exhibited in client access patterns. The second protocol relies on speculative service, whereby a request for a document is serviced by sending, in addition to the document requested, a number of other documents that the server speculates will be requested in the near future. This speculation reduces service time by exploiting the spatial locality of reference property.


Burstiness-tolerant Transport Protocols
PIs: Azer Bestavros and Mark Crovella

Given that burstiness (due to self-similarity) is to be expected in traffic no matter how much buffering is performed, it seems reasonable to expect transport protocols to be tolerant of burstiness. One way of dealing with burstiness is to expect packet drops (i.e., erasures) and design the transport protocol in a way that would mask (or otherwise reduce) the impact of packet erasures. One way of achieving this goal is to use dynamically-adjustable levels of redundancy. To that end, Professors Bestavros and Crovella are currently investigating the use of AIDA techniques to incorporate this adjustable redundancy. Another benefit of using AIDA in the design of transport protocols is to tolerate the fragmentation of IP packets when transmitted over ATM network using 48-byte cells. Professor Bestavros and his students have proposed and analyzed through simulation the performance of an AIDA-based TCP/IP protocol that they have named TCP Boston.


Tools for Network-Aware Applications
PI: Mark Crovella

The Tools for Network-Aware Applications project is studying ways to improve the performance of network applications by providing them with measurements of the current conditions on the Internet. Prof. Crovella and his students have shown that if an application has access to a small set of simple measurements of network conditions, it can improve response time dramatically. For example, current Internet-based applications like the World Wide Web are typically constained to retrieve a file only from one specific location. If instead a small number of alternate locations are provided, then the application can almost always improve the transfer time of the file by selecting on-the-fly the location that promises the best performance at the current moment. To support this approach to performance improvement, Prof. Crovella and his students have developed tools to measure latency, bottleneck link speed, and congestion along any path in the Internet. These tools have been shown to accurately measure properties of the network, and hold promise for improving the performance of systems like the World Wide Web.


WebSearch
PI: Stan Sclaroff

The primary goal of this project is to develop a world wide web image search tool, for searching web documents based on image content. Unlike keyword-based search, search by image content allows users to guide a search through the selection (or creation) of example images. The technical challenges associated with this project are in part due to the staggering scale of the world wide web, and in part due to the problem of developing effective image representations for very fast search based on image content. In addition, this project will address issues relating to developing user interfaces for a web search by image content browser.


The Responsive Web Computing Project
PIs: Azer Bestavros, Marina Chen, Mark Crovella, Abdelsalam Heddaya, and Stan Sclaroff.

The goal of this umbrella project is to use the Web as a medium (within either the global Internet or an enterprise intranet) for metacomputing in a reliable way with real-time performance guarantees. We approach this problem from four different levels: (1) network services and protocol-level techniques, (2) middleware solutions such as caching, prefetching, and replication, (3) Web computing resource management models, real-time scheduling protocols, and services, and (4) an object-oriented framework to capture these models and associated protocols and services along with application-specific knowledge and the overall designs of Web computing applications.


Maintainer: A.Bestavros Created on: 1994.05.02 Updated on: 1996.08.18