!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! B O S T O N U N I V E R S I T Y Computer Science Department C O L L O Q U I U M Efficient Algorithms for Geometric Clustering Problems Magda Procopiuc Computer Science Department Duke University Wednesday, March 28 11:00 am (Coffee served at 10:45PM) Seminar Room / MCS 135 Clustering problems arise in many disciplines and have a wide range of applications, such as data mining, information retrieval, spatial data bases, and image processing. Because of the great variety of domains in which they appear, many variants of clustering problems have been studied. However, they all require a partition of a given set of objects into groups so as to optimize some partitioning criteria. Unfortunately, most clustering problems are intractable. Still, when the number of clusters is small, the optimal or near-optimal clustering can be computed in polynomial time. I will briefly talk about a class of clustering algorithms we proposed, that work under a general class of metrics and can be extended to any number of dimensions. These methods compute the optimal clustering of a set of points, but also allow a trade-off between the quality of the result and the running time. In recent years, it has been widely recognized that one of the main difficulties in data indexing arises from the overall dimensionality of the data. Many of the current indexing structures perform badly when the dimensionality exceeds 20. Moreover, if the goal is to discover patterns, generating full-dimensional clusters in spaces of moderately high dimensionality is almost always irrelevant, as data tends to be very weakly correlated. This motivates the study of projective clustering, in which the goal is to partition the data into subsets that are well-correlated only in subspaces of the original space. I will discuss both theoretical and practical methods for projective clustering. Our theoretical algorithms are, to the best of our knowledge, the first ones that achieve provably good approximation results. I will also present a practical method we developed for indexing high-dimensional data. We applied this method to face detection in image databases, and preliminary accuracy results are promising. This is still work in progress. A newly emerging field that can substantially benefit from efficient indexing methods deals with kinetic applications. In such an application, data points continuously change their spatial coordinates. I will present a novel indexing structure that adapts itself over time to the position of the data points. Using some new results for efficiently maintaining the extent of a moving point set, we are able to estimate when the structure will need updating and to achieve a trade-off between the number of updates and the query performance. Our experiments show that the index is able to maintain good query performance over long periods of time. Host: Margrit Betke ------------------------------------------------------------------------------- For colloquium info, including directions, see http://cs-www.bu.edu/colloquium -------------------------------------------------------------------------------