------------------------------------------------------------------------------- ******************************************************************************* ------------------------------------------------------------------------------- B O S T O N U N I V E R S I T Y Computer Science Department C O L L O Q U I U M Tuesday, April 16, 1996 2:00 pm (Coffee served at 1:30 pm) Seminar Room / MCS 135 ------------------------------------------------------------------------------- ******************************************************************************* ------------------------------------------------------------------------------- Indexing Non-Traditional and Multimedia Databases King-Ip (David) Lin University of Maryland Nowadays, database management systems have to handle non-traditional and multimedia datatypes, like video, documents, DNA sequences etc. Devising new methods to provide efficient retrieval of the new datatypes is crucial. The challenge is two-fold: providing suitable abstraction of data (features) to be indexed; and providing efficient data structures (indexes) to facilitate the queries. Traditional indexing techniques provide solutions for alphanumeric data in low dimensional space. However, the situations are different when it comes to the new datatypes. Index structures becomes inefficient in high dimensional space (`dimensionality curse'); while new data types do not present apparent features to be indexed (`featureless space'). In this talk I will present two solutions of the problems. To combat the first problem, I will present the TV-tree, an indexing structure which allow data in high-dimensional space to be indexed efficiently. It uses the notion of introducing new dimensions `when needed'. Moreover it is able to adapt to the data being stored, avoiding unnecessary storage. I will also present the `FastMap' algorithm. This is a very fast (linear time) algorithm to map data into points in an arbitrary dimensional space, to be indexed by index structures. The algorithm requires only a knowledge of similarity among the data, thus making it applicable in a large number of applications. Furthermore, the algorithm preserves the similarity information to a large extent, thus allowing the mapped points to be used in other applications, like clustering and data mining. ------------------------------------------------------------------------------- For colloquium info, including directions, see http://cs-www.bu.edu/colloquium For more information contact Prof. Mark Crovella -------------------------------------------------------------------------------