----------------------------------------------------------------------- B O S T O N U N I V E R S I T Y Computer Science Department C O L L O Q U I U M Wednesday, November 7, 3:00 PM (Coffee served at 2:45PM) Seminar Room / MCS 135 OPTIMAL OUTLIER REMOVAL John Dunagan MIT Department of Mathematics Abstract Given a set of points in high-dimensional Euclidean space, we define a point x to be a gamma-outlier if there exists some direction w in which x is more than gamma standard deviations from the mean along w. Outlier removal is the problem of removing an epsilon fraction of points so that the remaining subset has no gamma-outliers. We characterize the optimal trade-off between epsilon and gamma using a simple algorithm for outlier removal. We believe this work has a good chance of being useful to anyone who works with data, from physical scientists to computer scientists to applied mathematicians. As one application, we improve the running time of learning a halfspace in the presence of random classification noise (a classic problem in machine learning) from O(n^28) to O(n^5). For a visual demonstration, please visit http://theory.lcs.mit.edu/~jdunagan This is joint work with Santosh Vempala. Host: Leo Reyzin ------------------------------------------------------------------------- For colloquium info, including directions, see http://cs-www.bu.edu/colloquium -------------------------------------------------------------------------