Machine Learning and Optimization
Abstract
Many large scale machine learning problems are formulated as optimization
problems, in which some measure of error or loss is to be minimized over
a suitable training corpus. Real problems have too many data points
to fit in a single computer. Hence the data and/or computation must
be distributed over a network of computers. Often the only practical
methods for extremely large problems are so-called splitting methods, but
their convergence properties are extremely variable: sometimes very fast,
sometimes very slow, in ways that can be hard to predict. The goal of
this project is to gain a better understanding of the convergence behavior
and to use this understanding to construct accelerated algorithms with
more consistent convergence properties. This will allow the application
of machine learning techniques to a much wider class of problems.
Methods
Splitting methods (or more precisely alternating direction methods)
are based on the idea that a general convex optimization problem can
be split into two or more parts, each of which can be solved much more
easily compared to the problem as a whole. The methods cycle through
all the variables in turn, optimizing over each subset of variables
leaving the rest fixed. The proposed work builds on a preliminary
analysis of a simple model problem using the eigen-structure of certain
matrix operators. The project is devoted to extending this analysis
to more general problems, as well as developing faster solvers using
well-established computational technologies for the matrix eigenvalue
problem. Success will be measured in terms of the generality of the
theory developed and the improvements in the observed convergence behavior
on real problems.
Impact
With faster solvers, discovery of major regions of influence in a
global-scale social network (e.g. Facebook or Twitter) could become
practical on modest computer platforms. The same holds for tracking
disease propagation and people in video sequences. With efficient
solvers, tracking software could be deployed on local hardware without
the need for high-powered central servers. This will lead to advances
in countless areas such data mining, compressive sensing, recommender
systems, signal processing, missing data imputation, analysis of large
scale social, biological or computer networks, image reconstruction,
denoising and classification.
Dissemination and Human Resource Development
The results of this research are to be disseminated in papers in the
principal journals and conferences in machine learning, data mining,
and optimization as well as in the form of software packages via the WWW
(http://www-users.cs.umn.edu/~boley/ML-Optimization).
The project depends on the interaction between different disciplines
and applications areas, which will attract students from a variety
of backgrounds at both the graduate and undergraduate level. Some
research tasks are suitable as projects in classes on linear algebra,
optimization, data mining, machine learning and are to be developed
for both undergraduate and graduate students. Undergraduate students,
including women and members of underrepresented groups, will see the value
of mathematical algorithms to solve real problems of interest to them.