Machine learning - IHES
Francis Bach Scientific activity

Machine learning

Digital data is increasingly taking centre stage in science and industry, and in our daily lives too. One of the aims of machine learning is to give meaning to these huge amounts of data. A major challenge in learning is the ability to “generalise”, that is, to predict beyond the data observed. To achieve this, the traditional approach is to formulate the learning problem as an optimisation problem, using noisy data.

“My current research covers two issues: (1) stochastic optimisation methods for large data sets, for which it is necessary to develop algorithms which computational complexity is linear with data size, and (2) convex optimisation methods for common combinatorial optimisation problems in learning (such as for the problem of splitting data up in several groups, or “clustering”).

Whilst at IHES, thanks to the Schlumberger Chair, I worked on both these issues. The interaction with the Institute’s researchers and invited researchers, together with C. Villani’s Cours de l’IHES, were of particular benefit. Firstly,working jointly with V. Perchet (professor at ENSAE), we looked at how to use higher-order regularity in online optimisation, where only noisy data from the function to be optimised were available.

Then, using links between optimal transport theory and sub-modularity in combinatorial problems, I showed how a large part of the sub-modular analysis over the hypercube could be interpreted as an optimal transport property between two completely ordered sets.This led to new optimisation methods in polynomial time for a new class of continuous but non-convex functions. Lastly, S. Arlot (professorat Paris-Sud,currently at IHES), A. Celisse (lecturer at université de Lille-1) organised a conference on the trade-off between computation time and statistical performance with international invited speakers, which provided a forum for presenting recent progress in our rapidly expanding field of research.”

Francis Bach.

Have a look at the videos on the Schlumberger workshop on “Computational and statistical trade-offs in learning”