In January, 2018, a chair of data sciences was opened at the Collège de France by Stéphane Mallat, French researcher who had worked out in 1987 an algorithm at the origin of the future JPEG2000 format before creating a start-up producing chips for TV which allowed to improve the resolution of the image (production of a high-resolution image from a standard signal), after which he interested himself in deep-learning algorithm for problems connected to the automatic recognition of images.
In an interview for the newspaper La recherche1, he explains the reasons of the creation of his chair at the Collège de France and gives indications onto the contents of the classes he gives.
For him, it was important that the name of chair was ” Data Sciences ” with the plural form because it is a multidisciplinary field of research. Indeed, although the used tools are always the same (applied mathematics, IT mathematics and IA, information theory etc.), the handled sets of data concern any sorts of sciences (physics, biology, cognitive sciences, economy, social sciences, etc.). Yet each of these sciences have their own approach to the problem of big data, what makes it a massively multidisciplinary domain. Moreover, Stéphane Mallat supports that the emergence of this discipline is not due to a scientific necessity, but rather to a social and university pressure, because these methods are deeply revolutionizing our societies (like chairs of computer science were imperative in universities, fifty years earlier). In fact, the actual pressure is such as Stéphane Mallat works at present on the opening of another chair of data sciences at the Ecole Normale Supérieure (a French school to form searchers). As a result, and as the domain just begins to crystallize, the main objective of his chair at the Collège de France will be to create a common vocabulary for all the concerned scientific disciplines, to describe problems connected to large-dimension data. In other words, to put the bases of this new science by creating a new vocabulary.
Historically, if we want to understand where from comes this emergent domain, it is because of the accumulation of the data and the increase of the computing power that the applied math and the computer science met to give birth to machine learning. Because historically, we were first capable of storing a large amount of data, before knowing what we could do with it. Globally, the data sciences are used to reach two types of objectives: the modelling of a set of data (to generate new data, compress data, reconstruct or improve the quality of an image etc.) and the prediction (which consists in giving meaning to a set of data). At present, deep learning techniques work well for these uses, but we fail to understand why. Thus it is a whole domain of research to understand this, in order to make them more reliable for critical applications such as medical diagnosis or autonomous cars. Other areas of research concern the reduction of the number of dimensions of the problems by discovering and using multi-scales hierarchies (observation of the data with various scales) and symmetries (invariances) in the handled set of data.
And because we have on one hand data warehouses and on the other hand quite an arsenal of applied math to use, one of the distinctive characteristics of this domain is that it is at the same time theoretical and experimental. It is indeed, according to Stéphane Mallat, from empirical approaches and remarkable intuitions of several researchers and engineers that were born the recent and sudden progress we know in the techniques of visual and vocal recognition, machine translation or still in Go and chess games. And it is for him the experimental search in this domain which brings to the foreground new mathematical problems; and that is why this correspondence between math and application is at the heart of his classes.
1■ La recherche, february 2018