A brief description of my research -- main research subjects
The paper numbers below refer to my publication list
Statistical inference for biological sequence analysis
Please check our group webpage
See also my talks Rough fitness landscapes: from protein evolution to protein design and Using generative models to describe protein evolution
Modern sequencing techniques are producing an explosion in the amount of available sequences of biological molecules, mostly proteins and RNA. Such molecules belong to the most interesting complex systems in nature; they are essential in almost all biological processes. They robustly fold into well-defined three-dimensional structures, which in turn form the basis of their functionality. This sequence-structure-function relationship has, over several decades now, attracted substantial research in biological physics.
A fascinating approach to biological sequence analysis has emerged over the last years: in the course of evolution, biological sequences change because of mutations. We can now easily observe the sequence variability across families of so-called homologous molecules, e.g. evolutionarily related proteins of equivalent function coming from different species of common ancestry. Two homologous proteins may differ by 70-80% of their amino acids without any substantial change in structure and function. However, sequence variability is not fully random: most mutations are deleterious, reducing protein stability or functionality; and they are suppressed by natural selection. Only protein variants of similar, or even better functionality are maintained.
We conclude that the molecule's structure and function constrain the viable sequence space, which can be explored by evolution. Inverting this argument, the empirically observed sequence variability of homologous molecules contains information about such evolutionary constraints. One can then firmly establish the concept of data-driven "sequence landscapes", i.e. a family of models that describe the statistical properties of protein or RNA families. The parameters of these models can be obtained via inference or learning procedures, and they can be used to extract useful information on molecular structure and function, on the effects of mutations, and to generate new artificially-designed molecules with specific properties.
I am currently working, in close collaboration with the group of Martin Weigt, on improving the state-of-the-art alignment techniques by including coevolutionary information, on designing good generative models for protein design, and using sequence landscapes to model in vitro evolutionary experiments.
Agent-based models of the macroeconomy
Relevant papers: 76, 78, 93, 99, 123This project started in the framework of a large interdisciplinary European collaboration funded by the project CRISIS. The aim of our work was to explore the possible types of phenomena that simple macroeconomic Agent-Based models (ABM) can reproduce. We proposed a methodology, inspired by statistical physics, that characterizes a model through its phase diagram in the space of parameters. In fact, it might be surprising that, despite the huge number of ABM studies, people have mostly focused on calibration issues against data, while systematic studies of the phase diagram and of finite-size-scaling effects have not been performed very often. We believe that understanding generically the phase diagram of these models is extremely important to gain qualitative insight before attempting any calibration against real data.
As a case study, we considered the large macro-economic fluctuations observed in the so-called "Mark I" ABM that was the core of the CRISIS project. Our major finding was the generic existence of a phase transition between a "good economy" where unemployment is low, and a "bad economy" where unemployment is high. We then introduced a simpler framework that allowed us to show that this transition is robust against many modifications of the model, and is generically induced by an asymmetry between the rate of hiring and the rate of firing of the firms. The unemployment level remains small until a tipping point, beyond which the economy suddenly collapses. If the parameters are such that the system is close to this transition, any small fluctuation is amplified as the system jumps between the two equilibria. We have explored several natural extensions of the model. One is to introduce a bankruptcy threshold, limiting the firms maximum level of debt-to-sales ratio. This leads to a rich phase diagram with, in particular, a region where acute endogenous crises occur, during which the unemployment rate shoots up before the economy can recover. We introduced a simple Fokker-Planck model that allowed us to understand the emergence of these endogenous crises in great details through an analytic solution.
Finally, we generalised the stylised macroeconomic Agent-Based model by introducing simple wage policies. This leads to inflation (in the "good" phase) or deflation (in the "bad" phase), but leaves the overall phase diagram of the model essentially unchanged. We then introduced a Central Bank, that sets the interest rate such as to steer the economy towards a prescribed inflation and unemployment level, with the aim of investigating the role and efficacy of monetary policy. Our major finding is that provided the policy of the Central Bank is not too aggressive (i.e. if the bank does not respond too quickly to fluctuations) the Central Bank is successful in achieving its goals. However, the existence of different equilibrium states of the economy, separated by phase boundaries, can cause the monetary policy itself to trigger instabilities and be counter-productive. In other words, the Central Bank must navigate in a narrow window: too little is not enough, too much leads to instabilities and wildly oscillating economies. This conclusion strongly contrasts with the prediction of standard economic models in the so-called Dynamical Stochastic General Equilibrium (DSGE) class, that are widely used by Central Banks across the world.
Mean field theory of the glass transition, jamming and amorphous packings of hard spheres
Divulgative papers: 31, 86
Review paper: 40, 91
A detailed description of my work in this field can be found here.
See also my talk Jamming and hard sphere glasses.Hard spheres are ubiquitous in condensed matter: they have been used as models for liquids, crystals, colloidal systems, granular systems, and powders. Packings of hard spheres are of even wider interest, as they are related to important problems in information theory, such as digitalization of signals, error correcting codes, and optimization problems. In three dimensions the densest packing of identical hard spheres has been proven to be the FCC lattice, and it is conjectured that the closest packing is ordered (a regular lattice, e.g, a crystal) in low enough dimension. Still, amorphous packings have attracted a lot of interest, because for polydisperse colloids and granular materials the crystalline state is not obtained in experiments for kinetic reasons. We constructed in papers [14, 17] a theory of amorphous packings, and more generally glassy states, of hard spheres that is based on the replica method: this theory gives predictions on the structure and thermodynamics of these states. In dimensions between two and six these predictions can be successfully compared with numerical simulations [14, 24]. In paper [17] we also discussed the limit of large dimension where an exact solution is possible. Applications to hard spheres on the hypercube are discussed in [19] and the theory is extended to binary mixtures in [37] and to soft spheres in [47]. In the review paper [40] we improved the discussion of the large dimension limit and we obtained new results on the correlation function and the contact force distribution in three dimensions. We also tried to clarify the main assumptions that are beyond our theory and in particular the relation between our static computation and the dynamical procedures used to construct amorphous packings. In paper [29] we compared mean field theory with a collection of available experimental data.