scholar github twitter linkedin


I am a final year PhD candidate at Mila and University of Montréal, supervised by Simon Lacoste-Julien, and a visiting researcher at Meta working with Michael Rabbat.

Before joining Mila, I received my MSc in Artificial Intelligence at the University of Amsterdam in 2018, under the supervision of Patrick Forré and examined by Max Welling. I hold a BSc in Mathematical Engineering from Universidad EAFIT in Medellin.

Research areas: (constrained) optimization, neural network sparsity, information theory, federated learning, applications of differential/algebraic geometry in machine learning.

My CV is available here.

My name is pronounced Xose Gaʝego Posada [Hoh-seh Gah-jeh-goh Poh-sah-dah] - hear it.
My Dijkstra and Erdős numbers are 4.



  • Apr 5: I will be giving a talk on training sparse neural networks with constrained optimization at the One World Seminar Series on the Mathematics of Machine Learning.

  • Mar 15: Thrilled to be serving as co-General Chair for the LatinX in AI workshop at ICML 2023! For more details on the calls for papers, reviewers or volunteers, please visit the LatinX@ICML2023 website.

  • Mar 5: I will be attending Khipu 2023 in in Montevideo, Uruguay.

  • Jan 9: I have started a new role as a visiting researcher at Meta working on scalable adaptive optimization methods with Mike Rabbat!


  • Dec 3: I will be attending my first ever NeurIPS in Montréal this week!

  • Sept 1: I joined Mila, one of the world's largest academic labs working in DL, as a PhD student under Simon Lacoste-Julien's supervision.

  • Aug 24: I successfully defended my MSc thesis at the University of Amsterdam on Simplicial Autoencoders, with the invaluable guidance of Patrick Forré! 🎉


  1. A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale. H-J. M. Shi, T.-H. Lee, S. Iwasaki, J. Gallego-Posada,, Z. Li, K. Rangadurai, D. Mudigere, M. Rabbat. arXiv preprint, 2023.

  2. Controlled Sparsity via Constrained Optimization or: How I Learned to Stop Tuning Penalties and Love Constraints. J. Gallego-Posada, J. Ramirez, A. Erraqabi, Y. Bengio and S. Lacoste-Julien. NeurIPS, 2022.

  3. L0onie: Compressing COINs with L0-constraints. J. Ramirez and J. Gallego-Posada. Sparsity in Neural Networks Workshop, 2022.

  4. Equivariant Mesh Attention Networks. S. Basu, J. Gallego-Posada, F. Viganò, J. Rowbottom and T. Cohen. TMLR, 2022.

  5. Flexible Learning of Sparse Neural Networks via Constrained L0 Regularization. J. Gallego-Posada, J. Ramirez and A. Erraqabi.. NeurIPS 2021 LatinX in AI Workshop, 2021.

  6. Simplicial Regularization. J. Gallego-Posada and P. Forré. ICLR 2021 Workshop on Geometrical and Topological Representation Learning, 2021.

  7. How to make your optimizer generalize better. S. Vaswani, R. Babanezhad, J. Gallego-Posada, A. Mishkin, S. Lacoste-Julien and N. Le Roux. Contributed talk at NeurIPS 2020 OPT Workshop on Optimization for Machine Learning, 2020. -- Previous version: To Each Optimizer a Norm, To Each Norm its Generalization.

  8. GAIT: A Geometric Approach to Information Theory. J. Gallego-Posada, A. Vani, M. Schwarzer and S. Lacoste-Julien. AISTATS 2020 (Previous version presented as an oral at NeurIPS 2019 Workshop on Information Theory and Machine Learning) [talk]

  9. Simplicial AutoEncoders: A connection between Algebraic Topology and Probabilistic Modelling. J. Gallego-Posada and P. Forré. MSc Thesis, 2018.

  10. Beyond Local Nash Equilibria for Adversarial Network. F. Oliehoek, R. Savani, J. Gallego-Posada, E. van der Pol and R. Groß. Benelearn, 2018.

  11. Detection and Diagnosis of Breast Tumors using Deep Convolutional Neural Networks. J. Gallego-Posada, D. Montoya, and O. Quintero. Proceedings of the XVII Latin American Conference on Automatic Control, Universidad EAFIT, 2016, pp. 11–17.

  12. Interval Analysis and Optimization Applied to Parameter Estimation under Uncertainty. J. Gallego-Posada and M. Puerta. Boletim da Sociedade Paranaense de Matemática, vol. 36, no. 2, pp. 107-124, 2018.

  13. Statistical Software Reliability Models. J. Gallego-Posada and F. Zuluaga. Data Analytics Applications in Latin America, 2017.


  • Isabel Urrego - Undergraduate research project 2022

  • Daniel Otero - Undergraduate research project 2022

  • Juan Ramirez - Undergraduate research project 2020-2021; internship at Mila; now PhD student at Mila, University of Montreal

Teaching Assistantships

  • Winter 20, 21, 22 and 23: Theoretical Principles for Deep Learning by Ioannis Mitliagkas

    • This is an advanced graduate class for students who want to engage in theory-driven deep learning research.

    • Topics: Convex optimization, smooth games, informatio theory, statistical learning theory. Visit the course website for the full syllabus.

    • Check out the recording of my 2020 online lecture on Reproducing Kernel Hilbert Spaces!

  • Fall 20 and Fall 19: Probabilistic Graphical Models by Simon Lacoste-Julien

    • This course is centered around the formalism of probabilistic graphical models as a tool to encode probability distributions over numerous interacting random variables.

    • Topics: Graphical models: training and inference algorithms, variational inference, exponential families, information theory. Visit the course website for the full syllabus.

    • These are the slides for my 2020 and 2019 guest lectures on Bayesian Non-Parametrics: Gaussian and Dirichlet Processes.

  • 2017: Computational Intelligence - Machine Learning by Evert Haasdijk at the Vrjie Universiteit Amsterdam.

© Jose Gallego - Last updated: Sep-2023