About Me
Hi! I'm a Ph.D. student at Mila and Université de Montréal, advised by Irina Rish and Eugene Belilovsky. My research focuses on efficient foundation model pre-training through disributed optimization, meta-learning, and continual learning. Previously, I received my MMath in Computer Science from the University of Waterloo, where I was a member of the WISE Lab advised by Krzysztof Czarnecki. My research focused on developing 3D computer vision algorithms for object re-identification from point clouds. Before that, I received by BSc in Computer Science from Concordia University, where I completed software engineering internships at Accedian and Morgan Stanley. I also completed research internships in the CLaC Lab advised by Sabine Bergler and at Mila advised by Eugene Belilovsky and Guy Wolf.
Selected Publications

See my Google Scholar page for a complete list of publications.

  • PyLO: Towards Accessible Learned Optimizers in PyTorch
    Paul Janson*, Benjamin Thérien*, Quentin Anthony, Xiaolong Huang, Abhinav Moudgil, Eugene Belilovsky
    CODEML Workshop at the Forty-second International Conference on Machine Learning
    Full Paper In Submission to MLSys 2026
    ArXiv | OpenReview | Code | Tweet Thread
  • MuLoCo: Muon is a practical inner optimizer for DiLoCo
    Benjamin Thérien, Xiaolong Huang, Irina Rish, Eugene Belilovsky
    ICML Workshop on Efficient Systems for Foundation Models (ES-FoMo), 2025
    Full Paper In Submission to MLSys 2026
    ArXiv | OpenReview | Tweet Thread
  • μLO: Compute-Efficient Meta-Generalization of Learned Optimizers
    Benjamin Thérien, Charles-Étienne Joseph, Boris Knyazev, Edouard Oyallon, Irina Rish, Eugene Belilovsky
    (Oral) NeurIPS 2024 Workshop on Optimization for Machine Learning
    Full paper in submission to ICLR 2026
    ArXiv | OpenReview | Code | Tweet Thread
  • Dense Backpropagation Improves Routing for Sparsely-Gated Mixture-of-Experts
    Ashwinee Panda, Vatsal Baherwani, Zain Sarwar, Benjamin Thérien, Stephen Rawls, Sambit Sahu, Tom Goldstein, Supriyo Chakraborty
    Advances in Neural Information Processing Systems (NeurIPS), 2025
    ArXiv | OpenReview | Code
  • Continual Pre-training of MoEs: How robust is your router?
    Benjamin Thérien, Charles-Étienne Joseph, Zain Sarwar, Ashwinee Panda, Anirban Das, Shi-Xiong Zhang, Stephen Rawls, Sambit Sahu, Eugene Belilovsky, Irina Rish
    Transactions on Machine Learning Research (TMLR), 2025
    ArXiv | OpenReview | Tweet Thread
  • Simple and Scalable Strategies to Continually Pre-train Large Language Models
    Adam Ibrahim*, Benjamin Thérien*, Kshitij Gupta*, Mats Leon Richter, Quentin Anthony, Timothée Lesort, Eugene Belilovsky, Irina Rish
    Transactions on Machine Learning Research (TMLR), 2024
    ArXiv | OpenReview | Code | Tweet Thread | Video
  • Meta-learning Optimizers for Communication-Efficient Learning
    Charles-Étienne Joseph*, Benjamin Thérien*, Abhinav Moudgil, Boris Knyazev, Eugene Belilovsky
    Transactions on Machine Learning Research (TMLR), 2024
    ArXiv | OpenReview | Code
  • Object Re-Identification from Point Clouds
    Benjamin Thérien, Chengjie Huang, Adrian Chow, Krzysztof Czarnecki
    IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024
    WACV | ArXiv | Code
  • Out-of-Distribution Detection for LiDAR-based 3D Object Detection
    Chengjie Huang, Van Duong Nguyen, Vahdat Abdelzad, Christopher Mannes, Luke Rowe, Benjamin Thérien, Rick Salay, Krzysztof Czarnecki
    IEEE 25th International Conference on Intelligent Transportation Systems (ITSC), 2022
    ArXiv
  • (oral: 4.2% of submissions) Parametric Scattering Networks
    Shanel Gauthier*, Benjamin Thérien*], Laurent Alsène-Racicot, Muawiz Chaudhary, Irina Rish, Eugene Belilovsky, Michael Eickenberg, Guy Wolf
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022
    CVPR | ArXiv | Code | Video

†: Oral presenter. *: Equal contribution.