About Me
Hi! I'm a Ph.D. student at Mila and Université de Montréal, advised by Irina Rish and Eugene Belilovsky. My research focuses on efficient foundation model pre-training through disributed optimization, meta-learning, and continual learning. Previously, I received my MMath in Computer Science from the University of Waterloo, where I was a member of the WISE Lab advised by Krzysztof Czarnecki. My research focused on developing 3D computer vision algorithms for object re-identification from point clouds. Before that, I received by BSc in Computer Science from Concordia University, where I completed software engineering internships at Accedian and Morgan Stanley. I also completed research internships in the CLaC Lab advised by Sabine Bergler and at Mila advised by Eugene Belilovsky and Guy Wolf.
Selected Publications
See my Google Scholar page for a complete list of publications.
- PyLO: Towards Accessible Learned Optimizers in PyTorch
Paul Janson*, Benjamin Thérien*, Quentin Anthony, Xiaolong Huang, Abhinav Moudgil, Eugene Belilovsky
CODEML Workshop at the Forty-second International Conference on Machine Learning
Full Paper In Submission to MLSys 2026
ArXiv | OpenReview | Code | Tweet Thread
- MuLoCo: Muon is a practical inner optimizer for DiLoCo
Benjamin Thérien, Xiaolong Huang, Irina Rish, Eugene Belilovsky
ICML Workshop on Efficient Systems for Foundation Models (ES-FoMo), 2025
Full Paper In Submission to MLSys 2026
ArXiv | OpenReview | Tweet Thread
- μLO: Compute-Efficient Meta-Generalization of Learned Optimizers
Benjamin Thérien, Charles-Étienne Joseph, Boris Knyazev, Edouard Oyallon, Irina Rish, Eugene Belilovsky
(Oral) NeurIPS 2024 Workshop on Optimization for Machine Learning
Full paper in submission to ICLR 2026
ArXiv | OpenReview | Code | Tweet Thread
- Dense Backpropagation Improves Routing for Sparsely-Gated Mixture-of-Experts
Ashwinee Panda, Vatsal Baherwani, Zain Sarwar, Benjamin Thérien, Stephen Rawls, Sambit Sahu, Tom Goldstein, Supriyo Chakraborty
Advances in Neural Information Processing Systems (NeurIPS), 2025
ArXiv | OpenReview | Code
- Continual Pre-training of MoEs: How robust is your router?
Benjamin Thérien, Charles-Étienne Joseph, Zain Sarwar, Ashwinee Panda, Anirban Das, Shi-Xiong Zhang, Stephen Rawls, Sambit Sahu, Eugene Belilovsky, Irina Rish
Transactions on Machine Learning Research (TMLR), 2025
ArXiv | OpenReview | Tweet Thread
- Simple and Scalable Strategies to Continually Pre-train Large Language Models
Adam Ibrahim*, Benjamin Thérien*, Kshitij Gupta*, Mats Leon Richter, Quentin Anthony, Timothée Lesort, Eugene Belilovsky, Irina Rish
Transactions on Machine Learning Research (TMLR), 2024
ArXiv | OpenReview | Code | Tweet Thread | Video
- Meta-learning Optimizers for Communication-Efficient Learning
Charles-Étienne Joseph*, Benjamin Thérien*, Abhinav Moudgil, Boris Knyazev, Eugene Belilovsky
Transactions on Machine Learning Research (TMLR), 2024
ArXiv | OpenReview | Code
- Object Re-Identification from Point Clouds
Benjamin Thérien, Chengjie Huang, Adrian Chow, Krzysztof Czarnecki
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024
WACV | ArXiv | Code
- Out-of-Distribution Detection for LiDAR-based 3D Object Detection
Chengjie Huang, Van Duong Nguyen, Vahdat Abdelzad, Christopher Mannes, Luke Rowe, Benjamin Thérien, Rick Salay, Krzysztof Czarnecki
IEEE 25th International Conference on Intelligent Transportation Systems (ITSC), 2022
ArXiv
- (oral: 4.2% of submissions) Parametric Scattering Networks
Shanel Gauthier*, Benjamin Thérien*†], Laurent Alsène-Racicot, Muawiz Chaudhary, Irina Rish, Eugene Belilovsky, Michael Eickenberg, Guy Wolf
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022
CVPR | ArXiv | Code | Video
†: Oral presenter. *: Equal contribution.
