Experience

Download full CV

Postdoctoral Researcher
RIKEN Center for Advanced Intelligence Project (AIP)
April 2024 – Present ⚲ Tokyo, Japan
High-Dimensional Structure Theory Team | Team leader: Prof. Masaaki Imaizumi | Joint team between RIKEN and the University of Tokyo
- Lead several lines of research analyzing LLMs, with a focus on transformer optimization, in-context learning, and general emergent abilities.
- Secured a JSPS Grant-in-Aid for Early-Career Scientists to support the development of a theoretical framework for in-context learning based on singular learning theory.
- Engage in student supervision and collaboration within the joint RIKEN–University of Tokyo lab and beyond.
Project Researcher
The University of Tokyo
November 2023 – March 2024 ⚲ Tokyo, Japan
Matsuo Lab
- Explored theoretical directions in LLMs, including work on training dynamics now continued at RIKEN.
Applied Scientist Intern
Amazon
November 2022 – March 2023 ⚲ Berlin, Germany
Natural Language Processing Team
- Developed an efficient memory-augmented transformer architecture for conversational AI, enabling processing of sequences of unbounded length.
Research Software Engineer
Microsoft Research
December 2018 – December 2019 ⚲ Cambridge, UK
TrueSkill project for estimating player skill based on Bayesian networks; used by Halo, the game selling 81MM+ copies. A team of 5 people, research led by Dr. Tom Minka
- Explored system applicability to alternative game designs, obtaining quantitative results.
- Quickly ramped up in C# to enhance metrics for performance analysis and improve library design.
Research Assistant
RIKEN Center for Advanced Intelligence Project (AIP)
November 2017 – August 2018 ⚲ Tokyo, Japan
Approximate Bayesian Inference Team. Team leader: Dr. M. Emtiyaz Khan
- Derived a novel continual learning method based on the approximate variational inference algorithm for Bayesian neural networks and wrote a paper accepted to a NeurIPS workshop.
Site Reliability Engineering (SRE) Intern
Google
July 2017 – October 2017 ⚲ Dublin, Ireland
- Prototyped a deep learning system for identifying spam in SRE alerts based on anomaly detection techniques.
Research Student
The University of Tokyo
April 2016 – September 2016 ⚲ Tokyo, Japan
Yasuo Kuniyoshi’s Laboratory
- Studied emotion recognition in images and implemented a baseline solution.
Software Engineer
Yandex
December 2014 – March 2016 ⚲ Minsk, Belarus
Backend team of around 20 people working on the Yandex search engine written mostly in C++, the most popular search engine in Russia at the time, with 100MM+ daily queries
- Accelerated loading and reduced memory consumption of the search engine.
- Designed and implemented an approach to optimize data center balancing.
Software Engineering Intern
Yandex
July 2014 – August 2014 ⚲ Moscow, Russia
- Discovered and implemented a method to reduce search database size without information loss, saving approximately $1M in storage costs.
Software Engineer
CheckPoint
July 2013 – November 2014 ⚲ Minsk, Belarus
- Diagnosed and resolved customer-reported issues in a C codebase for enterprise media encryption and anti-malware products.

Education

Ph.D. in Computer Science
Max Planck Institute for Mathematics in the Sciences
2023 ⚲ Leipzig, Germany
Thesis: Expected Complexity and Gradients of Deep Maxout Neural Networks and Implications to Parameter Initialization. Supervisor: Prof. Guido Montúfar (Group Leader at MPI MiS and Professor at UCLA).
Degree awarded by Leipzig University. Research conducted at MPI MiS with parallel enrollment in IMPRS.
- Proposed a stable initialization method for deep maxout networks, achieving over 40% accuracy improvement; published in ICML.
- Proved that expected complexity grows polynomially with depth in maxout networks despite earlier assumptions of exponential growth; published in NeurIPS.
- Contributed to a study on loss landscapes of ReLU networks; published in TMLR.
Master of Information Science and Technology
The University of Tokyo
2018 ⚲ Tokyo, Japan
Thesis: Variational Inference for Continual Learning by using Weight-Perturbation in Adam. Supervisor: Prof. Tatsuya Harada
Specialist in Computer Science
Belarusian State University
2015 ⚲ Minsk, Belarus
Thesis: Algorithms for recognition of circular objects and elements on them (in case of coins). Supervisor: Prof. Yuri Svirid

Technical Skills

Python

PyTorch

TensorFlow

Parallel Computing (MPI)

Grants & Scholarships

JSPS Grant-in-Aid for Early-Career Scientists

JSPS ∙ April 2025 – March 2027

Secured a grant for 4,810,000¥ (approximately 33,000$) to support the work on a theoretical framework for in-context learning development in LLMs.

Japanese Ministry of Education, Culture, Sports, Science and Technology (MEXT) Scholarship

MEXT ∙ April 2016 – August 2018

One of two recipients from Belarus.

Languages

English

Fluent

Japanese

Business level, JLPT N2

Russian

Native

Belarusian

Native

Experience

Postdoctoral Researcher

Project Researcher

Applied Scientist Intern

Research Software Engineer

Research Assistant

Site Reliability Engineering (SRE) Intern

Research Student

Software Engineer

Software Engineering Intern