
A Hitchhiker's Guide to Momentum
Polyak momentum is one of the most iconic methods in optimization. Despite it's simplicity, it features rich dynamics that depend both on the stepsize and momentum parameter. In this blog post we identify the different regions of the parameter space and discuss their convergence properties using the theory of Chebyshev polynomials.

Autoregressive Renaissance in Neural PDE Solvers
Recent developments in the field of neural partial differential equation (PDE) solvers have placed a strong emphasis on neural operators. However, the paper Message Passing Neural PDE Solver by Brandstetter et al. published in ICLR 2022 revisits autoregressive models and designs a message passing graph neural network that is comparable with or outperforms both the stateoftheart Fourier Neural Operator and traditional classical PDE solvers in its generalization capabilities and performance. This blog post delves into the key contributions of this work, exploring the strategies used to address the common problem of instability in autoregressive models and the design choices of the message passing graph neural network architecture.

Data Poisoning is Hitting a Wall
In this post, we look at the paper 'Data Poisoning Won't Save You From Facial Recognition', discuss the impact of the work, and additionally look at how this work fares in the current state of adversarial machine learning. Being a blog post as opposed to a traditional paper, we try to avoid inundating the reader with mathematical equations and complex terminologies. Instead, we aim to put forth this work's primary concept and implications, along with our observations, in a clear, concise manner. Don't want to go through the entire post? Check out the TL;DR at the end for a quick summary.

Decay No More
Weight decay is among the most important tuning parameters to reach high accuracy for largescale machine learning models. In this blog post, we revisit AdamW, the weight decay version of Adam, summarizing empirical findings as well as theoretical motivations from an optimization perspective.

How does the inductive bias influence the generalization capability of neural networks?
The blog post discusses how memorization and generalization are affected by extreme overparameterization. Thereforeit explains the overfitting puzzle in machine learning and how the inductive bias can help to understand the generalization capability of neural networks.

How much metalearning is in imagetoimage translation?
...in which we find a connection between metalearning literature and a paper studying how well CNNs deal with nuisance transforms in a classimbalanced setting. Closer inspection reveals a surprising amount of similarity  from metainformation to loss functions. This implies that the current conception of metalearning might be too narrow.

Practical Applications of Bsuite For Reinforcement Learning
In 2019, researchers at DeepMind published a suite of reinforcement learning environments called Behavior Suite for Reinforcement Learning, or bsuite. Each environment is designed to directly test a core capability of a general reinforcement learning agent, such as its ability to generalize from past experience or handle delayed rewards. In this blog post, we extend their work by providing specific examples of how bsuite can address common challenges faced by reinforcement learning practitioners during the development process.

Rethinking the Implementation Tricks and Monotonicity Constraint in Cooperative Multiagent Reinforcement Learning
QMIX, a very classical multiagent reinforcement learning (MARL) algorithm, is often considered to be a weak performance baseline due to its representation capability limitations. However, we found that by improving the implementation techniques of QMIX we can enable it to achieve stateoftheart on the StarCraft MultiAgent Challenge (SMAC) testbed. Furthermore, the key factor of the monotonicity constraint of QMIX was found in this post, we tried to explain its role and corroborated its superior performance by combining it with another actorcritic style algorithm. We have opensourced the code at https://github.com/hijkzzz/pymarl2 for researchers to evaluate the effects of these proposed techniques.

Strategies for Classification Layer Initialization in ModelAgnostic MetaLearning
This blog post discusses different strategies for initializing the classification layers parameters before finetuning on a new task in ModelAgnostic MetaLearning. Each of the strategies in question has emerged from a different problemand it will be analyzed whether one approach can solve the problems addressed by the other approaches.

Thinking Like Transformers
Thinking like Transformers proposes a simple language for coding with attentionlike primitives. Using this language, we consider a challenging set of puzzles to gain intuition for how Transformer could implement basic algorithms.

Universality of Neural Networks on Sets vs. Graphs
Universal function approximation is one of the central tenets in theoretical deep learning research. It is the question of whether a specific neural network architecture is, in theory, able to approximate any function of interest. The ICLR paper “How Powerful are Graph Neural Networks?” shows that mathematically analysing the constraints of an architecture as a universal function approximator and alleviating these constraints can lead to more principled architecture choices, performance improvements, and longterm impact on the field. Specifically in the fields of learning on sets and learning on graphs, universal function approximation is a wellstudied property. The two fields are closely linked because the need for permutation invariance in both cases leads to similar building blocks. However, we argue that these two fields have sometimes evolved in parallel, not fully exploiting their synergies. This post aims at bringing these two fields closer together, particularly from the perspective of universal function approximation.