-
A Visual Dive into Conditional Flow Matching
Conditional flow matching (CFM) was introduced by three simultaneous papers at ICLR 2023, through different approaches (conditional matching, rectifying flows and stochastic interpolants).
The main part of this post, Section 2, explains CFM by using both visual intuitions and insights on its probabilistic formulations. Section 1 introduces nomalizing flows; it can be skipped by reader familiar with the topic, or that wants to cover them later. Section 3 opens on the links between CFM and other approaches, and ends with a 'CFM playground'. -
An Illustrated Guide to Automatic Sparse Differentiation
In numerous applications of machine learning, Hessians and Jacobians exhibit sparsity, a property that can be leveraged to vastly accelerate their computation. While the usage of automatic differentiation in machine learning is ubiquitous, automatic sparse differentiation (ASD) remains largely unknown. This post introduces ASD, explaining its key components and their roles in the computation of both sparse Jacobians and Hessians. We conclude with a practical demonstration showcasing the performance benefits of ASD.
-
Analysing The Spectral Biases in Generative Models
Diffusion and GAN models have demonstrated remarkable success in synthesizing high-quality images propelling them into various real-life applications across different domains. However, it has been observed that they exhibit spectral biases that impact their ability to generate certain frequencies and makes it possible to distinguish real images from fake ones. In this blog we analyze these models and attempt to explain the reason behind these biases.
-
Avoid Overclaims - Summary of Complexity Bounds for Algorithms in Minimization and Minimax Optimization
In this blog, we revisit the convergence analysis of first-order algorithms in minimization and minimax optimization problems. Within the classical oracle model framework, we review the state-of-the-art upper and lower bound results in various settings, aiming to identify gaps in existing research. With the rapid development of applications like machine learning and operation research, we further identify some recent works that revised the classical settings of optimization algorithms study.
-
Building Blocks of Differentially Private Training
In this blog, we introduce the building blocks of training a neural network in a differentially private way.
-
Can LLM Simulations Truly Reflect Humanity? A Deep Dive
Simulation powered by Large Language Models (LLMs) has become a promising method for exploring complex human social behaviors. However, the application of LLMs in simulations presents significant challenges, particularly regarding their capacity to accurately replicate the complexities of human behaviors and societal dynamics, as evidenced by recent studies highlighting discrepancies between simulated and real-world interactions. This blog rethinks LLM-based simulations by emphasizing both their limitations and the necessities for advancing LLM simulations. By critically examining these challenges, we aim to offer actionable insights and strategies for enhancing the applicability of LLM simulations in human society in the future.
-
Cross-Layer Orthogonal Vectors Pruning and Fine-Tuning
The absorb operation utilized in DeepSeek, which merges Query-Key and Value-Output weight matrices during inference, significantly increases parameter count and computational overhead. We observe that these absorbed matrices inherently exhibit low-rank structures. Motivated by this insight, we introduce CLOVER (Cross-Layer Orthogonal Vectors), a method that factorizes these matrices into four head-wise orthogonal matrices and two sets of singular values without any loss of information. By eliminating redundant vectors, CLOVER reduces the encoder parameters in Whisper-large-v3 by 46.42% without requiring additional training. Moreover, by freezing singular vectors and fine-tuning only singular values, CLOVER enables efficient full-rank fine-tuning. When evaluated on eight commonsense reasoning tasks with LLaMA-2 7B, CLOVER surpasses existing SoTA methods—LoRA, DoRA, HiRA, and PiSSA—by 7.6%, 5.5%, 3.8%, and 0.7%, respectively.
-
Do not write that jailbreak paper
Jailbreaks are becoming a new ImageNet competition instead of helping us better understand LLM security. The community should revisit their choices and focus on research that can uncover new security vulnerabilities.
-
Do vision models perceive objects like toddlers ?
Despite recent advances in artificial vision systems, humans are still more data-efficient at learning strong visual representations. Psychophysical experiments suggest that toddlers develop fundamental visual properties between the ages of one and three, which affect their perceptual system for the rest of their life. They begin to recognize impoverished variants of daily objects, pay more attention to the shape of an object to categorize it, prefer objects in specific orientations and progressively generalize over the configural arrangement of objects' parts. This post examines whether these four visual properties also emerge in off-the-shelf machine learning (ML) vision models. We reproduce and complement previous studies by comparing toddlers and a large set of diverse pre-trained vision models for each visual property. This way, we unveil the interplay between these visual properties and highlight the main differences between ML models and toddlers. Code is available at (https://github. com/Aubret/BabyML).
-
Does Editing Provide Evidence for Localization?
A basic aspiration for interpretability research in large language models is to localize semantically meaningful behaviors to particular components within the LLM. There are various heuristics for finding candidate locations within the LLM. Once a candidate localization is found, it can be assessed by editing the internal representations at the corresponding localization and checking whether this induces model behavior that is consistent with the semantic interpretion of the localization. The question we address here is, how strong is the evidence provided by such edits? To assess localization, we want to assess the effect of the optimal intervention at a particular location. The key new technical tool is a way of adapting LLM alignment techniques to find such optimal localized edits. With this tool in hand, we give an example where the edit-based evidence for localization appears strong, but where localization clearly fails. Indeed, we find that optimal edits at random localizations can be as effective as aligning the full model. In aggregate, our results suggest that merely observing that localized edits induce targeted changes in behavior provides little to no evidence that these locations actually encode the target behavior.
-
Factual Context Validation and Simplification: A Scalable Method to Enhance GPT Trustworthiness and Efficiency
As the deployment of Large Language Models (LLMs) like GPT expands across domains, mitigating their susceptibility to factual inaccuracies or hallucinations becomes crucial for ensuring reliable performance. This blog post introduces two novel frameworks that enhance retrieval-augmented generation (RAG): one uses summarization to achieve a maximum of 57.7% storage reduction, while the other preserves critical information through statement-level extraction. Leveraging DBSCAN clustering, vectorized fact storage, and LLM-driven fact-checking, the pipelines deliver higher overall performance across benchmarks such as PubMedQA, SQuAD, and HotpotQA. By optimizing efficiency and accuracy, these frameworks advance trustworthy AI for impactful real-world applications.
-
Fine-Tuning Token-Based Large Multimodal Models: What Works, What Doesn’t and What's Next
In this blog post, we explore the advancements and challenges in fine-tuning unified token-based large multimodal models, focusing on the Chameleon architecture and its fine-tuned variant, Anole. Released in 2024, these models exemplify a modern approach for integrating various data modalities through tokens, simplifying modal fusion and leveraging established techniques from large language models. The post details our research efforts to reveal what is important, what is mistaken, and what is worth exploring in future research during the fine-tuning process.