I’m a PhD student at the Artificial Intelligence and Machine Learning Lab, TU Darmstadt. My main research interests cover a broad range of Machine Learning related topics such as deep models, tractable probabilistic models such as probabilistic circuits, and their applications. In specific, I work on bridging the gap between probabilistic circuits and deep neural networks. We want to push the limits of probabilistic circuits and aim to combine their strenghts of tractable flexibility with the modeling capacity of neural networks.
Note: Until 2022 known as Steven Lang.
Contact: steven (dot) braun (at) cs (dot) tu-darmstadt (dot) de
Access to pre-trained models has recently emerged as a standard across numerous machine learning domains. Unfortunately, access to the original data the models were trained on may not equally be granted. This makes it tremendously challenging to fine-tune, compress models, adapt continually, or to do any other type of data-driven update. We posit that original data access may however not be required. Specifically, we propose Contrastive Abductive Knowledge Extraction (CAKE), a model-agnostic knowledge distillation procedure that mimics deep classifiers without access to the original data. To this end, CAKE generates pairs of noisy synthetic samples and diffuses them contrastively toward a model’s decision boundary. We empirically corroborate CAKE’s effectiveness using several benchmark datasets and various architectural choices, paving the way for broad application.
Probabilistic Circuits That Know What They Don’t Know
Fabrizio Ventola*, Steven Braun*, Zhongjie Yu, Martin Mundt, and Kristian Kersting
Proceedings of the 39th Conference on Uncertainty in Artificial Intelligence (UAI), 2023
Probabilistic circuits (PCs) are models that allow exact and tractable probabilistic inference. In contrast to neural networks, they are often assumed to be well-calibrated and robust to out-of-distribution (OOD) data. In this paper, we show that PCs are in fact not robust to OOD data, i.e., they don’t know what they don’t know. We then show how this challenge can be overcome by model uncertainty quantification. To this end, we propose tractable dropout inference (TDI), an inference procedure to estimate uncertainty by deriving an analytical solution to Monte Carlo dropout (MCD) through variance propagation. Unlike MCD in neural networks, which comes at the cost of multiple network evaluations, TDI provides tractable sampling-free uncertainty estimates in a single forward pass. TDI improves the robustness of PCs to distribution shift and OOD data, demonstrated through a series of experiments evaluating the classification confidence and uncertainty estimates on real-world data.
Towards Coreset Learning in Probabilistic Circuits
Martin Trapp, Steven Lang, Aastha Shah, Martin Mundt, Kristian Kersting, and Arno Solin
In The 5th Workshop on Tractable Probabilistic Modeling (UAI), 2022
Probabilistic circuits (PCs) are a powerful family of tractable probabilistic models, guaranteeing efficient and exact computation of many probabilistic inference queries. However, their sparsely structured nature makes computations on large data sets challenging to perform. Recent works have focused on tensorized representations of PCs to speed up computations on large data sets. In this work, we present an orthogonal approach by sparsifying the set of n observations and show that finding a coreset of k≪n data points can be phrased as a monotone submodular optimisation problem which can be solved greedily for a deterministic PCs of |\G| nodes in \mathcalO(k \,n |\G|). Finally, we verify on a series of data sets that our greedy algorithm outperforms random selection.
CLEVA-Compass: A Continual Learning EValuation Assessment Compass to Promote Research Transparency and Comparability
Martin Mundt, Steven Lang, Quentin Delfosse, and Kristian Kersting
In International Conference on Learning Representations (ICLR), 2022
What is the state of the art in continual machine learning? Although a natural question for predominant static benchmarks, the notion to train systems in a life- long manner entails a plethora of additional challenges with respect to set-up and evaluation. The latter have recently sparked a growing amount of critiques on prominent algorithm-centric perspectives and evaluation protocols being too nar- row, resulting in several attempts at constructing guidelines in favor of specific desiderata or arguing against the validity of prevalent assumptions. In this work, we depart from this mindset and argue that the goal of a precise formulation of desiderata is an ill-posed one, as diverse applications may always warrant distinct scenarios. Instead, we introduce the Continual Learning EValuation Assessment Compass: the CLEVA-Compass. The compass provides the visual means to both identify how approaches are practically reported and how works can simultane- ously be contextualized in the broader literature landscape. In addition to promot- ing compact specification in the spirit of recent replication trends, it thus provides an intuitive chart to understand the priorities of individual systems, where they resemble each other, and what elements are missing towards a fair comparison.
Elevating Perceptual Sample Quality in Probabilistic Circuits through Differentiable Sampling
Steven Lang, Martin Mundt, Fabrizio Ventola, Robert Peharz, and Kristian Kersting
In Proceedings of Machine Learning Research, Workshop on Preregistration in Machine Learning (NeurIPS), 2022
Deep generative models have seen a dramatic improvement in recent years, due to the use of alternative losses based on perceptual assessment of generated samples. This improvement has not yet been applied to the model class of probabilistic circuits (PCs), presumably due to significant technical challenges concerning differentiable sampling, which is a key requirement for optimizing perceptual losses. This is unfortunate, since PCs allow a much wider range of probabilistic inference routines than main-stream generative models, such as exact and efficient marginalization and conditioning. Motivated by the success of loss reframing in deep generative models, we incorporate perceptual metrics into the PC learning objective. To this aim, we introduce a differentiable sampling procedure for PCs, where the central challenge is the non-differentiability of sampling from the categorical distribution over latent PC variables. We take advantage of the Gumbel-Softmax trick and develop a novel inference pass to smoothly interpolate child samples as a strategy to circumvent non-differentiability of sum node sampling. We initially hypothesized, that perceptual losses, unlocked by our novel differentiable sampling procedure, will elevate the generative power of PCs and improve their sample quality to be on par with neural counterparts like probabilistic auto-encoders and generative adversarial networks. Although our experimental findings empirically reject this hypothesis for now, the results demonstrate that samples drawn from PCs optimized with perceptual losses can have similar sample quality compared to likelihood-based optimized PCs and, at the same time, can express richer contrast, colors, and details. Whereas before, PCs were restricted to likelihood-based optimization, this work has paved the way to advance PCs with loss formulations that have been built around deep neural networks in recent years.
DAFNe: A One-Stage Anchor-Free Deep Model for Oriented Object Detection
Steven Lang, Fabrizio Ventola, and Kristian Kersting
We present DAFNe, a Dense one-stage Anchor-Free deep Network for oriented object detection. As a one-stage model, it performs bounding box predictions on a dense grid over the input image, being architecturally simpler in design, as well as easier to optimize than its two-stage counterparts. Furthermore, as an anchor-free model, it reduces the prediction complexity by refraining from employing bounding box anchors. With DAFNe we introduce an orientation-aware generalization of the center-ness function for arbitrarily oriented bounding boxes to down-weight low-quality predictions and a center-to-corner bounding box prediction strategy that improves object localization performance. Our experiments show that DAFNe outperforms all previous one-stage anchor-free models on DOTA 1.0, DOTA 1.5, and UCAS-AOD and is on par with the best models on HRSC2016.
Einsum Networks: Fast and Scalable Learning of Tractable Probabilistic Circuits
Robert Peharz, Steven Lang, Antonio Vergari, Karl Stelzner, Alejandro Molina, Martin Trapp, Guy Van Den Broeck, Kristian Kersting, and Zoubin Ghahramani
In Proceedings of the 37th International Conference on Machine Learning (ICML), 2020
Probabilistic circuits (PCs) are a promising av- enue for probabilistic modeling, as they permit a wide range of exact and efficient inference rou- tines. Recent “deep-learning-style” implementa- tions of PCs strive for a better scalability, but are still difficult to train on real-world data, due to their sparsely connected computational graphs. In this paper, we propose Einsum Networks (EiNets), a novel implementation design for PCs, improving prior art in several regards. At their core, EiNets combine a large number of arithmetic operations in a single monolithic einsum-operation, leading to speedups and memory savings of up to two orders of magnitude, in comparison to previous implementations. As an algorithmic contribution, we show that the implementation of Expectation- Maximization (EM) can be simplified for PCs, by leveraging automatic differentiation. Further- more, we demonstrate that EiNets scale well to datasets which were previously out of reach, such as SVHN and CelebA, and that they can be used as faithful generative image models.
WekaDeeplearning4j: A deep learning package for Weka based on Deeplearning4j
Steven Lang, Felipe Bravo-Marquez, Christopher Beckham, Mark Hall, and Eibe Frank
Deep learning is a branch of machine learning that generates multi-layered representations of data, commonly using artificial neural networks, and has improved the state-of-the-art in various machine learning tasks (e.g., image classification, object detection, speech recognition, and document classifica- tion). However, most popular deep learning frameworks such as TensorFlow and PyTorch require users to write code to apply deep learning. We present WekaDeeplearning4j, a Weka package that makes deep learning accessible through a graphical user interface (GUI). The package uses Deeplearning4j as its backend, provides GPU support, and enables GUI-based training of deep neural networks such as convolutional and recurrent neural networks. It also provides pre-processing functionality for image and text data.