publications | Çağatay Yıldız

2024

submitted

Identifying latent state transition in non-linear dynamical systems

Çağlar Hızlı , Çağatay Yıldız , Matthias Bethge , and 2 more authors

In Advances in Neural Information Processing Systems , 2024

Abs PDF

This work aims to improve generalization and interpretability of dynamical systems by recovering the underlying lower-dimensional latent states and their time evolutions. Previous work on disentangled representation learning within the realm of dynamical systems focused on the latent states, possibly with linear transition approximations. As such, they cannot identify nonlinear transition dynamics, and hence fail to reliably predict complex future behavior. Inspired by the advances in nonlinear ICA, we propose a state-space modeling framework in which we can identify not just the latent states but also the unknown transition function that maps the past states to the present. We introduce a practical algorithm based on variational auto-encoders and empirically demonstrate in realistic synthetic settings that we can (i) recover latent state dynamics with high accuracy, (ii) correspondingly achieve high future prediction accuracy, and (iii) adapt fast to new environments.
submitted

Investigating Continual Pretraining in Large Language Models: Insights and Implications

Çağatay Yıldız , Nishaanth Kanna Ravichandran , Matthias Bethge , and 1 more author

In Empirical Methods in Natural Language Processing , 2024

Abs PDF

This paper studies the evolving domain of Continual Learning (CL) in large language models (LLMs), with a focus on developing strategies for efficient and sustainable training. Our primary emphasis is on continual domain-adaptive pretraining, a process designed to equip LLMs with the ability to integrate new information from various domains while retaining previously learned knowledge and enhancing cross-domain knowledge transfer without relying on domain-specific identification. Unlike previous studies, which mostly concentrate on a limited selection of tasks or domains and primarily aim to address the issue of forgetting, our research evaluates the adaptability and capabilities of LLMs to changing data landscapes in practical scenarios. To this end, we introduce a new benchmark designed to measure the adaptability of LLMs to these evolving data environments, offering a comprehensive framework for evaluation. We examine the impact of model size on learning efficacy and forgetting, as well as how the progression and similarity of emerging domains affect the knowledge transfer within these models. Our findings uncover several key insights: (i) when the sequence of domains shows semantic similarity, continual pretraining enables LLMs to better specialize in the current domain compared to stand-alone fine-tuning, (ii) training across a diverse range of domains enhances both backward and forward knowledge transfer, and (iii) smaller models are particularly sensitive to continual pretraining, showing the most significant rates of both forgetting and learning. We posit that our research marks a shift towards establishing a more realistic benchmark for investigating CL in LLMs, and has the potential to play a key role in guiding the direction of future research in the field.
CoLLAs

Infinite dSprites for Disentangled Continual Learning: Separating Memory Edits from Generalization

Sebastian Dziadzio , Çağatay Yıldız , Gido Ven , and 3 more authors

In Lifelong Learning Agents , 2024

Abs PDF

The ability of machine learning systems to learn continually is hindered by catastrophic forgetting, the tendency of neural networks to overwrite existing knowledge when learning a new task. Continual learning methods alleviate this problem through regularization, parameter isolation, or rehearsal, but they are typically evaluated on benchmarks comprising only a handful of tasks. In contrast, humans are able to learn continually in dynamic, open-world environments, effortlessly achieving one-shot memorization of unfamiliar objects and reliably recognizing them under various transformations. To make progress towards closing this gap, we introduce Infinite dSprites, a parsimonious tool for creating continual classification and disentanglement benchmarks of arbitrary length and with full control over generative factors. We show that over a sufficiently long time horizon, the performance of all major types of continual learning methods deteriorates on this simple benchmark. Thus, Infinite dSprites highlights an important aspect of continual learning that has not received enough attention so far: given a finite modelling capacity and an arbitrarily long learning horizon, efficient learning requires memorizing class-specific information and accumulating knowledge about general mechanisms. In a simple setting with direct supervision on the generative factors, we show how learning class-agnostic transformations offers a way to circumvent catastrophic forgetting and improve classification accuracy over time. Our approach sets the stage for continual learning over hundreds of tasks with explicit control over memorization and forgetting, emphasizing open-set classification and one-shot generalization.

2023

NeurIPS

Invariant Neural Ordinary Differential Equations

Ilze Amanda Auzina , Çağatay Yıldız , Sara Magliacane , and 2 more authors

In Advances in Neural Information Processing Systems , 2023

Abs PDF

Latent neural ordinary differential equations have been proven useful for learning non-linear dynamics of arbitrary sequences. In contrast with their mechanistic counterparts, the predictive accuracy of neural ODEs decreases over longer prediction horizons (Rubanova et al., 2019). To mitigate this issue, we propose disentangling dynamic states from time-invariant variables in a completely data-driven way, enabling robust neural ODE models that can generalize across different settings. We show that such variables can control the latent differential function and/or parameterize the mapping from latent variables to observations. By explicitly modeling the time-invariant variables, our framework enables the use of recent advances in representation learning. We demonstrate this by introducing a straightforward self-supervised objective that enhances the learning of these variables. The experiments on low-dimensional oscillating systems and video sequences reveal that our disentangled model achieves improved longterm predictions, when the training data involve sequence-specific factors of variation such as different rotational speeds, calligraphic styles, and friction constants.
ICLR

Latent Neural ODEs with Sparse Bayesian Multiple Shooting

Valerii Iakovlev , Çağatay Yıldız , Markus Heinonen , and 1 more author

In International Conference on Learning Representations , 2023

Abs PDF

Training dynamic models, such as neural ODEs, on long trajectories is a hard problem that requires using various tricks, such as trajectory splitting, to make model training work in practice. These methods are often heuristics with poor theoretical justifications, and require iterative manual tuning. We propose a principled multiple shooting technique for neural ODEs that splits the trajectories into manageable short segments, which are optimised in parallel, while ensuring probabilistic control on continuity over consecutive segments. We derive variational inference for our shooting-based latent neural ODE models and propose amortized encodings of irregularly sampled trajectories with a transformer-based recognition network with temporal attention and relative positional encoding. We demonstrate efficient and stable training, and state-of-the-art performance on multiple large-scale benchmark datasets.

2022

NeurIPS workshop

Latent GP-ODEs with Informative Priors

Ilze Amanda Auzina , Çağatay Yıldız , and Efstratios Gavves

In A causal view on dynamical systems workshop , 2022

Abs PDF

For many complex systems the parametric form of the differential equation might be unknown or infeasible to determine. Earlier works have explored to model the unknown ODE system with a Gaussian Process model, however, the application has been limited to a low dimensional data setting. We propose a novel framework by combining a generative and a Bayesian nonparametric model. Our model learns a physically meaningful latent representation (position, momentum) and solves in the latent space an ODE system. The use of GP allows us to account for uncertainty as well as to extend our work with informative priors. We demonstrate our framework on an image rotation dataset. The method demonstrates its ability to learn dynamics from high dimensional data and we obtain state-of-the-art performance compared to earlier GP-based ODEs models on dynamic forecasting.
NeurIPS

Learning Interacting Dynamical Systems with Latent Gaussian Process ODEs

Çağatay Yıldız , Melih Kandemir , and Barbara Rakitsch

In Advances in Neural Information Processing Systems , 2022

Abs PDF Code

We study uncertainty-aware modeling of continuous-time dynamics of interacting objects. We introduce a new model that decomposes independent dynamics of single objects accurately from their interactions. By employing latent Gaussian process ordinary differential equations, our model infers both independent dynamics and their interactions with reliable uncertainty estimates. In our formulation, each object is represented as a graph node and interactions are modeled by accumulating the messages coming from neighboring objects. We show that efficient inference of such a complex network of variables is possible with modern variational sparse Gaussian process inference techniques. We empirically demonstrate that our model improves the reliability of long-term predictions over neural network based alternatives and it successfully handles missing dynamic or static information. Furthermore, we observe that only our model can successfully encapsulate independent dynamics and interaction information in distinct functions and show the benefit from this disentanglement in extrapolation scenarios.
UAI

Variational multiple shooting for Bayesian ODEs with Gaussian processes

Pashupati Hedge , Çağatay Yıldız , Harri Lahdesmaki , and 2 more authors

In Uncertainty in Artificial Intelligence , 2022

Abs PDF Code

Recent machine learning advances have proposed black-box estimation of unknown continuous-time system dynamics directly from data. However, earlier works are based on approximative ODE solutions or point estimates. We propose a novel Bayesian nonparametric model that uses Gaussian processes to infer posteriors of unknown ODE systems directly from data. We derive sparse variational inference with decoupled functional sampling to represent vector field posteriors. We also introduce a probabilistic shooting augmentation to enable efficient inference from arbitrarily long trajectories. The method demonstrates the benefit of computing vector field posteriors, with predictive uncertainty scores outperforming alternative methods on multiple ODE learning tasks.
thesis

Differential Equations for Machine Learning

Çağatay Yıldız

2022

HTML PDF

2021

ICML

Continuous-time Model-based Reinforcement Learning

Çağatay Yıldız , Markus Heinonen , and Harri Lahdesmaki

In International Conference on Machine Learning , 2021

Abs PDF Code

Model-based reinforcement learning (MBRL) approaches rely on discrete-time state transition models whereas physical systems and the vast majority of control tasks operate in continuous-time. To avoid time-discretization approximation of the underlying process, we propose a continuous-time MBRL framework based on a novel actor-critic method. Our approach also infers the unknown state evolution differentials with Bayesian neural ordinary differential equations (ODE) to account for epistemic uncertainty. We implement and test our method on a new ODE-RL suite that explicitly solves continuous-time control systems. Our experiments illustrate that the model is robust against irregular and noisy data, is sample-efficient, and can solve control problems which pose challenges to discrete-time MBRL methods.

2019

NeurIPS

ODE2VAE: Deep generative second order ODEs with Bayesian neural networks

Çağatay Yıldız , Markus Heinonen , and Harri Lahdesmaki

In Advances in Neural Information Processing Systems , 2019

Abs HTML PDF Supp Code Poster

We present Ordinary Differential Equation Variational Auto-Encoder (ODE2VAE), a latent second order ODE model for high-dimensional sequential data. Leveraging the advances in deep generative models, ODE2VAE can simultaneously learn the embedding of high dimensional trajectories and infer arbitrarily complex continuous-time latent dynamics. Our model explicitly decomposes the latent space into momentum and position components and solves a second order ODE system, which is in contrast to recurrent neural network (RNN) based time series models and recently proposed black-box ODE techniques. In order to account for uncertainty, we propose probabilistic latent ODE dynamics parameterized by deep Bayesian neural networks. We demonstrate our approach on motion capture, image rotation, and bouncing balls datasets. We achieve state-of-the-art performance in long term motion prediction and imputation tasks.

2018

ICML

Learning unknown ODE models with Gaussian processes

Markus Heinonen , Çağatay Yıldız , Henrik Mannerström , and 2 more authors

In International Conference on Machine Learning , 2018

Abs HTML PDF Supp Code Poster

In conventional ODE modelling coefficients of an equation driving the system state forward in time are estimated. However, for many complex systems it is practically impossible to determine the equations or interactions governing the underlying dynamics. In these settings, parametric ODE model cannot be formulated. Here, we overcome this issue by introducing a novel paradigm of nonparametric ODE modelling that can learn the underlying dynamics of arbitrary continuous-time systems without prior knowledge. We propose to learn non-linear, unknown differential functions from state observations using Gaussian process vector fields within the exact ODE formalism. We demonstrate the model’s capabilities to infer dynamics from sparse data and to simulate the system forward into future.
ICML

Asynchronous Stochastic Quasi-Newton MCMC for Non-Convex Optimization

Umut Simsekli , Çağatay Yıldız , Than Huy Nguyen , and 2 more authors

In International Conference on Machine Learning , 2018

Abs HTML PDF Supp Poster

Recent studies have illustrated that stochastic gradient Markov Chain Monte Carlo techniques have a strong potential in non-convex optimization, where local and global convergence guarantees can be shown under certain conditions. By building up on this recent theory, in this study, we develop an asynchronous-parallel stochastic L-BFGS algorithm for non-convex optimization. The proposed algorithm is suitable for both distributed and shared-memory settings. We provide formal theoretical analysis and show that the proposed method achieves an ergodic convergence rate of O(1/\sqrtN) (N being the total number of iterations) and it can achieve a linear speedup under certain conditions. We perform several experiments on both synthetic and real datasets. The results support our theory and show that the proposed algorithm provides a significant speedup over the recently proposed synchronous distributed L-BFGS algorithm.
MLSP

Learning Stochastic Differential Equations With Gaussian Processes Without Gradient Matching

Çağatay Yıldız , Markus Heinonen , Jukka Intosalmi , and 2 more authors

In IEEE 28th International Workshop on Machine Learning for Signal Processing , 2018

Abs HTML PDF Code

We introduce a novel paradigm for learning non-parametric drift and diffusion functions for stochastic differential equation (SDE). The proposed model learns to simulate path distributions that match observations with non-uniform time increments and arbitrary sparseness, which is in contrast with gradient matching that does not optimize simulated responses. We formulate sensitivity equations for learning and demonstrate that our general stochastic distribution optimisation leads to robust and efficient learning of SDE systems.
Workshop

A Nonparametric Spatio-temporal SDE Model

Çağatay Yıldız , Markus Heinonen , and Harri Lähdesmäki

In NeurIPS Spatiotemporal Workshop , 2018

Abs PDF Poster

We propose a nonparametric spatio-temporal stochastic differential equation (SDE) model that can learn the underlying dynamics of arbitrary continuous-time systems without prior knowledge. We augment the input space of the drift function of an SDE with a temporal component to account for spatio-temporal patterns. The experiments on a real world data set demonstrate that the spatio-temporal model is better able to fit the data than the spatial model and also reduce the forecasting error.
DSP

A Bayesian change point model for detecting SIP-based DDoS attacks

Baris Kurt , Çağatay Yıldız , Taha Ceritli , and 2 more authors

In Digital Signal Processing , 2018

Abs HTML

Session Initiation Protocol (SIP), as one the most common signaling mechanism for Voice Over Internet Protocol (VoIP) applications, is a popular target for the flooding-based Distributed Denial of Service (DDoS) attacks. In this paper, we propose a DDoS attack detection framework based on the Bayesian multiple change model, which can detect different types of flooding attacks. Additionally, we propose a probabilistic SIP network simulation system that provides a test environment for network security tools.
SoftwareX

A real-time SIP network simulation and monitoring system

Çağatay Yıldız , Baris Kurt , Taha Ceritli , and 2 more authors

In SoftwareX , 2018

Abs HTML PDF Code

In this work we present a real time SIP network simulation and monitoring system. The SIP network simulator is based on a probabilistic generative model that mimics a social network of VoIP subscribers calling each other at random times. The monitoring system, installed at a SIP server, provides services for collecting network data and server statistics in real time. The system provides a robust framework for developing SIP network applications such as security monitors.

2017

SIU

A Bayesian change point model for epileptic seizure detection

Çağatay Yıldız , Haluk Bingol , Gulcin Irim-Celik , and 2 more authors

In 25th Signal Processing and Communications Applications Conference , 2017

Abs HTML PDF

Epilepsy is a chronic neurological disorder in which the normal pattern of neuronal activity in the brain becomes disturbed. Identification of the brain region that is abnormally active during an epileptic seizure is vital for epilepsy surgery. One way of achieving so is to collect electroencephalography (EEG) signals from epileptic people and then to identify the active region as a seizure occurs. In this work, we present a Bayesian change point model that detects when seizures occur. We applied our method to a data set that contains 48 EEG and electrocardiography (EKG) record pairs collected from epileptic people and observed that the model is able to detect all seizures.

2016

Workshop

A Dictionary Learning Based Anomaly Detection Method for Network Traffic Data

Taha Ceritli , Baris Kurt , Çağatay Yıldız , and 2 more authors

In ICML Anomaly Detection Workshop , 2016

Abs PDF Poster

In this paper we propose a dictionary learning scheme to extract network traffic pattern templates for different types of anomalies and the normal traffic via nonnegative matrix factorization. We employ Bayesian change point models on the representation of the running network traffic in terms of those templates to detect network anomalies. Our proposed methods are tested and evaluated on a simulated SIP network with attacks generated by a commercial network vulnerability scanning tool.
SIU

Attack Detection in VOIP Networks Using Bayesian Multiple Change-Point Models

Çağatay Yıldız , Taha Ceritli , Baris Kurt , and 2 more authors

In IEEE Singal Processing and Communications Applications , 2016

Abs HTML PDF

One of the most commonly used network protocols in Internet telephony services is SIP. As the popularity of the protocol increases, SIP networks have become targets of DDoS attacks more frequently. In this work, we propose a Bayesian change point model to detect anomalies due to such attacks. The model monitors the network and alarms when a change in the network traffic occurs. We test the model with a data set generated by network traffic and attack simulators.
SIU

A Probabilistic SIP Network Simulation System

Baris Kurt , Çağatay Yıldız , Taha Ceritli , and 4 more authors

In IEEE Singal Processing and Communications Applications , 2016

Abs PDF

Experimenting with large-scale real world data is crucial for the development of network protocol and investigate their performance. However, collecting such data from real networks, and especially to annotate them with ground truth proves to, if not impossible, too tedious. In such cases use of simulated data, generated for various network scenarios, becomes a plausible alternative. To this purpose, we have developed a SIP network simulation system that employs probabilistic network model and we have demonstrated its use for the performance analysis of statistical (D)DOS attack detectors.