publications
publications by categories in reversed chronological order. generated by jekyll-scholar.
2025
- ICLRIdentifying latent state transitions in non-linear dynamical systemsÇağlar Hızlı , Çağatay Yıldız , Matthias Bethge , and 2 more authorsIn International Conference on Learning Representations , 2025
This work aims to recover the underlying states and their time evolution in a latent dynamical system from high-dimensional sensory measurements. Previous works on identifiable representation learning in dynamical systems focused on identifying the latent states, often with linear transition approximations. As such, they cannot identify nonlinear transition dynamics, and hence fail to reliably predict complex future behavior. Inspired by the advances in nonlinear ICA, we propose a state-space modeling framework in which we can identify not just the latent states but also the unknown transition function that maps the past states to the present. Our identifiability theory relies on two key assumptions: (i) sufficient variability in the latent noise, and (ii) the bijectivity of the augmented transition function. Drawing from this theory, we introduce a practical algorithm based on variational auto-encoders. We empirically demonstrate that it improves generalization and interpretability of target dynamical systems by (i) recovering latent state dynamics with high accuracy, (ii) correspondingly achieving high future prediction accuracy, and (iii) adapting fast to new environments. Additionally, for complex real-world dynamics, (iv) it produces state-of the-art future prediction results for long horizons, highlighting its usefulness for practical scenarios.
- TMLRInvestigating Continual Pretraining in Large Language Models: Insights and ImplicationsÇağatay Yıldız , Nishaanth Kanna Ravichandran , Nitin Sharma , and 2 more authorsIn Transactions on Machine Learning Research , 2025
Continual learning (CL) in large language models (LLMs) is an evolving domain that focuses on developing efficient and sustainable training strategies to adapt models to emerging knowledge and achieve robustness in dynamic environments. Our primary emphasis is on continual domain-adaptive pretraining, a process designed to equip LLMs with the ability to integrate new information from various domains while retaining previously learned knowledge. Since existing works concentrate mostly on continual fine-tuning for a limited selection of downstream tasks or training domains, we introduce a new benchmark designed to measure the adaptability of LLMs to changing pretraining data landscapes. We further examine the impact of model size on learning efficacy and forgetting, as well as how the progression and similarity of emerging domains affect the knowledge transfer within these models. Our findings uncover several key insights: (i) continual pretraining consistently improves <1.5B models studied in this work and is also superior to domain adaptation, (ii) larger models always achieve better perplexity than smaller ones when continually pretrained on the same corpus, (iii) smaller models are particularly sensitive to continual pretraining, showing the most significant rates of both learning and forgetting, (iv) continual pretraining boosts downstream task performance of GPT-2 family, (v) continual pretraining enables LLMs to specialize better when the sequence of domains shows semantic similarity while randomizing training domains leads to better transfer and final performance otherwise. We posit that our research establishes a new benchmark for CL in LLMs, providing a more realistic evaluation of knowledge retention and transfer across diverse domains.
- CDCOptimal Control of Probabilistic Dynamics Models via Mean Hamiltonian MinimizationDavid Leeftink , Çağatay Yıldız , Steffen Ridderbusch , and 2 more authorsIn IEEE 64th Conference on Decision and Control , 2025
Without exact knowledge of the true system dynamics, optimal control of non-linear continuous-time systems requires careful treatment under epistemic uncertainty. In this work, we translate a probabilistic interpretation of the Pontryagin maximum principle to the challenge of optimal control with learned probabilistic dynamics models. Our framework provides a principled treatment of epistemic uncertainty by minimizing the mean Hamiltonian with respect to a posterior distribution over the system dynamics. We propose a multiple shooting numerical method that leverages mean Hamiltonian minimization and is scalable to large-scale probabilistic dynamics models, including ensemble neural ordinary differential equations. Comparisons against other baselines in online and offline model-based reinforcement learning tasks show that our probabilistic Hamiltonian approach leads to reduced trial costs in offline settings and achieves competitive performance in online scenarios. By bridging optimal control and reinforcement learning, our approach offers a principled and practical framework for controlling uncertain systems with learned dynamics.
- preprintFrom Raw Corpora to Domain Benchmarks: Automated Evaluation of LLM Domain ExpertiseNitin Sharma , Thomas Wolfers , and Çağatay YıldızIn arXiv , 2025
Accurate domain-specific benchmarking of LLMs is essential, specifically in domains with direct implications for humans, such as law, healthcare, and education. However, existing benchmarks are documented to be contaminated and are based on multiple-choice questions, which suffer from inherent biases. To measure domain-specific knowledge in LLMs, we present a deterministic pipeline that transforms raw domain corpora into completion-style benchmarks without relying on other LLMs or costly human annotation. Our approach first extracts domain-specific keywords and related target vocabulary from an input corpus. It then constructs prompt-target pairs where domain-specific words serve as prediction targets. By measuring LLMs’ ability to complete these prompts, we provide a direct assessment of domain knowledge at low computational cost. Our pipeline avoids benchmark contamination, enables automated updates with new domain data, and facilitates fair comparisons between base and instruction-tuned (chat) models. We validate our approach by showing that model performances on our benchmark significantly correlate with those on an expert-curated benchmark. We then demonstrate how our benchmark provides insights into knowledge acquisition in domain-adaptive, continual, and general pretraining. Finally, we examine the effects of instruction fine-tuning by comparing base and chat models within our unified evaluation framework. In conclusion, our pipeline enables scalable, domain-specific, LLM-independent, and unbiased evaluation of both base and chat models.
- preprintObject-level Self-Distillation for Vision PretrainingÇağlar Hızlı , Çağatay Yıldız , and Pekka MarttinenIn arXiv , 2025
State-of-the-art vision pretraining methods rely on image-level self-distillation from object-centric datasets such as ImageNet, implicitly assuming each image contains a single object. This assumption does not always hold: many ImageNet images already contain multiple objects. Further, it limits scalability to scene-centric datasets that better mirror real-world complexity. We address these challenges by introducing Object-level Self-Distillation (ODIS), a pretraining approach that shifts the self-distillation granularity from whole images to individual objects. Using object-aware cropping and masked attention, ODIS isolates object-specific regions, guiding the transformer toward semantically meaningful content and transforming a noisy, scene-level task into simpler object-level sub-tasks. We show that this approach improves visual representations both at the image and patch levels. Using masks at inference time, our method achieves an impressive 82.6% k-NN accuracy on ImageNet1k with ViT-Large.
2024
- EMNLPAdaptation Odyssey in LLMs: Why Does Additional Pretraining Sometimes Fail to Improve?Fırat Öncel , Matthias Bethge , Beyza Ermis , and 3 more authorsIn Empirical Methods in Natural Language Processing , 2024
We investigate why further training on pretrained language models sometimes backfires. Our key finding is that training a model on a text domain could degrade its perplexity on the test portion of the same domain. We discover that this performance degradation is positively correlated with the similarity between the additional and the original pretraining dataset of the LLM, and stems from a small set of uninformative tokens. Our work aims to clarify when model adaptation helps versus when foundational capabilities should be preserved.
- CoLLAsInfinite dSprites for Disentangled Continual Learning: Separating Memory Edits from GeneralizationSebastian Dziadzio , Çağatay Yıldız , Gido Ven , and 3 more authorsIn Lifelong Learning Agents , 2024
The ability of machine learning systems to learn continually is hindered by catastrophic forgetting, the tendency of neural networks to overwrite existing knowledge when learning a new task. Continual learning methods alleviate this problem through regularization, parameter isolation, or rehearsal, but they are typically evaluated on benchmarks comprising only a handful of tasks. In contrast, humans are able to learn continually in dynamic, open-world environments, effortlessly achieving one-shot memorization of unfamiliar objects and reliably recognizing them under various transformations. To make progress towards closing this gap, we introduce Infinite dSprites, a parsimonious tool for creating continual classification and disentanglement benchmarks of arbitrary length and with full control over generative factors. We show that over a sufficiently long time horizon, the performance of all major types of continual learning methods deteriorates on this simple benchmark. Thus, Infinite dSprites highlights an important aspect of continual learning that has not received enough attention so far: given a finite modelling capacity and an arbitrarily long learning horizon, efficient learning requires memorizing class-specific information and accumulating knowledge about general mechanisms. In a simple setting with direct supervision on the generative factors, we show how learning class-agnostic transformations offers a way to circumvent catastrophic forgetting and improve classification accuracy over time. Our approach sets the stage for continual learning over hundreds of tasks with explicit control over memorization and forgetting, emphasizing open-set classification and one-shot generalization.
2023
- NeurIPSInvariant Neural Ordinary Differential EquationsIlze Amanda Auzina , Çağatay Yıldız , Sara Magliacane , and 2 more authorsIn Advances in Neural Information Processing Systems , 2023
Latent neural ordinary differential equations have been proven useful for learning non-linear dynamics of arbitrary sequences. In contrast with their mechanistic counterparts, the predictive accuracy of neural ODEs decreases over longer prediction horizons (Rubanova et al., 2019). To mitigate this issue, we propose disentangling dynamic states from time-invariant variables in a completely data-driven way, enabling robust neural ODE models that can generalize across different settings. We show that such variables can control the latent differential function and/or parameterize the mapping from latent variables to observations. By explicitly modeling the time-invariant variables, our framework enables the use of recent advances in representation learning. We demonstrate this by introducing a straightforward self-supervised objective that enhances the learning of these variables. The experiments on low-dimensional oscillating systems and video sequences reveal that our disentangled model achieves improved longterm predictions, when the training data involve sequence-specific factors of variation such as different rotational speeds, calligraphic styles, and friction constants.
- ICLRLatent Neural ODEs with Sparse Bayesian Multiple ShootingValerii Iakovlev , Çağatay Yıldız , Markus Heinonen , and 1 more authorIn International Conference on Learning Representations , 2023
Training dynamic models, such as neural ODEs, on long trajectories is a hard problem that requires using various tricks, such as trajectory splitting, to make model training work in practice. These methods are often heuristics with poor theoretical justifications, and require iterative manual tuning. We propose a principled multiple shooting technique for neural ODEs that splits the trajectories into manageable short segments, which are optimised in parallel, while ensuring probabilistic control on continuity over consecutive segments. We derive variational inference for our shooting-based latent neural ODE models and propose amortized encodings of irregularly sampled trajectories with a transformer-based recognition network with temporal attention and relative positional encoding. We demonstrate efficient and stable training, and state-of-the-art performance on multiple large-scale benchmark datasets.
2022
- NeurIPS workshopLatent GP-ODEs with Informative PriorsIlze Amanda Auzina , Çağatay Yıldız , and Efstratios GavvesIn A causal view on dynamical systems workshop , 2022
For many complex systems the parametric form of the differential equation might be unknown or infeasible to determine. Earlier works have explored to model the unknown ODE system with a Gaussian Process model, however, the application has been limited to a low dimensional data setting. We propose a novel framework by combining a generative and a Bayesian nonparametric model. Our model learns a physically meaningful latent representation (position, momentum) and solves in the latent space an ODE system. The use of GP allows us to account for uncertainty as well as to extend our work with informative priors. We demonstrate our framework on an image rotation dataset. The method demonstrates its ability to learn dynamics from high dimensional data and we obtain state-of-the-art performance compared to earlier GP-based ODEs models on dynamic forecasting.
- NeurIPSLearning Interacting Dynamical Systems with Latent Gaussian Process ODEsÇağatay Yıldız , Melih Kandemir , and Barbara RakitschIn Advances in Neural Information Processing Systems , 2022
We study uncertainty-aware modeling of continuous-time dynamics of interacting objects. We introduce a new model that decomposes independent dynamics of single objects accurately from their interactions. By employing latent Gaussian process ordinary differential equations, our model infers both independent dynamics and their interactions with reliable uncertainty estimates. In our formulation, each object is represented as a graph node and interactions are modeled by accumulating the messages coming from neighboring objects. We show that efficient inference of such a complex network of variables is possible with modern variational sparse Gaussian process inference techniques. We empirically demonstrate that our model improves the reliability of long-term predictions over neural network based alternatives and it successfully handles missing dynamic or static information. Furthermore, we observe that only our model can successfully encapsulate independent dynamics and interaction information in distinct functions and show the benefit from this disentanglement in extrapolation scenarios.
- UAIVariational multiple shooting for Bayesian ODEs with Gaussian processesPashupati Hedge , Çağatay Yıldız , Harri Lahdesmaki , and 2 more authorsIn Uncertainty in Artificial Intelligence , 2022
Recent machine learning advances have proposed black-box estimation of unknown continuous-time system dynamics directly from data. However, earlier works are based on approximative ODE solutions or point estimates. We propose a novel Bayesian nonparametric model that uses Gaussian processes to infer posteriors of unknown ODE systems directly from data. We derive sparse variational inference with decoupled functional sampling to represent vector field posteriors. We also introduce a probabilistic shooting augmentation to enable efficient inference from arbitrarily long trajectories. The method demonstrates the benefit of computing vector field posteriors, with predictive uncertainty scores outperforming alternative methods on multiple ODE learning tasks.
2021
- ICMLContinuous-time Model-based Reinforcement LearningÇağatay Yıldız , Markus Heinonen , and Harri LahdesmakiIn International Conference on Machine Learning , 2021
Model-based reinforcement learning (MBRL) approaches rely on discrete-time state transition models whereas physical systems and the vast majority of control tasks operate in continuous-time. To avoid time-discretization approximation of the underlying process, we propose a continuous-time MBRL framework based on a novel actor-critic method. Our approach also infers the unknown state evolution differentials with Bayesian neural ordinary differential equations (ODE) to account for epistemic uncertainty. We implement and test our method on a new ODE-RL suite that explicitly solves continuous-time control systems. Our experiments illustrate that the model is robust against irregular and noisy data, is sample-efficient, and can solve control problems which pose challenges to discrete-time MBRL methods.
2019
- NeurIPSODE2VAE: Deep generative second order ODEs with Bayesian neural networksÇağatay Yıldız , Markus Heinonen , and Harri LahdesmakiIn Advances in Neural Information Processing Systems , 2019
We present Ordinary Differential Equation Variational Auto-Encoder (ODE2VAE), a latent second order ODE model for high-dimensional sequential data. Leveraging the advances in deep generative models, ODE2VAE can simultaneously learn the embedding of high dimensional trajectories and infer arbitrarily complex continuous-time latent dynamics. Our model explicitly decomposes the latent space into momentum and position components and solves a second order ODE system, which is in contrast to recurrent neural network (RNN) based time series models and recently proposed black-box ODE techniques. In order to account for uncertainty, we propose probabilistic latent ODE dynamics parameterized by deep Bayesian neural networks. We demonstrate our approach on motion capture, image rotation, and bouncing balls datasets. We achieve state-of-the-art performance in long term motion prediction and imputation tasks.
2018
- ICMLLearning unknown ODE models with Gaussian processesMarkus Heinonen , Çağatay Yıldız , Henrik Mannerström , and 2 more authorsIn International Conference on Machine Learning , 2018
In conventional ODE modelling coefficients of an equation driving the system state forward in time are estimated. However, for many complex systems it is practically impossible to determine the equations or interactions governing the underlying dynamics. In these settings, parametric ODE model cannot be formulated. Here, we overcome this issue by introducing a novel paradigm of nonparametric ODE modelling that can learn the underlying dynamics of arbitrary continuous-time systems without prior knowledge. We propose to learn non-linear, unknown differential functions from state observations using Gaussian process vector fields within the exact ODE formalism. We demonstrate the model’s capabilities to infer dynamics from sparse data and to simulate the system forward into future.
- ICMLAsynchronous Stochastic Quasi-Newton MCMC for Non-Convex OptimizationUmut Simsekli , Çağatay Yıldız , Than Huy Nguyen , and 2 more authorsIn International Conference on Machine Learning , 2018
Recent studies have illustrated that stochastic gradient Markov Chain Monte Carlo techniques have a strong potential in non-convex optimization, where local and global convergence guarantees can be shown under certain conditions. By building up on this recent theory, in this study, we develop an asynchronous-parallel stochastic L-BFGS algorithm for non-convex optimization. The proposed algorithm is suitable for both distributed and shared-memory settings. We provide formal theoretical analysis and show that the proposed method achieves an ergodic convergence rate of O(1/\sqrtN) (N being the total number of iterations) and it can achieve a linear speedup under certain conditions. We perform several experiments on both synthetic and real datasets. The results support our theory and show that the proposed algorithm provides a significant speedup over the recently proposed synchronous distributed L-BFGS algorithm.
- MLSPLearning Stochastic Differential Equations With Gaussian Processes Without Gradient MatchingÇağatay Yıldız , Markus Heinonen , Jukka Intosalmi , and 2 more authorsIn IEEE 28th International Workshop on Machine Learning for Signal Processing , 2018
We introduce a novel paradigm for learning non-parametric drift and diffusion functions for stochastic differential equation (SDE). The proposed model learns to simulate path distributions that match observations with non-uniform time increments and arbitrary sparseness, which is in contrast with gradient matching that does not optimize simulated responses. We formulate sensitivity equations for learning and demonstrate that our general stochastic distribution optimisation leads to robust and efficient learning of SDE systems.
- WorkshopA Nonparametric Spatio-temporal SDE ModelÇağatay Yıldız , Markus Heinonen , and Harri LähdesmäkiIn NeurIPS Spatiotemporal Workshop , 2018
We propose a nonparametric spatio-temporal stochastic differential equation (SDE) model that can learn the underlying dynamics of arbitrary continuous-time systems without prior knowledge. We augment the input space of the drift function of an SDE with a temporal component to account for spatio-temporal patterns. The experiments on a real world data set demonstrate that the spatio-temporal model is better able to fit the data than the spatial model and also reduce the forecasting error.
- DSPA Bayesian change point model for detecting SIP-based DDoS attacksBaris Kurt , Çağatay Yıldız , Taha Ceritli , and 2 more authorsIn Digital Signal Processing , 2018
Session Initiation Protocol (SIP), as one the most common signaling mechanism for Voice Over Internet Protocol (VoIP) applications, is a popular target for the flooding-based Distributed Denial of Service (DDoS) attacks. In this paper, we propose a DDoS attack detection framework based on the Bayesian multiple change model, which can detect different types of flooding attacks. Additionally, we propose a probabilistic SIP network simulation system that provides a test environment for network security tools.
- SoftwareXA real-time SIP network simulation and monitoring systemÇağatay Yıldız , Baris Kurt , Taha Ceritli , and 2 more authorsIn SoftwareX , 2018
In this work we present a real time SIP network simulation and monitoring system. The SIP network simulator is based on a probabilistic generative model that mimics a social network of VoIP subscribers calling each other at random times. The monitoring system, installed at a SIP server, provides services for collecting network data and server statistics in real time. The system provides a robust framework for developing SIP network applications such as security monitors.
2017
- SIUA Bayesian change point model for epileptic seizure detectionÇağatay Yıldız , Haluk Bingol , Gulcin Irim-Celik , and 2 more authorsIn 25th Signal Processing and Communications Applications Conference , 2017
Epilepsy is a chronic neurological disorder in which the normal pattern of neuronal activity in the brain becomes disturbed. Identification of the brain region that is abnormally active during an epileptic seizure is vital for epilepsy surgery. One way of achieving so is to collect electroencephalography (EEG) signals from epileptic people and then to identify the active region as a seizure occurs. In this work, we present a Bayesian change point model that detects when seizures occur. We applied our method to a data set that contains 48 EEG and electrocardiography (EKG) record pairs collected from epileptic people and observed that the model is able to detect all seizures.
2016
- WorkshopA Dictionary Learning Based Anomaly Detection Method for Network Traffic DataTaha Ceritli , Baris Kurt , Çağatay Yıldız , and 2 more authorsIn ICML Anomaly Detection Workshop , 2016
In this paper we propose a dictionary learning scheme to extract network traffic pattern templates for different types of anomalies and the normal traffic via nonnegative matrix factorization. We employ Bayesian change point models on the representation of the running network traffic in terms of those templates to detect network anomalies. Our proposed methods are tested and evaluated on a simulated SIP network with attacks generated by a commercial network vulnerability scanning tool.
- SIUAttack Detection in VOIP Networks Using Bayesian Multiple Change-Point ModelsÇağatay Yıldız , Taha Ceritli , Baris Kurt , and 2 more authorsIn IEEE Singal Processing and Communications Applications , 2016
One of the most commonly used network protocols in Internet telephony services is SIP. As the popularity of the protocol increases, SIP networks have become targets of DDoS attacks more frequently. In this work, we propose a Bayesian change point model to detect anomalies due to such attacks. The model monitors the network and alarms when a change in the network traffic occurs. We test the model with a data set generated by network traffic and attack simulators.
- SIUA Probabilistic SIP Network Simulation SystemBaris Kurt , Çağatay Yıldız , Taha Ceritli , and 4 more authorsIn IEEE Singal Processing and Communications Applications , 2016
Experimenting with large-scale real world data is crucial for the development of network protocol and investigate their performance. However, collecting such data from real networks, and especially to annotate them with ground truth proves to, if not impossible, too tedious. In such cases use of simulated data, generated for various network scenarios, becomes a plausible alternative. To this purpose, we have developed a SIP network simulation system that employs probabilistic network model and we have demonstrated its use for the performance analysis of statistical (D)DOS attack detectors.