Symbolic Regression: Towards Interpretability and Automated Scientific Discovery
Taxonomy
About this Tutorial
This tutorial provides a comprehensive exploration of Symbolic Regression, an emerging area of AI focused on discovering interpretable mathematical expressions from data. As AI systems become increasingly integrated into critical domains, the ability to uncover transparent, mathematical relationships is essential for advancing scientific understanding and developing trustworthy AI systems.
Recent advances in AI, particularly in deep learning and LLMs, have opened new paradigms in symbolic regression, enabling more sophisticated approaches to equation discovery and interpretation. These developments raise fundamental questions about how we can harness AI techniques to advance scientific understanding while maintaining interpretability.
Our tutorial is guided by the central question: “How can we leverage AI to discover meaningful mathematical expressions that advance scientific understanding while ensuring interpretability and trustworthiness?” We will explore this question through a comprehensive journey that covers:
- Foundations and Evolution: How has symbolic regression evolved from traditional search-based methods to modern AI-driven approaches? What are the key principles and challenges in discovering interpretable mathematical expressions?
- Modern Approaches: How do different paradigms - from evolutionary algorithms to transformer models and LLMs - contribute to equation discovery? How can we effectively combine these approaches?
- Evaluation and Benchmarking: What constitutes meaningful evaluation in symbolic regression? How do we design benchmarks that truly capture the ability to discover interpretable mathematical relationships?
- Impact: How can symbolic regression advance interpretable modeling and scientific discovery across different domains? What are the practical implications?
The tutorial is designed for researchers and practitioners in machine learning, AI, and scientific domains who seek to understand and contribute to the advancement of interpretable modeling. While familiarity with basic machine learning concepts is helpful, no prior experience with symbolic regression is required. Through this tutorial, attendees will gain both theoretical understanding and practical insights into symbolic regression, positioning them to contribute to this evolving field and its applications across science and industry.
Reading List
Introduction
- Interpretable scientific discovery with symbolic regression: a review
- Symbolic Regression is NP-hard
Methods
Search SR Methods
- Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl [Github]
- Gene-pool Optimal Mixing Evolutionary Algorithm for Genetic Programming (Evolutionary Computation’21) [Github]
- Symbolic Regression via Neural-Guided Genetic Programming Population Seeding (NeurIPS’21) [Github]
- Symbolic Physics Learner: Discovering governing equations via Monte Carlo tree search (ICLR’23) [Github]
- AI Feynman: A physics-inspired method for symbolic regression (Science Advances) [Github]
- Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients (ICLR’21) [Github]
Learning SR Methods
- Neural Symbolic Regression that scales (ICML’21) [Github]
- End-to-end Symbolic Regression with Transformers (NeurIPS’22) [Github]
- SymFormer: End-to-end symbolic regression using transformer-based architecture [Github]
- SymbolicGPT: A Generative Transformer Model for Symbolic Regression [Github]
- SNIP: Bridging Mathematical Symbolic and Numeric Realms with Unified Pre-training (ICLR’24) [Github]
Learning + Search SR Methods
- Transformer-based Planning for Symbolic Regression (NeurIPS’23) [Github]
- A Unified Framework for Deep Symbolic Regression (NeurIPS’22) [Github]
- Deep Generative Symbolic Regression (ICLR’23) [Github]
- Efficient Generator of Mathematical Expressions for Symbolic Regression (Machine Learning’23) [Github]
- SNIP: Bridging Mathematical Symbolic and Numeric Realms with Unified Pre-training (ICLR’24) [Github]
LLM-guided SR Methods
- LLM-SR: Scientific Equation Discovery via Programming with Large Language Models (ICLR’25) [Github]
- In-Context Symbolic Regression: Leveraging Large Language Models for Function Discovery (ACL’24) [Github]
- Symbolic Regression with a Learned Concept Library (NeurIPS’24) [Github]
Benchmarks
- Contemporary Symbolic Regression Methods and their Relative Performance (NeurIPS’21) [Github]
- Rethinking Symbolic Regression Datasets and Benchmarks for Scientific Discovery (DMLR’24) [Github]