04 July 2025

SymMatika:

Structure-Aware Symbolic Discovery

Preprint 2025

Michael Scherk
Michael Scherk Duke University www.linkedin.com/in/michael-scherk
Boyuan Chen
Boyuan Chen Duke University boyuanchen.com

Overview

Symbolic regression (SR) seeks to recover closed-form mathematical expressions that describe observed data. While existing methods have advanced the discovery of either explicit mappings (i.e., $y = f(x)$) or discovering implicit relations (i.e., $F(x, y)=0$), few modern and accessible frameworks support both. Moreover, most approaches treat each expression candidate in isolation, without reusing recurring structural patterns that could accelerate search. We introduce SymMatika, a hybrid SR algorithm that combines multi-island genetic programming (GP) with a reusable motif library inspired by biological sequence analysis. SymMatika identifies high-impact substructures in top-performing candidates and reintroduces them to guide future generations. Additionally, it incorporates a feedback-driven evolutionary engine and supports both explicit and implicit relation discovery using implicit-derivative metrics. Across benchmarks, SymMatika achieves state-of-the-art recovery rates, achieving 5.1\% higher performance than the previous best results on Nguyen, the first recovery of Nguyen-12, and competitive performance on the Feynman equations. It also recovers implicit physical laws from Eureqa datasets up to 100 times faster. Our results demonstrate the power of structure-aware evolutionary search for scientific discovery. To support broader research in interpretable modeling and symbolic discovery, we have open-sourced the full SymMatika framework.

Video (Click to YouTube)

Video Figure

Paper

Check out our paper linked here.

Codebase

Check out our codebase at https://github.com/generalroboticslab/SymMatika for code, instructions, terminal usage, and GUI usage.

Citation

@misc{scherk2025symmatikastructureawaresymbolicdiscovery,
      title={SymMatika: Structure-Aware Symbolic Discovery}, 
      author={Michael Scherk and Boyuan Chen},
      year={2025},
      eprint={2507.03110},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2507.03110}, 
}

Acknowledgment

This work is supported by ARO under award W911NF2410405, DARPA FoundSci program under award HR00112490372, DARPA TIAMAT program under award HR00112490419, and by gift supports from BMW and OpenAI.

Categories

AI for Science Data Driven Dynamical System