Highlights:

  • New Python library ‘gridfm-datakit-v1’ enables scalable and realistic generation of Power Flow (PF) and Optimal Power Flow (OPF) datasets.
  • Developed by a global team of 14 researchers including Alban Puech and Matteo Mazzonelli.
  • Addresses key limitations in existing PF and OPF datasets such as lack of realistic perturbations and cost variability.
  • Supports large-scale power grids up to 10,000 buses with stochastic load modeling and topology perturbations.

TLDR:

gridfm-datakit-v1 is a new open-source Python library that generates realistic and scalable datasets for power systems. It overcomes limitations in existing tools, helping machine learning models better understand diverse grid conditions and improve optimization of energy networks.

A team of international researchers has unveiled **gridfm-datakit-v1**, an open-source Python library designed to revolutionize how Power Flow (PF) and Optimal Power Flow (OPF) datasets are generated. The work, led by **Alban Puech**, **Matteo Mazzonelli**, **Celia Cintas**, **Tamara R. Govindasamy**, **Mangaliso Mngomezulu**, **Jonas Weiss**, **Matteo Baù**, **Anna Varbella**, **François Mirallès**, **Kibaek Kim**, **Le Xie**, **Hendrik F. Hamann**, **Etienne Vos**, and **Thomas Brunschwiler**, directly addresses critical shortcomings in existing grid data generation methods. Their study, published on [arXiv](https://arxiv.org/abs/2512.14658), focuses on improving how machine learning (ML) solvers are trained and tested in the energy systems domain.

Traditional power system datasets often suffer from limited realism and diversity—issues that can significantly impact the generalization ability of ML algorithms. Existing data collections are typically constrained to OPF-feasible points, meaning they do not account for scenarios where grid operations exceed safe limits, such as voltage violations or branch overloads. Furthermore, fixed generator cost assumptions reduce the adaptability of ML models trained on such data. **gridfm-datakit-v1** resolves these problems by supporting stochastic variations and N-k topology perturbations, combining global load scaling based on real-world profiles with localized noise injection. This generates a richer set of training scenarios that reflect the unpredictability of real-world power systems.

Technically, the library scales efficiently to extremely large grids—handling systems with up to 10,000 buses. It can create datasets that go beyond operating limits, offering ML practitioners a broader spectrum of training cases that include grid failures and operational violations. For OPF datasets, gridfm-datakit-v1 introduces variable generator cost functions, enhancing the adaptability of models across different system cost structures. The software includes comparisons with major existing libraries such as **OPFData**, **OPF-Learn**, **PGLearn**, and **PFΔ**, demonstrating superior diversity and scalability. The library is licensed under Apache 2.0, available through GitHub and easily installable using `pip install gridfm-datakit`, making it accessible to researchers and engineers looking to develop intelligent, data-driven power system solutions.

The release of gridfm-datakit-v1 represents a pivotal advancement for the intersection of artificial intelligence and energy systems engineering. By enabling more diverse, realistic, and scalable dataset generation, the library is expected to accelerate the development of robust ML-based solvers, contributing to smarter and more resilient energy infrastructures globally.

Source:

Source:

https://arxiv.org/abs/2512.14658

Leave a Reply

Your email address will not be published. Required fields are marked *