Research Article Open Access

A Multi-Split Cross-Strategy for Enhancing Machine Learning Algorithms Prediction Results with Data Generated by Conditional Generative Adversarial Network

Abdelfattah Abassi1, Brahim Bakkas2,3, Mostapha El Jai2,4, Ahmed Arid2 and Hussain Benazza2
  • 1 Department of Computer Science, Ecole Nationale Supérieure d'Arts et Métiers (ENSAM-MEKNES), Moulay Ismail University, Meknes, Morocco
  • 2 Department of Computer Science, Ecole Nationale Supérieure d'Arts et Métiers (ENSAM-MEKNES), Moulay Ismail University, Meknes, Morocco
  • 3 Department of Computer Science, Regional Center for Teaching and Training Professions, Meknes, Morocco
  • 4 Euromed Center of Research, Euromed Polytechnic School, Euromed University, FEZ, Morocco

Abstract

In this study, we present a Multi-Split Cross-Strategy (MSC-Strategy) designed to leverage synthetic tabular data generated by a Conditional Generative Adversarial Network (CGAN). Our study aims to investigate the potential of synthetic data in comparison to real-world data for improving machine learning predictive results. Firstly, we develop a CGAN architecture tailored to generate synthetic tabular data, trained on a comprehensive real-world dataset. Secondly, we validate the synthetic data generated by the CGAN to ensure its statistical fidelity and resemblance to the distribution of real data. Finally, we selectively leverage a subset of the generated data and apply our strategy to create a new combined training set comprising the training set of real data and the chosen subset of generated data. To validate our approach, we employ six diverse regression models: Decision Tree (DT), K-Nearest Neighbors (KNN), Random Forest (RF), XGB Regressor (XGB), and Support Vector Regressor (SVR). Each model is trained and tested using a training set of real data, generated data, combined data (training set of real data and generated data), and data formed by our MSC strategy. Our findings indicate that the training set formed by our MSC strategy demonstrates remarkable predictive performance compared to real-world data and generated data, highlighting its ability to enhance the prediction of machine learning models using only a subset of generated data.

Journal of Computer Science
Volume 20 No. 7, 2024, 700-707

DOI: https://doi.org/10.3844/jcssp.2024.700.707

Submitted On: 7 February 2024 Published On: 9 April 2024

How to Cite: Abassi, A., Bakkas, B., Jai, M. E., Arid, A. & Benazza, H. (2024). A Multi-Split Cross-Strategy for Enhancing Machine Learning Algorithms Prediction Results with Data Generated by Conditional Generative Adversarial Network. Journal of Computer Science, 20(7), 700-707. https://doi.org/10.3844/jcssp.2024.700.707

  • 2,119 Views
  • 842 Downloads
  • 0 Citations

Download

Keywords

  • Conditional Generative Adversarial Networks
  • Tabular Data Generation
  • Machine Learning