Journal Home Online First Current Issue Archive For Authors Journal Information 中文版

Frontiers of Information Technology & Electronic Engineering >> 2024, Volume 25, Issue 9 doi: 10.1631/FITEE.2300278

A novel overlapping minimization SMOTE algorithm for imbalanced classification

Affiliation(s): Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ), Shenzhen 518107, China; College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China; less

Received: 2023-04-21 Accepted: 2024-06-29 Available online: 2024-06-29

Next Previous

Abstract

The is a popular algorithm to reduce the impact of class imbalance in building classifiers, and has received several enhancements over the past 20 years. SMOTE and its variants synthesize a number of s in the original sample space to alleviate the adverse effects of class imbalance. This approach works well in many cases, but problems arise when synthetic sample points are generated in overlapping areas between different classes, which further complicates classifier training. To address this issue, this paper proposes a novel generalization-oriented rather than imputation-oriented generation algorithm, named SMOTE (OM-SMOTE). This algorithm is designed specifically for binary problems. OM-SMOTE first maps the original sample points into a new sample space by balancing sample encoding and classifier generalization. Then, OM-SMOTE employs a set of sophisticated imputation rules to generate synthetic sample points that are as far as possible from overlapping areas between classes. Extensive experiments have been conducted on 32 imbalanced datasets to validate the effectiveness of OM-SMOTE. Results show that using OM-SMOTE to generate synthetic s leads to better classifier training performances for the naive Bayes, support vector machine, decision tree, and logistic regression classifiers than the 11 state-of-the-art SMOTE-based imputation algorithms. This demonstrates that OM-SMOTE is a viable approach for supporting the training of high-quality classifiers for . The implementation of OM-SMOTE is shared publicly on the GitHub platform at https://github.com/luxuan123123/OM-SMOTE/.

Related Research