Semi-supervised feature selection based on fuzzy related family

Authors: Zhijun Guo, Yang Shen, Tian Yang, Yuan-Jiang Li, Yanfang Deng, Yuhua Qian

Abstract:

Current machine learning algorithms encounter challenges such as missing labels and high dimensionality. Feature selection serves as an effective dimensionality reduction technique, enhancing the efficiency and accuracy of subsequent machine learning tasks by eliminating irrelevant and redundant features. Given the difficulty in obtaining fully labeled data, partially labeled data has become a crucial target for machine learning models to address. The related family is an efficient, rough set-based feature selection approach; however, it cannot be applied to semi-supervised learning tasks. Consequently, this paper introduces a semi-supervised feature selection method based on a fuzzy related family for partially labeled data. At first, the fuzzy label values of unlabeled samples are calculated based on fuzzy similarity relationships by establishing a novel fuzzy covering system. Subsequently, a fuzzy related family is constructed by a consistent fuzzy set. Then a semi-supervised feature selection algorithm, referred to as the Semi-supervised Fuzzy Related Family (SFRF), is developed using the established feature significance measurement. Compared to existing semi-supervised feature selection algorithms, SFRF considerably enhances feature selection efficiency while preserving classification accuracy. Specifically, the average reduction efficiency across twelve datasets increased by up to 109 times.

Keywords:

1-s2.0-S0020025523012458-main.pdf

Mon Dec 30 17:51:00 CST 2024