scikit-learn-feature_selection

2023-12-13 04:31:30

1. 移除低方差的特征

方差低，说明变化不大。将特征方差值小于一定值的特征移除
在这里插入图片描述

单变量特征分析

通过单特征分析，选择最好的（前k个）的特征，scikit-learn 提供的方法有：

SelectKBest removes all but the highest scoring features
SelectPercentile removes all but a user-specified highest scoring percentage of featuresusing common univariate statistical tests for each feature: false positive rate SelectFpr, false discovery rate SelectFdr, or family wise error SelectFwe.
GenericUnivariateSelect allows to perform univariate feature selection with a configurable strategy. This allows to select the best univariate selection strategy with hyper-parameter search estimator.

from sklearn.datasets import load_iris
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_classif
X, y = load_iris(return_X_y=True)
X.shape
X_new = SelectKBest(f_classif, k=2).fit_transform(X, y)
X_new.shape

在这里插入图片描述

example

https://scikit-learn.org/stable/auto_examples/feature_selection/plot_feature_selection.html#sphx-glr-download-auto-examples-feature-selection-plot-feature-selection-py

递归特征消除

给定一个为特征分配权重的外部估计器(例如，线性模型的系数)，递归特征消除(RFE)的目标是通过递归地考虑越来越小的特征集来选择特征。首先，在初始特征集上训练估计器，并通过任何特定属性(如coef_， feature_importances_)或可调用属性获得每个特征的重要性。然后，从当前特征集中修剪最不重要的特征。该过程在已修剪的集合上递归重复，直到所需的数目。
在这里插入图片描述

使用SelectFromMode进行特征选择

SelectFromModel是一个元转换器，可以与任何通过特定属性(如coef_， feature_importances_)或在拟合后通过一个可调用的importance_getter来为每个特性分配重要性的估计器一起使用。如果特征值的相应重要性低于所提供的阈值参数，则认为特征不重要并将其删除。除了以数字方式指定阈值之外，还有使用字符串参数查找阈值的内置启发式方法。可用的启发式方法是“平均值”、“中位数”和它们的浮点倍数，如“0.1*mea”。
在这里插入图片描述

文章来源:https://blog.csdn.net/weixin_39107270/article/details/134944752
本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若内容造成侵权/违法违规/事实不符，请联系我的编程经验分享网邮箱：veading@qq.com进行投诉反馈，一经查实，立即删除！