OneHotEncoder
is a class in the sklearn.preprocessing
module of the scikit-learn
library ¹. It is used to encode categorical features as a one-hot numeric array ¹. The input to this transformer should be an array-like of integers or strings, denoting the values taken on by categorical (discrete) features ¹. The features are encoded using a one-hot encoding scheme, which creates a binary column for each category and returns a sparse matrix or dense array (depending on the sparse_output
parameter) ¹. By default, the encoder derives the categories based on the unique values in each feature. Alternatively, you can also specify the categories manually ¹. This encoding is needed for feeding categorical data to many scikit-learn estimators, notably linear models and SVMs with the standard kernels ¹. Note that a one-hot encoding of y labels should use a LabelBinarizer
instead ¹.OneHotEncoder "是 "scikit-learn "库¹的 "sklearn.preprocessing "模块中的一个类。它用于将分类特征编码为一热数字数组 ¹。该转换器的输入应该是一个整数或字符串数组,表示分类(离散)特征的取值¹。特征使用单次编码方案进行编码,为每个类别创建一个二进制列,并返回稀疏矩阵或密集数组(取决于 `sparse_output` 参数)¹。默认情况下,编码器根据每个特征中的唯一值推导出类别。或者,也可以手动指定类别 ¹。向许多 scikit-learn 估计器(尤其是线性模型和带有标准核的 SVM)输入分类数据时,都需要使用这种编码 ¹。请注意,y 标签的单次编码应使用 "LabelBinarizer "来代替 ¹。