说到特征降维/选择的问题，大部分EDA的套路都是从model训练的loss来判断feature... / Data Science Archive / Telegram Center

Forwarded from Data Science Archive (小熊猫)

说到特征降维/选择的问题，大部分EDA的套路都是从model训练的loss来判断feature importance。其实有一个简单易行而且很有效的办法是在CV里面用做feature permutation，对原始特征shuffle得到shadow（也可以加一些噪音），在通过zscore比较两者差异来判断importance，不断遍历筛选。在ESLII中593页有提到这个办法。R里面有一个包Boruta可以做这件事，py也有：https://github.com/scikit-learn-contrib/boruta_py

GitHub

GitHub - scikit-learn-contrib/boruta_py: Python implementations of the Boruta all-relevant feature selection method.

Python implementations of the Boruta all-relevant feature selection method. - scikit-learn-contrib/boruta_py

https://t.center/DataScienceArchive/114

1.8K views小熊猫, Jan 26, 2022 at 05:46

Love Center - Dating, Friends & Matches, NY, LA, Dubai, Global

Find friends or serious relationships easily