PRACTICAL -2

 Aim: Perform following Data Pre-processing (Feature Selection/Elimination) tasks using python

THEORY:

 Feature Selection is one of the core concepts in machine learning which hugely impacts the performance of your model. The data features that you use to train your machine learning models have a huge influence on the performance you can achieve.

Feature engineering is the process of translating the collected data into features that better reflect the problem we are trying to solve to the model, enhancing its efficiency and precision.

  • Univariate Selection
  • Recursive Feature Elimination
  • Principal Component Analysis
  • Feature Importance  

 

Univariate Selection:
    Univariate feature selection works by selecting the best features based on univariate statistical tests. It can be seen as a preprocessing step to an estimator. Scikit-learn exposes feature selection routines as objects that implement the transform method:
  • Select K-Best removes all but the  highest scoring features
  • Select Percentile removes all but a user-specified highest scoring percentage of features
  • using common univariate statistical tests for each feature: false positive rate Select Fpr, false discovery rate Select Fdr, or family wise error Select Fwe.
Recursive feature elimination (RFE): 
    Unlike the univariate method, RFE starts by fitting a model on the entire set of features and computing an importance score for each predictor. The weakest features are then removed, the model is re-fitted, and importance scores are computed again until the specified number of features are used. Features important score are ranked by the model’s coef_ or feature_importances_attributes, and by recursively eliminating a small number of features per loop.

Fig 1


Fig 2

Feature Importance:

    Methods that use ensembles of decision trees (like Random Forest or Extra Trees) can also compute the relative importance of each attribute. These importance values can be used to inform a feature selection process.

This recipe shows the construction of an Extra Trees ensemble of the iris flowers dataset and the display of the relative feature importance.






Principal Component Analysis:
Method for reduction of dimension is called Principal Component Analysis. And this method is also called Data Reduction Technique. A property of PCA is that you have the choice to select the number of dimensions or principal component in the transformed result.

Conclusion:- Learn about how the selection / elimination process ( cycle ) is important in preprocessing & how it is works



google colab Link :-

https://colab.research.google.com/drive/1wI8HxovpUMG4XLw9N064ULqfJZTNx_cu?usp=sharing

 

Comments