PRACTICAL EXAM

17IT115 

    Task-1: Dataset Description using Orange tool. What is need to be done to improve the accuracy of classification result of the given dataset? Get the maximum classification accuracy possible by performing following methods. -->Pre-processing o Encoding o Normalization o Missing value handling o Feature Selection

Data-set Description:-

The dataset was made available through 2 csv files – audit_risk and trial. The audit_risk has 27 columns and trial has 18 columns. The 27 columns in audit_risk file are the following:


  • LOCATION_ID – Unique ID of the city or province
  • numbers - Historical discrepancy score
  • Money_Value - Amount of money involved in misstatements in the past audits
  • SCORE_MV, Risk_D - These columns can be derived from the Money_Value
  • District_Loss - Historical risk score of a district in the last 10 years
  • PROB – probability of District_Loss
  • Risk_E – It is the product of District_Loss and PROB
  • History - Average historical loss suffered by firm in the last 10 years
  • Prob – Probability of Historical Loss score
  • Risk_F – It is the product of History and prob

Money_value has null data-set



 
 
 
 


Fig Data Visual by Orange
 
Fig Libraries imported
 
 
 

Fig  After Prediction
 
 
 
 
 

Fig  After data pre- processing


 
 
Task 2:-Generate the Dashboard of preprocessed dataset from task-1. 

Find the Maximum data insights by plotting Bar chart, Boxplot, Pie Plot, Stack Plot using PowerBI dashboard visualization.


Fig 5 Power BI Dashboard

Dashboard with pie- chart, bar graph & stack chart

Colab Link:- https://colab.research.google.com/drive/1sKYAeJcdJrhcdwGd4Ak200BF5LFgyjJq

Comments