PRACTICAL -5

 

Aim: Data pre-processing and text analytics using Orange.


What is text analytics ?

Text analytics is a automatic process to find patterns in a raw data in huge numbers and to use it for the growth of the organization. This helps organization to understand trends and make better decisions.


What is sentiment analysis?

Sentiment analysis refers to the use of natural language processing, text analysis, computational linguistics and biometrics to systematically identify, extract, quantify and study affective states and subjective information.


Can we get sentiments of any people on their tweet? Why it is useful?

Yes, we can get sentiments of people on their tweet. Based on the sentiment we can detect that it is positive, negative or neutral and based on the results we can take actions accordingly.


Pre-Processing on Data


We have performed various techniques for pre processing of data.

Discretization

Discretization is the process through which we can transform our continuous data into discrete form. We achieve this by making continuous intervals of equal length.


This is how we perform Discretization


This is how our data looks after Discretization.

Continuizetion

Continuizetion is the process where we can convert discretize attributes into continuous or we can remove discrete attributes from the table.

This is how we perform Continuization.


This is how our data looks after Continnuization.

Normalization

Normalization is converting the source data in to another format that allows processing data effectively. The main purpose of data normalization is to minimize or even exclude duplicated data.


This is how we perform Normalization.


This is how our data looks after Normalization.

Randomization

Randomization is the process of adding noise to the data that the behavior of the individual records is masked.


This is how we perform Randomization.

This is how our data looks after Randomization.

Orange tool also provide functionality to pre process data in Python Script.


Python code for Continuizetion of data.


Python code for Discretization of data.


Python code for Normalization of data.


Python code for Randomization of data.

Conclusion:- 
We learn about data pre-processing can be done by different methods available

Comments