Sentiment Analysis on tweets gave me interesting results.
1. Extract tweets with HashTag on Leonardo
2. Generate CSV of Tweets
3. Extract required information
4. Natural Language Processing - Tokenizing ,Stamming etc.
5.Classify them as Positive Negative Neutral
RT @FindingSquishy_: If #Leonardo Di Caprio wins an Oscar tonight, Tumblr will probably break
if #Leonardo di Caprio doesn't win an oscar I am going to scream
RT @Mohammed_Meho: #Leonardo Di Caprio better win an Oscar tonight.
RT @Miralemcc: #The Wolf of the Wall Street and# Leonardo di Caprio for #Oscars2014
#Leonardo Di Caprio doesn't deserve and never has deserved an oscar. Deal with it
Step1 : Step 1 is Scrapping tweets for the required tag. This can be done using the twitter API or You can use online sites for searching tweets and extract the search results from it. There are many sites that can give you direct Sentiment analysis results like NCSU project :
Stanford Project :
But I choose twitter seeker that just gives you search result without sentiments and I wanted to do Sentiment analysis myself.
TwitterSeeker generates you a Excel sheet with all tweet information.
You can filter it by selecting language as english In the image I applied no filter.
Excel file generated will have user name ,time of posting,tweet and many other as option. In the current case I am only concerned with the tweet.
STEP 2 : generate CSV of Tweets.
For my data as input to ML algorithms , I used CSV file. CSV is Comma Seperated Value format in which each column is seperated by delimiter. After getting excel from twitter I converted into a CSV file.
STEP 3: Extract Requried Information:
This is the step where your knowledge of Data mining will come into use. As in the present I am only concerned with one column that is tweet. Now general tweet is generally in a form
Username @User #tag Link
which can very randomly.
Now I removed all the unnecessary words from it . All usernames tags and links.
#updated every day.
STEP 4: Tag Generation.
Get tags for All tweets.
STEP 5: Sentiment Analysis :
For sentiment Analysis I am using ANEW dataset from University of Florida.
Our Dictionary Datset was composed of 3 main components:
Valence which is the pleasantness of stimulus
Arousal Intensity of Provoked Emotion
Dominance Degree of control exerted by Stimulus.
We decided to use the arousal ratings to estimate polarity
of a tweet. The following steps were followed regarding the
- Generate tags for each tweet.
- For each word i in the tweet that exist in the Arousal Dictionary, extract the mean and standard deviation of valence, arousal, dominance.
- Count number of tags for each tweet. If they are zero or 1 ignore it because of less information to estimate
- To calculate the overall mean and standard deviation of each twitter feed , numerically average the generated n tags mean and standard deviations.