Skip to content

Twitter Trends Analysis using Apache Spark on a local 2-node cluster

Notifications You must be signed in to change notification settings

ashok133/Twitter-Trends-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Twitter-Trends-Analysis

Twitter Trends Analysis using Apache Spark (PySpark) on a local 2-node cluster.

What it does?

Uses socketstream and listens to a TCP server, which integrates to twitter on it behalf and provides the tweets to this socket stream listener. These tweets can be analysed in real time by accepting a trending term and scouring the tweet stream to count the number of occurences of the term in each minute.

  1. Jupyter notebook - twitter_feed_bda.ipynb
  2. Server broker - tweetread.py
  3. Scoured data - tweet_count.csv

Help guides - PySpark installation

Configuring PySpark and iPython notebooks

Rest is self-explanatory.

About

Twitter Trends Analysis using Apache Spark on a local 2-node cluster

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published