Twitter Trends Analysis using Apache Spark (PySpark) on a local 2-node cluster.
Uses socketstream and listens to a TCP server, which integrates to twitter on it behalf and provides the tweets to this socket stream listener. These tweets can be analysed in real time by accepting a trending term and scouring the tweet stream to count the number of occurences of the term in each minute.
- Jupyter notebook - twitter_feed_bda.ipynb
- Server broker -
- Scoured data - tweet_count.csv
Configuring PySpark and iPython notebooks
Rest is self-explanatory.