Project Documentation

Social Media Analytics Tool

#1. Introduction

Social media has been a place where people readily share information as they see it. Almost all verticles today are covered by social media. With the influence that social media has on the world today, it becomes essential for any business, analyst, and researcher to keep a watch on social media for getting to know the user engagement on a topic or a brand entity.

When it comes to businesses, a lot of customers post on social media about their experience with the product or service. Therefore, businesses can establish a good customer relationship through interaction on social media. Businesses can be alerted by the bad experience of a customer through social media and solve it.

It is also essential for businesses, analysts, and researchers to look into various other factors on social media that are connected to their topic of interest and have an effect on it directly or indirectly. But, no free and open-source tool does all of this in one place. Therefore, I have created this Django-based web app that uses Artificial Intelligence and Twitter API to fetch tweets on a particular topic/business, present advanced analytics, and filters, and also automatically recognizes the related entities to a topic/business and fetched it. Therefore, this tool improves over time with usage!

Twitter was selected as a convenient social media platform for this tool due to the high frequency and user interaction with the information. But you can feel free to choose any other platforms that provide API access to their content.

#2. Let's see what we can do with this tool

  1. Get all the tweets about a business/topic in one place

    Main Dashboard with Tweets

    View all the tweets that are related to your business directly or indirectly, at one place. You can also click on them and you will be taken directly to twitter, where you can interract with the tweet by linking, replying, or retweeting it on your page.

  2. Filters on sentiments

    Sentiment filtering for tweets

    Sentiments are classified using a Bi-directional LSTM model that is trained on a combination of Quora's insincere questions dataset as well as IMDB's critical reviews' database. You can view the tweets that belong to a particular sentiment by just selecting that on your filters box on the right side of the screen. This becomes important for businesses to handle and respond to negative tweets and appreciate the positive ones in their business.

    Here's a screenshot of a positive sentiment filter on tweets that would matter to coffee shops such as "Starbucks" to look at what their competitors are up to. Sentiment filtering for tweets

  3. Smart entity fetch and filter

    Entity filtering for tweets

    This feature allows you to view tweets on a specific topic that matters for your business/topic. The app also comes with an algorithm to fetch tweets about more entities that are relevant to your business, based on the previous tweets. So, you don't have to worry about what other things might matter to your business/topic. The system does it for you and it improves with usage. If you feel an entity is missing, you can just add it in the admin panel and the algorithm learns from what it has been missing and takes care of it the next time.

  4. Real-time customer analytics

    Twitter customer analytics dashboard page 1

    Twitter customer analytics dashboard page 1

    Twitter customer analytics dashboard page 1

    Know some statistics about the people Tweeting about your business or any topic that matters to you. This would allow you to see the impact and the outreach of the tweets that matter to you.

#3. How does it work?

In order to understand the working in the best way, let's take the topic of "Disaster management" as the context.

Back end architecture

The back-end of the web application is made using Django. The Django-based back-end will be fetching the tweets about the disaster from the Twitter API periodically (every 20 minutes, but a feature to adjust this in the admin panel is coming soon). The tweets are then run through an AI model deployed in the back-end. The model is a Bi-directional LSTM that would classify the sentiment of the Tweets. The named entities of the tweets are also determined. Then, the tweets, along with their named entities and their classified sentiments are stored in the PostgreSQL database deployed on the Heroku cloud platform.

Whenever a user operates the dashboard, the tweets stored in the database are loaded onto the front-end, where he/she can just click on the tweet and it will take him/her directly to Twitter, where he/she can interact with the tweet by either liking it, replying to it, or retweeting it. The user can also filter the tweets by the named entities that he/she wants to see and interact with them.

The tweeters’ information will be stored in the database, along with the tweets. So, the people using the dashboard can also view the statistics of the users tweeting about the specific entities, and also determine the outreach of the tweets. In the situation of disaster management, this could be helpful in determining if a tweet outreach by an influencer would be helpful in fetching aid and other kinds of donations for a particular disaster event.

#4. How was the sentiment classification model trained?

Sentiment classification model training

The training data sets were downloaded from the kaggle platform. Two sets of datasets were taken - one from Quora and another from IMDB. Both the data sets were first cleaned and then put into a dataframe of the same format of the columns. One column contained the text (which is either a question from quora or a critical movie review from IMDB). The text was then tokenized and the tokenizer was saved. This was done so that the same tokenizer can be used for tokenizing the text from the tweets fed into the model during deployment.

Further, a Bi-directional LSTM architecture was trained on the dataset. It was also tested for overfitting and it was successfully determined that the model did not overfit and it could be used for deployment in real-time. The architecture was also made taking the performance factors into consideration, so that the sentiments of the tweets can be classified quickly as and when they are fetched, without delay.

#5. Run it yourself!

  1. Download or clone the git repository from here. Note that the files are stored in Git LFS.
  2. Create a python virtual environment and name it as per your convenience.
  3. Activate the virtual environment by doing: source yourenvname/bin/activate
  4. Install the requirements from the requirements.txt to your virtual environment by running: pip3 install -r requirements.txt
  5. Change the database settings to point to your database.
  6. Run the Django migrations to your database by:
    python3 makemigrations
    python3 migrate
  7. Create a Django superuser by python3 createsuperuser
  8. Run the Django app by python3 runserver. By default, your app must be running on http://localhost:8000/
  9. Head to the admin panel at http://localhost:8000/admin/, log in and add the entities that you want to focus on the "entity" part of the dashboard. For example, "Food".
  10. Log out, head back to the home page and log in again at the home page.
  11. Wait for a while till the tweets are fetched and then refresh. (This might take a lot of time)
  12. If the tweets don't fetch automatically, please use the Django shell to execute the function digAndStore() which is in twitterops/ (I'm working on a solution for this using Celery as Cron isn't working properly).