Analyzing and Automating Customer Service Queries on Twitter Using Robotic Process Automation

.


Introduction
Recently, companies have begun to use social media to promote their product and reach customers better.A company's social media presence builds customer loyalty, provides better reach, and gives an edge over competitors.An intuitive solution would be to build social media presence on as many platforms as possible.However, a survey by Li et al. (2021) shows that the percentage of unique users on each platform is minuscule, hence, focusing on a couple of platforms would suffice to reach most social media users.Some companies, like Fitbit, have turned to twitter to provide free customer service.This is appealing to customers as there is less hassle to tweeting the issue than calling customer service personnel.According to a survey by Appel et al. (2020), about 42% of American users check Twitter once a day, and 25% check several times a day.Thus, users are more inclined to interact via twitter than by phone calls.These tweeted responses, however, are not always automated and could be tedious for customer service personnel to deal with.
As the competition in the market continues to rise, companies need a system that can analyze customer satisfaction and resolve customer complaints for a particular product.Although market research tools as well as automated chatbots are available, they are costly for start-ups.A survey by Tripathi and Oivo (2020) found that 38% of start-ups fail due to lack of funds and this is the leading cause of start-up failure.
Currently, there does not exist an integrated platform that can pull comments from social media, decide whether an automated message should be sent, and provide insights on how to improve the product.The developed system uses UiPath for seamless dataflow between separate workflows and load balancing between bots.This system aims to help start-ups with limited funding deal with real time data, perform pre-analytics steps, get insights out of the data, and provide solutions to customers.

Literature Survey
One of the many ways RPA can be upgraded to IPA is through the implementation of chatbots (Langmann and Turi, 2023).Chatbots provide a user interface to the user to ensure faster and seamless automation of several business processes (Rizk et al., 2020).A well-designed chatbot can extract the required information from the user even if the user's query/response is not in a particular format (Cebuc and Rus, 2023).Chatbots are shown to be highly effective to automate frequently requested tasks.Using NLP, they adapt to tasks that are slightly different from their programmed base task (Kecht et al., 2023).Another case (Shaikh et al., 2022) presents a framework with various stages that offers guidelines with enough flexibility to be applicable in complex and heterogeneous corporate environments as well as for small and medium sized companies.The core idea of RPM is that repetitive routines amenable to automation can be discovered from logs of interactions between workers and web and desktop applications, also known as User Interactions (UI) logs (Herm et al., 2022).RPA does not require changing the existing IT systems, as robots mimic human behavior.Thus, robots can operate fully within the User Interface (UI), leaving IT systems unchanged (Asatiani and Penttinen, 2016).
The system by Mellachervu and Minukuri (2018) extracted ASIN codes to extract reviews from Amazon for a particular product using Python.Parsing, categorization, and labeling of the extracted reviews were done using a SAS text miner.This feature analysis aids retailers and businesses in gaining a better understanding of client expectations in order to develop future items that fulfill their demands and add value to their business.The study of Kirilenko et al. (2022) discussed two main approaches to sentiment analysis that were lexicon based and machine learning.This study extends prior review studies of how knowledge and methodologies of sentiment analysis have developed.Cheema et al. (2021) show an experimental evaluation of visual, textual, and multimodal characteristics for sentiment prediction in tweets with the recently announced CLIP embeddings.
The work by Vashisht and Dharia (2020) showed that the introduction of the chatbot and its connection with a BI tool had shown promising results.Liu et al. (2020a) presented the most optimal method as a convolutional neural network based on sentence modeling in NLP tasks.Sowmia et al. (2023) proposed a framework that presented various crucial aspects such as emotions, instructor engagement, student understanding, and learning outcomes.Deep learning algorithms such as LSTM, GRU, and RNN were employed to classify the students' reviews.It showed that students are becoming increasingly comfortable with online learning environments.In the work developed by Gandhi et al. (2021), sentiment analysis was done on tweets extracted via tweepy.The features are extracted via the N-gram modeling technique.The tweets were segregated into positive, negative, and neutral sentiments.The SVM classifier gave approximately 80% accuracy.Williams (2021) built an automated solution on twitter that can use the NASA-API and twitter API that replies to users' tweets with a picture from the mar's rover.It used the tweepy package.When a user requested a picture from the mar's rover, the Twitter handle used the NASA and twitter APIs to reply to the tweet with a photo.
Concerning the development of chatbots using the BERT model, (Chang et al., 2022) have developed a bot that can detect if a financial message is fraudulent or not.This study presents a comparison between ELMO, BERT, and Word2Vec to see which model could extract semantic features better.BERT, along with SVM or a random forest classifier gave about 98% accuracy.To automatically detect automatable tasks, (Ketkar and Gawade, 2021) attempted to describe the various use cases that can and cannot be automated by robotic process automation.The use cases were determined by analyzing logs generated by various applications.The work determined the information that logs must contain and how the logs must be converted to identify tasks that can be automated.Das and Das (2022) proposed a deep learning-based sentence level classification scheme for crime reports, incorporating CNN and GloVe vectorization.To mitigate the issue of class imbalance, the study employs a strategy of combining similar classes and evaluates the efficacy of the proposed approach through comparisons with existing methods.The machine performed the categorization of emails primarily based on the content material with the assistance of an SVM.Karri and Kumar (2020) examined and discussed the extraordinary technologies used inside the chatbots and also addressed the layout and implementation of the chatbot system.They specified techniques for designing the chatbots and presented an evaluation of several chatbots methodologies.Liu et al. (2020b) used SAS enterprise miner to help consumers determine which product model to buy if they want to get the most value for their money.It helped in assisting and determining how well items were performing.They compared whether the concerns were remedied between the two product generations.It also threw light on how concept link graphs could be generated from the tweets.A concept link graph is a graph that depicts the strong correlations between the words of an input corpus.Thus, the product owners can all the issues associated with a product feature from the concept link graph.

General Workflow
The general workflow of the system (Fig. 1) consists of the following steps:  Scraping data from the page: Tweets about a product by the customer from the product's twitter handle (here, Fitbit) are scraped using the UiPath web scraping tool (50,000 tweets in our case) in real time and added to an excel sheet  Detecting the sentiment of comments: Positive tweets are detrimental to finding out how well the product is being received and what features are being raved upon by customers.Negative tweets have to be responded to promptly, to maintain customer satisfaction and loyalty  Categorize the issue: For Fitbit, 4 main issues that the customers face problems with were identified: Battery issues, sync issues, display issues, and sleep issues.Every company needs to keep track of the issues commonly encountered with the product and keep them in check  Send a reply to the customer depending upon the issue identified: The query/complaint is sent to the chatbot corresponding to the issue identified.Splitting the chatbot into four models ensures that the entire system is lightweight, accurate, and gives faster responses  All the data that is extracted is fed into the Tableau cloud dashboard to derive visual insights (tweet information such as timestamp, sentiment, and category).This can be utilized to provide insights to a small business on how well the product is received on social media and what features could be improved on.The dashboard is dynamic and the company can apply filters (based on time, month, year, sentiment, or feature) to analyze the market performance of the product These 5 separate workflows are then integrated using UiPath.The final workflow (Fig. 2) can be deployed to the UiPath orchestrator as a NuGet package.An orchestrator is used to schedule jobs over a group of robots as user requests start pouring in.Software robots (bots) are remote agents that execute workflows.Upon authentication, the bot accesses packages that have been uploaded to the orchestrator by the developer.The bot loads the models and workflows to handle user requests.An orchestrator is used to schedule jobs over a group of robots as user requests start pouring in.
These 5 separate workflows are then integrated using UiPath.The final workflow (Fig. 2) can be deployed to the UiPath orchestrator as a NuGet package.An orchestrator is used to schedule jobs over a group of robots as user requests start pouring in.Software robots (bots) are remote agents that execute workflows.Upon authentication, the bot accesses packages that have been uploaded to the orchestrator by the developer.The bot loads the models and workflows to handle user requests.An orchestrator is used to schedule jobs over a group of robots as user requests start pouring in.

About the Models
The real time data scraping tool offered by the UiPath platform was used to scrape 50 thousand and above tweets from twitter.UiPath's automatic real time data extraction was linked to the page from which tweets were to be extracted, which continuously updated tweets as and when they were added to an excel sheet.Text preprocessing, visual analysis, and modeling phases were included in the model of sentiment analysis.
A learning pipeline of BERT-CNN-BiLSTM was implemented.BERT is one of the emerging powerhouses of natural language processing tasks including sentiment analysis, Named Entity Recognition (NER), and topic modeling.
The sentiment analysis model achieved 96% accuracy with improved prediction methods and by using the increased size of the training dataset.The model generated during the implementation was (Fig. 3) the BERT-CNNBiLSTM model.

Fig. 1:
The overall workflow for the proposed system After sentiment analysis, the model used for the classification of tweets is a linear model.This model for classification was used due to its highest accuracy among the other 3 classification models that were compared while training with the existing dataset for categorization.
Multi class text classification with scikit learn for labeling and categorization of tweets was used.In the model, terms were found that were correlated to each of the issues that are used as different classes for labeling tweets.Four different classes were used: Battery, display, sleep, and sync for categorization.The labeling and classification of tweets with the logistic regression linear model gave 97% accuracy which was the highest among all compared classification models.

Dataset Generation and Description
Instead of having humans manually interpret tweets, the proposed method was distinctive since the training and testing data was generated automatically.In the proposed method for sentiment analysis, any tweets that had emoticons, such as those that read "(:)", were considered positive, while those that contained "(:)", indicated a negative sentiment.For training the sentiment analysis model, an existing dataset consisting of 16 lakh tweets with their polarities (0, 2, 4) was used.For testing the sentiment analysis model, the scraped 50 thousand tweets from Fitbit on twitter were used.
The training dataset that was used to evaluate the labeling and categorization models consists of 1 lakh tweets that were generated using a keyword search in TWINT.In this model, the text and class categories were used to train the model.The dataset comprises customer complaints and the corresponding issue.Experiments were performed with several machine learning methods to analyze which classification method performed better.The testing dataset is the CSV file which got generated from the sentiment analysis model (consisting of 50 thousand tweets from Fitbit in twitter) using UiPath.

Implementation of Models
Each model has a specific architecture and its strengths and functions as defined below: 1.The BERT module is utilized to transform word tokens from the raw tweet messages to contextual word embeddings in the sentiment analysis model 2. The CNN module is known for its ability to create different classes and extract as many features as possible from the text of tweets for classification.It proved to be the most optimal method for text representation in NLP tasks 3. The BiLSTM module keeps the chronological order between words in a document, thus it can ignore unnecessary words using the delete gate The implementation of the sentiment analysis started by labeling encoding 0 as negative, 2 as neutral, and 4 as positive.The text of tweets was cleaned by removing URLs, HTML tags, and punctuations which improved the sentiment predictions of tweets.The environment was set up to build the model by using BERT base uncased because of unorganized tweets that were not easily understood as the tweets were classified.There were libraries used from 'nltk' to download 'stopwords' from 'nltk.corpora for text preprocessing.For modeling, libraries from 'sklearn' like 'label encoder', 'classification_report', 'accuracy_score', and 'precision_score' were used.TensorFlow was used for libraries like 'Embedding', 'LSTM', 'bidirectional', 'BertTokenizer' and 'TFBertModel' to build the sentiment analysis model.To train the model Adam optimizer proved to be more efficient as it improved the time complexity of the model.The real time data for testing was fetched from twitter using TWINT packages.The threshold was determined as (0.4, 0.7).The trained model for sentiment analysis was saved and worked successfully with 96% accuracy.
The comparison of Naive Bayes, random forest, support vector machines, and logistic regression methods for multi-class text classification is presented in the model for categorization.The results of training and experiments indicate that logistic regression linear model multi-class classification method for tweets categorization had achieved the highest 97% accuracy in the results of training and experiments indicate that the logistic regression linear model multi-class classification method for tweets categorization had achieved the highest 97% accuracy in comparison with Naive Bayes, random forest, and support vector machines classification methods.On the contrary random forest classifier had got the lowest average accuracy of 46% (Fig. 4).Due to the comparison of 4 models for best accuracy and optimal results during the implementation GPU was required for faster training.
We have used sklearn.feature_extraction.text.Tfidf vectorizer in the categorization model to calculate a tf-idf (term frequency inverse document frequency) vector for each consumer complaint narrative.This calculated value showed how important a word is for the tweets in a collection of words.
After pre-processing the tweets, stemming and lemmatization was performed to group similar words.
Unigrams and bigrams were generated to get the similarities and correlations between the textual terms and products mentioned in tweets.After importing the necessary packages, required for every model, all 4 models were trained with the pre-processed text of all tweets.The graph generated after training defined the most ideal model as the logistic regression linear model which gets saved after being trained for integration with UiPath (Fig. 5).
The final tested result data was generated as a CSV file (Fig. 6) after sentiment analysis and labeling.The data has 4 fields:

Integration with UiPath
The Twitter intelligence tool TWINT in integration with UiPath was used to fetch real time data from twitter.The scraped data gets stored in CSV format as input.The pre-trained models for sentiment analysis is connected to UiPath using the path URL.The results are generated using the prediction method from the python process code.Results are given in a CSV file with all tweets labeled with sentiments for further categorization.
The pre-trained logistic regression linear model for categorization is integrated with UiPath connected using path URL.The python process model for generating results uses the CSV file after sentiment analysis as input for performing categorization.Since the text was cleaned previously during sentiment analysis, it was appropriate to be sent as input into the classification model.The methods and functions are called from UiPath to perform correlation and vectorization of words.Using the pre-trained linear model, the prediction takes place and the results are generated in a CSV format with categorized tweets into 4 different classes according to their text correlations.

BERT vs Canned Responses
Canned responses are predefined responses to a set of FAQs about the product.For the chatbot to recognize a query, the query must be entered by the user in a certain format.If the user fails to follow the predefined format or gives more information than necessary, the chatbot will generally reply with a default failure message.This has proved to be frustrating to customers and increases customer churn (Savarimuthu et al., 2021).
Using the models trained by BERT and open AI-GPT, customers receive intelligent and personalized messages for their complaints and queries.Even if the customer tweet is not in a particular format and provides more information than necessary, the model can intelligently correlate it with the information it has been trained on and gives a natural language response (Sarol et al., 2022).To the customer, it seems like they are interacting with customer service personnel than a bot.Hence, customers feel that they are valued and develop loyalty towards the brand.

Training the BERT Model
The BERT transformer consists of encoders and decoders, that understand input sentence tokens and predicts what the next token could be (Stremmel and Singh, 2021).Training transformers on large corpora is a very costly operation and hence it is more viable to import a pre-trained transformer (like open AI-GPT) (Kumari and Pushphavati, 2022).To use the double headed open AI GPT, this study uses the pytorchpretrained-BERT Python library, which is open sourced by hugging face (Jain, 2022).Along with this, a dataset of customer service FAQs is required to train the model.The dataset (JSON) includes the following 3 parameters, personality, candidates, and history.
 Personality: Indicates the knowledge base or the personality of the bot (Ex: "I am a customer support helper for Fitbit.") History: Any previous utterances by the chatbot and the customer (Ex: The customer asks the chatbot, "how do I get started on Fitbit?")  Candidates: Replies that can be considered as a reply to the query from the user.Ex: Some potential replies to a customer asking to get started on Fitbit could be, "download the Fitbit  app on your phone to connect the Fitbit to your mobile."Or "login to your Fitbit app" There are replies of two types: Distractors and golden replies:  Distractors: Candidates that do not have a suitable reply to the query, but have the same context as the ideal (golden) reply  Golden reply: The most ideal answer to the user query among the candidates To train the transformer, we need to fuse the personality, candidates, and history from the dataset into a single input before feeding it to the transformer.The model cannot distinguish between personality, candidates, and history with delimiters.To overcome this problem 5 special tokens were introduced:  <bos> indicates the beginning of the sentence that describes the personality/knowledge base of the bot  <eos> indicates the end of the sentence that describes the personality/knowledge base of the bot  <consumer> indicates the beginning of the query being asked by the user  <bot> indicates the beginning of the response from the bot  <pad> is used as a padding token To detect the next token according to the context, information about segments, word positions, and word embeddings is necessary (Wang et al., 2020).The special tokens introduced earlier ensure that we can pass information about the segments easier.
The double headed BERT model consists of the language modeling head and the prediction head.The Language modeling head applies a cross entropy loss to the section of the target corresponding to the gold response after projecting the hidden state onto the word embedding matrix to get logits (green labels in Fig. 7).The next sentence prediction head uses a cross entropy loss to accurately identify a gold response among potential distractions by passing the hidden state of the last token through a linear layer to obtain a score.The total loss of the model is the weighted sum of the language modeling loss and the next sentence prediction loss.

Tweepy Integration
Before generating the response, the workflow checks if the chatbot already replied to the query.The pre-processed tweet is input into the model.The model outputs tokens that can be decoded to give the output query.To send tweets via twitter, a twitter developer account is required.Using the twitter API keys, the tweepy Python package is used to develop methods that send replies to the customer (Gowda et al., 2023).This update_status method takes the customer's twitter to handle details and the reply to be sent.It then sends the reply in response to the customer's tweet (Fig. 8).

The Dashboard
The entire dataset consisting of the tweets along with the tweet details that have been derived from the algorithm i.e., the sentiment and category, along with the timestamp (the time at which the tweet was posted) serves as input in the form of an excel sheet to the Tableau cloud dashboard via an automation step in the UiPath workflow.Any new tweets and the extracted information about them are appended to this dataset's insights in real time.
The dashboard (Fig. 9) generates several graphs that establish relationships between the various parameters in the dataset.The figure displays a sample dashboard that we generated, including insights such as the number categorization of tweets based on their sentiment and the number of tweets indicating engagement of users.The purpose of these tweets is to help the company derive insights (for example, the number of complaints for battery issues throughout the year).
Using these insights, the company can analyze how the product is being received in the market.The dashboard also offers multiple views for the graphs; thus, the user can visualize data in different graphical representations based on multiple parameters based on what insights they are trying to derive.This dashboard provides a real time, cost effective solution that helps small businesses improve their product market performance and customer satisfaction.

Conclusion
This approach considers the various ways in which a company can automate and derive insights from a product company's twitter customer service handle.It generates graphs and statistics based on data gathered to better understand market reviews.It uses UiPath's web scraping tool to scrape recent tweets and the BERT-CNNBiLSTM (96% accuracy) detects the sentiment of the tweets.The logistic regression (97% accuracy) model categorizes the tweets into one of the four predefined categories based on the subject of the tweet.Also, using the double headed BERT chatbot model (78% accuracy), it responds to client complaints or feedback with little assistance from humans.So far, on the UiPath platform, there is no single integrated market analysis tool that could perform all these actions seamlessly.This interface can be used by companies to perform market analysis quite easily.
The system can be further worked on by adding a feature to retrain the model based on customer preferences, automatically categorize tweets into categories other than the four pre-defined categories, and develop concept link maps from the consumer tweets of the twitter handles.

Fig. 2 :
Fig. 2: The UiPath workflow of the entire system

Fig. 7 :
Fig. 7: The double headed BERT model with input information for training

Fig. 8 :
Fig. 8: Input customer query with the reply generated by the sleep issues chatbot

Fig. 9 :
Fig. 9: Dynamic dashboard with filters that can be used to make business decisions