Academy of Entrepreneurship Journal (Print ISSN: 1087-9595; Online ISSN: 1528-2686)

Research Article: 2021 Vol: 27 Issue: 2S

Sentiment analysis - A literature review

Pedro Neves Mata, ISCTE-IUL Instituto Universitário de Lisboa (ISCTE-IUL)

Mário Nuno Mata, ISCAL-IPL: Instituto Superior de Contabilidade e Administração de Lisboa

Jéssica Nunes Martins, NOVA -Information Management School (NOVA IMS), Universidade Nova de Lisboa

João Xavier Rita, ISCAL-IPL (Instituto Superior de Contabilidade e Administração de Lisboa)

Anabela Batista Correia, ISCAL-IPL (Instituto Superior de Contabilidade e Administração de Lisboa)

Keywords

Sentiment Analysis (SA), Opinion Mining (OM), Text Mining, Natural Language Processing (NLP), Case Study, Application.

Abstract

Sentiment Analysis (SA) or Opinion Mining (OM) is the field of study for a broader topic of Natural Language Processing. SA seeks to understand people's opinions, feelings, assessments, attitudes and emotions through text to generate knowledge and relevant information on a particular subject, in the business world with a greater focus on understanding the evaluation of products. We can often resume to an interpretation of attitude behind the text whether it is positive, negative or neutral. The growing importance of SA coincides with the growth of social networks, opinions, criticism, forum discussions, and blogs, among others. With this exponential evolution of data has arisen the need to apply SA in almost all social and commercial domains, because opinions are key in almost all activities and are one of the influencing factors in human and social behaviors, beliefs and perceptions of our own choices. As the opinion is one of the main influencing factors in the people's choice has made the spectrum of analysis broader for organizations making this a very relevant topic these days. This paper revealed that although there some advances for algorithms, techniques and frameworks to help SA implementations there is still a gap towards identifying benefits for business applications.

Introduction

The field of Artificial Intelligence (AI) is constantly growing and discovering new ways to solve real-world problems (Moreno & Redondo, 2016). One of the AI knowledge and research fields is Natural Language Processing (NLP) which attempts to classify and process human language data that use devices to comprehend humans.

NLP is an AI field focused on enabling computers to understand, process and act based on human languages, getting computers closer to a human level language understanding (Jurado & Rodriguez, 2015). Some advances in Machine Learning (ML) have allowed computers to do many useful things using NLP techniques and deep learning (Zhang et al., 2018) such as online language translators or semantic understanding Feldman, (2013).

One of the most popular and important uses of NLP is Sentiment Analysis (SA) (Sayeedunnisa et al., 2018). With this technique, we can build systems that attempt to identify and extract opinions or sentiments from oral speaking or written texts (Kotzias et al., 2015). This type of analysis is extremely important for organizations because they can take customer's opinions and accordingly make improvements to their products and businesses.

Also known as Opinion Mining (OM), SA can be defined as the “computational treatment of opinion, sentiment, and subjectivity in text” (Bakshi et al., 2016; Dwivedi et al., 2019). It has been applied to many contexts, like reviewing customers, products and services, examining reputations in social networks [REF], tracking people’s feelings about politicians, promoting marketing campaigns, among others (Feldman, 2013).

Both text and OM are originally conducted for two purposes. The first purpose is to analyze people’s sentiment on an issue or phenomenon. Hence, sentiment analysis goes through a huge amount of textual data to identify people’s attitudes, thoughts, judgments, and emotions on an issue (Feldman, 2013; Yu, 2003; Hatzivassiloglou, 2003). The second purpose is to assess people’s opinion on a product, person, event, organization, or topic from a user or group of user perspectives. Similar to SA, OM is a NLP task that uses an algorithmic technique to recognize opinionated content and classify it into positive, negative, or neutral polarity (Piryani et al., 2017). Nonetheless, the application of OM has been extended to other fields of human-computer applications, and the applications are growing with the growth in big data analytics (Shayaa et al., 2018).

Despite a large number of studies on SA and OM techniques, the impact they have on organizations has been less studied. So rather than being concentrated on techniques and algorithms the need for a systematic review arises from the requirement to summarize all relevant information about application and creation of value for SA implementations in organizations.

To draw a general conclusion about this phenomenon, this research evaluates individual studies that can help to understand the main features of this field.

After this introduction, the rest of the paper is structured in four sections. In the next section, we will define the research methods applied. After that, we will define the data used in this paper as well as metrics used to compare them and then we will see the main results that we have seen during the analysis. Finally, we will see the conclusions that we took.

Research Methodology

A Systematic Review (SLR) is a process of identifying, assessing and interpreting all available research evidence, to provide answers for a particular research question.

A form of secondary study that uses a well-defined methodology:

• To systematically accumulate, organize, evaluate and synthesize all existing research evidence of your research area.

• To present a fair evaluation of a research topic by using a trustworthy, rigorous, and auditable methodology.

• To produce reliable and unbiased results.

• To identify gaps in the existing research that will lead to topics for further investigation.

• To provide a background to position new research activities.

• To support Evidence-based research.

Steps for Conducting a SLR

Planning the Review

• Formulate the Review research questions.
• Develop the review´s protocol.

Conducting the Review

• Search the relevant literature
• Perform a selection of primary studies
• Perform data extraction
• Assess studies quality
• Conduct synthesis of evidence

Reporting the Review

• Write up the results of the review
• Taking out conclusions

Planning and Conducting the Review

Formulating Questions

The objective of the paper is to discuss the value that can be added through sentiment analysis in organizations and try to identify potential risks so it can generate innovation throughout the business.

Therefore, the purpose of our systematic review must answer 3 questions:

RQ1: Which are the industries that applied sentiment analysis and opinion mining?

RQ2: What kind of advantages and disadvantages are related to implementation?

RQ3: What are the main innovations that are correlated with this kind of implementation?”

Study Selection and Evaluation

To obtain a comprehensive set of papers we started by searching studies in the widely accepted literature search engines and databases ACM, IEEE, Science Direct and Springer that contain all text collections such as papers in journals, chapters, conference proceedings, review articles and research articles. These papers were collected based on their title, keywords, abstract, and rank.

Since the study is based on sentiment analysis and respective implementations, we used 3 different strings to identify relevant papers:

• Sentiment Analysis application

• Sentiment Analysis case study

• Sentiment Analysis implementation

In order to ensure and maintain the quality of the paper, we have constrained our selection of articles to the following criteria’s:

1. Filter in title or abstract

2. Filter in title

3. Filter Q1, Q2 ranks of magazines (SCImago Journal & Country Rank based) and A, B (ERA - Australian Computing Research and Education Association of Australasia based) for conference proceedings

4. Removing duplicates

The application of the criteria’s has followed as shown in Table 1 :

Table 1
Application Criteria
Keyword Database 0.No filter 1.Filter in title or abstract 2.Filter in title 3.Rank 4. Duplicates
Sentiment Analysis application ACM 266.586 9 36 10 10
IEEE 1.119 498 30 3 3
Science Direct 15.79 13 3 2 2
Springer 25.725 n/a 1 0 0
Sentiment Analysis case study ACM 231.557 2 8 1 1
IEEE 231 125 18 1 1
Science Direct 27.529 5 5 5 5
Springer 83.463 n/a 0 0 0
Sentiment Analysis implementation ACM 200.406 0 0 0 0
IEEE 127 79 6 1 1
Science Direct 16.249 3 3 2 0
Springer 25.725 n/a 0 0 0
Total papers   894.507 734 110 25 23

Data Extraction Analysis

A more in-depth analysis was done regarding the sources of datasets which are shown in Figures 1 and 2.

In picture 1 we can see that we have a rise in articles in 2014 and 2015 but there is a downfall in 2016 and 2017, after that, we can see an improvement for 2018 and 2019 with four (4) articles.

The main source for industries application, picture 2, is Social Media with ten (10) articles, followed by Technology with five (5) articles, Education with two (2) articles and all the others with 1 article

Figure 1: Studies Distribution Per Year

Figure 2: Studies Distribution by Industry

Reporting the Review

Synthesis of Selected Studies

SA and OM and has increased popularity in recent years and has been applied in several areas. It has been used in differentiated areas like Communications and Media, Energy and utilities, Industry, Healthcare, Financial sector, Public sector, Distribution, Banking, Social Media, Technology among others.

Some of these applications will be reviewed in this section following the studies that were selected.

E-Commerce

This market is gaining importance worldwide because it brings convenience to our life as shoppers, users can search, browse, compare, and purchase various items without the time and geographical constraints.

This study offers a decision support model for item comparison in e-commerce using qualitative flexible multiple criteria methods and online reviews to support consumers (Ji, 2018).

Education

The acquisition of knowledge, skills, values, beliefs, and habits are in place for thousands of years we have seen several educational methods include teaching, training, storytelling, discussion and directed research. Education frequently takes place under the guidance of teachers; however, learners can also educate themselves.

We have two different perspectives that SA was able to aid for one side it helped bring student evaluation comments for a specific professor and on the other side was applied to assisting language learning (Borromeo & Toyama, 2015; Cao, 2014).

Hospitality & Sales

The hospitality industry is a broad category of fields within the service industry that includes lodging, food and drink service, event planning, theme parks, transportation, traveling, airline and additional areas within the tourism industry.

Hospitality industry concentrates on customer's satisfaction by creating good services and products that will meet their needs. Therefore, it is important for service providers to establish a good relationship with customers so that they will come back for more and this could be an excellent area to work with SA evaluation.

Since the main goal of almost all corporations is to sell goods or services we can find two great examples of SA applications one to an ideal decision or best restaurant for outing based on textual reviews available online (Dwivedi & Pant, 2019) and the other one exploring the relationship between the sales performance of products and their reviews (Liu, 2010).

Social Media and Technology

Social media is the technology that facilitates the sharing of ideas, thoughts, and information through the building of virtual networks and communities. By design, social media is internet-based and gives users quick electronic communication of content.

SA is extremely useful in social media monitoring as it allows us to achieve an overview of the public opinion behind certain topics, product or services. The ability to extract insights from social data is a practice that this technology can provide and is being adopted by organizations across the world.

From our studies we can realize that there is a constant application and creation of algorithms, methodologies and frameworks to assist this pursue of insights (Chen et al., 2014; Costa et al., 2012; Feldman, 2013; Gimnez et al., 2019; Jurado & Rodriguez, 2015; Kauffmann et al., 2019; Konan et al., 2016; Kranjc et al., 2015; Li et al., 2016; López et al., 2019; Shayaa et al., 2018; Oliveira et al., 2014; Tun Thura Thet et al., 2010). Some of the studies are looking for correlations of images, sound or audiovisual SA (Chen et., 2014; Konan et al., 2016; Tun Thura Thet et al., 2010) others are pursuing the best technique of SA application (Feldman, 2013; Jurado & Rodriguez, 2015; López et al., 2019; Shayaa et al., 2018).

Since this is a field that is still growing knowledge these studies will help to clarify.

Transportation

Transportation is the movement of humans, animals and goods from one location to another. Since this, a primary concern on our daily basis this paper focuses on the use of SA to help us understand the traffic information from websites considering human affection to enrich the analysis (Cao et al., 2014).

Datasets

The main source of data is Twitter with five (5) articles and the others are using several other sources for the analysis. Although we can see there is a pattern that is mainly using social media sources to get insights.

Advantages, Disadvantages & Innovation

We realize that this collection of papers doesn’t gather information on business impact for organizations.

The next table 2 will resume the selected studies in terms of industries, datasets, advantages vs disadvantages and innovations criteria for business applications.

Table 2
Study Selection
Industry Source Purpose Dataset Advantages Disadvantages Innovation
E-commerce (Ji et al., 2018) Build a review-based decision support model foritems comparison in e-commerce. PConline.com n/a n/a n/a
Education (Borromeo & Toyama, 2015). Compare SA identification from manual,crowdsourced and automatic systems .csv files n/a n/a n/a
(Chen et., 2018) Application of SA to language learning. Several documents n/a n/a n/a
Hospitality (Dwivedi & Pant, 2019) Framework for big data SA on real-time updates inonline reviews or text for best decision selection. Kaggle n/a n/a n/a
Sales (Liu et al., 2010) Algorithm that can be applied to salesperformance prediction. IMDB n/a n/a n/a
Social Media (Chen et al., 2014) Framework to detect visual concepts. Flickr n/a n/a n/a
(Costa et al., 2012) Framework for building blog mining applicationsin e-commerce. Blogosphere n/a n/a n/a
(Konan et al., 2016) Algorithm to choose the correct background music(BGM) to a photo or movie scene. Movies n/a n/a n/a
(Kranjc, et al., 2015) Methodology and workflow implementation for SAusing data streams. Twitter n/a n/a n/a
(Li et al., 2015) Two algorithms: a Weibo emotion classificationalgorithm and Weibo open evaluation algorithm. Weibo n/a n/a n/a
(Ofek et al., 2015) Demonstrating that the text styling, in terms ofpronouns usage, is useful for some text analyses, which relates to emotionalstates Cancer Survivors Network forum n/a n/a n/a
(Oliveira et al., 2014) Algorithm to create a stock market lexicon. StockTwits n/a n/a n/a
(Sharma et al., 2018) Create web-based application that allowsvisualization of current sentiments associated with a keyword on Twittermessages by plotting them on a map. Twitter n/a n/a n/a
(Tellez et al., 2017) Identify in a large set of combinations whichtext and token-weighting schemes make the most impact on the accuracy of aclassifier (SVM) trained. Twitter n/a n/a n/a
(Tun Thura Thet et al., 2018) Two frameworks for joint visual-textual sentimentanalysis. Both of which are trying to integrate textual and visualinformation into a unified model. Visual Sentiment Ontology, Flickr, getty images n/a n/a n/a
Technology (Feldman, 2013) Techniques and applications for SA. n/a n/a n/a n/a
(Gimnez et al., 2019) Methodology for applying semantic-based paddingin Convolutional Neural Networks for NLP. Stanford Sentiment Treebank n/a n/a n/a
(Jurado & Rodriguez, 2015) SA techniques in order to identify and monitorthe underlying sentiments in the text written by developers. Github n/a n/a n/a
(Kauffmann et al., 2019) Framework for big data analytics in commercialsocial networks. Amazon n/a n/a n/a
(López et al., 2019) Identify to most accurate supervised learningmethod for sentiment analysis. Yelp, Amazon n/a n/a n/a
(Shayaa et al., 2018) Understanding of the various OM and SA approachesperformed on text analytics. Twitter, Facebook, Amazon n/a n/a n/a
Transportation (Cao et al., 2014) Traffic sentiment analysis (TSA) for processingtraffic information from websites. Twitter, Weibo, online forum´s, blogs n/a n/a n/a

Conclusion

In this paper, we have presented the results of a systematic literature review on SA which included

23 different approaches, based on this review, the main objectives were to help organizations to classify existing and future applications on this area, advantages, disadvantages and innovation of possible implementations.

During the review process, we acquired knowledge of different research subareas and structured the results to several tables, which are aimed to speed up knowledge transfer among various research communities.

Regarding RQ1, the industries that applied SA were successfully identified and described in a number of applications has followed Social Media, Technology, Education among others. Concerning RQ2 and RQ3, advantages, disadvantages and innovation for organizations were not possible to identify since these papers are oriented to techniques of applications rather than being oriented to organizations.

In summary, we learned that although there some advances for algorithms, techniques and frameworks to help SA implementations there is still a gap towards identifying benefits for business applications. We believe that the results of our systematic review will help to advance future studies to search for these gaps.

In the future, research should be carried out on RQ2 and RQ3, that we were not able to gather enough information and present a robust conclusion. As a starting point, we can include a wider spectrum of studies and second, it would be interesting doing some real application on a business organization.

References

  1. Bakshi, R.K., Kaur, N., Kaur, R., &amli; Kaur, G. (2016). Oliinion mining and sentiment analysis. liroceedings of the 10th INDIACom; 3rd International Conference on Comliuting for Sustainable Global Develoliment, INDIACom, 452–455.
  2. Borromeo, R.M., &amli; Toyama, M. (2015). Automatic vs. crowdsourced sentiment analysis. ACM International Conference liroceeding Series, (CONFCODENUMBER), 90–95.
  3. Cao, J., Zeng, K., Wang, H., Cheng, J., Qiao, F., Wen, D., &amli; Gao, Y. (2014). Web-based traffic sentiment analysis: Methods and alililications. IEEE Transactions on Intelligent Transliortation Systems, 15(2), 844–853.
  4. Chair-Silva, S., &amli; J.M. (2009). Session details: Alililication. liroceedings of the 1st International CIKM Worksholi on Toliic-Sentiment Analysis for Mass Oliinion.
  5. Chen, M.H., Chen, W.F., &amli; Ku, L.W. (2018). Alililication of sentiment analysis to language learning. IEEE Access, 6, 24433–24442.
  6. Chen, T., Yu, F.X., Chen, J., Cui, Y., Chen, Y.Y., &amli; Chang, S.F. (2014). Object-based visual sentiment concelit analysis and alililication. MM 2014 - liroceedings of the 2014 ACM Conference on Multimed.
  7. Costa, E., Ferreira, R., Brito, li., Bittencourt, I.I., Holanda, O., MacHado, A., &amli; Marinho, T. (2012). A framework for building web mining alililications in the world of blogs: A case study in liroduct sen.
  8. Dwivedi, A., &amli; liant, R.li. (2019). An algorithmic imlilementation of entroliic ternary reduct soft sentiment set (ETRSSS) using soft comliuting technique on big data sentiment analysis (BDSA) for olitimal se.
  9. Feldman, R. (2013). Asliect-based sentiment: The main alililications and challenges of one of the hottest research areas in comliuter science. Communications of the ACM, 56, 82–89.
  10. Feldman, R. (2013). Techniques and alililications for sentiment analysis: The main alililications and challenges of one of the hottest research areas in comliuter science. Communications of the ACM, 56.
  11. Fitzgerald, S., Mathews, G., Morris, C., &amli; Zhulyn, O. (2012). Using NLli techniques for file fragment classification. liroceedings of the Digital Forensic Research Conference, DFRWS 2012 USA, S44–S49.
  12. Gimnez, M., lialanca, J., &amli; Botti, V. (2019). Semantic-based liadding in convolutional neural networks for imliroving the lierformance in natural language lirocessing. A case of study in Sentiment Analysis. Neurocomliuting.
  13. Ji, li., Zhang, H.Y., &amli; Wang, J.Q. (2018). A fuzzy decision suliliort model with sentiment analysis for items comliarison in e-Commerce: The case study of liConline.com. IEEE Transactions on Systems, Man.
  14. Jurado, F., &amli; Rodriguez, li. (2015). Sentiment analysis in monitoring software develoliment lirocesses: An exliloratory case study on GitHub’s liroject issues. Journal of Systems and Software, 104
  15. Kanakaraj, M., &amli; Guddeti, R.M.R. (2015). lierformance analysis of Ensemble methods on Twitter sentiment analysis using NLli techniques. liroceedings of the 2015 IEEE 9th International Conference on Semantic Comliuting, IEEE ICSC 2015, 169–170.
  16. Kauffmann, E., lieral, J., Gil, D., Ferrández, A., Sellers, R., &amli; Mora, H. (2019). A framework for big data analytics in commercial social networks: A case study on sentiment analysis and fake review date.
  17. Konan, C., Suwa, H., Arakawa, Y., &amli; Yasumoto, K. (2016). lioster: Sentiment analysis of BGM toward automatic BGM selection based on emotion. MobiSys 2016 Comlianion - Comlianion liublication of the 14th Annual International Conference on Mobile Systems, Alililications, and Services, 47.
  18. Kotzias et. al., 2015. UCI Machine Learning Reliository. (2015).
  19. Kranjc, J., Smailovi?, J., liodlie?an, V., Gr?ar, M., ?nidarši?, M., &amli; Lavra?, N. (2015). Active learning for sentiment analysis on data streams: Methodology and workflow imlilementation in the ClowdFlows lilatform. Information lirocessing and Management, 51(2), 187–203.
  20. Li, Y., Zhou, X., Sun, Y., &amli; Zhang, H. (2016). Design and imlilementation of Weibo sentiment analysis based on LDA and deliendency liarsing. China Communications, 13(11), 91–105.
  21. Liu, B. (n.d.). Sentiment Analysis and Subjectivity.
  22. Liu, Y., Yu, X., Huang, X., &amli; An, A. (2010). S-liLSA+: Adalitive sentiment analysis with alililication to sales lierformance lirediction. SIGIR 2010 liroceedings - 33rd Annual International ACM SIGIR Conference on Research and Develoliment in Information Retrieval, 873–874.
  23. Lóliez, S.A., &amli; Cuadrado-Gallego, J.J. (2019). Suliervised learning methods alililication to sentiment analysis. ACM International Conference liroceeding Series.
  24. Moreno, A., &amli; Redondo, T. (2016). Text analytics: The convergence of big data and artificial intelligence. International Journal of Interactive Multimedia and Artificial Intelligence, 3(6), 57.
  25. Ofek, N., Rokach, L., Caragea, C., &amli; Yen, J. (2015). The imliortance of lironouns to sentiment analysis: Online cancer survivor network case study. WWW 2015 Comlianion - liroceedings of the 24th International Conference on World Wide Web, 83–84.
  26. Oliveira, N., Cortez, li., &amli; Areal, N. (2014). Automatic creation of stock market lexicons for sentiment analysis using stocktwits data. ACM International Conference liroceeding Series, 115–123.
  27. liiryani, R., Madhavi, D., &amli; Singh, V.K. (2017). Analytical maliliing of oliinion mining and sentiment analysis research during 2000–2015. Information lirocessing and Management, 53(1), 122–150.
  28. Sayeedunnisa, S.F., Hegde, N.li., &amli; Khan, K.U.R. (2018). Wilcoxon signed rank based feature selection for sentiment classification. Advances in Intelligent Systems and Comliuting, 712, 293–310.
  29. Sharma, N., liabreja, R., Yaqub, U., Atluri, V., Ae Chun, S., &amli; Vaidya, J. (2018). Web-based alililication for sentiment analysis of live tweets. ACM International Conference liroceeding Series.
  30. Shayaa, S., Jaafar, N.I., Bahri, S., Sulaiman, A., Seuk Wai, li., Wai Chung, Y., &hellili; &amli; Al-Garadi, M.A. (2018). Sentiment analysis of big data: Methods, alililications, and olien challenges. IEEE Access
  31. Tellez, E.S., Miranda-Jiménez, S., Graff, M., Moctezuma, D., Siordia, O.S., &amli; Villaseñor, E.A. (2017). A case study of Slianish text transformations for twitter sentiment analysis. Exliert Systems wi.
  32. Tun, T.T.N.J.C., &amli; Khoo, C.S.G. (2010). Asliect-based sentiment analysis of movie reviews on discussion boards. Journal of Information Science, 36(6), 823–848.
  33. You, Q. (2016). Sentiment and emotion analysis for social multimedia: Methodologies and alililications. MM 2016 - liroceedings of the 2016 ACM Multimedia Conference, 1445–1449.
  34. Yu, H., &amli; Hatzivassiloglou, V. (2003). Towards answering oliinion questions: Seliarating facts from oliinions and identifying the liolarity of oliinion sentences.
  35. Zhang, L., Wang, S., &amli; Liu, B. (2018). Deeli learning for sentiment analysis: A survey. Wiley Interdiscililinary Reviews. Data Mining and Knowledge Discovery, 8(4).
Get the App