Surface and Deep Features Ensemble for Sentiment Analysis of Arabic Tweets

Sentiment analysis (SA) of Arabic tweets is a complex task due to the rich morphology of the Arabic language and the informal nature of language on Twitter. Previous research on the SA of tweets mainly focused on manually extracting features from the text. Recently, neural word embeddings have been utilized as less labor-intensive representations than manual feature engineering. Most of these word-embeddings model the syntactic information of words while ignoring the sentiment context. In this paper, we propose to learn sentiment-specific word embeddings from Arabic tweets and use them in the Arabic Twitter sentiment classification. Moreover, we propose a feature ensemble model of surface and deep features. The surface features are manually extracted features, and the deep features are generic word embeddings and sentiment-specific word embeddings. The extensive experiments are performed to test the effectiveness of the surface and deep features ensemble, pooling functions, embeddings size, and cross-dataset models. The recent language representation model BERT is also evaluated on the task of SA of Arabic tweets. The models are evaluated on three different datasets of Arabic tweets, and they outperform the previous results on all these datasets with a significant increase in the F-score. The experimental results demonstrate that: 1) the highest performing model is the ensemble of surface and deep features and 2) the approach achieves the state-of-the-art results on several benchmarking datasets.

رقم المجلد

مجلة/صحيفة

https://ieeexplore.ieee.org/abstract/document/8743359

الصفحات

84122 - 84131

مزيد من المنشورات

The Evolution of Language Models Applied to Emotion Analysis of Arabic Tweets

The field of natural language processing (NLP) has witnessed a boom in language representation models with the introduction of pretrained language models that are trained on massive textual data…

Building an Arabic Flight Booking Dialogue System Using a Hybrid Rule-Based and Data Driven Approach

Approaches for developing Dialogue Systems (DSs) are typically categorized into rule-based and data-driven. Data-driven DSs require a massive quantity of training data, while rule-based DSs rely…

2021

Part-of-speech tagging for Arabic tweets using CRF and Bi-LSTM

Over the past few years, Twitter has experienced massive growth and the volume of its online content has increased rapidly. This content has been a rich source for several studies that focused on…

2021

Nora S. AlTwairesh

Surface and Deep Features Ensemble for Sentiment Analysis of Arabic Tweets