AraSenTi-Tweet: A Corpus for Arabic Sentiment Analysis of Saudi Tweets

Arabic Sentiment Analysis is an active research area these days. However, the Arabic language still lacks sufficient language resources to enable the tasks of sentiment analysis. In this paper, we present the details of collecting and constructing a large dataset of Arabic tweets. The techniques used in cleaning and pre-processing the collected dataset are explained. A corpus of Arabic tweets annotated for sentiment analysis was extracted from this dataset. The corpus consists mainly of tweets written in Modern Standard Arabic and the Saudi dialect. The corpus was manually annotated for sentiment. The annotation process is explained in detail and the challenges during the annotation are highlighted. The corpus contains 17,573 tweets labelled with four labels for sentiment: positive, negative, neutral and mixed. Baseline experiments were conducted to provide benchmark results for future work

موقع المؤتمر

Dubai,UAE

اسم المؤتمر

3rd International Conference on Arabic Computational Linguistics, ACLing

مزيد من المنشورات

The Evolution of Language Models Applied to Emotion Analysis of Arabic Tweets

The field of natural language processing (NLP) has witnessed a boom in language representation models with the introduction of pretrained language models that are trained on massive textual data…

Building an Arabic Flight Booking Dialogue System Using a Hybrid Rule-Based and Data Driven Approach

Approaches for developing Dialogue Systems (DSs) are typically categorized into rule-based and data-driven. Data-driven DSs require a massive quantity of training data, while rule-based DSs rely…

2021

Part-of-speech tagging for Arabic tweets using CRF and Bi-LSTM

Over the past few years, Twitter has experienced massive growth and the volume of its online content has increased rapidly. This content has been a rich source for several studies that focused on…

2021

Nora S. AlTwairesh

AraSenTi-Tweet: A Corpus for Arabic Sentiment Analysis of Saudi Tweets