تجاوز إلى المحتوى الرئيسي
User Image

Nora S. AlTwairesh

Assistant Professor

Head, Information Technology Department

علوم الحاسب والمعلومات
KSU Female Campus - Building 6 T121
المنشورات
ورقة مؤتمر
2018

Suar: Towards building a corpus for the Saudi dialect

This paper presents the preliminary results of the construction of a morphologically annotated corpus for the Saudi dialect. We call the corpus SUAR (SaUdi corpus for NLP Applications and Resources). The corpus consists of around 104,079 words collected from different online sources. The linguistic features of the Saudi dialect are elaborated and compared with Modern Standard Arabic and other Arabic dialects. This paper conducts a pilot study to explore possible directions to facilitate the morphological annotation of the Saudi corpus. The corpus was automatically annotated using the MADAMIRA tool, after which it was manually inspected to validate the resulting analysis.

موقع المؤتمر
Dubai,UAE
اسم المؤتمر
The 4th International Conference on Arabic Computational Linguistics (ACLing 2018)
مزيد من المنشورات
publications

The field of natural language processing (NLP) has witnessed a boom in language representation models with the introduction of pretrained language models that are trained on massive textual data…

publications

Approaches for developing Dialogue Systems (DSs) are typically categorized into rule-based and data-driven. Data-driven DSs require a massive quantity of training data, while rule-based DSs rely…

2021
publications

Over the past few years, Twitter has experienced massive growth and the volume of its online content has increased rapidly. This content has been a rich source for several studies that focused on…

2021