Suar: Towards building a corpus for the Saudi dialect

Conference Paper
Conference Name: 
The 4th International Conference on Arabic Computational Linguistics (ACLing 2018)
Conference Location: 
Conference Date: 
Saturday, November 17, 2018
Publication Abstract: 

This paper presents the preliminary results of the construction of a morphologically annotated corpus for the Saudi dialect. We call the corpus SUAR (SaUdi corpus for NLP Applications and Resources). The corpus consists of around 104,079 words collected from different online sources. The linguistic features of the Saudi dialect are elaborated and compared with Modern Standard Arabic and other Arabic dialects. This paper conducts a pilot study to explore possible directions to facilitate the morphological annotation of the Saudi corpus. The corpus was automatically annotated using the MADAMIRA tool, after which it was manually inspected to validate the resulting analysis.

PDF icon suar.pdf0 bytes