Proposing a Customised Method for Extratextual Documentative Annotation on Written Text Corpus

language resource extratextual intratextual annotation text corpus metadata header file docu

In this paper, we have made an attempt to portray a perceivable sketch of extratextual documentative annotation which, in the present frame of text annotation, is considered as one of the indispensable processes through which we can add representational information to the texts included in a written corpus. This becomes more important when a corpus is made with a large number of texts obtained from different genres and text types. To develop a workable frame for extratextual annotation, at each stage, we have broadly classified the existing processes of corpus annotation into two broad types. Moreover, we have tried to explain different layers that are embedded with extratextual annotation of texts as well as marked out the applications which can substantially enhance the accessibility of language data from a corpus for the works of text file management, information retrieval, lexical items extraction, and language processing. The techniques that we have proposed and described in this paper are unique in the sense that these are highly useful for expanding the utility of data of a written text corpus beyond the immediate horizons of language processing to the realms of theoretical, descriptive, and applied linguistics. In this paper, we have also argued that we should try to annotate all kinds of written text corpora so far developed in different natural languages at the extratextual level in a uniform manner so that the text samples stored in corpora can be uniformly used for various works of descriptive linguistics, theoretical linguistics, language technology, and applied linguistics including grammar writing, dictionary compilation, and language teaching. The annotation scheme proposed here is applied on a sample Bangla text corpus and we have noted that the accessibility of data and information from this kind of corpus is far easier than that of an un-annotated raw corpus.

نوع عمل المنشور

Article

رقم المجلد

رقم الانشاء

مجلة/صحيفة

International Journal of English Linguistics(indexed in Web of Science)

الصفحات

99-112

مزيد من المنشورات

Underapplication Opacity Beyond the Non-Local Compensatory Lengthening in Modern Colloquial Persian

This research discusses the underapplication opacity, namely counterbleeding, of non-local compensatory lengthening in
Modern Colloquial Persian, a style of informal speech in Iran (mostly…

بواسطة Mufleh Salem M. Alqahtani

2023

تم النشر فى:

Sage Publications

The Avoidance of Association Line Crossing in Prosodic Structure: An Examination of Non-Local Compensatory Lengthening in Colloquial Persian

This study examines how association line crossing in prosodic structure, as well as a bad sonority contour triggered by a glottal approximant in postconsonantal position, is avoided by non-local…

بواسطة Mufleh Salem M. Alqahtani

2023

تم النشر فى:

Yarmouk University

Conformity to the Obligatory Contour Principle and the Strict Layer Hypothesis: The avoidance of initial gemination in Maltese

This research investigates how the avoidance of initial gemination in Maltese is motivated by conformity to the Obligatory Contour Principle (OCP) and the Strict Layer Hypothesis (SLH) in light of…

بواسطة Mufleh Salem M. Alqahtani

2023

تم النشر فى:

Springer Nature

MUFLEH SALEM M. ALQAHTANI

Proposing a Customised Method for Extratextual Documentative Annotation on Written Text Corpus