EPSRC Reference: |
EP/K015206/1 |
Title: |
Natural Language Processing Working Together with Arabic and Islamic Studies |
Principal Investigator: |
Atwell, Professor ES |
Other Investigators: |
|
Researcher Co-Investigators: |
|
Project Partners: |
|
Department: |
Sch of Computing |
Organisation: |
University of Leeds |
Scheme: |
Standard Research |
Starts: |
01 April 2013 |
Ends: |
30 September 2015 |
Value (£): |
336,632
|
EPSRC Research Topic Classifications: |
Artificial Intelligence |
Comput./Corpus Linguistics |
Human Communication in ICT |
|
|
EPSRC Industrial Sector Classifications: |
No relevance to Underpinning Sectors |
|
|
Related Grants: |
|
Panel History: |
Panel Date | Panel Name | Outcome |
06 Nov 2012
|
*ICT*
|
Announced
|
|
Summary on Grant Application Form |
Summary
This is an interdisciplinary project which addresses the ICT call of "working together" by aligning ICT expertise and research interests from Computational and Corpus Linguistics, with Humanities research streams in Arabic and Islamic Studies, focusing on the Qur'an as a core text. It is also an international collaboration between the Universities of Leeds and Jordan, and further addresses the "working together" call via incoming and outgoing mobility in the form of Visiting Researcher placements in the School of Computing at Leeds (incoming) and the Centre for the Study of Islam in the Contemporary World at Jordan (outgoing). This agreement is proactive and novel, and has high impact, ensuring knowledge transfer from different methodological perspectives and cultures.
The study of Tajwid or Qur'anic recitation is a sub-field and taught module* in Islamic Studies programmes at both universities and elsewhere, and the original insight informing this project is to view Tajwid mark-up in the Qur'an as additional text-based data for computational analysis. This mark-up is already incorporated into Qur'anic Arabic script, and identifies prosodic-syntactic phrase boundaries of different strengths, plus gradations of prosodic and semantic salience through colour-coded highlighting of pitch accented syllables, and hence prosodically and semantically salient words.
The Computational Linguistics Module in Year 1 entails development and evaluation of software for generating a phonetically-transcribed, stressed and syllabified version of the entire text of the Qur'an, using the International Phonetic Alphabet (IPA). This canonical pronunciation tier for Classical Arabic will be informed and evaluated by Arabic linguists, Tajwid scholars, and phoneticians, and published in an updated version of the open-source Boundary-Annotated Qur'an Corpus [1], [2], preferably for LREC2 2014. The software will also be re-usable for Natural Language Engineering applications for Modern Standard Arabic, and for constructing dictionaries for Arabic language learners.
The Text Analytics Module in Year 2 implements statistical techniques such as keyword extraction3 to explore semiotic relationships between sound and meaning in the Qur'an, invoking a Saussurean-type view of the sign as '...a bi-unity of expression and content...' [5]. Our investigation entails: (i) text data mining for statistically significant phonemes, syllables, words, and correlates of rhythmic juncture [6], [7]; and (ii) interpretation of results from interdisciplinary perspectives: Corpus Linguistics (ICT); Tajwid science, plus Tafsir or Qur'anic exegesis (Islamic Studies); Arabic (Language and Literature); and Phonetics and Phonology (Linguistics).
In terms of ICT applications, the team will collaborate with stakeholders and beneficiaries to develop an associated or follow-on funding proposal for the UK Research Councils, to include publication of project software as an advanced corpus-query and visualization tool for Islamic Studies and Humanities scholars, plus Arabic language learners. This again represents an extension of the "working together" theme.
Finally, our approach is interdisciplinary and pioneers stylistic analysis of sound and rhythm encoded in writing as a semiotic system for religious and other literary texts. As such it is entirely novel and has direct implications for research-led teaching in both partner institutions plus a broad cross-section of research groups and user communities, namely: Natural Language Processing and Artificial Intelligence; Qur'anic and Islamic Studies; Arabic Language and Literature; Linguistics and Phonetics; Digital Humanities; and Psychology.
All references appear in Case for Support.
|
Key Findings |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Potential use in non-academic contexts |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Impacts |
Description |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk |
Summary |
|
Date Materialised |
|
|
Sectors submitted by the Researcher |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Project URL: |
|
Further Information: |
|
Organisation Website: |
http://www.leeds.ac.uk |