Arabic Learners Written Corpus: A Resource for Research and Learning

Project Director: Dr. Samira Farwaneh, University of Arizona

Project Researcher: Mohammed Tamimi, University of Arizona

This project has developed an extensive Arabic learner corpus comprising numerous written samples produced by L2 and heritage students, collected over 15 years of teaching. They were transcribed into a database with cross-referenced categories according to level (beginning, intermediate, advanced), learner (L2 vs. heritage), and genre (description, narration, instruction).  In the first phase of the project, the corpus will serve as a source of empirical data for hypothesis testing as well as a resource for developing materials for teaching Arabic.  It will be made available through the Center for Educational Resources in Culture, Language and Literacy (CERCLL) and the Center for Middle Eastern Studies (CMES) websites and will be offered to the Linguistic Data Consortium (LDC) for dissemination nationally. The second phase of the project will take place in the Fall of 2010/2011, and will involve tagging of morphological, syntactic and orthographic errors and the characteristics of each level.