This is a database of various texts in Winaray, a language spoken by approximately 3 million Filipinos on the islands of Leyte and Samar. The database has a variety of purposes, namely, to:
These texts are intended for teaching and research. Instruction in the mother tongue (L1) is part of the new (2012) K-12 education curriculum, so teachers in Leyte and Samar need texts in Waray that are appropriate for their students’ reading level. This project uses readability algorithms to help rank the texts. Furthermore, the new curriculum includes L1 grammar instruction. Currently there are few grammar resources in Waray. This corpus is a starting point for researchers trying to formulate grammar instruction.
The corpus is simply a list of Waray language texts. There are news articles, blogs, poems, essays, stories, and song lyrics. But in addition to these texts, the corpus provides data on the individual passages, as well as the Waray language as a whole. It identifies the following things about each entry:
Every time a new entry is added, the corpus learns more about the Waray language. Each word from the entry is added to a database of most commonly used words. This in turn is used to help determine which texts are easy to read and which are difficult: if a text has many common words, it will be easier to read; if it has many uncommon words, it will be more difficult.
The corpus analyzes texts according to readability. Put simply, readability determines how easy it is to comprehend a text. The basic theory is that longer sentences are harder to read than short sentences and longer words are harder to comprehend than short ones. Most readability formulas calculate the average length of sentences and the average number of syllables to give a readability score. This method is highly accurate. For example, the Flesch-Kincaid Readability Index accurate predicts comprehension level to 0.91 when compared to comprehension tests. This is the Flesch Reading Ease formula:
Score = 206.835 - (1.015 × ASL) - (84.6 × ASW)
Where: ASL = average sentence length (number of words divided by number of sentences)
ASW = average word length in syllables (number of syllables divided by number of words)
An alternative way to calculate readability that is also highly correlated to comprehension is to analyze the frequency of "hard" or "easy" words in a text. One common method is the Dale-Chall formula, described below:
Raw Score = 0.1579*(PDW) + 0.0496*(ASL) + 3.6365
Raw Score = uncorrected reading grade of a student who can answer one-half of the test questions on a passage.
PDW = Percentage of Difficult Words not on the Dale–Chall word list.
Readability algorithms are highly accurate measures of comprehension level, but the original formulas were developed for English, an Anglo-Saxon language with many single-syllable words. In contrast, Waray-Waray has very few single syllable words. Almost no nouns, verbs, or adjectives are monosyllablic. Waray-Waray also has very few prepositions, in comparison to the many monosyllabic prepositions in English. In short, Waray-Waray is polysyllabic, and furthermore, its grammar is agglutinative. Prefixes and suffixes change parts of speech. Other parts of speech are formed through affixes:
Therefore, the original readability formulas would categorize Waray language texts as much more difficult to comprehend, simply because they have more syllables.
Furthermore, syllabification is different in Waray than in English. Vowels are never combined into one syllable (in English, "too" is one syllable; in Waray, "tuod" is two syllables).
We therefore made a readability formula tailored for the Waray language: (a) sentence length and (b) frequency of common words determine readabilty; syllable length is disregarded. This criteria is the same as the Dale-Chall formula cited above, and can be referred to as the Modified Dale-Chall Waray Readability Formula.
A second challenge: Waray is predominantly an oral language. There is no standarized orthography. A word might be spelled "diritso" or "derecho", "diri" or "dire", "damo" or "damu". The corpus project therefore checks words with multiple spellings and consolidates them according to guidelines created by Voltaire Oyzon, et al. (2011) of Leyte Normal University in Tacloban City and Ricardo Ma. D. Nolasco of UP Diliman.
A third challenge: like any language, Waray has regional dialects, which means different vocabulary is used in different locations, although the grammar is basically the same.
de Veyra, V. I. Ortograpiya han Binisaya. (A. K. de Veyra, Trans.). In Luangco, G.C. (Ed.), Kandabao: Essays on Waray language, literature, and culture. Tacloban City, PH: Divine Word University Press
Godin, E.S. 2007. “Mga Batakan Sa Panitik Sa Binisaya-Sinugboanon,” gipatik alang sa Pasinati sa Panitik ug Batadila sa Binisaya-Sinugboanon MSU-IIT, Iligan City Peb. 22-23, 2007 tinambayayongan sa Komisyon sa Wikang Filipino ug BATHALAD-Mindanao
Lobel, J. W. 2009. Samar-Leyte. In Brown, K. and Ogilvie, S. Concise encyclopedia of languages of the world. (pp. 914-916) Oxford, UK: Elsevier, Ltd.
Luangco, G. C. (Ed.) 1982. Kandabao: Essays on Waray language, literature, and culture. Tacloban City, PH: Divine Word University Press.
Makabenta, Eduardo A. (2004). "Binisaya-English; English-Binisaya Dictionary",Adbox: Quezon City, Philippines.
Romualdez, N. L. Orthography and Prosody. In Luangco, G.C. (Ed.), Kandabao: Essays on Waray language, literature, and culture. Tacloban City, PH: Divine Word University Press.
Rubino, C. 2001. Waray Waray. In Garry, J. and Rubino, C. Facts about the world’s languages: An encyclopedia of the world’s major languages, past and present. New York, NY: Wilson Press
Tramp, G.D. (1997).Waray-English Dictionary. Dunwoody Press: Maryland , USA
The UCLA Phonetic Lab Archives. 2007. Retrieved from http://archive.phonetics.ucla.edu/
Wolff, J. U. 1968. The Historical Development of the Leyte-Samar Bisayan Vowel System. Leyte-Samar Studies 1 (1), 19-25
Copyright 2012, by Mark Fullmer & Panrehiyong Sentro sa Wikang Filipino-R8, Leyte Normal University