At its most basic, an online dictionary is simply a search form that checks if a user-provided word is in a database. If the word is there, the website displays whatever information the database has about the word (word meaning, regional origin, part of speech, sample sentence usage, etc.).
But the syntax of Waray makes word searches more complicated. Unlike English, where word roots are modified primarily through suffixes (ex. buy --> buys, buying), Waray also uses prefixes and infixes (ex. palit can take the form ginpalit iginpalit ipalit ipinalit makapalit pagpalit paliton papaliton pumalit, etc.).
This creates challenges for a dictionary-maker: a user might type "pumalit" or "pagpalit" or "napalit" or another variation to find the word root "palit". The online dictionary needs to know that any of these entries refer to the same word. It needs to know how to find the root within whatever word a user types.
We started by defining common affixes in Waray (see code, right). If a word starts with "mag", "pag", "na", "um", etc. (ex. napalit), the program simply strips these from the beginning of the word. If "um" or "in" directly follows the first letter (ex., pinalit), these infixes are also removed. If words end with "a" or "i" (ex., palita), they are removed.
$root = $search;
// DEFINE THE COMMON PREFIXES, SUFFIXES, INFIXES
$prefixfour = array('igin');
$prefixthree = array('nag', 'gin', 'pag', 'mag', 'tag');
$prefixtwo = array('ma', 'na', 'ka', 'pa');
$infix = array('um', 'in');
$suffix = array('a','i');
// CHECK FOR PREFIXES; IF FOUND, REMOVE THEM FROM THE WORD
$firstfour = substr($root,0,4);
if (in_array($firstfour,$prefixfour))
{ $root = substr($root,4); }
$firstthree = substr($root,0,3);
if (in_array($firstthree,$prefixthree))
{ $root = substr($root,3); }
else{
$firsttwo = substr($root,0,2);
if (in_array($firsttwo,$prefixtwo))
{ $root = substr($root,2); }
}
// CHECKS FOR INFIXES; IF FOUND, REMOVE THEM FROM THE WORD
$infixb = substr($root,1,2);
$infixa = substr($root,0,2);
if (in_array($infixa,$infix))
{ $root = substr($root,2);}
if (in_array($infixb,$infix)) {
$start = substr($root,0,1);
$end = substr($root,3);
$root = $start.$end;
}
// CHECK FOR SUFFIXES; IF FOUND, REMOVE THEM FROM THE WORD
$suffixtest = substr($root,-1);
if (in_array($suffixtest,$suffix))
{ $root = substr($root,0,-1); }
// CHECK FOR TENSE: IF THERE ARE DOUBLED SYLLABLES, (ex. nagTI-TI-kang), REMOVE THE FIRST
$first = substr($root,0,2);
$second = substr($root, 2,2);
if ($first == $second) { $root = substr($root, 2); }
// SEARCH THE DICTIONARY FOR ANY WORDS THAT CONTAIN THE ROOT
$sql = " SELECT word FROM frequency";
$result = mysql_query($sql)
or die(mysql_error());
$list = array();
while ($row = mysql_fetch_array($result))
{ extract($row); $list[] = $word; }
foreach($list as $needle)
{
$pos = strpos($needle, $root);
if ($pos !== false)
{ echo ". $needle ."">". $needle ." ";}
// INSERT INFIXES, (EX. palit BECOMES pumalit & pinalit) AND SEARCH FOR MATCHES
$firstletter = substr($root,0,1);
$therest = substr($root,1);
$um = "um";
$in = "in";
$modroot = $firstletter.$um.$therest;
$pos = strpos($needle, $modroot);
if ($pos !== false)
{ echo ". $needle ."">". $needle ." ";}
$modroot = $firstletter.$in.$therest;
$pos = strpos($needle, $modroot);
if ($pos !== false)
{ echo ". $needle ."">". $needle ." ";}
}
echo "";
?>
Copyright 2012, by Mark Fullmer & Panrehiyong Sentro sa Wikang Filipino-R8, Leyte Normal University