52 lines
1.1 KiB
Plaintext
52 lines
1.1 KiB
Plaintext
|
# German special characters are replaced:
|
||
|
häufig;haufig
|
||
|
üor;uor
|
||
|
björk;bjork
|
||
|
|
||
|
# here the stemmer works okay, it maps related words to the same stem:
|
||
|
abschließen;abschliess
|
||
|
abschließender;abschliess
|
||
|
abschließendes;abschliess
|
||
|
abschließenden;abschliess
|
||
|
|
||
|
Tisch;tisch
|
||
|
Tische;tisch
|
||
|
Tischen;tisch
|
||
|
geheimtür;geheimtur
|
||
|
|
||
|
Haus;hau
|
||
|
Hauses;hau
|
||
|
Häuser;hau
|
||
|
Häusern;hau
|
||
|
# here's a case where overstemming occurs, i.e. a word is
|
||
|
# mapped to the same stem as unrelated words:
|
||
|
hauen;hau
|
||
|
|
||
|
# here's a case where understemming occurs, i.e. two related words
|
||
|
# are not mapped to the same stem. This is the case with basically
|
||
|
# all irregular forms:
|
||
|
Drama;drama
|
||
|
Dramen;dram
|
||
|
|
||
|
# replace "ß" with 'ss':
|
||
|
Ausmaß;ausmass
|
||
|
|
||
|
# fake words to test if suffixes are cut off:
|
||
|
xxxxxe;xxxxx
|
||
|
xxxxxs;xxxxx
|
||
|
xxxxxn;xxxxx
|
||
|
xxxxxt;xxxxx
|
||
|
xxxxxem;xxxxx
|
||
|
xxxxxer;xxxxx
|
||
|
xxxxxnd;xxxxx
|
||
|
# the suffixes are also removed when combined:
|
||
|
xxxxxetende;xxxxx
|
||
|
|
||
|
# words that are shorter than four charcters are not changed:
|
||
|
xxe;xxe
|
||
|
# -em and -er are not removed from words shorter than five characters:
|
||
|
xxem;xxem
|
||
|
xxer;xxer
|
||
|
# -nd is not removed from words shorter than six characters:
|
||
|
xxxnd;xxxnd
|