readability
readability copied to clipboard
FOG accuracy??
Hi Tyler,
I have not had a chance to run tests on the perl code, but working in R today I compared the FOG readability scores from the koRpus and the readability packages for some short strings. For some cases the FOG scores were nearly identical. However, when you introduce numbers or non-sense words, then the Fog scores begin to diverge. If you have any thoughts as to why, I would be very interested in them.
Thanks, Steve
OUTPUT IS THE SAME
> detach("package:readability", unload=TRUE)
> library(koRpus)
> test <- "this is a test with numbers this is a beautiful day and I am adding text but not numbers what will this do"
> tagged.txt <- treetag(test,treetagger="manual",lang="en",format="obj",encoding="UTF-8",TT.options=list(path="/path/TrWarning message:="en"))
Can't find the lexicon file, hence omitted! Please ensure this path is valid:
/path/TreeTagger/lib/english-lexicon.txt
> readability(tagged.txt,index="FOG")
Hyphenation (language: en)
|======================================================================| 100%
Gunning Frequency of Gobbledygook (FOG)
Parameters: default
Grade: 10.94
Text language: en
Warning message:
Text is relatively short (<100 tokens), results are probably not reliable!
> detach("package:koRpus", unload=TRUE)
> library(readability)
> readability(test,NULL)
all Flesch_Kincaid Gunning_Fog_Index Coleman_Liau SMOG
1: all 7.9 10.9 4.4 8.8
Automated_Readability_Index Average_Grade_Level
1: 7.0 7.8
>
DIFFERENT (TEXT INCLUDES NUMBERS)
> detach("package:readability", unload=TRUE)
> library(koRpus)
> test <- "this is a test with numbers 123 1231 1231 2 123 41 2421 what will this do"
> tagged.txt <- treetag(test,treetagger="manual",lang="en",format="obj",encoding="UTF-8",TT.options=list(path="/path/TrWarning message:="en"))
Can't find the lexicon file, hence omitted! Please ensure this path is valid:
/path/TreeTagger/lib/english-lexicon.txt
> readability(tagged.txt,index="FOG")
Hyphenation (language: en)
|======================================================================| 100%
Gunning Frequency of Gobbledygook (FOG)
Parameters: default
Grade: 6.8
Text language: en
Warning message:
Text is relatively short (<100 tokens), results are probably not reliable!
> detach("package:koRpus", unload=TRUE)
> library(readability)
> readability(test,NULL)
all Flesch_Kincaid Gunning_Fog_Index Coleman_Liau SMOG
1: all 1.0 4.0 14.8 3.1
Automated_Readability_Index Average_Grade_Level
1: 10.0 6.6
DIFFERENT (TEXT INCLUDES NONSENSE WORDS)
detach("package:readability", unload=TRUE)
library(koRpus)
test <- "this is a test. with numbers this is a beautiful. day and I am adding text but. not numbers what will this do. ahsd jasdhjkas asjkdhjk sajkasdfsd kjksdaj sdakfjsl asdkjfdsl csaaskllsdfjkl sdadkjfdskl sadjfkldsa dsajlkasreoipiuwefopmcoipeeqwmfler cijreffeqrmfaq fqewkpofremcd qdfwefdopowerjmwef,f"
tagged.txt <- treetag(test,treetagger="manual",lang="en",format="obj",encoding="UTF-8",TT.options=list(path="/path/TreeTagger",preset="en"))
readability(tagged.txt,index="FOG")
Hyphenation (language: en)
Gunning Frequency of Gobbledygook (FOG)
Parameters: default
Grade: 11.27
Text language: en
Warning message:
Text is relatively short (<100 tokens), results are probably not reliable!
> detach("package:koRpus", unload=TRUE)
> library(readability)
> readability(test,NULL)
all Flesch_Kincaid Gunning_Fog_Index Coleman_Liau SMOG
1: all 6.9 8.3 19.6 8.8
Automated_Readability_Index Average_Grade_Level
1: 13.0 11.3