SMOG calculation discrepancies
Hi,
Text: "June 23rd, 2015 How Cigna deal limits Anthem’s Blue Cross brand “When health plans operate using the Blue Cross and Blue Shield brand, they are generally limited to business in a specific state or region as part of a licensing agreement with their trade group, the Blue Cross and Blue Shield Association. So when Anthem (ANTM), a major operator of Blue Cross plans, made its $184-a-share offer for Cigna (CI) to grow both health insurance businesses, it created potential hurdles when it comes to Anthem’s valuable Blue Cross brands expanding."
On readability-score.com I'm getting value of 15.2 for SMOG, but with $textStatistics->smogIndex($input) only 9.4. This is big difference. Am I doing something wrong?
https://travis-ci.org/DaveChild/Text-Statistics
It appears SMOG is broken. Can anyone confirm this?
Yes. There seem to be many issues here.
- The SMOG value here is always "normalized" (ie clamped) to the range [0, 12]. With that enabled you can never get that 15.2.
public $normalise = false;
- The SMOG formula is implemented wrong. It is taking the square root of the sum and lastly multiplies, but actually the order should be: square root, multiplication and then the sum.
Maths::bcCalc(
Maths::bcCalc(
Maths::bcCalc(
Syllables::wordsWithThreeSyllables($strText, true, $this->strEncoding),
'*',
Maths::bcCalc(
30,
'/',
Text::sentenceCount($strText, $this->strEncoding)
)
),
'sqrt',
0
),
'*',
1.043
),
'+',
3.1291
);
- When the input text is cleaned it is utf8_decoded. However, if you have an ASCII text, then some symbols get converted to "?" signs and those will be interpreted as terminators. So in your example text there are 2 sentences, but the script finds 5.
//$strText = utf8_decode($strText);
- I'm not sure, but I also removed all the words that contain numbers. I dunno. It didn't make sense to me to count "23rd" or "$184-a-share" as words.
$strText = preg_replace('/([^\.\s]*[0-9][^\.\s]*)/', '', $strText); // Remove words with numbers
$strText = preg_replace('/\'/', '', $strText); // Remove ' symbol, dunno if helps.
$strText = preg_replace('` `', ' ', $strText); // Remove double spaces (because for some reason you calculate words based on number of spaces)
Now, I don't have an account on readability-score.com, but I tried with other online calculators:
| Online-Utility | LearningAndWork | StoryToolz | Current TS | Improved TS | |
|---|---|---|---|---|---|
| characters | 437 | - | 436 | 427 | 425 |
| words | 94 | 92 | 94 | 94 | 92 |
| poly-words | - | 14 | - | 13 | 13 |
| sentences | 2 | 2 | 2 | 5 | 2 |
| syl. per word | 1.48 | - | 1.38 | 1.44 | 1.46 |
| ARI | 23.97 | - | 23.9 | 9.4 | 23.3 |
| Gunning-F | 23.06 | - | 23.9 | 12.6 | 23.6 |
| Flesch-K | 20.19 | - | 19.1 | 8.7 | 19.5 |
| Coleman-L | 10.94 | - | 10.8 | 10.9 | 11.4 |
| SMOG | 16.96 | 23.2 | 16.4 | 9.4 | 17.7 |
I also tried with my own test text, which is a bit longer.
| Online-Utility | LearningAndWork | StoryToolz | Current TS | Improved TS | |
|---|---|---|---|---|---|
| characters | 2919 | - | 2924 | 2899 | 2890 |
| words | 604 | 604 | 592 | 604 | 585 |
| poly-words | - | 90 | - | 108 | 108 |
| sentences | 32 | 32 | 32 | 32 | 32 |
| syl. per word | 1.65 | - | 1.58 | 1.64 | 1.68 |
| ARI | 10.77 | - | 11.1 | 10.6 | 11 |
| Gunning-F | 12.32 | - | 13.4 | 14 | 13.9 |
| Flesch-K | 11.2 | - | 10.3 | 11.2 | 11.4 |
| Coleman-L | 11.08 | - | 11.6 | 12 | 13.3 |
| SMOG | 12.8 | 17.7 | 12.1 | 10.7 | 13.6 |
But, yeah, there still seem to be problems. For example now the Coleman-Liau index went up compared to the other calculators.