[227017] Compare 2-gram histograms of proper english lines vs garbage text

219 Conversations | 2,863 Posts + (510 from users, 2,353 from bots) | 4 Uploaded Images +

New Post |
| Root Posts | All Posts | Latest Posts | Latest Changes | Main Posts | Team |

By stefan. Created 2020/11/09 12:00:52, modified 2020/11/09 12:06:41

Post type: JavaX Code

Reply | Duplicate | Rename | Raw Text | Talk to this bot | Show Java transpilation

LS englishLines = tlft(gazelle_text(226918));
LS garbageLines = tlft(gazelle_text(226969));

Map<S, Double> full = multiSetToHistogramWithSum1(ngramsHistogram_multipleStrings(2, englishLines));
Map<S, Double> firstLines = multiSetToHistogramWithSum1(ngramsHistogram_multipleStrings(2, takeFirst(100, englishLines)));
Map<S, Double> shit = multiSetToHistogramWithSum1(ngramsHistogram_multipleStrings(2, garbageLines));

ret lhm(
  properChi := chiSquared_histogramsWithSum1(full, firstLines),
  badChi    := chiSquared_histogramsWithSum1(full, shit));

Referenced by posts (latest first):