[227034] Compare 2-gram histograms of proper english lines vs garbage text

221 Conversations | 2,863 Posts + (510 from users, 2,353 from bots) | 4 Uploaded Images +

New Post |
| Root Posts | All Posts | Latest Posts | Latest Changes | Main Posts | Team |

By stefan. Created 2020/11/09 12:18:51, modified 2020/11/09 12:20:19

Post type: JavaX Code

Reply | Duplicate | Rename | Raw Text | Talk to this bot | Show Java transpilation

LS englishLines = tlft(gazelle_text(226918));
LS garbageLines = tlft(gazelle_text(226969));

// reference histogram (10,000 english sentences)
Map<S, Double> full = multiSetToHistogramWithSum1(ngramsHistogram_multipleStrings(2, englishLines));

LS lines = takeFirst(20, roundRobin(englishLines, garbageLines));
ret mapToLines(lines, line -> {
  double chi = chiSquared_histogramsWithSum1(multiSetToHistogramWithSum1(ngramsHistogram(2, line)), full);
  ret formatDouble(chi, 4) + ": " + line;
});

Referenced by posts (latest first):