You may remember that, some time ago, I mentioned a statistical spam filter. Yesterday, or the day before, I modified my old spam filter so it would keep a copy of messages, sorted into 'spam' or 'not spam' categories. Today I wrote code to take those old messages and perform the appropriate statistic-collection, and also the code to compare new messages against the statistics so collected. With the current small sample size of 85 real messages and 34 spams, it was able to subsequently go through those 119 messages and successfully identify 83 of the real messages as real and 32 of the spams as spam. One of the two unidentified spams was blank, the other was unusually well written, and would still be filtered once there's a larger sample. Of the two real messages, one of them actually was a spam that had just ended up in the wrong folder, impressively, and the other was a Yahoogroup message that I wouldn't have minded missing anyway. Given such results from a small sample, I look forward to seeing what transpires with a proper-sized sample.
[10:37]
|