Abstract & Bib

J. Goodman, W. Yih

Online Discriminative Spam Filter Training

CEAS-06

We describe a very simple technique for discriminatively training a spam filter. Our results on the TREC Enron spam corpus would have been the best for the Ham at .1% measure, and second best by the 1-ROCA measure. For the Mr. X corpus, our 1-ROCA measure was a close second best, and third best by the Ham at .1% measure. We use a very simple feature extractor (all words in the subject and headers). Our learning algorithm is also very simple: gradient descent of a logistic regression model.
 
@InProceedings{GoodmanYi06,
 author = {J. Goodman and W. Yih},
 title = {Online Discriminative Spam Filter Training},
 booktitle = {Proceedings of the 3rd Conference on Email and Anti-Spam},
 year = {2006}
}

[Home] [Education] [Experience] [Publications] [Presentations] [Demos] [Services] [Links]