Spam training for emails

vsajip

In general, my mailboxes have a "SpamTrain" folder to which I move emails that are spam, but haven't been detected as spam. Is there a mechanism whereby your anti-spam machinery can be trained on these spam emails to improve its effectiveness?

sean

vsajip When you have a few hundred spam emails collected, please email our support to let us know where they are stored and we'll then gather them up and use them for training.

We need a lot of messages for training because doing bulk training on a small number of messages isn't effective.

vsajip

sean That's fine. I'll make sure there's always at least 500 before pinging support.

aa11

I'm wondering about False Positives.

I've been noticing more of these across a different domains via email forwards (set in the OS CP).

I can find recurring emails (think: Newsletters, announcements, etc.) that have different SA scores (and other header information) when received by WF vs. now at OS.

Here's an example:

WebFaction (MX):

X-Spam-Level: 
X-Spam-Score: -0.097
X-Spam-Flag: NO
X-Spam-Status: No, score=-0.097 tagged_above=-999 required=3 tests=[DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, UNPARSEABLE_RELAY=0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no

Opalstack (MX):

-Spam-Level: ****
X-Spamd-Bar: ++++
X-Spam-Status: No, score=4.20

So I wonder:

Is this a known issue? (i.e.: SpamAssasain settings are purposly extra strict.)
Is there a way to train SA on false positives?
- How would that work with emails that have sensative information?
Any chance of white-listing, without resorting to emails having to be delivered to a Mail User + Procmail rules?

Thanks!

Possibly related posts:

https://community.opalstack.com/d/243-email-forwards
https://community.opalstack.com/d/163-spf-record-warning-from-dmarcanalyzer-anything-to-worry-about/11

sean

aa11 we're using rspamd instead of SA so the detection algorithms are a bit different.

If you need training done then save up at least a few hundred false positives and we'll feed them into the system. We never see the content of the mail when we do this.

The only mail filtering that we do by default is to quarantine incoming and outgoing messages that have a very high spam score. I think the spam score for that is 15 on incoming and 7 on outgoing. Your Opalstack example is way below that incoming threshold so would it never be filtered.

We're still working on possible whitelisting solutions but don't have any ETA for that.

aa11

sean

If you need training done then save up at least a few hundred false positives and we'll feed them into the system. We never see the content of the mail when we do this.

What's the process for this? How do we deliver the content of the mail without you seeing the content?
How do you recommend me (or my clients) handle these messages? ex:
- Put them in a folder on the server?
- Download them as .mbox files?
- Forward them to a specific Mail User?
Is it ok to accumulate the false positives in one place, even if they are addressed to different addresses and domains?

The only mail filtering that we do by default is to quarantine incoming and outgoing messages that have a very high spam score. I think the spam score for that is 15 on incoming and 7 on outgoing. Your Opalstack example is way below that incoming threshold so would it never be filtered.

👍

We're still working on possible whitelisting solutions but don't have any ETA for that.

👍

Thanks.

aa11

sean If you need training done then save up at least a few hundred false positives and we'll feed them into the system.

True story:

The Opalstack "Subscription will renew" email was marked as a false positive:

Subject: Your ‪Opalstack LLC‬ subscription will renew soon
From: Opalstack LLC upcoming-invoice@opalstack.com
Reply-To: Opalstack LLC billing@opalstack.com

Body:

Your subscription will renew soon

This is a friendly reminder that your Opalstack LLC subscription will automatically renew on [DATE].

Headers:

X-Spam-Level: ***
X-Spamd-Bar: +++
X-Spam-Status: No, score=3.60

Well, it was close. Maybe the OS mail server delievered it to the Inbox because of that header line: X-Spam-Status: No. But my local email client automatically moved it into Junk.

sean

aa11 use your mail client to put them all in a single mail folder.

Don't forward them to us and don't forward them to one of your other mail users - move or copy them with your mail client.

We'll then gather them directly from your various mail user directories and feed them into rspamd.

This doesn't involve us reading your mail, all we'll see is your folder names and individual message filenames like 1606684504.44733_0.imap1.us.opalstack.com:2,S.

aa11

sean

use your mail client to put them all in a single mail folder.

Don't forward them to us and don't forward them to one of your other mail users - move or copy them with your mail client.

We'll then gather them directly from your various mail user directories and feed them into rspamd.

That makes sense.

But what about the cases where the mail was addressed to a forwarding address, and the destination is, say, Gmail? And the recipient only uses Gmail.com to check email (where they notice the false positives)?

ex:
personA@OShostedmxdomain.com -> personA@gmail.com

I can imagine "Forward as Attachment" being an ok idea (since it would preserve the headers of the false positive), but it wouldn't ingest as easily as the original message.

This doesn't involve us reading your mail, all we'll see is your folder names and individual message filenames like 1606684504.44733_0.imap1.us.opalstack.com:2,S.

Understood.

Thanks.

sean

aa11 if you're forwarding mail and not storing it on our servers then you'll have to copy (not forward) it back to us with an IMAP client before we can train with it.

I think you can use the Pattern option in mbsync to pull an individual folder over if you need to.

aa11

sean if you're forwarding mail and not storing it on our servers then you'll have to copy (not forward) it back to us with an IMAP client before we can train with it.

👍

I think you can use the Pattern option in mbsync to pull an individual folder over if you need to.

Good suggestion. The same would go for the first as well, by only "Subscribing" to the dedicated "False Positives" folder or label (in the case of Gmail).

Thanks.

sean

aa11 those messages are sent by Stripe (our billing processor).

Our mail system delivered it to your inbox because it isn't spam. 🤷

Some mail clients have a junk mail setting like "trust spamassassin headers" or "trust junk mail headers". If your client has a setting like that and it's not enabled, then try enabling it to see if it makes a difference.

aa11

sean

Some mail clients have a junk mail setting like "trust spamassassin headers" or "trust junk mail headers". If your client has a setting like that and it's not enabled, then try enabling it to see if it makes a difference.

I have that enabled. But I also have some "Advanced" custom rules that might have been overriding that. I've turned them off for now.

Thanks for the information!