Five new troubleshooting guides for when DMARC breaks
We published five troubleshooting guides covering SPF errors, DKIM failures, alignment issues, forwarding breakage, and the full p=none to p=reject migration. Each one follows the same format: symptom, diagnosis, solution, prevention.
I read a lot of support tickets. The same five problems come up over and over: SPF permerror, DKIM body hash mismatch, alignment failures that seem impossible, forwarding that breaks authentication, and the fear of moving past p=none.
Our documentation already covers the fundamentals. What DMARC is, how SPF and DKIM work, what the DNS records mean. That is useful when you are setting things up. It does not help when something breaks at 2 AM and you need a fix, not a lecture.
So we wrote five troubleshooting guides. Each one targets a specific failure category. Same structure every time: what you see in your reports, how to find the root cause, the steps to fix it, and how to stop it from coming back.
Why can’t I just Google this?
You can. You will find dozens of articles that say “publish an SPF record” and “set up DKIM.” None of that helps when your SPF record already exists but has 14 DNS lookups, or when DKIM passes but DMARC still fails because the d= domain does not match your From header.
These guides assume you already have DMARC deployed and need to fix something specific. They have dig commands, header fields to inspect, and DNS records to change.
What is the SPF 10-lookup limit and why does my record keep breaking?
This is the single most common SPF issue I see in tickets. RFC 7208 limits SPF evaluation to 10 DNS mechanism lookups. Every include, a, mx, redirect, and exists counts. ip4, ip6, and all do not count because they do not trigger DNS queries.
The catch is that include mechanisms are recursive. include:_spf.google.com looks like one lookup, but it expands to four more includes internally. Google Workspace alone eats five of your ten. Add Microsoft 365, a marketing platform, and a transactional service, and you are past the limit before you even notice.
The receiving server gives up evaluating your record and returns SPF permerror. Permanent failure.
The SPF errors guide shows how to count your lookups with dig, which services consume the most, and three ways to fix the problem: removing unused includes, delegating to subdomains, and SPF flattening (with the risks that come with it).
Why does DKIM pass but the signature is invalid?
DKIM failures are sneakier. The most common one is “body hash did not verify.” The message body changed after it was signed. The receiver recomputes the hash, compares it to the bh= value in the DKIM-Signature header, and they do not match.
Usually it is not your fault. A security gateway injected a disclaimer footer, a DLP appliance rewrote links, or an antivirus scanner modified the HTML. You did everything right. Something between you and the recipient broke the signature.
Other failures: expired signatures (the x= tag sets an expiration timestamp, and messages delayed in queues fail after that time), key not found errors (the selector in s= points to a DNS record that does not exist), and short keys (RSA 1024-bit that some receivers now reject).
The DKIM failures guide has diagnostic commands for each of these, a key rotation procedure that avoids downtime, and canonicalization settings that make signatures survive minor body modifications.
Why does SPF pass but DMARC still fails?
This one confuses everyone. SPF passed. The IP is authorized. The DNS record is correct. DMARC says fail.
Alignment. DMARC does not just check whether SPF or DKIM passed. It checks whether the domain that passed matches the domain in the visible From header. SPF checks the envelope sender (the Return-Path). If your ESP uses its own bounce domain, say bounces.mailchimp.com, SPF passes for the ESP’s domain but does not align with your From domain example.com. DMARC fails.
Same thing with DKIM. If the d= value in the signature is the ESP’s domain instead of yours, DKIM passes but alignment fails.
The alignment issues guide covers relaxed vs strict alignment, how subdomains interact with it, the RFC 5321 (envelope) vs RFC 5322 (visible) address distinction, and how to read alignment results in aggregate reports. I included concrete email header examples showing both passing and failing scenarios, because this is much easier to understand when you can see it.
Why does forwarding break everything?
This is the most frustrating DMARC problem because there is no clean fix. When a message is forwarded at the server level, the forwarding server’s IP replaces the original. SPF fails because the forwarder’s IP is not in the sender’s SPF record.
Mailing lists make it worse. Mailman, Google Groups, LISTSERV. They add footers, prepend list names to subjects, rewrite From headers. Every one of those modifications can invalidate the DKIM signature.
So a perfectly legitimate message, forwarded through a university mail system or a corporate rule, fails both SPF and DKIM at the final destination. If you have p=reject, it is dropped. The recipient never sees it.
ARC (Authenticated Received Chain, RFC 8617) helps. Each intermediary adds headers that preserve the original authentication results. If the receiver trusts the intermediary, it can override the DMARC failure. Google, Microsoft, and Yahoo honor ARC from trusted senders. Not everyone does yet.
The forwarding and ARC guide explains the mechanics, covers SRS (Sender Rewriting Scheme) and how it interacts with alignment, and lays out what you can realistically do as a domain owner before tightening your policy. Spoiler: some forwarding failures are unavoidable at p=reject, and you should quantify the impact before deciding if you can live with it.
How do I move from p=none to p=reject without losing mail?
This is the guide people ask me for most. They deployed DMARC with p=none, they see reports coming in, and now they want to enforce. But they are afraid of breaking legitimate mail. Rightly so.
The migration goes in stages: p=none for monitoring, p=quarantine for soft enforcement, p=reject for full enforcement. Skipping stages causes mail loss. I have seen it happen.
The real work happens between p=none and p=quarantine. You need to build a complete inventory of every service that sends mail using your domain, confirm that each one has proper DKIM or SPF alignment, and get your pass rate above 95%. That takes time. Monthly newsletters, quarterly reports, that HR system nobody documented. You need to catch them all.
The pct= tag is your safety net. Start at 25%, watch the reports, increase gradually. If something breaks, roll back immediately.
The none to reject migration guide has a week-by-week timeline, a pre-reject checklist, and a documented rollback procedure. It also covers subdomain policies and how to handle pct= during each phase.
All five guides
- SPF errors: permerror, too many lookups, and syntax failures
- DKIM failures: body hash mismatch, expired signatures, and missing keys
- Alignment issues: relaxed vs strict, subdomains, and header mismatch
- Forwarding and ARC: why authentication breaks and how to fix it
- DMARC policy migration: the complete roadmap from p=none to p=reject
You can also browse them from the troubleshooting hub.
If you are not monitoring DMARC reports yet, create an account to get started. Plans start at $19/month and unlock full access to your aggregate reports, sender insights, and DNS monitoring. You will get your first reports within 24 to 48 hours, and these guides will make a lot more sense with real data from your own domain.