Content levels
AECS-1 §4.3 defines the same message body at six processing levels, from byte-faithful to LLM-ready:
rawFull → raw → text → clean → forAI
↘ html
Implementations SHOULD populate every level they’re capable of producing.
A field the implementation cannot produce MUST be null — never omitted silently
in a way a consumer would need to distinguish from “not yet processed.”
| Field | Description |
|---|---|
content.rawFull |
Complete original RFC 5322 message — all headers, MIME parts, encodings, exactly as received. Suitable for archival and re-parsing. |
content.raw |
The latest message body only. Quoted reply history is stripped at the MIME level. Headers are excluded. |
content.html |
HTML rendition of the latest message content. null if the message has no HTML part. |
content.text |
Plain text rendition of the latest message content, decoded from any transfer encoding. |
content.clean |
Plain text with email signatures and quoted reply chains removed using heuristic detection. May be imperfect. |
content.forAI |
Derived from clean. Additionally: whitespace normalised, inline image references removed, forwarded-message headers collapsed to a single summary line. This is the field AI consumers should use as their primary input. |
Consumers preferring minimal context-window usage should use content.forAI.
Consumers requiring fidelity to the original should use content.rawFull.
How @mvrx/mail produces each level
content.rawFull— the exact source bytes, decoded as text. Never modified.content.text/content.html— parsed directly from the MIME tree (viapostal-mime);textis derived fromhtmlwhen no plain-text part exists.content.raw—textwith quoted reply chains stripped at the line level (>prefixes,On ... wrote:markers,-----Original Message-----blocks, long underscore dividers).content.clean—rawwith a heuristic signature stripper applied: an RFC 3676--delimiter,Sent from my iPhone-style mobile signoffs, confidentiality-notice boilerplate, and short “Best, <name>” sign-offs are all detected and removed.content.forAI—cleanwithcid:/data:URIs replaced by[inline image removed],--- Forwarded message ---banners collapsed to[forwarded message], whitespace normalized, and the result truncated toforAIMaxChars(default8,000) with a trailing[truncated]marker.
Replacing the default cleaner
const email = await parse(message, {
cleaner: (text) => myCustomCleaner(text), // sync or async
});
cleaner replaces the signature-stripping step (clean) — quote-chain stripping
(raw) always runs first, since it operates directly on MIME quoting structure
rather than heuristics.
Security note
None of the levels — including forAI — sanitize for prompt injection.
content.forAI reduces noise, it does not make email content trustworthy. See
AECS-1 §7 and
Threads & wrappers for how to pass email content to
an LLM safely.