Menu

Content levels

AECS-1 §4.3 defines the same message body at six processing levels, from byte-faithful to LLM-ready:

rawFull  →  raw  →  text  →  clean  →  forAI
                  ↘  html

Implementations SHOULD populate every level they’re capable of producing. A field the implementation cannot produce MUST be null — never omitted silently in a way a consumer would need to distinguish from “not yet processed.”

Field Description
content.rawFull Complete original RFC 5322 message — all headers, MIME parts, encodings, exactly as received. Suitable for archival and re-parsing.
content.raw The latest message body only. Quoted reply history is stripped at the MIME level. Headers are excluded.
content.html HTML rendition of the latest message content. null if the message has no HTML part.
content.text Plain text rendition of the latest message content, decoded from any transfer encoding.
content.clean Plain text with email signatures and quoted reply chains removed using heuristic detection. May be imperfect.
content.forAI Derived from clean. Additionally: whitespace normalised, inline image references removed, forwarded-message headers collapsed to a single summary line. This is the field AI consumers should use as their primary input.

Consumers preferring minimal context-window usage should use content.forAI. Consumers requiring fidelity to the original should use content.rawFull.

How @mvrx/mail produces each level

  1. content.rawFull — the exact source bytes, decoded as text. Never modified.
  2. content.text / content.html — parsed directly from the MIME tree (via postal-mime); text is derived from html when no plain-text part exists.
  3. content.rawtext with quoted reply chains stripped at the line level (> prefixes, On ... wrote: markers, -----Original Message----- blocks, long underscore dividers).
  4. content.cleanraw with a heuristic signature stripper applied: an RFC 3676 -- delimiter, Sent from my iPhone-style mobile signoffs, confidentiality-notice boilerplate, and short “Best, <name>” sign-offs are all detected and removed.
  5. content.forAIclean with cid:/data: URIs replaced by [inline image removed], --- Forwarded message --- banners collapsed to [forwarded message], whitespace normalized, and the result truncated to forAIMaxChars (default 8,000) with a trailing [truncated] marker.

Replacing the default cleaner

const email = await parse(message, {
  cleaner: (text) => myCustomCleaner(text), // sync or async
});

cleaner replaces the signature-stripping step (clean) — quote-chain stripping (raw) always runs first, since it operates directly on MIME quoting structure rather than heuristics.

Security note

None of the levels — including forAI — sanitize for prompt injection. content.forAI reduces noise, it does not make email content trustworthy. See AECS-1 §7 and Threads & wrappers for how to pass email content to an LLM safely.