I would like to quarantine an outgoing email when the To and CC header contains a large number of recipients. How can I do this?
Refer to the CLI DLP examples for a data loss prevention (DLP) pattern that detects multiple email addresses.
Can the DLP module detect all information leakage?
No. A Data Loss Prevention (DLP) module helps reduce accidental data leaks, but it cannot guarantee protection against a skilled, determined attacker. Any claim that a DLP solution can detect all data exfiltration is incomplete. Sensitive information can be transformed or disguised in ways that are extremely difficult to detect automatically. For example, text can be represented as images, ASCII art, screenshots, obfuscated wording, deliberate misspellings, alternate encodings, or embedded within files using techniques like steganography. Attackers may also compress, encrypt, or split information across multiple channels to avoid detection.
What DLP does well
Detects many common, accidental leaks (for example, sending sensitive data to the wrong recipient, unredacted attachments, or obvious policy violations).
Enforces organizational policies consistently (for example, blocking or quarantining emails that contain identifiable patterns such as credit card numbers).
Where DLP has limits
It cannot reliably detect information that has been heavily rewritten, encoded, embedded in images or other media, or sent via encrypted channels it cannot inspect.
It can miss context-dependent leaks (for example, using synonyms, paraphrasing, or partial data that only becomes sensitive when combined).
Adversaries actively adapt to bypass static rules and patterns
Example ASCII art:
#### ###### #### ##### ###### ##### # # # # # # # # #### ##### # # # ##### # # # # ##### # # # # # # # # # # # #### ###### #### # # ###### #
I have added a sentence to the list of patterns but somehow the sentence is not matched. Why is the sentence not matched?
It is possible that a word in your sentence is part of the skip list. All words in the skip list are removed from the text before scanning. If your pattern contains any of these skipped words, it will not match. For example, if the skip list includes the word “this” and your pattern is “this is a text”, the pattern will never match because it includes a word that is skipped.
Tip
To view the content scanned by the DLP engine, upload the raw email in MIME format and review the extracted text.
ciphermail-cli dlp extract text --file <file>
What is the skip list?
To improve performance, the system skips very common words that almost never contain sensitive information. Examples include words like “the,” “and,” “of,” “to,” and “a.” By removing these words before scanning, the process completes faster without reducing the accuracy of sensitive information detection.
By default, the skip list contains the 100 most common words in English. If your content relies on certain common terms or you work with other languages, you can customize this list by adding or removing words to match your needs.
I would like to match a word if the word contains uppercase characters but not when it contains lowercase characters. Is this possible?
Matching uppercase characters is not supported. When text is extracted from emails, it is normalized to make pattern matching simpler and faster. The normalization process works as follows: all line breaks (carriage returns and line feeds) are replaced with spaces, sequences of multiple spaces are reduced to a single space, all characters are converted to lowercase, and the text is Unicode-normalized (NFC). Because everything is converted to lowercase, uppercase-only patterns cannot be matched.
Are there any pre-defined patterns?
For some DLP patterns see DLP examples
Are attachments also scanned?
Text-based attachments (for example, .xml and .html) are scanned and their contents are analyzed. The contents of binary attachments are not scanned at this time. Future versions of the CipherMail gateway will add content scanning for common binary formats such as .doc, .zip, and .pdf.
However, the gateway does detect the document type of each attachment and includes that information in the extracted text. This lets you create rules to quarantine messages that contain specific document types (for example, PDF or Word files), even if the file’s extension has been changed.
I cannot delete certain patterns. Why is that?
If a pattern is currently in use, for example selected in the global settings, it cannot be deleted. Before deleting a pattern, make sure it is not selected by any user, any domain, or the global settings.
Are email headers scanned?
Email headers are included in the scan; however, the following headers are excluded:
Received
From
Reply-To
References
Message-ID
These headers are ignored to reduce the chance of false positives when scanning for multiple email addresses.