Question 1

What counts as an invisible character?

Accepted Answer

Invisible characters are Unicode code points that render with zero visual width or control directional text flow without appearing on screen. Examples: ZWSP (U+200B, zero-width space), ZWNJ (U+200C, zero-width non-joiner), ZWJ (U+200D, zero-width joiner), BOM (U+FEFF, byte-order mark), Bidi controls (U+200E-200F, U+202A-202E, U+2066-2069), soft hyphen (U+00AD), variation selectors (U+FE00-FE0F, U+E0100-E01EF), and tag characters (U+E0000-E007F). Some control text behavior invisibly; others are used for zero-width watermarks.

Question 2

Why would normal text contain invisible characters?

Accepted Answer

Invisible characters appear in text from multiple sources: copy-pasting from rich-text editors (Word, Google Docs, Slack) that embed soft hyphens and direction markers; AI-generated content with embedded watermarks or authentication tokens; intentional obfuscation in supply-chain attacks; clipboard tools and font-shaping libraries inserting control codes for layout; and metadata stored inline (e.g., Markdown link reference definitions or format markers). They often survive plain-text export unnoticed.

Question 3

Does this catch ChatGPT or AI watermarks?

Accepted Answer

Yes. AI models including ChatGPT and others embed invisible authentication or tracking tokens using Unicode tag characters (U+E0000-E007F range). This tool detects the entire tag character block. Note that watermarking techniques evolve — this tool catches tag characters and all known Unicode invisible categories, but if new steganographic methods emerge outside standard invisible character ranges, they may not be detected.

Question 4

Is my text sent to a server?

Accepted Answer

No. Detection and removal run entirely in your browser via JavaScript. Your text never leaves your device. You can verify this by opening DevTools → Network and confirming no data requests are made while you paste and process text.

Question 5

What about Trojan-Source attacks (CVE-2021-42574)?

Accepted Answer

Trojan-Source exploits use bidirectional (Bidi) Unicode control characters (particularly RLO U+202E, LRO U+202D, and directional isolates U+2066-2069) to reorder visible characters in source code, hiding malicious logic. For example, RLO can flip the display order of code, making a hidden return statement appear as a benign comment. This tool detects and removes all Bidi control characters, mitigating the primary attack vector. Examining source after cleanup reveals the true logic order.

Zero-Width Character Detector

Zero-Width Character Detector

What It Detects

How to Use

Common Scenarios

Why Client-Side Matters

Related Tools

FAQ