ZeroTool Workbench
Zero-Width Character Detector
Detect and remove invisible Unicode characters: ZWSP, ZWNJ, ZWJ, BOM, Bidi controls, soft hyphens, tag characters (AI watermarks), and variation selectors. Client-side, no upload.
Zero-Width Character Detector
Invisible Unicode characters silently embed themselves in text from copy-paste, AI-generated content, and supply-chain attacks. They don’t render visually but can carry watermarks, manipulate text flow, hide malicious code, or interfere with parsing. This tool detects and removes them.
Paste text and the detector will highlight every invisible character, categorize it, and offer a cleaned version. Use strip mode to remove all categories at once or selectively — zero-width only, Bidi controls only, AI watermarks (tag characters) only, or variation selectors only.
What It Detects
- Zero-width spaces (U+200B) — invisible space for soft-wrapping without line breaks
- Zero-width joiner / non-joiner (U+200D, U+200C) — join or prevent joining of adjacent characters in scripts like Arabic and Devanagari
- Byte-order mark (U+FEFF) — marks file encoding endianness; often corrupts plain-text imports
- Word joiner (U+2060) — prevents line breaks without rendering space
- Bidirectional controls (U+200E-200F, U+202A-202E, U+2066-2069) — override text direction for RTL/LTR display; used in Trojan-Source attacks to hide code
- Soft hyphen (U+00AD) — suggests line-break point when text reflows; can corrupt copy-pasted content
- Variation selectors (U+FE00-FE0F, U+E0100-E01EF) — select glyph variants for emoji and CJK characters; accumulate when exporting from design tools
- Tag characters (U+E0000-E007F) — reserved for invisible metadata; used for AI model watermarks and authentication tokens
How to Use
- Paste your text into the input field. You can also upload a .txt file.
- The tool instantly highlights every invisible character in the text, color-coded by category.
- A summary shows the count and breakdown: how many zero-width, Bidi, tag characters, etc.
- Choose a strip mode: All (remove everything), Zero-width only, Bidi only, Tag only, or Variation only.
- Copy the cleaned text or download it as a .txt file.
Common Scenarios
Auditing AI-generated content — ChatGPT and other models embed invisible tokens for watermarking and authentication. When verifying whether text is machine-generated or reviewing for compliance, strip tag characters to remove the markers.
Supply-chain code security — Malicious dependencies may hide Trojan-Source payloads using Bidi controls. Code reviewers can paste suspect files here to expose hidden directionality attacks before they reach CI/CD.
Email and anti-phishing — Phishing emails and trojanized content often use invisible characters to obfuscate URLs or bypass filters. Copy-paste the email body to reveal the true text order and catch social-engineering tricks.
Cleaning clipboard paste-throughs — When copying from Google Docs, Microsoft Word, or Slack, soft hyphens and Bidi markers tag along. Paste into this tool and strip to get clean plain text for code blocks or databases.
Incident response and text forensics — When investigating whether leaked or leaked credentials contain exfiltration markers or hidden metadata, this tool reveals exactly what invisible data is embedded.
Why Client-Side Matters
Processing on your device rather than uploading to a server has two key advantages:
Privacy — Your text is sensitive. Credentials, code, leaked data, and confidential documents stay in your browser. No logs, no retention, no analytics on what you paste.
No forensic delay — Trojan-Source samples and potentially malicious content don’t need to be uploaded for analysis. Paste and detect instantly. This is critical when reviewing untrusted code repositories or email attachments in real time.
Related Tools
- Unicode Text Converter — convert text to Unicode code points and back; explore character composition.
- String Escape Utility — encode and decode escaped sequences (C, JavaScript, Python, regex, JSON); complement invisible-character detection with full-text encoding visibility.
FAQ
What counts as an invisible character?
Invisible characters are Unicode code points that render with zero visual width or control directional text flow without appearing on screen. Examples: ZWSP (U+200B, zero-width space), ZWNJ (U+200C, zero-width non-joiner), ZWJ (U+200D, zero-width joiner), BOM (U+FEFF, byte-order mark), Bidi controls (U+200E-200F, U+202A-202E, U+2066-2069), soft hyphen (U+00AD), variation selectors (U+FE00-FE0F, U+E0100-E01EF), and tag characters (U+E0000-E007F). Some control text behavior invisibly; others are used for zero-width watermarks.
Why would normal text contain invisible characters?
Invisible characters appear in text from multiple sources: copy-pasting from rich-text editors (Word, Google Docs, Slack) that embed soft hyphens and direction markers; AI-generated content with embedded watermarks or authentication tokens; intentional obfuscation in supply-chain attacks; clipboard tools and font-shaping libraries inserting control codes for layout; and metadata stored inline (e.g., Markdown link reference definitions or format markers). They often survive plain-text export unnoticed.
Does this catch ChatGPT or AI watermarks?
Yes. AI models including ChatGPT and others embed invisible authentication or tracking tokens using Unicode tag characters (U+E0000-E007F range). This tool detects the entire tag character block. Note that watermarking techniques evolve — this tool catches tag characters and all known Unicode invisible categories, but if new steganographic methods emerge outside standard invisible character ranges, they may not be detected.
Is my text sent to a server?
No. Detection and removal run entirely in your browser via JavaScript. Your text never leaves your device. You can verify this by opening DevTools → Network and confirming no data requests are made while you paste and process text.
What about Trojan-Source attacks (CVE-2021-42574)?
Trojan-Source exploits use bidirectional (Bidi) Unicode control characters (particularly RLO U+202E, LRO U+202D, and directional isolates U+2066-2069) to reorder visible characters in source code, hiding malicious logic. For example, RLO can flip the display order of code, making a hidden return statement appear as a benign comment. This tool detects and removes all Bidi control characters, mitigating the primary attack vector. Examining source after cleanup reveals the true logic order.