HTML entities are the mechanism HTML uses to represent characters that would otherwise be misinterpreted by the parser — angle brackets that would start a tag, ampersands that would begin another entity, and characters outside ASCII that might not survive encoding changes. Getting entities wrong causes broken layouts, garbled text, and in the worst case, XSS vulnerabilities.
What Is an HTML Entity?
An HTML entity is a string that begins with & and ends with ;. It represents a single character. There are two forms:
Named entities use a descriptive keyword:
< → <
> → >
& → &
" → "
→ (non-breaking space)
Numeric entities use the Unicode code point, either decimal or hexadecimal:
< → < (decimal)
< → < (hexadecimal)
© → ©
© → ©
Both forms produce identical output. Named entities are more readable; numeric entities work for any Unicode character including those without named equivalents.
Why HTML Entities Matter
Avoid breaking HTML structure
The < and > characters have special meaning in HTML. If you need to display literal angle brackets — for example, showing source code — you must escape them:
<!-- Wrong: browser parses this as an incomplete tag -->
<p>Use <b>bold</b> to emphasize.</p>
Using <strong> instead of <b> is preferred.
<!-- Correct -->
<p>Using <strong> instead of <b> is preferred.</p>
Prevent XSS vulnerabilities
Failing to escape user-supplied content before inserting it into HTML is one of the most common web security bugs. If a user enters <script>alert(1)</script> and your code outputs it raw, that script runs in every visitor’s browser.
<!-- Dangerous: user input inserted raw -->
<p>Hello, <%= username %></p>
<!-- Safe: HTML-encoded -->
<p>Hello, <%= htmlEncode(username) %></p>
Always encode these five characters for any user-supplied content inserted into HTML:
| Character | Entity |
|---|---|
& | & |
< | < |
> | > |
" | " |
' | ' |
Display characters outside your document’s character set
Before UTF-8 became universal, documents served in ISO-8859-1 (Latin-1) couldn’t represent characters like ©, €, or — directly. Entities were the workaround. Today, serving HTML as UTF-8 is standard, but entities remain useful for characters you can’t easily type or that might be stripped by text processors.
Common HTML Entities Reference
Reserved Characters
| Character | Name | Entity | Numeric |
|---|---|---|---|
< | Less-than | < | < |
> | Greater-than | > | > |
& | Ampersand | & | & |
" | Double quote | " | " |
' | Single quote | ' | ' |
Typography
| Character | Name | Entity | Use case |
|---|---|---|---|
| Non-breaking space | | Prevent line breaks between words |
— | Em dash | — | Sentence breaks, ranges |
– | En dash | – | Number ranges (2010–2024) |
… | Ellipsis | … | Truncated text |
" | Left double quote | “ | Quotations |
" | Right double quote | ” | Quotations |
' | Apostrophe | ’ | Contractions |
Symbols
| Character | Name | Entity |
|---|---|---|
© | Copyright | © |
® | Registered | ® |
™ | Trademark | ™ |
€ | Euro | € |
£ | Pound | £ |
¥ | Yen | ¥ |
° | Degree | ° |
± | Plus-minus | ± |
× | Multiplication | × |
÷ | Division | ÷ |
→ | Right arrow | → |
← | Left arrow | ← |
↑ | Up arrow | ↑ |
↓ | Down arrow | ↓ |
Mathematical
| Character | Name | Entity |
|---|---|---|
≤ | Less or equal | ≤ |
≥ | Greater or equal | ≥ |
≠ | Not equal | ≠ |
∞ | Infinity | ∞ |
∑ | Summation | ∑ |
√ | Square root | √ |
Encoding and Decoding in Code
JavaScript
The browser DOM handles HTML entity encoding:
// Encode (escape HTML)
function htmlEncode(str) {
const div = document.createElement('div');
div.appendChild(document.createTextNode(str));
return div.innerHTML;
}
htmlEncode('<script>alert(1)</script>');
// "<script>alert(1)</script>"
// Decode (unescape HTML)
function htmlDecode(str) {
const div = document.createElement('div');
div.innerHTML = str;
return div.textContent;
}
htmlDecode('<b>bold</b>');
// "<b>bold</b>"
In Node.js (no DOM), use a library:
npm install he
import he from 'he';
he.encode('<script>alert(1)</script>');
// "<script>alert(1)</script>"
he.decode('<b>bold</b>');
// "<b>bold</b>"
Python
Python’s standard library handles the common cases:
import html
# Encode
html.escape('<script>alert(1)</script>')
# '<script>alert(1)</script>'
# Encode including single quotes
html.escape("it's a test", quote=True)
# 'it's a test'
# Decode
html.unescape('<b>Hello & World</b>')
# '<b>Hello & World</b>'
PHP
// Encode (for HTML context)
htmlspecialchars('<script>alert(1)</script>', ENT_QUOTES, 'UTF-8');
// <script>alert(1)</script>
// Encode all applicable characters
htmlentities('© 2024', ENT_QUOTES, 'UTF-8');
// © 2024
// Decode
html_entity_decode('<b>bold</b>', ENT_QUOTES, 'UTF-8');
// <b>bold</b>
When to Encode vs When Not To
Always encode when inserting untrusted input into HTML — user names, search queries, form values, API data.
Don’t double-encode. If content has already been encoded (stored in the database as <), encoding it again produces &lt;, which renders as the literal string < rather than <.
Use UTF-8 instead of entities for non-ASCII text. Serving documents as UTF-8 and storing text as UTF-8 means you can use ©, €, → directly in HTML without entities. Entities for these characters are only necessary in legacy or constrained environments.
Non-breaking spaces sparingly. is often used as a spacing hack in HTML emails or to prevent word wrapping. In modern HTML/CSS, white-space: nowrap or word-break properties are usually more maintainable.
Quick Lookup Tool
If you need to find the entity for a character, or decode an entity you encountered in source code, the fastest way is a dedicated encoder/decoder. Try the ZeroTool HTML Entity tool →
Paste text containing HTML entities to decode them, or type/paste special characters to get their entity equivalents. Useful for:
- Decoding garbled HTML from a CMS export
- Finding the correct entity for a symbol you only have visually
- Verifying that your template escaping is working correctly
Summary
HTML entities are essential for two things: displaying reserved characters without breaking HTML parsing, and preventing XSS by escaping user input. In modern UTF-8 documents, you mostly only need <, >, &, ", and ' for escaping — the rest of the character set can be inserted directly.