HTML Entity Reference: Encode & Decode Special Characters

HTML entities are the mechanism HTML uses to represent characters that would otherwise be misinterpreted by the parser — angle brackets that would start a tag, ampersands that would begin another entity, and characters outside ASCII that might not survive encoding changes. Getting entities wrong causes broken layouts, garbled text, and in the worst case, XSS vulnerabilities.

What Is an HTML Entity?

An HTML entity is a string that begins with & and ends with ;. It represents a single character. There are two forms:

Named entities use a descriptive keyword:

&lt;    → <
&gt;    → >
&amp;   → &
&quot;  → "
&nbsp;  → (non-breaking space)

Numeric entities use the Unicode code point, either decimal or hexadecimal:

&#60;   → <   (decimal)
&#x3C;  → <   (hexadecimal)
&#169;  → ©
&#x00A9; → ©

Both forms produce identical output. Named entities are more readable; numeric entities work for any Unicode character including those without named equivalents.

Why HTML Entities Matter

Avoid breaking HTML structure

The < and > characters have special meaning in HTML. If you need to display literal angle brackets — for example, showing source code — you must escape them:

<!-- Wrong: browser parses this as an incomplete tag -->
<p>Use <b>bold</b> to emphasize.</p>
Using <strong> instead of <b> is preferred.

<!-- Correct -->
<p>Using &lt;strong&gt; instead of &lt;b&gt; is preferred.</p>

Prevent XSS vulnerabilities

Failing to escape user-supplied content before inserting it into HTML is one of the most common web security bugs. If a user enters <script>alert(1)</script> and your code outputs it raw, that script runs in every visitor’s browser.

<!-- Dangerous: user input inserted raw -->
<p>Hello, <%= username %></p>

<!-- Safe: HTML-encoded -->
<p>Hello, <%= htmlEncode(username) %></p>

Always encode these five characters for any user-supplied content inserted into HTML:

Character	Entity
`&`	`&`
`<`	`<`
`>`	`>`
`"`	`"`
`'`	`'`

Display characters outside your document’s character set

Before UTF-8 became universal, documents served in ISO-8859-1 (Latin-1) couldn’t represent characters like ©, €, or — directly. Entities were the workaround. Today, serving HTML as UTF-8 is standard, but entities remain useful for characters you can’t easily type or that might be stripped by text processors.

Common HTML Entities Reference

Reserved Characters

Character	Name	Entity	Numeric
`<`	Less-than	`<`	`<`
`>`	Greater-than	`>`	`>`
`&`	Ampersand	`&`	`&`
`"`	Double quote	`"`	`"`
`'`	Single quote	`'`	`'`

Typography

Character	Name	Entity	Use case
	Non-breaking space	` `	Prevent line breaks between words
`—`	Em dash	`—`	Sentence breaks, ranges
`–`	En dash	`–`	Number ranges (2010–2024)
`…`	Ellipsis	`…`	Truncated text
`"`	Left double quote	`“`	Quotations
`"`	Right double quote	`”`	Quotations
`'`	Apostrophe	`’`	Contractions

Symbols

Character	Name	Entity
`©`	Copyright	`©`
`®`	Registered	`®`
`™`	Trademark	`™`
`€`	Euro	`€`
`£`	Pound	`£`
`¥`	Yen	`¥`
`°`	Degree	`°`
`±`	Plus-minus	`±`
`×`	Multiplication	`×`
`÷`	Division	`÷`
`→`	Right arrow	`→`
`←`	Left arrow	`←`
`↑`	Up arrow	`↑`
`↓`	Down arrow	`↓`

Mathematical

Character	Name	Entity
`≤`	Less or equal	`≤`
`≥`	Greater or equal	`≥`
`≠`	Not equal	`≠`
`∞`	Infinity	`∞`
`∑`	Summation	`∑`
`√`	Square root	`√`

Encoding and Decoding in Code

JavaScript

The browser DOM handles HTML entity encoding:

// Encode (escape HTML)
function htmlEncode(str) {
  const div = document.createElement('div');
  div.appendChild(document.createTextNode(str));
  return div.innerHTML;
}

htmlEncode('<script>alert(1)</script>');
// "&lt;script&gt;alert(1)&lt;/script&gt;"

// Decode (unescape HTML)
function htmlDecode(str) {
  const div = document.createElement('div');
  div.innerHTML = str;
  return div.textContent;
}

htmlDecode('&lt;b&gt;bold&lt;/b&gt;');
// "<b>bold</b>"

In Node.js (no DOM), use a library:

npm install he

import he from 'he';

he.encode('<script>alert(1)</script>');
// "&lt;script&gt;alert(1)&lt;&#x2F;script&gt;"

he.decode('&lt;b&gt;bold&lt;/b&gt;');
// "<b>bold</b>"

Python

Python’s standard library handles the common cases:

import html

# Encode
html.escape('<script>alert(1)</script>')
# '&lt;script&gt;alert(1)&lt;/script&gt;'

# Encode including single quotes
html.escape("it's a test", quote=True)
# 'it&#x27;s a test'

# Decode
html.unescape('&lt;b&gt;Hello &amp; World&lt;/b&gt;')
# '<b>Hello & World</b>'

PHP

// Encode (for HTML context)
htmlspecialchars('<script>alert(1)</script>', ENT_QUOTES, 'UTF-8');
// &lt;script&gt;alert(1)&lt;/script&gt;

// Encode all applicable characters
htmlentities('© 2024', ENT_QUOTES, 'UTF-8');
// &copy; 2024

// Decode
html_entity_decode('&lt;b&gt;bold&lt;/b&gt;', ENT_QUOTES, 'UTF-8');
// <b>bold</b>

When to Encode vs When Not To

Always encode when inserting untrusted input into HTML — user names, search queries, form values, API data.

Don’t double-encode. If content has already been encoded (stored in the database as <), encoding it again produces &lt;, which renders as the literal string < rather than <.

Use UTF-8 instead of entities for non-ASCII text. Serving documents as UTF-8 and storing text as UTF-8 means you can use ©, €, → directly in HTML without entities. Entities for these characters are only necessary in legacy or constrained environments.

Non-breaking spaces sparingly.   is often used as a spacing hack in HTML emails or to prevent word wrapping. In modern HTML/CSS, white-space: nowrap or word-break properties are usually more maintainable.

Quick Lookup Tool

If you need to find the entity for a character, or decode an entity you encountered in source code, the fastest way is a dedicated encoder/decoder. Try the ZeroTool HTML Entity tool →

Paste text containing HTML entities to decode them, or type/paste special characters to get their entity equivalents. Useful for:

Decoding garbled HTML from a CMS export
Finding the correct entity for a symbol you only have visually
Verifying that your template escaping is working correctly

Summary

HTML entities are essential for two things: displaying reserved characters without breaking HTML parsing, and preventing XSS by escaping user input. In modern UTF-8 documents, you mostly only need <, >, &, ", and ' for escaping — the rest of the character set can be inserted directly.

Encode and decode HTML entities instantly →

Character	Name	Entity	Use case
	Non-breaking space	` `	Prevent line breaks between words
`—`	Em dash	`—`	Sentence breaks, ranges
`–`	En dash	`–`	Number ranges (2010–2024)
`…`	Ellipsis	`…`	Truncated text
`"`	Left double quote	`“`	Quotations
`"`	Right double quote	`”`	Quotations
`'`	Apostrophe	`’`	Contractions

Character	Name	Entity	Use case
	Non-breaking space	` `	Prevent line breaks between words
`—`	Em dash	`—`	Sentence breaks, ranges
`–`	En dash	`–`	Number ranges (2010–2024)
`…`	Ellipsis	`…`	Truncated text
`"`	Left double quote	`“`	Quotations
`"`	Right double quote	`”`	Quotations
`'`	Apostrophe	`’`	Contractions