plato·unicode
🔒 Your data never leaves your browser

Unicode Character Table & Search

Search 600+ Unicode characters by name, codepoint, or category. Click any character to copy it as a glyph, U+XXXX codepoint, HTML entity, or CSS escape.

Worked Examples

1. Inserting ✓ checkmark in CSS ::before pseudo-element

Use the CSS escape \2713 inside a content declaration. The backslash introduces a hex codepoint; no U+ prefix is used in CSS.

/* CSS */
li.done::before {
  content: '\2713';   /* ✓  CHECK MARK  U+2713 */
  color: green;
  font-family: 'Segoe UI Symbol', Arial, sans-serif;
  margin-right: 0.4em;
}

Always list a symbol-capable fallback font. The escape ends at the first non-hex character, so the trailing space after \2713 is consumed as a delimiter and does not appear in output — use '\2713 ' (space after the space) if you need an actual space.

2. Using © copyright in HTML (named entity vs numeric)

All three forms below produce identical output in every browser. Prefer the named entity © for readability; use numeric for characters without named entities.

<!-- Named entity (readable, HTML5) -->
<p>Copyright &copy; 2024 Plato Tools</p>

<!-- Decimal numeric entity -->
<p>Copyright &#169; 2024 Plato Tools</p>

<!-- Hex numeric entity (U+00A9) -->
<p>Copyright &#xA9; 2024 Plato Tools</p>

<!-- Direct UTF-8 character in file (also fine if saved as UTF-8) -->
<p>Copyright © 2024 Plato Tools</p>

3. Detecting emoji codepoints in JavaScript (handling surrogate pairs)

Emoji live in supplementary planes (U+1F000+) and occupy two UTF-16 code units (a surrogate pair). Use codePointAt and spread syntax to handle them correctly.

// ❌ Wrong — charCodeAt returns the high surrogate (0xD83D), not the emoji
'😀'.charCodeAt(0); // 55357

// ✓ Correct — codePointAt reads the full surrogate pair
'😀'.codePointAt(0); // 128512  (U+1F600)

// ✓ Iterate all codepoints in a string (handles surrogates)
const codepoints = [...'Hello 😀'].map(ch => ch.codePointAt(0));
// [72, 101, 108, 108, 111, 32, 128512]

// ✓ Convert codepoint back to character
String.fromCodePoint(128512); // '😀'

// ✓ Format as U+XXXX
function toUPlus(cp) {
  return 'U+' + cp.toString(16).toUpperCase().padStart(4, '0');
}
toUPlus(128512); // 'U+1F600'

About Unicode and This Tool

Unicode is the universal character encoding standard that assigns a unique number — called a codepoint — to every character in every writing system on Earth. Created in 1991 and maintained by the Unicode Consortium, it now covers more than 150,000 characters spanning 154 scripts, mathematical notation, musical symbols, emoji, and historical scripts from Linear B to Cuneiform. The full range runs from U+0000 to U+10FFFF, giving over 1.1 million possible slots.

Before Unicode, software engineers juggled dozens of incompatible encodings — ASCII for English, ISO-8859-1 for Western European languages, Shift-JIS for Japanese, GB2312 for Simplified Chinese. Mixing them in a single document was nearly impossible without explicit tagging and complex conversion tables. Unicode solved this by providing one authoritative mapping. You can now write Arabic, Hindi, and Korean in the same file and every Unicode-aware application will interpret it identically.

UTF-8 is the dominant encoding of Unicode on the web. It is a variable-width encoding: ASCII characters (U+0000–U+007F) use a single byte, making UTF-8 backward-compatible with ASCII. Characters in the U+0080–U+07FF range use two bytes, most characters up to U+FFFF use three bytes, and supplementary characters (emoji, rare historic scripts) use four bytes. Because it is compact for English and ASCII-heavy content, UTF-8 is used by over 98% of websites.

Every Unicode character has a codepoint written as U+XXXX where XXXX is a hexadecimal number padded to at least four digits. For example, the euro sign € is U+20AC, the Greek letter π is U+03C0, and the snowman ☃ is U+2603. When you need to include these in HTML you can write them as hex entities (&#x20AC;) or decimal entities (&#8364;). In CSS you drop the U+ prefix and use a backslash: \20AC.

This tool lets you browse a curated set of over 600 characters across six categories: Latin and Greek letters, symbols, math operators, arrows, currency signs, and punctuation. Use the search box to filter by character name or codepoint, click any card to see all four copy formats, or use the "Any codepoint" field to look up any Unicode character — including emoji and supplementary characters — directly by its decimal or hex codepoint.

Everything runs in your browser. No characters, search queries, or codepoints are sent to any server. The entire dataset is embedded inline so the tool works fully offline once loaded.

Frequently Asked Questions

What is a Unicode codepoint?
A Unicode codepoint is a unique integer assigned to every character in the Unicode standard. It is written as U+XXXX where XXXX is a hexadecimal number, for example U+0041 is the Latin capital letter A and U+2665 is the black heart suit ♥. There are over 1.1 million possible codepoints (U+0000 to U+10FFFF), of which about 150,000 are currently assigned to characters.
What is the difference between UTF-8, UTF-16, and UTF-32?
UTF-8, UTF-16, and UTF-32 are all encodings of the same Unicode codepoints but use different byte widths. UTF-8 uses 1 to 4 bytes per character and is ASCII-compatible, making it the dominant encoding on the web. UTF-16 uses 2 or 4 bytes; it is used internally by JavaScript, Java, and Windows. UTF-32 uses exactly 4 bytes per character, making it simple but memory-intensive. For web and file storage, UTF-8 is almost always the correct choice.
How do I write a Unicode character as an HTML entity?
You can write any Unicode character as a numeric HTML entity using the codepoint in decimal (♥) or hexadecimal (♥) form. Both refer to the heart suit ♥ at U+2665. Many common characters also have named entities such as &copy; for © and &amp; for &. Numeric hex entities are the most portable and work for any Unicode character including emoji and rare scripts.
How do I use a Unicode character in CSS content?
In CSS, Unicode escapes use a backslash followed by the hex codepoint without the U+ prefix. For example, to insert a checkmark in a ::before pseudo-element write: content: '\2713'. The escape ends at the first non-hex character or after six hex digits. You do not need to pad to six digits in modern browsers, but padding to four is conventional. Always set font-family to a font that includes the character.
What Unicode characters are safe to use in filenames?
On all major systems, ASCII letters, digits, hyphens, underscores, and periods are safe. Characters to avoid: / and NUL on all systems; additionally \ : * ? " < > | on Windows. Unicode letters and accented characters generally work on modern filesystems (NTFS, APFS, ext4 with UTF-8 locale) but can cause problems when moving files between systems or uploading to servers with ASCII-only configurations.
What is the BMP vs supplementary characters?
The Basic Multilingual Plane (BMP) covers codepoints U+0000 to U+FFFF and includes virtually all characters in modern use. Supplementary characters occupy the 16 planes beyond the BMP (U+10000 to U+10FFFF) and include emoji, historic scripts, and rare CJK extensions. In UTF-16, supplementary characters require two 16-bit code units called a surrogate pair. In JavaScript, strings are UTF-16, so supplementary characters have a .length of 2 and require String.prototype.codePointAt() instead of .charCodeAt().
What are Unicode combining characters?
Combining characters are codepoints that modify the preceding base character rather than standing alone. For example, U+0301 (COMBINING ACUTE ACCENT) placed after the letter e produces é. A sequence of a base character plus one or more combining characters is called a grapheme cluster and is perceived as a single glyph. This is why string length in code can differ from visual character count — JavaScript's .length counts code units, not grapheme clusters. The Intl.Segmenter API is needed to count visible characters correctly.
What are Unicode homoglyphs and why are they a security risk?
Homoglyphs are characters from different Unicode blocks that look visually identical or nearly identical. For example, the Cyrillic а (U+0430) looks the same as the Latin a (U+0061). Attackers exploit homoglyphs in IDN homograph attacks to register domain names that look identical to legitimate ones. They are also used in source-code attacks to embed invisible or visually identical identifiers. Browsers mitigate IDN attacks by showing punycode for mixed-script domains.
How do I get the codepoint of a character in JavaScript?
For BMP characters use str.charCodeAt(0) which returns the UTF-16 code unit. For supplementary characters (emoji, etc.) use str.codePointAt(0) which correctly handles surrogate pairs and returns the full codepoint. To convert back, use String.fromCodePoint(cp). To iterate all codepoints in a string, spread it: [...str].map(ch => ch.codePointAt(0)) — the spread operator correctly segments surrogate pairs into single characters.
What is the Unicode replacement character U+FFFD?
U+FFFD (REPLACEMENT CHARACTER, displayed as a diamond with a question mark) is substituted by parsers when they encounter a byte sequence that cannot be decoded in the expected encoding. It signals a text encoding error — typically a file being read with the wrong encoding, a corrupted byte, or an invalid UTF-8 sequence. If you see replacement characters in your output, check that your file is saved as UTF-8 and that you are reading it with a UTF-8 decoder.