Unicode Encoder / Decoder

Encode or decode Unicode escape sequences easily. Convert between readable text and Unicode representations including \u escapes and code points.

Input Settings

Mode:

Format:

U+XXXX Codepoint Format

Every character is converted to its Unicode codepoint notation. All characters, including ASCII, are shown as U+XXXX. Example: 'café' → 'U+0063 U+0061 U+0066 U+00E9'

Text to Encode:

No input entered

Unicode Encoded Result

Unicode encoded result will appear here

Enter text and click "Convert" to start

About Unicode Encoder / Decoder

Unicode encoding converts text characters into Unicode escape sequences (like \u0041 for 'A') or code point representations, allowing any character from any language to be represented in ASCII-safe format. This tool handles bidirectional conversion between readable text and various Unicode representations, supporting international characters, symbols, and emojis.

Why use a Unicode Encoder / Decoder?

Unicode encoding is essential for internationalization, ensuring that text with non-ASCII characters can be safely stored and transmitted across systems that may not support the full Unicode character set. It's crucial for web development, database storage, JSON data exchange, and any application that needs to handle multilingual content or special characters reliably.

Who is it for?

International web developers, software engineers working with multilingual applications, data migration specialists, and developers building systems that handle diverse character sets will find this tool invaluable. It's particularly useful for those working with JSON APIs, database imports/exports, or legacy systems that require Unicode escape sequences.

How to use the tool

Enter your text containing international characters, symbols, or emojis in the input field

Choose 'Encode' to convert characters to Unicode escape sequences, or 'Decode' to convert Unicode sequences back to readable text

Select your preferred Unicode format (\u escapes, code points, etc.) if options are available

Click the conversion button to process your text

Copy the encoded or decoded result for use in your code, database, or application

Frequently Asked Questions

How do I encode/decode Unicode escape sequences?

Paste text and the tool outputs the Unicode escape sequences (`\uXXXX` for BMP characters or `\u{XXXXX}` for supplementary characters). For decoding, paste `\u`-escaped strings and get the original text. Useful for: embedding Unicode in source code (JavaScript, JSON, Python), debugging encoding issues, inspecting what specific characters look like by codepoint. Runs entirely in your browser — your input never leaves the device.

What is Unicode and what are code points?

Unicode is the universal character standard, defining a unique 'codepoint' (a number) for every character across all scripts. Total: ~144,000 characters as of Unicode 15. Codepoints are written as U+XXXX (e.g., U+0041 = 'A', U+1F600 = 😀). UTF-8, UTF-16, UTF-32 are different encodings of the same codepoints into bytes. Modern web/JSON uses UTF-8 by default. The `\u` escape in code (JavaScript, JSON) represents a codepoint in source: `\u0041` = 'A'.

Is my data sent to a server when I encode?

No — encoding/decoding runs entirely in your browser via JavaScript. Your input never reaches a server, never gets logged. Verify in DevTools' Network tab: zero HTTP requests during encoding. Safe for processing sensitive text, debugging mojibake (garbled text), or inspecting what Unicode characters appear in a string.

What's the difference between codepoints and UTF-8 bytes?

Codepoints are abstract character numbers (U+0041 = A). UTF-8 bytes are how those codepoints are stored as 1-4 bytes. Examples: 'A' is codepoint U+0041 = 1 byte 0x41 in UTF-8. 'é' is U+00E9 = 2 bytes 0xC3 0xA9 in UTF-8. '中' is U+4E2D = 3 bytes 0xE4 0xB8 0xAD in UTF-8. '😀' is U+1F600 = 4 bytes 0xF0 0x9F 0x98 0x80 in UTF-8. For source code escapes (`\u`), you use codepoints; for binary protocols, you use UTF-8 bytes.

What are surrogate pairs?

UTF-16 represents codepoints above U+FFFF (the BMP boundary) as two 16-bit code units called surrogates: a high surrogate (D800-DBFF) followed by a low surrogate (DC00-DFFF). Together they encode codepoints in the supplementary planes (emoji, rare scripts, mathematical symbols). JavaScript's `\u` escape only handles single 16-bit values — so for codepoints above FFFF, you need surrogate pairs (`😀` for 😀) OR the modern `\u{1F600}` ES6 syntax. This affects string length: 'A' is length 1; '😀' is length 2 in JavaScript (counts surrogates), length 1 in code-point count.

Why does my JSON have escaped Unicode like \u00e9?

JSON allows two representations for non-ASCII characters: raw UTF-8 bytes (the modern recommendation) or `\uXXXX` escape sequences. Some JSON encoders (older Python `json.dumps`, some Java libraries) default to escaping all non-ASCII to ensure ASCII-safe output. Modern encoders (Node.js, modern Python with `ensure_ascii=False`) emit raw UTF-8. Both are valid JSON; both decode to the same string. For human-readable JSON, prefer raw UTF-8; for maximum compatibility with old systems, prefer escape sequences.

What is Unicode normalization (NFC, NFD)?

Some characters can be represented multiple ways. 'é' can be U+00E9 (single precomposed character) or U+0065 + U+0301 (e + combining acute accent — two code points). Visually identical, but different byte sequences — leading to mismatched string comparisons. Unicode normalization standardises: NFC composes (combines to single character); NFD decomposes (splits into base + combining marks). For text-equality checks, normalize to NFC first. JavaScript: `'café'.normalize('NFC')`. Critical for URL/filename consistency across macOS (NFD) and other OSes (NFC).

When should I use \u escape sequences in code?

Three common cases. (1) Source code with non-ASCII strings, when you want to keep source ASCII-only (for legacy editor compatibility, build tool issues): use `\u00e9` instead of 'é'. (2) Invisible/ambiguous characters: zero-width space (`\u200B`), non-breaking space (`\u00A0`), other Unicode whitespace — escape them for clarity. (3) Documentation showing what specific characters look like by codepoint. For normal modern code with UTF-8 source files, write the characters directly — escape sequences hurt readability. Modern editors handle UTF-8 source code reliably.

Share This Tool

Found this tool helpful? Share it with others who might benefit from it!

💡 Help others discover useful tools! Sharing helps us keep these tools free and accessible to everyone.

Support This Project

Buy Me a Coffee