The Complete Technical Guide to URL Percent-Encoding
Everything developers, marketers, and webmasters need to know about how URLs handle special characters.
Percent-encoding (also called URL encoding) is a mechanism defined in the RFC 3986 internet standard that allows arbitrary data to be represented safely inside a Uniform Resource Locator (URL). A URL can only legally contain a limited set of ASCII characters - the letters A-Z, digits 0-9, and a handful of punctuation marks like hyphens, underscores, periods, and tildes. Any character outside that approved set must be converted before it can travel across the internet without corruption.
The conversion works by replacing each unsafe byte with a percent sign (%) followed by exactly two hexadecimal digits that represent the byte value in ASCII or UTF-8. For example, a space character has the decimal ASCII value 32, which is 20 in hexadecimal - so a space becomes %20. An exclamation mark (!) is ASCII 33, which is 21 in hexadecimal - so it becomes %21. This is why you often see strings like hello%20world in the address bar of your browser.
Without percent-encoding, a URL containing a space would confuse servers and browsers because a space is also used as a delimiter in HTTP requests. Similarly, characters like #, &, and ? have special structural meaning inside a URL - using them literally inside a value would break the URL's intended structure. Percent-encoding allows those characters to appear safely as data rather than as structural signals.
RFC 3986 divides URL characters into two categories: unreserved characters and reserved characters. Understanding this distinction is fundamental to encoding URLs correctly.
Unreserved characters are the safe characters that can appear anywhere in a URL without any encoding, because they have no special structural meaning. These are: uppercase and lowercase letters (A-Z, a-z), digits (0-9), and four punctuation marks: hyphen (-), underscore (_), period (.), and tilde (~). These 66 characters are always safe to use as-is.
Reserved characters are characters that have a specific structural purpose inside a URL. The main reserved characters are: :, /, ?, #, [, ], @, !, $, &, ', (, ), *, +, ,, ;, and =. Each of these plays a role in separating the different parts of a URL - for example, ? begins the query string, & separates multiple query parameters, and = joins a parameter name to its value. If you need to use one of these characters as literal data (rather than as a structural delimiter), it must be percent-encoded so the server does not misinterpret it.
JavaScript provides two built-in encoding functions and choosing the wrong one is one of the most common URL-related bugs in web development. The key question to ask yourself is: "Am I encoding a complete URL, or am I encoding a value that will be placed inside a URL?"
encodeURI() is designed for encoding a complete, already-structured URL. It intentionally leaves the structural reserved characters untouched - so characters like :, /, ?, &, =, and # pass through unencoded. This is correct when you want to send the URL across the wire without breaking its structure. For example, encodeURI("https://example.com/search?q=hello world") produces https://example.com/search?q=hello%20world - the space is encoded but the URL structure is preserved.
encodeURIComponent() is more aggressive. It encodes everything that is not an unreserved character - including all of the reserved structural characters. This makes it ideal for encoding the individual value portion of a query parameter. For example, if you have a redirect URL as a query parameter, like ?redirect=https://other.com/path, the value https://other.com/path contains slashes and colons that would break the outer URL if left unencoded. Using encodeURIComponent("https://other.com/path") produces https%3A%2F%2Fother.com%2Fpath - fully safe to embed as a parameter value.
The practical rule: use encodeURIComponent() for any individual query string value, form field value, or data fragment. Use encodeURI() only when you need to sanitize a complete URL that already has its structure intact.
Spaces are among the most commonly mishandled characters in URLs. A space is not a valid URL character and must always be encoded before a URL is transmitted. However, there are actually two different ways spaces can appear in encoded URLs, and mixing them up can cause subtle bugs.
The modern, standards-compliant encoding of a space is %20. This is what both encodeURI() and encodeURIComponent() produce, and it is correct in all parts of a URL - in the path segment, in query parameter names, and in query parameter values.
You may also encounter the + character being used to represent a space. This originates from the older application/x-www-form-urlencoded format used when HTML forms submit data via the GET method. In this format, spaces in form fields are encoded as + rather than %20. While many servers and frameworks accept both interchangeably in query strings, a literal + in a URL is technically a plus sign - not a space. If you paste a URL containing + into this tool's decode mode, a + will remain as a + in the output because decodeURIComponent() follows the strict RFC standard. Always prefer %20 in URLs you construct programmatically to avoid ambiguity.
The percent-encoding specification works at the level of individual bytes. For characters in the standard ASCII range (0-127), each character maps directly to one byte, so one character becomes one %XX sequence. But the modern web uses UTF-8 to represent a far larger universe of characters - accented letters (like e with an acute accent), non-Latin scripts (Arabic, Chinese, Japanese, Korean, Hindi), and even emoji.
UTF-8 is a variable-width character encoding: common ASCII characters use one byte, but characters outside the ASCII range use two, three, or four bytes. When a multi-byte character gets percent-encoded, each byte is encoded separately, producing a chain of %XX pairs. For example, the letter e with an acute accent (e) has the UTF-8 byte sequence 0xC3 0xA9, so it percent-encodes to %C3%A9. An emoji like the rocket character requires four bytes in UTF-8, producing four %XX pairs when encoded.
This tool uses JavaScript's native encodeURIComponent() and decodeURIComponent(), both of which handle UTF-8 multi-byte sequences automatically and correctly. You can safely paste text in any language - Japanese, Arabic, emoji, or mathematical symbols - and the tool will produce the correct, standards-compliant percent-encoded output. Modern web servers and browsers all understand UTF-8 percent-encoding, making it the universal standard for internationalized URLs.
Common Character Percent-Encoding Reference
| Character | Name | Percent-Encoded | URL Role / Notes |
|---|---|---|---|
| Space | Space | %20 | Most common encoding need. Always encode spaces in URLs. |
! | Exclamation Mark | %21 | Sub-delimiter. Encoded by encodeURIComponent, safe in some contexts. |
" | Double Quote | %22 | Not valid in URLs. Must always be encoded. |
# | Hash / Pound | %23 | Marks the fragment identifier. Encode when used as data. |
$ | Dollar Sign | %24 | Reserved sub-delimiter. Encode when used as data in a value. |
% | Percent Sign | %25 | The encoding prefix itself. Must be encoded as %25 when used literally. |
& | Ampersand | %26 | Separates query parameters. Must be encoded inside a parameter value. |
' | Single Quote | %27 | Sub-delimiter. Encode in query values to prevent SQL/HTML injection risks. |
( | Open Parenthesis | %28 | Sub-delimiter. Encode inside query parameter values. |
) | Close Parenthesis | %29 | Sub-delimiter. Encode inside query parameter values. |
+ | Plus Sign | %2B | Ambiguous: means "space" in form encoding. Encode to avoid confusion. |
, | Comma | %2C | Sub-delimiter. Often used to separate list values in query strings. |
/ | Forward Slash | %2F | Path separator. Encode with encodeURIComponent inside a query value. |
: | Colon | %3A | Separates protocol from host. Encode inside a parameter value. |
; | Semicolon | %3B | Sub-delimiter. Encode in query string values. |
= | Equals Sign | %3D | Joins key to value in query string. Must be encoded inside a value. |
? | Question Mark | %3F | Begins the query string. Encode inside query parameter values. |
@ | At Sign | %40 | Used in authority (user info). Encode inside parameter values. |
[ | Open Bracket | %5B | Used for IPv6 address notation. Encode in all other contexts. |
] | Close Bracket | %5D | Used for IPv6 address notation. Encode in all other contexts. |
{ | Open Brace | %7B | Not a valid URL character. Always encode. |
| | Pipe / Vertical Bar | %7C | Not a valid URL character. Always encode. |
} | Close Brace | %7D | Not a valid URL character. Always encode. |
~ | Tilde | %7E | Unreserved character - technically safe to leave unencoded, but often encoded for uniformity. |