Character Encoding for Web Pages
Description
ASCII Control Characters, Non-ASCII Characters, Reserved Characters, and Unsafe Characters.
ASCII Control Characters
ASCII control characters are encoded because they are not printable. ASCII control characters include the ISO-8859-1 (ISO-Latin) character ranges 00-1F hex (0-31 decimal) and 7F (127 decimal).
Non-ASCII Characters
Non-ASCII characters are encode because they are not in the ASCII set and by definition not legal in URLs. This set of characters includes the entire "top half" of the ISO-Latin set 80-FF hex (128-255 decimal).
Reserved Characters
URLs use some characters for special use in defining their syntax. When these characters are not used in their special role inside a URL, they need to be encoded.
Character | Code (Hex) | Code (Dec) |
---|---|---|
Dollar ("$") | 24 | 36 |
Ampersand ("&") | 26 | 38 |
Plus (" + ") | 2B | 43 |
Comma (",") | 2C | 44 |
Forward slash/Virgule ("/") | 2F | 47 |
Colon (":") | 3A | 58 |
Semi-colon (";") | 3B | 59 |
Equals (" = ") | 3D | 61 |
Question mark ("? ") | 3F | 63 |
'At' symbol ("@") | 40 | 64 |
Unsafe Characters
Some characters present the possibility of not being understood within URLs for various reasons. These characters should also always be encoded.
Character | Code (Hex) | Code (Dec) | Why encode? |
---|---|---|---|
Space | 20 | 32 | Significant sequences of spaces may be lost in some uses (especially multiple spaces) |
Quotation marks | 22 | These characters are often used to delimit URLs in plain text. | |
'Less Than' symbol ("<") | 3C | ||
'Greater Than' symbol (">") | 3E | ||
'Pound' character ("#") | 23 | 35 | This is used in URLs to indicate where a fragment identifier (bookmarks/anchors in HTML) begins. |
Percent character ("%") | 25 | 37 | This is used to URL encode/escape other characters, so it should itself also be encoded. |
Left Curly Brace ("{") | 7B | 123 | Some systems can possibly modify these characters. |
Right Curly Brace ("}") | 7D | 125 | |
Vertical Bar/Pipe ("|") | 7C | 124 | |
Backslash ("\") | 5C | 92 | |
Caret ("^") | 5E | 94 | |
Tilde ("~") | 7E | 126 | |
Left Square Bracket ("[") | 5B | 91 | |
Right Square Bracket ("]") | 5D | 93 | |
Grave Accent ("`") | 60 | 96 |
See Also