Xbasic

Character Encoding for Web Pages

Description

ASCII Control Characters, Non-ASCII Characters, Reserved Characters, and Unsafe Characters.

ASCII Control Characters

ASCII control characters are encoded because they are not printable. ASCII control characters include the ISO-8859-1 (ISO-Latin) character ranges 00-1F hex (0-31 decimal) and 7F (127 decimal).

Non-ASCII Characters

Non-ASCII characters are encode because they are not in the ASCII set and by definition not legal in URLs. This set of characters includes the entire "top half" of the ISO-Latin set 80-FF hex (128-255 decimal).

Reserved Characters

URLs use some characters for special use in defining their syntax. When these characters are not used in their special role inside a URL, they need to be encoded.

Character

Code (Hex)

Code (Dec)

Dollar ("$")

24

36

Ampersand ("&")

26

38

Plus (" + ")

2B

43

Comma (",")

2C

44

Forward slash/Virgule ("/")

2F

47

Colon (":")

3A

58

Semi-colon (";")

3B

59

Equals (" = ")

3D

61

Question mark ("? ")

3F

63

'At' symbol ("@")

40

64

Unsafe Characters

Some characters present the possibility of not being understood within URLs for various reasons. These characters should also always be encoded.

Character

Code (Hex)

Code (Dec)

Why encode?

Space

20

32

Significant sequences of spaces may be lost in some uses (especially multiple spaces)

Quotation marks

22

These characters are often used to delimit URLs in plain text.

'Less Than' symbol ("<")

3C

'Greater Than' symbol (">")

3E

'Pound' character ("#")

23

35

This is used in URLs to indicate where a fragment identifier (bookmarks/anchors in HTML) begins.

Percent character ("%")

25

37

This is used to URL encode/escape other characters, so it should itself also be encoded.

Left Curly Brace ("{")

7B

123

Some systems can possibly modify these characters.

Right Curly Brace ("}")

7D

125

Vertical Bar/Pipe ("|")

7C

124

Backslash ("\")

5C

92

Caret ("^")

5E

94

Tilde ("~")

7E

126

Left Square Bracket ("[")

5B

91

Right Square Bracket ("]")

5D

93

Grave Accent ("`")

60

96

See Also