Tuesday, April 29, 2008

Convert text to ASCII and ASCII to text - Python code

Python has a very simple way to convert text to ASCII and ASCII to text.

To find the ASCII value of a character use the ord() function.
>>> ord('B')

To get a character from it's ASCII value use the chr() function.
>>> chr(65)

Here is a program to get the list of all characters along with their values:

# program to print the list of characters and their ASCII values
for value in range(0, 255):
print "ASCII Value:", value, "\t", "Character:", chr(value), "\n"

Here is another simple ASCII encoding-decoding program:

# Convertion from text to ASCII codes

message = raw_input("Enter message to encode: ")

print "Decoded string (in ASCII):"
for ch in message:
print ord(ch),
print "\n\n"

# Convertion from ASCII codes to text

message = raw_input("Enter ASCII codes: ")

decodedMessage = ""

for item in message.split():
decodedMessage += chr(int(item))

print "Decoded message:", decodedMessage

Hope you got a clear understanding of how to convert between text and ASCII code.


aatiis said...

I used to use chr() and ord() to store huge numbers (like 512bit integers) by bit-shifting by 8 and &ing. Actually I used '%c' % 65, but I think it's pretty much the same as chr(65).

Steven Kehlet said...

Thanks, very helpful.

Jabba Laci said...

If you use range(0,255), then you'll miss the value 255. I suggest range(256). Another solution for producing the ASCII table:

for char in range(256):
print "%d: %c" % (char, char)

I started to learn Python a week ago. I'm glad that I found your blog, it'll be useful for me.

Attila Oláh said...

Yup, string formatting using "%c" is an elegant solution. And you're right about the 256 too, since 256 = len(range(256)), of course :)

Naveen said...

Nice post. Pretty much clarifies about the ASCII chars (0-255). How about UTF-8 charset ? For a given character (for example ò), how to find its Unicode representation ?

Attila Oláh said...

What do you mean by "unicode representation"? u'ó' *is* the Unicode representation. You can encode/decode, of course. I.e.

>>> [ord(c) for c in u'ó'.encode('utf-8')]
[194, 179]

On the other hand:
>>> '%c%c' % (194, 179)
>>> 'ó'

Can you tell me, why '%c%c' % (194, 179) != 'ó'? And why:

>>> print '%c%c' % (195, 179)


Naveen said...

'%c%c' % (194, 179) != 'ó'
May be because %c %c took the default code page of the underlying stream which is why we did not get ó.

I just took a wild-guess.. Correct me if I am wrong.

Attila Oláh said...

The :mod:`unicodedata` module might be helpful in this case: this is retrieved from http://www.python.org/doc/2.6.4/library/unicodedata.html :

>>> import unicodedata
>>> unicodedata.lookup('LEFT CURLY BRACKET')
>>> unicodedata.name(u'/')
>>> unicodedata.decimal(u'9')
>>> unicodedata.decimal(u'a')
Traceback (most recent call last):
File "", line 1, in ?
ValueError: not a decimal
>>> unicodedata.category(u'A') # 'L'etter, 'u'ppercase
>>> unicodedata.bidirectional(u'\u0660') # 'A'rabic, 'N'umber

Kyria Kalokairi said...

Thank you so much! I wasted an hour on this - and it's so easy once you know how to do it.