Strip HTML tags using Python
We often need to strip HTML tags from string (or HTML source). I usually do it using a simple regular expression in Python. Here is my function to strip HTML tags: def remove_html_tags(data): p = re.compile(r'<.*?>') return p.sub('', data) Here is another function to remove more than one consecutive white spaces: def remove_extra_spaces(data): p = re.compile(r'\s+') return p.sub(' ', data) Note that re module needs to be imported in order to use regular expression. Here you can find an updated code that gets the text from html: http://love-python.blogspot.com/2011/04/html-to-text-in-python.html