Saturday, April 16, 2011

Replace consecutive whitespace with a single space

Sometimes we need to replace consecutive whitespace in a string with a single space. This is a good practice while parsing html files. Let me show you two ways of doing this.

First one is to split the string and join. Here is the code snippet:
>>> s = "a b c d e f"
>>> " ".join(s.split())
'a b c d e f'

You can check more string methods here: http://docs.python.org/release/2.5.2/lib/string-methods.html

Second method is to use regular expression. Here is the code:
>>> import re
>>> s = "a b c d e f"
>>> p = re.compile(r'\s+')
>>> data = p.sub(' ', s)
>>> data
'a b c d e f'

3 comments:

JustGlowing said...

I prefer to use the replace() method to replace the doubled space with single space:
>>> s = "a b c d e"
>>> s.replace(" "," ")
'a b c d e'

Tamim Shahriar (Subeen) said...

But if there is odd number of spaces between a and b (say a[space][space][space]b) then using replace("[space][space]", "[space]") will leave two spaces between a and b (a[space][space]b).

Sagar said...

You can run multiple iterations of the replace() method.
Lets say s = "a[space][space][space]b"
After iteration 1,
s = "a[space][space]"
After iteration 2,
s = "a[space][b"