extract domain name from url

Sometimes I need to find domain name from url in my program for various purposes (most of the time in my crawlers). So far I used the following function that takes an url and returns the domain name:


def find_domain(url):
    pos = url[7:].find('/')
    if pos == -1:
        pos = url[7:].find('?')
        if pos == -1:
            return url[7:]
        url = url[7:(7+pos)]
        return url


But today I found a module named urlparse. So my function now looks like this:


def find_domain2(url):
    return urlparse(url)[1]

The new one is much better I think.

Check urlparse for details.

Comments

alexandre said…
This will return the hostname, not the domain name. Anyway, thanks for sharing.
Unknown said…
Hey, i optimized your code:

from urlparse import urlparse
parsed = urlparse('http://example.com')
print parsed.hostname
Swagat said…
exactly my point urlparse gives the hostname, not domain name
Henri Salo said…
You should also try Python-module tldextract.

Popular posts from this blog

Strip HTML tags using Python

lambda magic to find prime numbers

Convert text to ASCII and ASCII to text - Python code