Get Original URL
Once I got into trouble while crawling some websites. Some of the URL I had wasn't the original URL, rather they were redirecting to some other URL. Then I came up with a function to get the original URL. Here I share it with you: def get_original_url(url): """This function takes an url and returns the original url with cookie (if any) """ cj = cookielib.CookieJar() opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj)) opener.addheaders = [('User-agent', 'Mozilla/5.0')] usock = opener.open(url) url = usock.geturl() usock.close() return url, cj Please send me your comments on this piece of code.