Posts

Showing posts with the label url redirect

Get Original URL

Once I got into trouble while crawling some websites. Some of the URL I had wasn't the original URL, rather they were redirecting to some other URL. Then I came up with a function to get the original URL. Here I share it with you: def get_original_url(url): """This function takes an url and returns the original url with cookie (if any) """     cj = cookielib.CookieJar()     opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))     opener.addheaders = [('User-agent', 'Mozilla/5.0')]     usock = opener.open(url)     url = usock.geturl()     usock.close()     return url, cj Please send me your comments on this piece of code.