Updated python code for get html source
Yesterday I made little update to my function get_html_source() that gets the content of a page. I did so because I found that my previous function didn't support HTTP POST. Now the code supports both HTTP GET and HTTP POST. It also returns the cookiejar along with the html content of the page. def get_html_source(url, referer = '', data = 0, cj = 0, retry_counter = 0): if retry_counter > 0: print 'Trying Again...' if retry_counter > 3: print 'Could not get source from url:', url return '', '' try: if cj: opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj)) else: opener = urllib2.build_opener() opener.addheaders = [('Referer', referer), ('Content-Type', 'application/x-www-form-urlencoded'), ('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.14) Gecko/200