Use user agent in your spider
Some websites don't allow your spider to scrape the pages unless you use an user-agent in your code. You can fool the websites using user-agent so that they understand that the request is coming from a browser. Here is a piece of code that use user agent 'Mozilla 5.0' to get the html content of a website: import urllib2 url = "http://www.example.com" #write your url here opener = urllib2.build_opener() opener.addheaders = [('User-agent', 'Mozilla/5.0')] usock = opener.open(url) url = usock.geturl() data = usock.read() usock.close() print data You can use other user agent as well. For example, the user agent my Firefox browser uses: "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.12) Gecko/20061201 Firefox/2.0.0.12 (Ubuntu-feisty)" What is your user agent?