set timeout while spidering a site

Though I heavily depend on urllib2 module to develop web crawler, but sometimes the crawlers just stuck ... :(. So it's necessary to set a timeout but unfortunately urllib2 doesn't provide anything for this purpose. So we have to depend on socket module. here is the code that I use:


import socket

timeout = 300 # seconds
socket.setdefaulttimeout(timeout)

Comments

redbaron said…
What perfomance (sites/day) did you achieve with your crawlers?

Popular posts from this blog

Python all any built-in function

Accept-Encoding 'gzip' to make your cralwer faster

lambda magic to find prime numbers