set timeout while spidering a site

Though I heavily depend on urllib2 module to develop web crawler, but sometimes the crawlers just stuck ... :(. So it's necessary to set a timeout but unfortunately urllib2 doesn't provide anything for this purpose. So we have to depend on socket module. here is the code that I use:


import socket

timeout = 300 # seconds
socket.setdefaulttimeout(timeout)

Comments

redbaron said…
What perfomance (sites/day) did you achieve with your crawlers?

Popular posts from this blog

Strip HTML tags using Python

lambda magic to find prime numbers

Convert text to ASCII and ASCII to text - Python code