set timeout while spidering a site

February 12, 2008

Though I heavily depend on urllib2 module to develop web crawler, but sometimes the crawlers just stuck ... :(. So it's necessary to set a timeout but unfortunately urllib2 doesn't provide anything for this purpose. So we have to depend on socket module. here is the code that I use:


import socket

timeout = 300 # seconds
socket.setdefaulttimeout(timeout)

Comments

redbaron said…

What perfomance (sites/day) did you achieve with your crawlers?

September 7, 2008 at 2:36 AM

Search This Blog

life is short - you need Python!

set timeout while spidering a site

Comments

Popular posts from this blog

Python all any built-in function

Accept-Encoding 'gzip' to make your cralwer faster

lambda magic to find prime numbers