Web Crawling

·

Pablo Hoffman

There is also Scrapy (Python based) which is faster than Mechanize but not as scalable as Nutch or Heritrix, which means that it’s not meant to be used for crawling the entire web, but it’s OK for crawling a lot (5000+) of sites, even huge ones like Amazon.