Scrapy 爬虫

Scrapy 统计收集

Stats Collector是Scrapy提供的一种工具，用于以键/值的形式收集统计信息，并使用Crawler API（Crawler提供对所有Scrapy核心组件的访问权限）访问它。统计收集器为每个蜘蛛提供一个统计表，其中当蜘蛛打开时统计收集器自动打开，并且在蜘蛛关闭时关闭统计收集器。

Common Stats Collector使用

以下代码使用 stats 属性访问统计信息收集器。

class ExtensionThatAccessStats(object):
   def __init__(self, stats):
      self.stats = stats  

   @classmethod
   def from_crawler(cls, crawler):
      return cls(crawler.stats)

下表显示了可用于统计收集器的各种选项：


stats.set_value('hostname', socket.gethostname())
stats.inc_value('customized_count')
stats.max_value('max_items_scraped', value)
stats.min_value('min_free_memory_percent', value)
stats.get_value('customized_count')
stats.get_stats()
{'custom_count': 1, 'start_time': datetime.datetime(2009, 7, 14, 21, 47, 28, 977139)}

序号	参数	描述
1	stats.set_value('hostname', socket.gethostname())	它用于设置统计值。
2	stats.inc_value('customized_count')	它增加了统计值。
3	stats.max_value('max_items_scraped', value)	只有在大于以前的值时，才可以设置统计值。
4	stats.min_value('min_free_memory_percent', value)	只有在低于先前值的情况下，才可以设置统计值。
5	stats.get_value('customized_count')	它获取统计值。
6	stats.get_stats() {'custom_count': 1, 'start_time': datetime.datetime(2009, 7, 14, 21, 47, 28, 977139)}	它提取所有的统计数据

可用的统计收集器

Scrapy提供了可以使用 STATS_CLASS 设置访问的不同类型的统计收集器。

MemoryStatsCollector

它是默认的统计收集器，用于维护用于抓取的每个蜘蛛的统计信息，并将数据存储在内存中。

class scrapy.statscollectors.MemoryStatsCollector

DummyStatsCollector

这个统计收集器是非常有效的，它什么都不做。这可以使用 _STATSCLASS 设置进行设置，并可用于禁用统计信息收集以提高性能。

class scrapy.statscollectors.DummyStatsCollector