scrapy 为什么要用yield item 而不用yield dict来传输数据

MiracleZhao

2020-01-08

经过实践, yield dict和yield item一样有效果，不过为什么官方要用yield item ，以下是官方解释：

The main goal in scraping is to extract structured data from unstructured sources, typically, web pages. Scrapy spiders can return the extracted data as Python dicts. While convenient and familiar, Python dicts lack structure: it is easy to make a typo in a field name or return inconsistent data, especially in a larger project with many spiders.

To define common output data format Scrapy provides the Item class. Item objects are simple containers used to collect the scraped data. They provide a dictionary-like API with a convenient syntax for declaring their available fields.

Various Scrapy components use extra information provided by Items: exporters look at declared fields to figure out columns to export, serialization can be customized using Item fields metadata, trackref tracks Item instances to help find memory leaks (see Debugging memory leaks with trackref), etc.

简单的说，就是爬虫过多的时候，使用dict容易出现键字打错，而造成数据传输错误，使用item 系统可以通过key error来提示程序员从而避免这种问题。

scrapy

安科网

scrapy 为什么要用yield item 而不用yield dict来传输数据

MiracleZhao

经过实践, yield dict和yield item一样有效果，不过为什么官方要用yield item ，以下是官方解释：

MiracleZhao

相关推荐

如何利用Scrapy爬虫框架抓取网页全部文章信息（上篇）

一分钟搞定Scrapy分布式爬虫、队列和布隆过滤器

一篇文章教会你理解Scrapy网络爬虫框架的工作原理和数据采集过程

在Scrapy中如何利用Xpath选择器从网页中采集目标数据——详细教程（下篇）

在Scrapy中如何利用Xpath选择器从网页中采集目标数据——详细教程（上篇）

手把手教你进行Scrapy中item类的实例化操作

如何改造 Scrapy 从而实现多网站大规模爬取？

二十六、Scrapy自定义命令

scrapy 管理部署的爬虫项目的python类

分布式爬虫部署基于scrapy和scrapy-redis

8_3 scrapy模拟登录人人网

Python爬虫 - scrapy

Scrapy爬虫

用scrapy爬取图片

scrapy基本知识

Python爬虫 - scrapy框架的基本操作

十八、scrapy内置媒体（图片和文件）下载方式

Scrapy爬虫

Python Scrapy图片爬取原理及代码实例

scrapy 详解

MiracleZhao