首页 > 用scrapy爬取网易新闻时出错[新手]

news163,用scrapy爬取网易新闻时出错[新手]

互联网 2021-06-14 08:06:19
items.pyimport scrapyclass News163Item(scrapy.Item):title = scrapy.Field()url = scrapy.Field()source = scrapy.Field()content = scrapy.Field()news_spider.py#coding:utf-8from scrapy.contrib.linkextractors import LinkExtractorfrom scrapy.contrib.spiders import CrawlSpider,Ruleclass ExampleSpider(CrawlSpider):name = "news"allowed_Domains = ["news.163.com"]start_urls = ['http://news.163.com/']rules = [Rule(LinkExtractor(allow=r"/14/12\d+/\d+/*"),'parse_news')]def parse_news(self,response):news = News163Item()news['title'] = response.xpath("//*[@id="h1title"]/text()").extract()news['source'] = response.xpath("//*[@id="ne_article_source"]/text()").extract()news['content'] = response.xpath("//*[@id="endText"]/text()").extract()news['url'] = response.urlreturn news

cd进入所在目录后,命令行执行:

scrapy crawl news -o news163.json

会跳出如下错误:

Traceback (most recent call last):File "/usr/bin/scrapy", line 9, in load_entry_point('Scrapy==0.24.4', 'console_scripts', 'scrapy')()File "/usr/lib/pymodules/python2.7/scrapy/cmdline.py", line 143, in execute_run_print_help(parser, _run_command, cmd, args, opts)File "/usr/lib/pymodules/python2.7/scrapy/cmdline.py", line 89, in _run_print_helpfunc(*a, **kw)File "/usr/lib/pymodules/python2.7/scrapy/cmdline.py", line 150, in _run_commandcmd.run(args, opts)File "/usr/lib/pymodules/python2.7/scrapy/commands/crawl.py", line 57, in runcrawler = self.crawler_process.create_crawler()File "/usr/lib/pymodules/python2.7/scrapy/crawler.py", line 87, in create_crawlerself.crawlers[name] = Crawler(self.settings)File "/usr/lib/pymodules/python2.7/scrapy/crawler.py", line 25, in __init__self.spiders = spman_cls.from_crawler(self)File "/usr/lib/pymodules/python2.7/scrapy/spidermanager.py", line 35, in from_crawlersm = cls.from_settings(crawler.settings)File "/usr/lib/pymodules/python2.7/scrapy/spidermanager.py", line 31, in from_settingsreturn cls(settings.getlist('SPIDER_MODULES'))File "/usr/lib/pymodules/python2.7/scrapy/spidermanager.py", line 22, in __init__for module in walk_modules(name):File "/usr/lib/pymodules/python2.7/scrapy/utils/misc.py", line 68, in walk_modulessubmod = import_module(fullpath)File "/usr/lib/python2.7/importlib/__init__.py", line 37, in import_module__import__(name)File "/home/gao/news/news/spiders/news_spider.py", line 15news['title'] = response.xpath("//*[@id="h1title"]/text()").extract() ^SyntaxError: invalid syntax

请问是哪里出错了?python新手,scrapy也是最近才用的,很生疏,求指点。

谢谢:@捏造的信仰 的回答,但是更改过之后,还是有错误。

2014-12-02 20:13:02+0800 [news] ERROR: Spider error processing Traceback (most recent call last):File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 824, in runUntilCurrentcall.func(*call.args, **call.kw)File "/usr/lib/python2.7/dist-packages/twisted/internet/task.py", line 638, in _ticktaskObj._oneWorkUnit()File "/usr/lib/python2.7/dist-packages/twisted/internet/task.py", line 484, in _oneWorkUnitresult = next(self._iterator)File "/usr/lib/pymodules/python2.7/scrapy/utils/defer.py", line 57, in work = (callable(elem, *args, **named) for elem in iterable)------File "/usr/lib/pymodules/python2.7/scrapy/utils/defer.py", line 96, in iter_errbackyield next(it)File "/usr/lib/pymodules/python2.7/scrapy/contrib/spidermiddleware/offsite.py", line 26, in process_spider_outputfor x in result:File "/usr/lib/pymodules/python2.7/scrapy/contrib/spidermiddleware/referer.py", line 22, in return (_set_referer(r) for r in result or ())File "/usr/lib/pymodules/python2.7/scrapy/contrib/spidermiddleware/urllength.py", line 33, in return (r for r in result or () if _filter(r))File "/usr/lib/pymodules/python2.7/scrapy/contrib/spidermiddleware/depth.py", line 50, in return (r for r in result or () if _filter(r))File "/usr/lib/pymodules/python2.7/scrapy/contrib/spiders/crawl.py", line 67, in _parse_responsecb_res = callback(response, **cb_kwargs) or ()File "/home/gao/news/news/spiders/news_spider.py", line 14, in parse_newsnews = News163Item()exceptions.NameError: global name 'News163Item' is not defined

请问这又是什么原因呢?

免责声明:非本网注明原创的信息,皆为程序自动获取自互联网,目的在于传递更多信息,并不代表本网赞同其观点和对其真实性负责;如此页面有侵犯到您的权益,请给站长发送邮件,并提供相关证明(版权证明、身份证正反面、侵权链接),站长将在收到邮件24小时内删除。

相关阅读