亚洲国产日韩欧美一区二区三区,精品亚洲国产成人av在线,国产99视频精品免视看7,99国产精品久久久久久久成人热,欧美日韩亚洲国产综合乱

<u id="nxwpe"></u>

Community

Learn

Tools Library

AI Tools

Leisure

English

python - The webpage cannot be crawled again after updating the data

給我你的懷抱 2017-05-18 10:58:50

695

The webpage I crawled updated a piece of information today, and then the crawler ran but did not crawl it.


from pyspider.libs.base_handler import *
from pyspider.database.mysql.mysqldb import SQL

class Handler(BaseHandler):
    crawl_config = {
    }
       
    @every(minutes=24 * 60)
    def on_start(self):
        self.crawl('http://www.yxztb.net/yxweb/zypd/012001/012001001/', callback=self.index_page)

    @config(age=10 * 24 * 60 * 60)
    def index_page(self, response):
        for each in response.doc('.tdmoreinfosub a').items():
            self.crawl(each.attr.href, callback=self.detail_page)
            
    @config(priority=2)
    def detail_page(self, response):
        
        return {
                "address":"宜興市",
                "url":response.url,
                "title":response.doc('font  span').text(),
                "date" :response.doc('#tdTitle > .webfont').text()[8:17],
            }
    
    def on_result(self, result):
        print result
        if not result or not result['title']:
            return
        sql = SQL()
        sql.replace('zhaobiao',**result)

I hope the bosses can be more specific and have more exchanges

給我你的懷抱

reply all(2)

我想大聲告訴你2017-05-18 11:00:50 2 floor

@config (age) parameter setting directly ignores the execution of index.page

Like +0

Add Reply

迷茫2017-05-18 11:00:50 1 floor

Since @every of on_start is one day, then set age=12 * 60 * 60 半天是比較合適的，保證每次 every 肯定不會(huì)被 age 所限制。另外 @config(age=10 * 24 * 60 * 60) in self.crawl, which means not to crawl again within 10 days.

Like +0

Add Reply