frontera icon indicating copy to clipboard operation
frontera copied to clipboard

The `KeyError` throw when running to to_fetch in StateContext class: b'fingerprint'

Open yujiaao opened this issue 5 years ago • 0 comments

https://github.com/scrapinghub/frontera/blob/master/frontera/core/manager.py I use 0.8.1 code base in LOCAL_MODE, The KeyError throw when running to to_fetch in StateContext class:

from line 801:

class StatesContext(object):
	...
    def to_fetch(self, requests):
        requests = requests if isinstance(requests, Iterable) else [requests]
        for request in requests:
            fingerprint = request.meta[b'fingerprint'] # error occured here!!!
```		

I think the reason is the meta b'fingerprint' used before it's setting:

from line 302:

class LocalFrontierManager(BaseContext, StrategyComponentsPipelineMixin, BaseManager): def page_crawled(self, response): ... self.states_context.to_fetch(response) # here used b'fingerprint' self.states_context.fetch() self.states_context.states.set_states(response) super(LocalFrontierManager, self).page_crawled(response) # but only here init! self.states_context.states.update_cache(response)

from line 233:

class BaseManager(object): def page_crawled(self, response): ... self._process_components(method_name='page_crawled', obj=response, return_classes=self.response_model) # b'fingerprint' will be set when pipeline go through here

My corrent work aroud is add the line to to_fetch method of StateContext class:

def to_fetch(self, requests):
    requests = requests if isinstance(requests, Iterable) else [requests]
    for request in requests:
        if b'fingerprint' not in request.meta:                
            request.meta[b'fingerprint'] = sha1(request.url)
        fingerprint = request.meta[b'fingerprint']
        self._fingerprints[fingerprint] = request
What is the collect way to fix this? 

yujiaao avatar Sep 04 '20 07:09 yujiaao