python-user-agents icon indicating copy to clipboard operation
python-user-agents copied to clipboard

is_bot property is not implemented well

Open dobestan opened this issue 10 years ago • 5 comments

@property
    def is_bot(self):
        return True if self.device.family == 'Spider' else False

Is it okay to update current implementation of is_bot property? Have any idea?

dobestan avatar Aug 23 '15 13:08 dobestan

Any suggestions? :)

selwin avatar Aug 23 '15 23:08 selwin

@selwin Here the full bot list: http://www.robotstxt.org/db/all.txt or alternatively there is also this python lib: https://pypi.python.org/pypi/robot-detection

fabiocaccamo avatar Jul 11 '16 10:07 fabiocaccamo

I have a supporting test from a corpus of 211,000 UA strings. This is a rare 'miss', but, the attached shows a case where the word 'bot' in a user agent string correlates with bots that aren't caught by the parser.

agents_not_is_bot.txt

Also, code review on the function:

return True if self.device.family == 'Spider' else False

can be written as:

return self.device.family == 'Spider'

elisarver avatar Jul 22 '16 14:07 elisarver

Has there been any more thought about this?

Do you (we) consider this to be the appropriate place for building out more accurate bot detection or is it the responsibility of ua_parser (and the regexes in uap-core)?

How about an attempt to integrate https://pypi.python.org/pypi/robot-detection in some way? It should be reasonably simple. The only difficulty I see is reconciling what ua-parser considers a 'spider' with what robot-detection considers a bot; No doubt they will diverge. Is that a problem?

robcowie avatar Mar 26 '17 10:03 robcowie

Bump on this would be nice if this was something that was configurable so we can register our own bots.

cr0mbly avatar Mar 16 '21 18:03 cr0mbly