[Feature Request] If target selector does not exist
I think the X-Target-Selector is a fantastic addition to the API, but it also introduces a maintenance aspect, that is currently hard to work with; CSS selectors can change. I wonder if it would make sense for reader to indicate in the response, whether or not the provided selector actually existed?
From what I can tell, currently, reader will just silently fall back to returning all the content from the target URL instead. This is fine, but an extra indication that the selector did not exist, would allow users to react to this. I think this would be highly valuable, since sites change and it is hard to know when they do with reader at the moment.
Hi @MaSchVam thanks for the suggestion and sorry for the late reply. It's indeed a nice idea.
Let me check what I can do.
@nomagick I guess we need to separate stream response and non-stream response, here is what I'm thinking:
- For stream response, use a separated event in the end to indicate if target selector exists.
- For non-stream response, use a response header to indicate.
What do you think?
@mapleeit Perhaps we should return blank or 422 error when the target selector did not hit anything.
The idea of the target selector was to use the selected elements to replace the original DOM tree. Previously, I didn't give it much thought and simply left it untouched when no elements were found. https://github.com/jina-ai/reader/blob/7e6c2fcf485ed7b69c44ebd44ca7bd20077929f5/backend/functions/src/services/jsdom.ts#L86-L88
But as this issue suggested, it might be a more uniform behaviour to return blank or a 422 error, as if the page was blank.
Fixed by c36aa730b4aa4c369f93eab7411b071f34ac0fd5