crawl4ai
crawl4ai copied to clipboard
[Bug]: `css_selector` is not working properly in version 0.5.0.post8
crawl4ai version
0.5.0.post8
Expected Behavior
Upgrading from v0.4.248 to v0.5.0.post8, I found css_selector is different from before. What I expected:
- CSS selectors like
div[data-testid='ArticleBody']can work - Should return all matching elements
Current Behavior
- When the css_selector contains single quotes, for example,
div[data-testid='ArticleBody'], an error will occur:
Warning: Could not get content for selector 'div[data-testid='ArticleBody']': Page.evaluate: SyntaxError: missing ) after argument list
at eval (<anonymous>)
at UtilityScript.evaluate (<anonymous>:234:30)
at UtilityScript.<anonymous> (<anonymous>:1:44)
- Only the first element that matches the specified CSS selector was matched, rather than all matching elements.
Is this reproducible?
Yes
Inputs Causing the Bug
These two issues can be resolved by modifying the relevant code in async_crawler_strategy.py like this:
821c821,826
< content = await page.evaluate(f"document.querySelector('{selector}')?.outerHTML || ''")
---
> # Use querySelectorAll to get all elements matching the selector
> # Use double quotes to wrap the selector, so the single quotes inside it won't cause problems
> content = await page.evaluate(
> f"""Array.from(document.querySelectorAll("{selector}"))
> .map(el => el.outerHTML)
> .join('')""")
I intended to submit a pull request, but couldn't find the code branch corresponding to v0.5.0.post8.
Steps to Reproduce
Code snippets
OS
macOS
Python version
3.12.9
Browser
No response
Browser version
No response
Error logs & Screenshots (if applicable)
No response
I intended to submit a pull request, but couldn't find the code branch corresponding to v0.5.0.post8.