crawl4ai icon indicating copy to clipboard operation
crawl4ai copied to clipboard

[Bug]: `css_selector` is not working properly in version 0.5.0.post8

Open maggie-edkey opened this issue 10 months ago • 1 comments

crawl4ai version

0.5.0.post8

Expected Behavior

Upgrading from v0.4.248 to v0.5.0.post8, I found css_selector is different from before. What I expected:

  1. CSS selectors like div[data-testid='ArticleBody'] can work
  2. Should return all matching elements

Current Behavior

  1. When the css_selector contains single quotes, for example, div[data-testid='ArticleBody'], an error will occur:
Warning: Could not get content for selector 'div[data-testid='ArticleBody']': Page.evaluate: SyntaxError: missing ) after argument list
    at eval (<anonymous>)
    at UtilityScript.evaluate (<anonymous>:234:30)
    at UtilityScript.<anonymous> (<anonymous>:1:44)
  1. Only the first element that matches the specified CSS selector was matched, rather than all matching elements.

Is this reproducible?

Yes

Inputs Causing the Bug

These two issues can be resolved by modifying the relevant code in async_crawler_strategy.py like this:

821c821,826
<                             content = await page.evaluate(f"document.querySelector('{selector}')?.outerHTML || ''")
---
>                             # Use querySelectorAll to get all elements matching the selector
>                             # Use double quotes to wrap the selector, so the single quotes inside it won't cause problems
>                             content = await page.evaluate(
>                                 f"""Array.from(document.querySelectorAll("{selector}"))
>                                     .map(el => el.outerHTML)
>                                     .join('')""")

I intended to submit a pull request, but couldn't find the code branch corresponding to v0.5.0.post8.

Steps to Reproduce


Code snippets


OS

macOS

Python version

3.12.9

Browser

No response

Browser version

No response

Error logs & Screenshots (if applicable)

No response

maggie-edkey avatar Mar 29 '25 05:03 maggie-edkey

I intended to submit a pull request, but couldn't find the code branch corresponding to v0.5.0.post8.

2025-MAR-ALPHA-1

prokopis3 avatar Mar 29 '25 06:03 prokopis3