[Bug]: 'async for' requires an object with __aiter__ method, got CrawlResultContainer
crawl4ai version
0.6.3
Expected Behavior
The asynchronous generator works fine
Current Behavior
Hello, when I started a web service and ran the deep search function multiple times, I ran it to the same url after a while (the url had previously been crawling to subpages normally) and this error occurred: 'async for' requires an object with aiter method, got CrawlResultContainer. I checked ahead of time: the CrawlResultContainer returns a regular list, not an asynchronous generator. But why is it that during multiple executions the asynchronous generator is returned and the subpages can be searched properly? Restarting the service after this problem does not cause the problem until it has been running for some time. Is the browser failing to close properly causing a resource leak? This is still the case in 0.7.4.
from fastapi import FastAPI
from crawl4ai import AsyncWebCrawler, BrowserConfig, CacheMode
from crawl4ai import CrawlerRunConfig
from crawl4ai.content_scraping_strategy import LXMLWebScrapingStrategy
from crawl4ai.deep_crawling import BFSDeepCrawlStrategy
app = FastAPI()
async def get_crawler():
browser_conf = BrowserConfig(
browser_type='chromium',
headless=True,
verbose=True,
user_agent_generator_config={"mode": "random"},
extra_args=["--disable-gpu", "--disable-dev-shm-usage", "--no-sandbox"],
)
crawler = AsyncWebCrawler(config=browser_conf)
await crawler.start()
return crawler
async def close_crawler(craler):
await craler.close()
async def bfs_crawl(url):
config = CrawlerRunConfig(
deep_crawl_strategy=BFSDeepCrawlStrategy(
max_depth=5,
include_external=False,
max_pages=3000
),
scraping_strategy=LXMLWebScrapingStrategy(),
stream=True,
verbose=True,
cache_mode=CacheMode.BYPASS,
page_timeout=50000,
# page_timeout=5000,
# excluded_tags=EXCLUDE_TAGS,
check_robots_txt=True,
)
crawler = await get_crawler()
results = []
try:
async for result in await crawler.arun(url, config=config):
results.append(result)
print("bfs crawl finished")
except Exception as e:
print(f"bfs crawl error:{e}")
finally:
await close_crawler(crawler)
from pydantic import BaseModel
class CrawlRequest(BaseModel):
url: str
@app.post("/bfs_crawl")
async def crawl(req: CrawlRequest):
await bfs_crawl(req.url)
if __name__ == "__main__":
import uvicorn
uvicorn.run(app="tt:app", host='0.0.0.0', port=8813, workers=3)
Is this reproducible?
Yes
Inputs Causing the Bug
Steps to Reproduce
Code snippets
OS
Linux
Python version
3.11.11
Browser
No response
Browser version
No response
Error logs & Screenshots (if applicable)
'async for' requires an object with aiter method, got CrawlResultContainer
@yumingmin88, hi there, could you share with us the URL that causes this issue, as I have been trying to reproduce it. but i have not encountered this issue
@yumingmin88, hi there, could you share with us the URL that causes this issue, as I have been trying to reproduce it. but i have not encountered this issue @Ahmed-Tawfik94 I'm very sorry, but this problem cannot be stably reproduced. I'll let you know next time I manage to reproduce the steps.
Try this url: https://www.technogym.com/en-INT/
However, I don't think it's related to a specific URL.Since this issue has occurred with multiple different URLs, I think the root cause might be that, for some reason, the crawler fails to continue deep crawling into subpages—thus preventing the asynchronous generator from being produced.When the service is running normally, URL A can be deeply crawled successfully and reach its subpages. But after the service has been running continuously for some time, attempting to deeply crawl URL A again will fail. It can be observed that the page of URL A is successfully crawled, but the crawler does not proceed to its subpages. Could it be that the parameter "stream" was lost in the process?
Recently, I've encountered an error like this:
Task exception was never retrieved future: <Task finished name='Task-888150037' coro=<Connection.run.
.init() done, defined at /usr/local/lib/python3.11/site-packages/playwright/_impl/_connection.py:276> exception=Exception('Connection.init: Connection closed while reading from the driver')> Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/playwright/_impl/_connection.py", line 277, in init self.playwright_future.set_result(await self._root_object.initialize()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/playwright/_impl/_connection.py", line 219, in initialize await self._channel.send( File "/usr/local/lib/python3.11/site-packages/playwright/_impl/_connection.py", line 61, in send return await self._connection.wrap_api_call( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/playwright/_impl/_connection.py", line 528, in wrap_api_call raise rewrite_error(error, f"{parsed_st['apiName']}: {error}") from None Exception: Connection.init: Connection closed while reading from the driver
I'm sorry that I couldn't obtain more detailed information. Thank you
@Ahmed-Tawfik94 @yumingmin88 Any updates on this bug ? I am getting the same issue.
@HG2407 @yumingmin88 have you tried to run this with the new release v0.7.6 ?
@HG2407 @yumingmin88 have you tried to run this with the new release v0.7.6 ?
Yes, I used 0.7.6 and it still happens occasionally. At first I thought there was a problem with the way I initialized and destroyed the crawler, so I used crawler_pool.py instead. It's still a problem.