eliaweiss
eliaweiss
Yes, it solved the problem :) I would recommend 1. add it to documentation 2. throw an exception if it is not set
here is some more info: 1. max_page is ignored max_pages = 10 # Configure a 2-level deep crawl config = CrawlerRunConfig( semaphore_count=1, deep_crawl_strategy=BFSDeepCrawlStrategy( max_depth=10, include_external=False, # Maximum number of pages...
although it showing 500 crawled page, it only save 250, does it know how to handle repeated links?
it seems that I was able to suppress this issue by setting semaphore_count=1,
I'm pretty sure the problem is in playwright/chromium rather than crawl4ai And that it is a resource problem Note that a similar problem is reported on playwright proj
@aravindkarnam See this issue https://github.com/microsoft/playwright/issues/13038 The error msg is different, but in my log there were a ton of error msg, and later I realize that the first one was...
There where 2 *monkey patch* suggested on this thread, the second one got garbled, here it is again: ```py # browser_patch.py """ Monkey patch for fixing the browser closure issue...
@aravindkarnam Pretty sure @viraj-lunani is hitting the same issue — it’s that browser has been closed error again. Makes sense that with concurrent crawls on a singleton instance, you could...
@aravindkarnam @viraj-lunani Here’s a semaphore-based solution—should be thread-safe, but without more context, it’s hard to judge the risks. I had to use the semaphore in both the close and start...