crawl4ai icon indicating copy to clipboard operation
crawl4ai copied to clipboard

[Bug]: a gist file link had been 404 error in document. (sample_ecommerce.html)

Open weykon opened this issue 10 months ago • 3 comments

crawl4ai version

document website

Expected Behavior

link html , fine

Current Behavior

404 error

Is this reproducible?

Yes

Inputs Causing the Bug

While learning JSON schema, I'm stuck and trying to understand through entire example.

[post](https://docs.crawl4ai.com/extraction/no-llm-strategies/)

[404 error link](https://gist.githubusercontent.com/githubusercontent/2d7b8ba3cd8ab6cf3c8da771ddb36878/raw/1ae2f90c6861ce7dd84cc50d3df9920dee5e1fd2/sample_ecommerce.html)

Steps to Reproduce

1. [post](https://docs.crawl4ai.com/extraction/no-llm-strategies/)

2. find [404 error link](https://gist.githubusercontent.com/githubusercontent/2d7b8ba3cd8ab6cf3c8da771ddb36878/raw/1ae2f90c6861ce7dd84cc50d3df9920dee5e1fd2/sample_ecommerce.html)

Code snippets


OS

macos

Python version

fine

Browser

chrome

Browser version

No response

Error logs & Screenshots (if applicable)

No response

weykon avatar Mar 24 '25 07:03 weykon

@weykon I'm unable to understand what exactly is the issue here. Could you share a code snippet for the issue you are facing!

aravindkarnam avatar Mar 25 '25 06:03 aravindkarnam

In the official documentation, at https://docs.crawl4ai.com/extraction/no-llm-strategies/, the third point 3. Advanced Schema & Nested Structures contains an HTML example, which is a link, but it is currently invalid. (I assume that it used to exist).

weykon avatar Mar 25 '25 07:03 weykon

@Ahmed-Tawfik94 We need to update the documentation here: https://docs.crawl4ai.com/extraction/no-llm-strategies/#3-advanced-schema-nested-structures

Image

ntohidi avatar Nov 14 '25 11:11 ntohidi