Scrapegraph-ai openai.ContentFilterFinishReasonError

Describe the bug When scraping pages, randomly get this error:

Could not parse response content as the request was rejected by the content filter

response = self.root_client.beta.chat.completions.parse(**payload)
│ │ │ │ │ │ └ {'messages': [{'content': '\nYou are a website scraper and you have just scraped the\nfollowing content from a website conver...
│ │ │ │ │ └ <function Completions.parse at 0x7f075673b640>
│ │ │ │ └ <openai.resources.beta.chat.completions.Completions object at 0x7f0731467310>
│ │ │ └ <openai.resources.beta.chat.chat.Chat object at 0x7f0731466fe0>
│ │ └ <openai.resources.beta.beta.Beta object at 0x7f0731466d40>
│ └ <openai.OpenAI object at 0x7f073110f010>
└ ChatOpenAI(client=<openai.resources.chat.completions.Completions object at 0x7f07314658a0>, async_client=<openai.resources.ch...
File "/root/.cache/pypoetry/virtualenvs/platform-backend-data-tq7C0_9c-py3.10/lib/python3.10/site-packages/openai/resources/beta/chat/completions.py", line 140, in parse
return _parse_chat_completion(
└ <function parse_chat_completion at 0x7f0756700d30>
File "/root/.cache/pypoetry/virtualenvs/platform-backend-data-tq7C0_9c-py3.10/lib/python3.10/site-packages/openai/lib/_parsing/_completions.py", line 75, in parse_chat_completion
raise ContentFilterFinishReasonError()
└ <class 'openai.ContentFilterFinishReasonError'>

openai.ContentFilterFinishReasonError: Could not parse response content as the request was rejected by the content filter

To Reproduce Try to scrape this page:

https://news.bms.com/news/details/2016/First-Presentation-of-Two-Year-Overall-Survival-Data-for-Opdivo-nivolumab-in-Combination-with-Yervoy-ipilimumab-Showed-Superior-Efficacy-Versus-Yervoy-Alone-in-Advanced-Melanoma/default.aspx

Expected behavior Extracted Content

Additional context Add any other context about the problem here.

Oct 04 '24 19:10 matheus-rossi

can you share the code please?

Oct 07 '24 08:10 VinciGit00

@matheus-rossi can you share the code please?

Nov 04 '24 08:11 VinciGit00

I’ve also opened a ticket with OpenAI and received the following response:

We understand that you received an error message when using the API, and we apologize for any inconvenience this may have caused. This is not the experience we want you to have.

Our filter is designed to detect text that may be sensitive or unsafe. While it isn’t perfect and may occasionally flag content incorrectly, we’ve currently configured it to err on the side of caution, which can lead to more false positives. We’re actively working on improving these filters, and over time, this issue should occur less frequently.

Additionally, we use Moderation models to ensure that content complies with OpenAI’s Usage Policies. These models help classify content across several categories, including hate, threats, self-harm, sexual content, sexual content involving minors, violence, and graphic violence.

Essentially, they’re blocking the scraping because the content is deemed “not safe.”

You can close this issue, as the cause is related to OpenAI.

Nov 04 '24 11:11 matheus-rossi

ok thank you

Nov 04 '24 12:11 VinciGit00