Scrapegraph-ai langchain_core.exceptions.OutputParserException: Invalid json output: I'm ready to answer your question about the content I've scraped. What would you like to know?

Most times when I use a groq model, this is the output I get:

...
  File "/Users/matthewberman/miniconda3/envs/scrape/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 1626, in _call_with_config
    context.run(
  File "/Users/matthewberman/miniconda3/envs/scrape/lib/python3.10/site-packages/langchain_core/runnables/config.py", line 347, in call_func_with_variable_args
    return func(input, **kwargs)  # type: ignore[call-arg]
  File "/Users/matthewberman/miniconda3/envs/scrape/lib/python3.10/site-packages/langchain_core/output_parsers/base.py", line 170, in <lambda>
    lambda inner_input: self.parse_result(
  File "/Users/matthewberman/miniconda3/envs/scrape/lib/python3.10/site-packages/langchain_core/output_parsers/json.py", line 69, in parse_result
    raise OutputParserException(msg, llm_output=text) from e
langchain_core.exceptions.OutputParserException: Invalid json output: I'm ready to answer your question about the content I've scraped. What would you like to know?

sometimes, after this, I get actual valid JSON, but most times I don't.

here's my scraping code:

from scrapegraphai.graphs import SmartScraperGraph

graph_config = {
    "llm": {
        "model": "groq/llama3-8b-8192",
        "api_key": "XXX",
        "temperature": 0
    },
    "embeddings": {
        "model": "ollama/nomic-embed-text",
        "base_url": "http://localhost:11434",  # set Ollama URL
    },
    "verbose": True,
}

smart_scraper_graph = SmartScraperGraph(
    prompt="List me ALL the AI projects with their descriptions",
    # also accepts a string with the already downloaded HTML code
    source="https://www.github.com/trending",
    config=graph_config
)

result = smart_scraper_graph.run()
print(result)

May 16 '24 19:05 mberman84

I'm having issues when prompting to open a Google maps link

May 16 '24 20:05 looniegroup

From the weirdness of the error message, it looks like the LLM has replied with words (in a chat-like way, by saying it's ready to analyze the scraped data) instead of outputting a JSON, and that output has messed with the LangChain JSON output parser.

If that were the case, then it would be an error on part of the LLM, and not strictly from the library.

Maybe a better prompt from within the nodes could fix this, but dealing with LLMs always brings a certain degree of indeterminism to the equation, even when the temperature is set to 0, and especially when the node prompt needs to work for different models from various sources.

May 17 '24 08:05 f-aguzzi