registry icon indicating copy to clipboard operation
registry copied to clipboard

Invalid package entries in seed file

Open letmaik opened this issue 8 months ago • 10 comments

I'm not sure if this is a known issue but the seed_2025_05_16.json file has a bunch of entries where the "packages" field points to non-existing packages, see below:

Not on PyPI registry

  • mcp-server-box
  • adfinmcp
  • mcp-aiven
  • inkeep-mcp-server
  • needle-mcp
  • ai-agent-marketplace-index-mcp
  • mcp-server-gravitino
  • mcp-ical
  • aws-cost-explorer-mcp
  • cfbd-mcp-server
  • chess-mcp
  • dify-mcp-server
  • mcp-local-rag
  • membase-mcp
  • qgis-mcp
  • mcp-server-rabbitmq
  • servicenow-mcp
  • videocapture-mcp

Not on npm registry

  • redis-mcp-server
  • mcp-server-applescript
  • crypto-sentiment-mcp
  • heurist-agent-framework
  • mcp-teams-server
  • whale-tracker-mcp
  • my-docs-site

Invalid package name

  • agentrpc/agentrpc
  • ppl-ai/modelcontextprotocol
  • riza-io/riza-mcp
  • zenml-io/mcp-zenml
  • baidubce/app-builder
  • samuelgursky/davinci-resolve-mcp
  • SaseQ/discord-mcp
  • stefanoamorelli/hyprmcp
  • lamaalrajih/kicad-mcp
  • TBXark/mcp-proxy
  • wilsonchenghy/ShaderToy-MCP
  • wanaku-ai/wanaku
  • jaw9c/awesome-remote-mcp-servers

letmaik avatar May 18 '25 10:05 letmaik

Don't have prior context, but assuming seed is scraped data, then a lot more of these servers probably do not have valid npmjs.

One of the bigger problems for Glama registry was that many servers that have package.json or pyproject.toml, are simply not published to the registry - or even worse - use names of already published unrelated packages.

We use combination of scraping/LLMs to determine when it is safe to link npmjs, but it is not perfect.

punkpeye avatar May 18 '25 10:05 punkpeye

You could also scrape https://glama.ai/mcp/reference. We provide all of the MCP registry data for free to the community.

punkpeye avatar May 18 '25 11:05 punkpeye

@punkpeye https://glama.ai/api/mcp/v1/servers returns 500 Internal Server Error, does it need some API token?

letmaik avatar May 18 '25 15:05 letmaik

Oops. Looks like one of the server URLs broke Zod validation and made the endpoint unavailable.

Fixed and added resilience, so that a single server validation failure doesn't nuke everything else.

punkpeye avatar May 18 '25 16:05 punkpeye

@punkpeye Thanks, I'll have a look. For context, I've created https://letmaik.github.io/mcp-provenance-monitor/ which illustrates the current state of supply chain provenance for MCP servers, and to get to the provenance in the first place I needed the npm/pypi package names.

letmaik avatar May 18 '25 17:05 letmaik

We are in the middle of partnering with one of the better known security vendors to provide related insights (supply chain integrity).

I don't think we will be able to pass down all of these insights through the API, but there is probably some high-level summary that we can.

I am not super familiar with npm provenance, but the fact that I am not tells something about its popularity, i.e., not sure how valuable these insights are if the vast majority of the dependencies are not covered.

punkpeye avatar May 18 '25 17:05 punkpeye

Good to hear. It's a chicken and egg problem. Because there is no enforcement in clients yet (understandably) AND it doesn't provide immediate value, developers don't care much, but without reaching critical mass clients will never be able to enforce it. The fact that provenance is still marked as experimental both in PyPI and npm also doesn't help. I think we'll get there eventually though. Related: https://github.com/modelcontextprotocol/modelcontextprotocol/issues/526#issuecomment-2889133104

letmaik avatar May 18 '25 18:05 letmaik

Thanks for flagging this. @sridharavinash @toby can confirm, but I believe this is intended to be example data that will never be added to production.

The official registry should be opt-in. Rather than putting seed data into production, I'm more inclined to do something like "for a period of 1-2 weeks, we are accepting publish requests in anticipation of opening read requests as an official launch on date X".

To solve the problem of incorrect package references, we should likely require a reference to repository.url to be present on the corresponding registry entry (e.g. npm package is published indicating it corresponds to GitHub URL XYZ). We should take an action to ensure this is a possibility on all the major registries, but I know it is on at least PyPi and npmjs.

tadasant avatar May 19 '25 15:05 tadasant

Thanks for flagging this. @sridharavinash @toby can confirm, but I believe this is intended to be example data that will never be added to production.

+1 , the seed tooling is rather crude as @punkpeye mentioned, this is just a way to get data into the server to help us shape the API data. More than happy to include better seed tooling if you want to add it to this repo. Currently the seed JSON is generated from a combination of parsing modelcontextprotocol/servers and the GitHub APIs to augment some data.

sridharavinash avatar May 19 '25 16:05 sridharavinash

Thanks for flagging this. @sridharavinash @toby can confirm, but I believe this is intended to be example data that will never be added to production.

+1 , the seed tooling is rather crude as @punkpeye mentioned, this is just a way to get data into the server to help us shape the API data. More than happy to include better seed tooling if you want to add it to this repo. Currently the seed JSON is generated from a combination of parsing modelcontextprotocol/servers and the GitHub APIs to augment some data.

Happy for Glama to be the source. It could be a symbiotic relationship.

We spend thousands of computing hours to monitor GitHub for MCP mentions, indexing repositories and their content (we record every commit and the associated blobs)/GitHub stars/followers; even using LLMs to analyze the composition of the registry and the contents of README.md. Not to mention, we generate and build Docker containers and run introspection queries to extract tools, resources, etc.

punkpeye avatar May 19 '25 17:05 punkpeye