Remote Sync Strips Trailing Forward Slash - Results in 404
Summary
I have 2 pulp servers, one of which serves as a primary (where I control package promotion, etc) and the other simply syncs repositories from the primary. I am using nginx as my reverse proxy and am also utilizing certgaurd as well (though I don't think this has impact here). The problem, is that my secondary server's rpm remote points to the primary's distribution and includes a trailing slash, but pulp rpm sync seems to be stripping that slash away, resulting in 404s.
Steps to reproduce
[root@primary ~]# pulp rpm distribution show --name application-eng
{
"pulp_href": "/pulp/api/v3/distributions/rpm/rpm/ea16f53b-8a78-4395-8935-eaa7d96f06c7/",
"pulp_created": "2023-08-09T15:28:09.236805Z",
"base_path": "application-x86_64-eng",
"base_url": "https://primary.env.company.com/pulp/content/application-x86_64-eng/",
"content_guard": "/pulp/api/v3/contentguards/certguard/x509/b0fa42ca-5eff-4c0d-a53c-dfadf9268ae5/",
"pulp_labels": {},
"name": "application-eng",
"repository": null,
"publication": "/pulp/api/v3/publications/rpm/rpm/5f217017-ab21-4ed6-8b5e-96afab630321/"
}
[root@secondary ~]# pulp rpm remote show --name application-uri-eng
{
"pulp_href": "/pulp/api/v3/remotes/rpm/rpm/8c4ee2f0-e095-4967-bc48-fa1a041d37e2/",
"pulp_created": "2023-09-01T14:12:29.471030Z",
"name": "application-uri-eng",
"url": "https://primary.env.company.com/pulp/content/application-x86_64-eng/",
"ca_cert": <redacted>,
"client_cert": <redacted>
"tls_validation": true,
"proxy_url": null,
"pulp_labels": {},
"pulp_last_updated": "2023-09-01T14:29:33.775680Z",
"download_concurrency": null,
"max_retries": null,
"policy": "on_demand",
"total_timeout": null,
"connect_timeout": null,
"sock_connect_timeout": null,
"sock_read_timeout": null,
"headers": null,
"rate_limit": null,
"hidden_fields": [
{
"name": "client_key",
"is_set": true
},
{
"name": "proxy_username",
"is_set": false
},
{
"name": "proxy_password",
"is_set": false
},
{
"name": "username",
"is_set": false
},
{
"name": "password",
"is_set": false
}
],
"sles_auth_token": null
}
[root@secondary ~]# pulp rpm repository show --name application-eng
{
"pulp_href": "/pulp/api/v3/repositories/rpm/rpm/36a955a4-f5d6-4b48-947f-46edbe396201/",
"pulp_created": "2023-09-01T14:12:31.994781Z",
"versions_href": "/pulp/api/v3/repositories/rpm/rpm/36a955a4-f5d6-4b48-947f-46edbe396201/versions/",
"pulp_labels": {},
"latest_version_href": "/pulp/api/v3/repositories/rpm/rpm/36a955a4-f5d6-4b48-947f-46edbe396201/versions/0/",
"name": "application-eng",
"description": "Mirror for: https://primary.env.company.com/pulp/content/application-x86_64-eng/",
"retain_repo_versions": null,
"remote": "/pulp/api/v3/remotes/rpm/rpm/8c4ee2f0-e095-4967-bc48-fa1a041d37e2/",
"autopublish": true,
"metadata_signing_service": null,
"retain_package_versions": 0,
"metadata_checksum_type": null,
"package_checksum_type": null,
"gpgcheck": 0,
"repo_gpgcheck": 0,
"sqlite_metadata": false
}
[root@secondary ~]# pulp rpm repository sync --name application-eng
Started background task /pulp/api/v3/tasks/676d1531-ba40-43f6-9c76-48bb3f4c4f87/
Error: Task /pulp/api/v3/tasks/676d1531-ba40-43f6-9c76-48bb3f4c4f87/ failed: '404, message='Not Found', url=URL('https://primary.env.company.com/pulp/content/application-x86_64-eng')'
[root@secondary ~]# curl --key pulp.key --cert pulp.pem https://primary.env.company.com/pulp/content/application-x86_64-eng/
<html>
<head><title>Index of /pulp/content/application-x86_64-eng/</title></head>
<body bgcolor="white">
<h1>Index of /pulp/content/application-x86_64-eng/</h1>
<hr><pre><a href="../">../</a>
<a href="Packages/">Packages/</a> 29-Jun-2022 03:54
<a href="config.repo">config.repo</a>
<a href="repodata/">repodata/</a> 25-Aug-2023 15:39
</pre><hr></body>
</html>
Expected behavior
I expected the call to pulp rpm repository sync on the secondary to have retained the tailing forward slash as defined in the rpm remote.
Stacktrace/Error log
[root@primary ~]# tail -n2 /var/log/nginx/access.log
XXX.XXX.XXX.XXX - - [01/Sep/2023:15:19:58 +0000] "GET /pulp/content/application-x86_64-eng HTTP/1.1" 404 14 "-" "pulpcore/3.22.1 (cpython 3.8.11-final0, Linux x86_64) (aiohttp 3.8.1)"
XXX.XXX.XXX.XXX - - [01/Sep/2023:15:21:33 +0000] "GET /pulp/content/application-x86_64-eng/ HTTP/1.1" 200 649 "-" "curl/7.29.0"
Pulp and pulp-cli version info
[root@primary ~]# pulp status
{
"versions": [
{
"component": "core",
"version": "3.22.1",
"package": "pulpcore"
},
{
"component": "rpm",
"version": "3.19.7",
"package": "pulp-rpm"
},
...
[root@secondary ~]# pulp --version
pulp3 command line interface, version 0.19.2
Additonal context
It looks to me like the remote is configured correctly. And since the trailing slash is part of the url there, I cannot see that the cli is to blame either. Would you be able to provide a full stacktrace of this failure?
You should get that either from pulp task show or from the server logs.
Sure thing, thanks for looking at this with me.
[root@secondary ~]# pulp task show --href /pulp/api/v3/tasks/676d1531-ba40-43f6-9c76-48bb3f4c4f87/
{
"pulp_href": "/pulp/api/v3/tasks/676d1531-ba40-43f6-9c76-48bb3f4c4f87/",
"pulp_created": "2023-09-01T15:19:58.646463Z",
"state": "failed",
"name": "pulp_rpm.app.tasks.synchronizing.synchronize", "logging_cid": "6cc165caf2204c5b97bf55399a0b057b", "started_at": "2023-09-01T15:19:58.769975Z", "finished_at": "2023-09-01T15:19:59.018980Z",
"error": {
"traceback": " File \"/usr/local/lib/pulp/lib64/python3.8/site-packages/pulpcore/tasking/pulpcore_worker.py\", line 444, in _perform_task\n result = func
(*args, **kwargs)\n File \"/usr/local/lib/pulp/lib64/python3.8/site-packages/pulp_rpm/app/tasks/synchronizing.py\", line 486, in synchronize\n remote_url = f
etch_remote_url(remote, url)\n File \"/usr/local/lib/pulp/lib64/python3.8/site-packages/pulp_rpm/app/tasks/synchronizing.py\", line 305, in fetch_remote_url\n
remote_url = fetch_mirror(remote)\n File \"/usr/local/lib/pulp/lib64/python3.8/site-packages/pulp_rpm/app/tasks/synchronizing.py\", line 254, in fetch_mirror\
n result = downloader.fetch()\n File \"/usr/local/lib/pulp/lib64/python3.8/site-packages/pulpcore/download/base.py\", line 175, in fetch\n return done.pop
().result()\n File \"/usr/local/lib/pulp/lib64/python3.8/site-packages/pulpcore/download/http.py\", line 273, in run\n return await download_wrapper()\n Fil
e \"/usr/local/lib/pulp/lib64/python3.8/site-packages/backoff/_async.py\", line 151, in retry\n ret = await target(*args, **kwargs)\n File \"/usr/local/lib/p
ulp/lib64/python3.8/site-packages/pulpcore/download/http.py\", line 258, in download_wrapper\n return await self._run(extra_data=extra_data)\n File \"/usr/lo
cal/lib/pulp/lib64/python3.8/site-packages/pulp_rpm/app/downloaders.py\", line 117, in _run\n self.raise_for_status(response)\n File \"/usr/local/lib/pulp/li
b64/python3.8/site-packages/pulp_rpm/app/downloaders.py\", line 102, in raise_for_status\n response.raise_for_status()\n File \"/usr/local/lib/pulp/lib64/python3.8/site-packages/aiohttp/client_reqrep.py\", line 1004, in raise_for_status\n raise ClientResponseError(\n",
"description": "404, message='Not Found', url=URL('https://primary.env.company.com/pulp/content/application-x86_64-eng')"
},
"worker": "/pulp/api/v3/workers/0cd785be-c0ad-4415-89e9-6a7c9d6161fb/",
"parent_task": null,
"child_tasks": [],
"task_group": null,
"progress_reports": [],
"created_resources": [],
"reserved_resources_record": [
"/pulp/api/v3/repositories/rpm/rpm/36a955a4-f5d6-4b48-947f-46edbe396201/",
"shared:/pulp/api/v3/remotes/rpm/rpm/8c4ee2f0-e095-4967-bc48-fa1a041d37e2/"
]
}
@dralley Does this look familiar to you? Would you agree reassigning this issue to pulp_rpm?
I'm fine with reassigning it to pulp_rpm. I agree that it probably isn't a CLI issue, at least.
It appears that this line of code in synchronizing.py is the culprit. What is the purpose of stripping out the trailing forward slash deliberately?
downloader = remote.get_downloader(url=remote.url.rstrip("/"), urlencode=False)
What happens here is that your URL has been identified as mirrorlist, which is not true https://github.com/pulp/pulp_rpm/blob/748b3dde057bfc90623e186842b14538a0162049/pulp_rpm/app/tasks/synchronizing.py#L301
Can you share with us the contents of /repodata? Is there repomd.xml present?
Related: https://github.com/pulp/pulpcore/issues/3173
Hello @munkey01,
@ipanova has a point, this error is only raised if there is an error finding repodata/repomd.xml (see the context). It would be really helpful to know if repomd.xml is there or not.
Another possible reason for get_repomd_file not "finding" the file (aka raising ClientResponseError) would be something related to the downloader configuration. It's a long shot, but I can look into that if the repomd.xml file is confirmed to be available at repodata/repomd.xml.
I think this is not related to slashes, although it may "look like" at first sight.
ps: just additional information, the 404 is expected when trying to get https://primary.env.company.com/pulp/content/application-x86_64-eng (no slash) before pulpcore 3.40.0, but that not meaningfully because the sync task only tries to hit this because its thinking its a mirrorlist in the first place, as Ina already said.