[BUG] graphrag pods cannot reach CosmosDB when accelerator is deployed in Azure Government
Describe the bug
Deploying the accelerator to Azure Government results in the following CrashLoopBackoff error for both the -index and -query pods.
azure.cosmos.exceptions.CosmosHttpResponseError: (Forbidden) Request originated from IP 52.XXX.XXX.XXX through public internet. This is blocked by your Cosmos DB account firewall settings. More info:
https://aka.ms/cosmosdb-tsg-forbidden
ActivityId: XXXX, Microsoft.Azure.Documents.Common/2.14.0
Code: Forbidden
Message: Request originated from IP 52.XXX.XXX.XXX through public internet. This is blocked by your Cosmos DB account firewall settings. More info:
https://aka.ms/cosmosdb-tsg-forbidden
ActivityId: XXXX, Microsoft.Azure.Documents.Common/2.14.0
This is because the CosmosDB firewall has Public network access disabled and the pods in AKS require access via the AKS API Server PIP.
I'm not sure why this is not a problem in Azure Commercial.
To Reproduce Steps to reproduce the behavior:
az cloud set --name "AzureUSGovernment"
az login
- Follow the Deployment guide but deploy to Azure Government instead of Azure Commercial. Do not use either the
-dor-goption.
The following additional params are required in deploy.paramaters.json
"AISEARCH_ENDPOINT_SUFFIX": "search.azure.us",
"AISEARCH_AUDIENCE": "https://search.azure.us",
"CLOUD_NAME":"AzureUSGovernment",
"GRAPHRAG_COGNITIVE_SERVICES_ENDPOINT":"https://cognitiveservices.azure.us/.default"
Some notes/workarounds
- Deploying with
-gor-ddoes not have this issue. - You can manually add the AKS VNET to the CosmosDB Firewall via the Portal. Networking -> Public Access -> Selected Networks -> Existing Virtual Network -> (select your AKS vnet in the MC_xxx resource group created by the deployment) -> Save
- I believe this can also be done via the az cli but have not tested that yet.
A proper fix is likely to deploy the AKS cluster in private cluster mode with public fqdn disabled, and establish a private endpoint between the AKS cluster and Cosmos (and the other resources it needs to reach).
Need to test this again after changes from #123 were introduced.
I believe this is OBE now.
Actually, no, this is still an issue when deploying GraphRAG in Azure Government without the -g option.