Creating data sources on the HTTP API from the command line, using HTTPie
Dear lovely people of Apache Superset,
first things first: Thanks a stack for conceiving and maintaining Apache Superset. It is truly a gem.
Foreword
This is not meant to be an actual bug report. Maybe you can slap an info label on it, or just tuck it away into the "Discussions" section?
Introduction
I am trying to create a data source using the HTTP API of Apache Superset without adjusting WTF_CSRF_ENABLED = False and I think I took all input from #2488, #4018, #8382, #10354, #16003, #17206, #19343, #19356, and further information referenced below into consideration.
#16003 was the most helpful of all resources, outlining how to send both Authorization and X-CSRFToken headers appropriately. However, people are still struggling to replicate this workflow from the command line, for example using curl.
In this post, I would like to demonstrate, that beyond properly sending the corresponding tokens, you will also need to maintain a session between requests. I will use HTTPie for that purpose.
Walkthrough
This is meant to be exercised on a standard vanilla installation of Apache Superset, where the authentication credentials are still admin/admin and no other pieces have been modified. If you adjusted your installation, you will need to modify some bits accordingly.
You will need to install both HTTPie and jq, e.g. by typing {apt,brew,yum} install httpie jq.
# Authenticate and acquire a JWT token.
AUTH_TOKEN=$(http --session=superset http://localhost:8088/api/v1/security/login username=admin password=admin provider=db | jq -r .access_token)
# Acquire a CSRF token.
CSRF_TOKEN=$(http --session=superset http://localhost:8088/api/v1/security/csrf_token/ Authorization:"Bearer ${AUTH_TOKEN}" | jq -r .result)
# Create a data source item / database connection.
http --session=superset http://localhost:8088/api/v1/database/ database_name="PostgreSQL Example" engine=postgres sqlalchemy_uri=postgres://[email protected]:5432 Authorization:"Bearer ${AUTH_TOKEN}" X-CSRFToken:"${CSRF_TOKEN}"
Enquiry
Somehow, I would have expected that this procedure would also work without needing to maintain a session. However, when running the commands from the example above, and omitting the --session= option, the last command croaks with the venerous
400 Bad Request: The CSRF session token is missing.
Conclusion
So, this post is meant to be both an informational reference for the community how to actually create datasource items using the HTTP API from the commandline, and at the same time an enquiry to the developers, if my expectations, to be able to run a conversation with the API without maintaining a session, are actually inappropriate.
Thank you in advance for taking the time to look into this topic.
With kind regards, Andreas.
Further references
https://stackoverflow.com/questions/66015739/use-apache-superset-api-to-feed-a-dataset https://stackoverflow.com/questions/68614350/cannot-post-a-new-db-to-apache-superset-400-error-with-csrf https://solveforum.com/forums/threads/solved-cannot-post-a-new-db-to-apache-superset-400-error-with-csrf.49375/ https://groups.google.com/g/airbnb_superset/c/3H7SZma4ZEE
hello, I have the same problem when using curl to create database.
[root@superset]# token=$(curl -X 'POST' \
'http://'${HOSTNAME}':'${PORT}'/api/v1/security/login' \
-H 'accept: */*' \
-H 'Content-Type: application/json' \
-d '{
"username": "admin",
"password": "admin",
"refresh": true,
"provider": "db"
}')
[root@superset]# function parse_json { echo "${1//\"/}" | sed "s/.*$2:\([^,}]*\).*/\1/" ; }
[root@superset]# csrf=$(curl -X 'GET' 'http://'${HOSTNAME}':'${PORT}'/api/v1/security/csrf_token/' -H 'Authorization: Bearer '$(parse_json $token "access_token")'')
[root@superset]# curl -vvvv -X 'POST' 'http://'${HOSTNAME}':'${PORT}'/api/v1/database/' -H 'Authorization: Bearer '$(parse_json $token "access_en: '$(parse_json $csrf "result")'' -H 'accept: */*' -H 'Content-Type: application/json' -d '{
"database_name": "kyuubi-jdbc",
"sqlalchemy_uri": "hive://bcdp@dwh-htwsxrv9-kyuubi-kyuubi",
"expose_in_sqllab": true,
"allow_ctas": true,
"allow_cvas": true,
"allow_dml": true,
"allow_multi_schema_metadata_fetch": true
}'
* About to connect() to dwh-htwsxrv9-kyuubi-superset-dc959bbbd-lhkcf port 58093 (#0)
* Trying 192.168.11.173...
* Connected to dwh-htwsxrv9-kyuubi-superset-dc959bbbd-lhkcf (192.168.11.173) port 58093 (#0)
> POST /api/v1/database/ HTTP/1.1
> User-Agent: curl/7.29.0
> Host: dwh-htwsxrv9-kyuubi-superset-dc959bbbd-lhkcf:58093
> Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpYXQiOjE2NTcxODI2MTcsIm5iZiI6MTY1NzE4MjYxNywianRpIjoiMzYxMjA4YmEtMThjZC00MDY0LTgxOTQtNjdiZjI3ZmY1ZjI2IiwiZXhwIjoxNjU3MTgzNTEJlc2giOnRydWUsInR5cGUiOiJhY2Nlc3MifQ.a7sFispKsyUD3FDo47HuuCtq9jP7xpWy3ZaeI1bVpuc
> X-CSRFToken: ImY2ZmUxNDIzNGQ2YTUwYjI2NDg3ZDc0YjRjOGUxZGMwMDAzODA3Zjgi.YsaZsQ.SrP1_NXVfnSZ6uW16V25vPE7yqo
> accept: */*
> Content-Type: application/json
> Content-Length: 222
>
* upload completely sent off: 222 out of 222 bytes
* HTTP 1.0, assume close after body
< HTTP/1.0 400 BAD REQUEST
< Content-Type: text/html; charset=utf-8
< Content-Length: 150
< Server: Werkzeug/1.0.1 Python/3.7.10
< Date: Thu, 07 Jul 2022 08:33:01 GMT
<
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>400 Bad Request</title>
<h1>Bad Request</h1>
<p>The CSRF session token is missing.</p>
* Closing connection 0
@amotl Its clearly a bug, I have tried to create a guest user token from my rails app and i am keep getting error for CSRF token session is misisng. However, if i am trying from postman it is working fine.
Hi again,
using Superset 2.1.3, on a vanilla installation, I verified that maintaining a session, and supplying a CSRF token, is no longer needed to work with the HTTP API.
# Authenticate and acquire a JWT token.
AUTH_TOKEN=$(http http://localhost:8088/api/v1/security/login username=admin password=admin provider=db | jq -r .access_token)
# Create a data source item / database connection.
http http://localhost:8088/api/v1/database/ database_name="PostgreSQL Example" engine=postgres sqlalchemy_uri=postgres://[email protected]:5432 Authorization:"Bearer ${AUTH_TOKEN}"
Thanks a stack for improving the situation in this regard.
With kind regards, Andreas.
Hi again. After upgrading to the most recent Superset 3, the problem is back! Cheers, Andreas.
Request
http http://localhost:8088/api/v1/database/ database_name="PostgreSQL Example" engine=postgres sqlalchemy_uri=postgres://[email protected]:5432 Authorization:"Bearer ${AUTH_TOKEN}" --print hHbB
Response
{
"errors": [
{
"error_type": "GENERIC_BACKEND_ERROR",
"extra": {
"issue_codes": [
{
"code": 1011,
"message": "Issue 1011 - Superset encountered an unexpected error."
}
]
},
"level": "error",
"message": "400 Bad Request: The CSRF token is missing."
}
]
}
I see. With Superset 3, you need to configure WTF_CSRF_ENABLED = False in superset_config.py. Then, communicating with the HTTP API works without needing to use a corresponding CSRF token. That's fine for my specific purpose, but I am wondering if CSRF protection would be turned off completely then, also on requests from browsers?
I have this with latest Superset Docker image from the Docker hub.
Please, sort this out, this is ridiculous!
This has gone slilent for upward of a year, and is a bit confusing at this point, since it was originally reported in an older (unsupported) version. Maybe @dosu-bot can give us some advice and help summarize the current state of affairs.
Hi Evan. Unless anything has been fixed, I guess nothing has changed/improved in this regard.
After upgrading to the most recent Superset 3, the problem is back!
We had to use WTF_CSRF_ENABLED = False, in order to make pure HTTP API conversations possible, see https://github.com/crate/cratedb-examples/commit/e49671eb6dc62ff0adb009160d5d5d5ecc57b532. We think it should not be required to turn that off, because this would on the other hand make the web-based conversations more vulnerable?
I got it working by persisting a session and updating its headers (also the 'Referer' one):
import requests
BASE_URL = '...' # your base Superset URL
LOGIN = '/api/v1/security/login'
CSRF_TOKEN = '/api/v1/security/csrf_token/'
DATASET = '/api/v1/dataset/'
session = requests.Session()
session.headers.update({'Referer': BASE_URL})
res = requests.post(BASE_URL + LOGIN, json=auth_payload)
AUTH_HEADER = {
'Authorization': f'Bearer {res.json()["access_token"]}'
}
session.headers.update(AUTH_HEADER)
res = session.get(BASE_URL + CSRF_TOKEN)
CSRF_TOKEN_HEADER = {"X-CSRFToken": f"{res.json()['result']}"}
session.headers.update(CSRF_TOKEN_HEADER)
After this creating a dataset via a POST request to /api/v1/dataset/'
Yeah, this works. However, it's difficult to maintain a session on the command line. HTTP sessions are mostly not in the same box like API-style access, but for users instead.