server-client-python icon indicating copy to clipboard operation
server-client-python copied to clipboard

How to download only extracts from tableau server

Open scarmati opened this issue 5 years ago β€’ 7 comments

Hello, im new with this and i want to know how to download only the extract from the server.

scarmati avatar Apr 02 '20 19:04 scarmati

@scarmati I'm having the same issue. I am able to download the .tdsx file but I cannot do anything with the file at the moment. I did see that a .hyper file can be converted to a pandas dataframe.

EDIT: Full solution to download a datasource from tableau server and convert to a pandas dataframe below:

import tableauserverclient as TSC
import numpy as np
import pandas as pd
# import tableauhyperapi
import pantab as pt
import zipfile

tableau_auth = TSC.TableauAuth(username, pw)
server = TSC.Server('myTableauSever.com')

#Download .tdsx file
with server.auth.sign_in(tableau_auth):
    server.datasources.download(datasource_id= mydatasourceid)

#Convert .tdsx file to .hyper
with zipfile.ZipFile('myFile.tdsx', 'r') as zip_ref:
    zip_ref.extractall('test')

#pantab is a wrapper that uses Tableau Hyper API to convert a hyper file into a dictionary.
test = pt.frames_from_hyper(database='myFile.hyper')
test = pd.DataFrame(list(test.values())[0])

datasurfergtx avatar Oct 05 '20 17:10 datasurfergtx

@scarmati I figured out the problem. You will need zipfile to unzip the .tdsx file.

import zipfile

with zipfile.ZipFile('myFile.tdsx', 'r') as zip_ref:
    zip_ref.extractall('test')

Hope this helps. I'll edit my original reply for the full code.

datasurfergtx avatar Oct 06 '20 20:10 datasurfergtx

Hi @datasurfergtx

I am new to python. In your code when you say myFile.tdsx or myFile.hyper, are you naming the currently downloaded file, or is it a general name, and I need to change it to the right tdsx and hyper file names. If so, how do I identify these names through code dynamically and use them.

keshavsainik avatar Mar 29 '23 02:03 keshavsainik

Hi @keshavsainik

The file naming is structure consistent. I don't remember 100% but I can share the code I wrote 3 years ago lol

#Connect to Server
tableau_auth = TSC.TableauAuth(os.environ["TABLEAU_USERNAME"], os.environ["TABLEAU_PASSWORD"])
server = TSC.Server('myTableauWebsiteUrl',use_server_version=True)

#Create a list of mtv datasources
def datasources():
    with server.auth.sign_in(tableau_auth):
        all_datasources, pagination_item = server.datasources.get()
        my_datasources = [(datasource.name, datasource.id) for datasource in all_datasources]
        my_datasources = pd.DataFrame(my_datasources,columns= ['datasource','id'])
        return my_datasources

#datasource key generator
def datasource_key(datasource_name):
    my_datasources = datasources()
    ds_key = my_datasources.loc[my_datasources['datasource']== datasource_name, 'id'].values[0]
    ds_name = my_datasources.loc[my_datasources['datasource']== datasource_name, 'datasource'].values[0].replace(" ","_").replace(".","").replace(":","_").lower()
    return (ds_key, ds_name)

#Download a .tdsx file and unzip file
def download_ds(ds_key,ds_name):
    if not os.path.exists(f'../data/{ds_name}/'):
        os.makedirs(f'../data/{ds_name}/')
    with server.auth.sign_in(tableau_auth):
        server.datasources.download(filepath=f'../data/{ds_name}/{ds_name}_T', datasource_id=ds_key, include_extract=True)
    with zipfile.ZipFile(f'../data/{ds_name}/{ds_name}_T.tdsx', 'r') as zip_ref:
        zip_ref.extractall(f'../data/{ds_name}/{ds_name}_T')
    df = pt.frames_from_hyper(database = f'../data/{ds_name}/{ds_name}_T/Data/Extracts/hyper_0.hyper')
    df = pd.DataFrame(list(df.values())[0])
    return df
  1. datasources() function will generate a list of data sources in your tableau instance.
  2. Find the datasource you want to download and then use that name in the datasource_key() function. This will create the key value pair you will need to download your datasource.
  3. The last function download_ds will download the .tdx file and remove all the extra stuff and leave a clean file. The initial steps are used for automation. You can delete the references to the data directory if you don't need them. This will download the file and extract everything leaving a clean dataframe.

Here's an example of this in use:

ds = "My Tableau Report"
ds_key = datasource_key(ds)[0]
ds_name = datasource_key(ds)[1]
df = download_ds(ds_key,ds_name)

I hope this helps and Good luck.

datasurfergtx avatar Mar 29 '23 03:03 datasurfergtx

Hi @datasurfergtx ,

Thank you so much, I really appreciate itπŸ˜€.

keshavsainik avatar Mar 29 '23 03:03 keshavsainik

Hi @datasurfergtx, I am also looking into this and from the documentation (i.e. https://pantab.readthedocs.io/en/latest/api.html), I realised that instead of using "database" in the frames_from_hyper function, we are suppose to use "source"?

Screenshot 2023-03-30 at 3 49 35 PM

Screenshot 2023-03-30 at 3 47 54 PM

Currently, this is the outcome for using "source" and "database" respectively as shown below for your reference: Screenshot 2023-03-30 at 3 53 29 PM

Screenshot 2023-03-30 at 3 55 53 PM

This is the approach of how I have downloaded the .tdsx files and converting them into .hyper files as shown below for reference: Screenshot 2023-03-30 at 4 00 48 PM

Do you know if there's any solution to this? πŸ˜…πŸ™πŸ» (cc @keshavsainik, have you tried this out?)

fatraphael95 avatar Mar 30 '23 08:03 fatraphael95

Hi @fatraphael95 ,

You are right, database as a parameter in frames_from_hyper() didn't work, and the source as a parameter worked. I was able to extract data using source keyword and passing the location of the file as the value.

keshavsainik avatar Apr 04 '23 11:04 keshavsainik