In as we speak’s world, Reliability Every thing about knowledge options. Once you create a dashboard and report, you possibly can anticipate the numbers mirrored there to be appropriate and updated. Primarily based on these numbers, insights are drawn and motion is taken. In case your dashboard is damaged for sudden causes or if the numbers are incorrect – it’s Fireplace extinguishing To repair all the pieces. If the issue shouldn’t be mounted in time, it’s belief Will probably be positioned on the info crew and its options.
However why do dashboards get damaged or have incorrect numbers? If the dashboard is constructed accurately for the primary time, the issue is 99% of the time from the info (knowledge warehouse) fed to the dashboard. Some doable eventualities are:
- There have been few ETL pipelines that failed, so there isn’t a new knowledge but
- The desk will likely be changed by one other new one
- Some columns within the desk are dropped or renamed
- Knowledge Warehouse schema modified
- And extra.
It is nonetheless doable that the issue lies with the Tableau website, however in my expertise, most often it is at all times because of some adjustments to the info warehouse. Even when you realize the underlying trigger, it isn’t at all times simple to begin fixing it. There’s There isn’t any central location You possibly can see which Tableau knowledge sources rely on a selected desk. When you have a Tableau Knowledge Administration add-on, that could be helpful, however from what I do know, it is troublesome to seek out dependencies for customized SQL queries utilized in knowledge sources.
Nonetheless, the add-ons are too costly and most firms haven’t got them. The actual ache begins when you might want to manually undergo all the info sources and begin fixing. On high of that, there is a collection of customers full on my head, ready for a fast repair. The repair itself might not be troublesome, however it merely takes time.
What if you are able to do it? Predict these issues and Establish the affected knowledge sources Earlier than anybody notices the issue? Is not that nice? Properly, there is a strategy to do it now. Metadata API. The Metadata API makes use of GraphQL, the question language for the API that returns solely knowledge of curiosity. Try extra about GraphQL potentialities. graphql.org.
On this weblog put up, we are going to present you ways to hook up with Tableau Metadata API Utilizing the Tableau Server shopper in Python (TSC) A proactively figuring out library Use a selected desk and knowledge supply that will help you act sooner earlier than issues come up. As soon as you realize which Tableau knowledge sources are affected by a selected desk, you possibly can replace them your self or alert the proprietor of these knowledge sources about future adjustments to allow them to be ready.
Connecting to the Tableau Metadata API
Hook up with the Tableau server utilizing TSC. It is advisable import it into all of the libraries you want for train!
### Import all required libraries
import tableauserverclient as t
import pandas as pd
import json
import ast
import re
To hook up with the Metadata API, you should first create a private entry token in your Tableau account settings. Subsequent, replace <API_TOKEN_NAME>
& <TOKEN_KEY>
Use solely the token you created. I am going to replace once more <YOUR_SITE>
On the Tableau website. If the connection is efficiently established, a “connection” will likely be printed within the output window.
### Hook up with Tableau server utilizing private entry token
tableau_auth = t.PersonalAccessTokenAuth("<API_TOKEN_NAME>", "<TOKEN_KEY>",
site_id="<YOUR_SITE>")
server = t.Server("https://dub01.on-line.tableau.com/", use_server_version=True)
with server.auth.sign_in(tableau_auth):
print("Linked")
Right here you’ll get a listing of all the info sources printed in your website. There are lots of attributes that may be retrieved, however within the present use case, you possibly can preserve it easy and get solely the ID, identify, and proprietor contact info for all knowledge sources. This will likely be a grasp record that may add all the opposite info.
############### Get all of the record of information sources in your Website
all_datasources_query = """ {
publishedDatasources {
identify
id
proprietor {
identify
e mail
}
}
}"""
with server.auth.sign_in(tableau_auth):
end result = server.metadata.question(
all_datasources_query
)
I wish to deal with this weblog on the right way to actively establish which knowledge sources are affected by a selected desk, so I do not go into the nuances of the Metadata API. To raised perceive how a question works, you possibly can seek advice from the very detailed tableau itself Metadata API Documentation.
One factor to notice is that the Metadata API returns knowledge in JSON format. Relying on what you’re querying, you will note a number of nested JSON lists, which will be very troublesome to transform into Pandas dataframes. For the above metadata question, you’ll get the outcomes you need (that is mock knowledge to offer you an thought of what the output will appear like):
{
"knowledge": {
"publishedDatasources": [
{
"name": "Sales Performance DataSource",
"id": "f3b1a2c4-1234-5678-9abc-1234567890ab",
"owner": {
"name": "Alice Johnson",
"email": "[email protected]"
}
},
{
"identify": "Buyer Orders DataSource",
"id": "a4d2b3c5-2345-6789-abcd-2345678901bc",
"proprietor": {
"identify": "Bob Smith",
"e mail": "[email protected]"
}
},
{
"identify": "Product Returns and Profitability",
"id": "c5e3d4f6-3456-789a-bcde-3456789012cd",
"proprietor": {
"identify": "Alice Johnson",
"e mail": "[email protected]"
}
},
{
"identify": "Buyer Segmentation Evaluation",
"id": "d6f4e5a7-4567-89ab-cdef-4567890123de",
"proprietor": {
"identify": "Charlie Lee",
"e mail": "[email protected]"
}
},
{
"identify": "Regional Gross sales Developments (Customized SQL)",
"id": "e7a5f6b8-5678-9abc-def0-5678901234ef",
"proprietor": {
"identify": "Bob Smith",
"e mail": "[email protected]"
}
}
]
}
}
It is advisable convert this JSON response into a knowledge body to make it simpler to govern. Observe that you might want to extract the proprietor’s identify and e mail from inside the proprietor’s object.
### We have to convert the response into dataframe for straightforward knowledge manipulation
col_names = end result['data']['publishedDatasources'][0].keys()
master_df = pd.DataFrame(columns=col_names)
for i in end result['data']['publishedDatasources']:
tmp_dt = {okay:v for okay,v in i.objects()}
master_df = pd.concat([master_df, pd.DataFrame.from_dict(tmp_dt, orient='index').T])
# Extract the proprietor identify and e mail from the proprietor object
master_df['owner_name'] = master_df['owner'].apply(lambda x: x.get('identify') if isinstance(x, dict) else None)
master_df['owner_email'] = master_df['owner'].apply(lambda x: x.get('e mail') if isinstance(x, dict) else None)
master_df.reset_index(inplace=True)
master_df.drop(['index','owner'], axis=1, inplace=True)
print('There are ', master_df.form[0] , ' datasources in your website')
That is how it’s structured master_df
Look:
After getting ready your essential record, you possibly can proceed and begin getting the names of the tables embedded within the knowledge supply. If you’re an avid Tableau person, there are two methods to pick a desk in a Tableau knowledge supply. One is to make use of a number of tables to attain a brand new end result desk utilizing a customized SQL question by straight choosing tables and establishing a relationship between them and the opposite. Subsequently, each circumstances have to be addressed.
Dealing with customized SQL question tables
Beneath is a question that retrieves a listing of all of the customized SQLs utilized by the location together with the info supply: Observe that I filtered the record to get solely the primary 500 customized SQL queries. In lots of circumstances, your group might want to use offsets to retrieve the next customized SQL queries: There’s additionally an possibility to make use of the cursor methodology in pagination when getting a big record of outcomes (see here). For ease, I take advantage of the offset methodology as I do know, as I’ve lower than 500 customized SQL queries utilized in my website.
# Get the info sources and the desk names from all of the customized sql queries used in your Website
custom_table_query = """ {
customSQLTablesConnection(first: 500){
nodes {
id
identify
downstreamDatasources {
identify
}
question
}
}
}
"""
with server.auth.sign_in(tableau_auth):
custom_table_query_result = server.metadata.question(
custom_table_query
)
Primarily based on the mock knowledge, that is what our output appears to be like like.
{
"knowledge": {
"customSQLTablesConnection": {
"nodes": [
{
"id": "csql-1234",
"name": "RegionalSales_CustomSQL",
"downstreamDatasources": [
{
"name": "Regional Sales Trends (Custom SQL)"
}
],
"question": "SELECT r.region_name, SUM(s.sales_amount) AS total_sales FROM ecommerce.sales_data.Gross sales s JOIN ecommerce.sales_data.Areas r ON s.region_id = r.region_id GROUP BY r.region_name"
},
{
"id": "csql-5678",
"identify": "ProfitabilityAnalysis_CustomSQL",
"downstreamDatasources": [
{
"name": "Product Returns and Profitability"
}
],
"question": "SELECT p.product_category, SUM(s.revenue) AS total_profit FROM ecommerce.sales_data.Gross sales s JOIN ecommerce.sales_data.Merchandise p ON s.product_id = p.product_id GROUP BY p.product_category"
},
{
"id": "csql-9101",
"identify": "CustomerSegmentation_CustomSQL",
"downstreamDatasources": [
{
"name": "Customer Segmentation Analysis"
}
],
"question": "SELECT c.customer_id, c.location, COUNT(o.order_id) AS total_orders FROM ecommerce.sales_data.Clients c JOIN ecommerce.sales_data.Orders o ON c.customer_id = o.customer_id GROUP BY c.customer_id, c.location"
},
{
"id": "csql-3141",
"identify": "CustomerOrders_CustomSQL",
"downstreamDatasources": [
{
"name": "Customer Orders DataSource"
}
],
"question": "SELECT o.order_id, o.customer_id, o.order_date, o.sales_amount FROM ecommerce.sales_data.Orders o WHERE o.order_status = 'Accomplished'"
},
{
"id": "csql-3142",
"identify": "CustomerProfiles_CustomSQL",
"downstreamDatasources": [
{
"name": "Customer Orders DataSource"
}
],
"question": "SELECT c.customer_id, c.customer_name, c.section, c.location FROM ecommerce.sales_data.Clients c WHERE c.active_flag = 1"
},
{
"id": "csql-3143",
"identify": "CustomerReturns_CustomSQL",
"downstreamDatasources": [
{
"name": "Customer Orders DataSource"
}
],
"question": "SELECT r.return_id, r.order_id, r.return_reason FROM ecommerce.sales_data.Returns r"
}
]
}
}
}
Identical to earlier than I used to be making a grasp record of information sources, I nested JSON for my downstream knowledge sources right here. There you might want to extract solely the “identify” half. The Question column dumps the whole customized SQL. The Regex sample lets you simply seek for the names of tables utilized in queries.
We all know that desk names are at all times misplaced or come from participation clauses and often observe the format. <database_name>.<schema>.<table_name>
. <database_name>
It’s optionally available and isn’t used most often. There have been just a few queries utilizing this format, and we ended up getting solely the database and schema names, not the complete desk identify. After getting extracted the names of the info supply and the names of the tables, you might want to merge rows for every knowledge supply, as a number of customized SQL queries could also be utilized in a single knowledge supply.
### Convert the customized sql response into dataframe
col_names = custom_table_query_result['data']['customSQLTablesConnection']['nodes'][0].keys()
cs_df = pd.DataFrame(columns=col_names)
for i in custom_table_query_result['data']['customSQLTablesConnection']['nodes']:
tmp_dt = {okay:v for okay,v in i.objects()}
cs_df = pd.concat([cs_df, pd.DataFrame.from_dict(tmp_dt, orient='index').T])
# Extract the info supply identify the place the customized sql question was used
cs_df['data_source'] = cs_df.downstreamDatasources.apply(lambda x: x[0]['name'] if x and 'identify' in x[0] else None)
cs_df.reset_index(inplace=True)
cs_df.drop(['index','downstreamDatasources'], axis=1,inplace=True)
### We have to extract the desk names from the sql question. We all know the desk identify comes after FROM or JOIN clause
# Observe that the identify of desk will be of the format <data_warehouse>.<schema>.<table_name>
# Relying on the format of how desk known as, you'll have to modify the regex expression
def extract_tables(sql):
# Regex to match database.schema.desk or schema.desk, keep away from alias
sample = r'(?:FROM|JOIN)s+((?:[w+]|w+).(?:[w+]|w+)(?:.(?:[w+]|w+))?)b'
matches = re.findall(sample, sql, re.IGNORECASE)
return record(set(matches)) # Distinctive desk names
cs_df['customSQLTables'] = cs_df['query'].apply(extract_tables)
cs_df = cs_df[['data_source','customSQLTables']]
# We have to merge datasources as there will be a number of customized sqls utilized in the identical knowledge supply
cs_df = cs_df.groupby('data_source', as_index=False).agg({
'customSQLTables': lambda x: record(set(merchandise for sublist in x for merchandise in sublist)) # Flatten & make distinctive
})
print('There are ', cs_df.form[0], 'datasources with customized sqls utilized in it')
After performing the entire above operations, this cs_df
Look:

Processing common tables in a knowledge supply
Subsequent, you might want to get a listing of all of the common tables utilized by knowledge sources that aren’t a part of your customized SQL. There are two methods to deal with that. I am going to use both publishedDatasources
Objects and checks upstreamTables
Or use DatabaseTable
And test it out upstreamDatasources
. I would like outcomes on the knowledge supply stage so I am going to go the primary manner (principally, I have to be able to reuse some code once I have a look at a selected knowledge supply in additional element). Once more, as a substitute of going to pagination for ease, I am looping via every knowledge supply to make sure all the info. Get upstreamTables
The within of the sphere object have to be cleaned out.
############### Get the info sources with the common desk names utilized in your website
### Its greatest to extract the tables info for each knowledge supply after which merge the outcomes.
# Since we solely get the desk info nested beneath fields, in case there are tons of of fields
# utilized in a single knowledge supply, we are going to hit the response limits and will be unable to retrieve all the info.
data_source_list = master_df.identify.tolist()
col_names = ['name', 'id', 'extractLastUpdateTime', 'fields']
ds_df = pd.DataFrame(columns=col_names)
with server.auth.sign_in(tableau_auth):
for ds_name in data_source_list:
question = """ {
publishedDatasources (filter: { identify: """"+ ds_name + """" }) {
identify
id
extractLastUpdateTime
fields {
identify
upstreamTables {
identify
}
}
}
} """
ds_name_result = server.metadata.question(
question
)
for i in ds_name_result['data']['publishedDatasources']:
tmp_dt = {okay:v for okay,v in i.objects() if okay != 'fields'}
tmp_dt['fields'] = json.dumps(i['fields'])
ds_df = pd.concat([ds_df, pd.DataFrame.from_dict(tmp_dt, orient='index').T])
ds_df.reset_index(inplace=True)
That is how it’s structured ds_df
Please look:

It must be flat fields
Extracts area names and desk names as objects. Desk names are repeated a number of instances, in order that they have to be estimated to maintain solely distinctive ones.
# Operate to extract the values of fields and upstream tables in json lists
def extract_values(json_list, key):
values = []
for merchandise in json_list:
values.append(merchandise[key])
return values
ds_df["fields"] = ds_df["fields"].apply(ast.literal_eval)
ds_df['field_names'] = ds_df.apply(lambda x: extract_values(x['fields'],'identify'), axis=1)
ds_df['upstreamTables'] = ds_df.apply(lambda x: extract_values(x['fields'],'upstreamTables'), axis=1)
# Operate to extract the distinctive desk names
def extract_upstreamTable_values(table_list):
values = set()a
for inner_list in table_list:
for merchandise in inner_list:
if 'identify' in merchandise:
values.add(merchandise['name'])
return record(values)
ds_df['upstreamTables'] = ds_df.apply(lambda x: extract_upstreamTable_values(x['upstreamTables']), axis=1)
ds_df.drop(["index","fields"], axis=1, inplace=True)
When performing the above operations, the ultimate construction of ds_df
It could look one thing like this:

We have now all of the items, and now we have to merge them collectively:
###### Be part of all the info collectively
master_data = pd.merge(master_df, ds_df, how="left", on=["name","id"])
master_data = pd.merge(master_data, cs_df, how="left", left_on="identify", right_on="data_source")
# Save the outcomes to analyse additional
master_data.to_excel("Tableau Knowledge Sources with Tables.xlsx", index=False)
That is our last master_data
:

Desk-level impression evaluation
Suppose you may have a schema change within the Gross sales desk and also you wish to know which knowledge sources are affected. Subsequent, you possibly can simply write a small perform that checks if the desk exists in one of many two columns.upstreamTables
or customSQLTables
As follows:
def filter_rows_with_table(df, col1, col2, target_table):
"""
Filters rows in df the place target_table is a part of any worth in both col1 or col2 (helps partial match).
Returns full rows (all columns retained).
"""
return df[
df.apply(
lambda row:
(isinstance(row[col1], record) and any(target_table in merchandise for merchandise in row[col1])) or
(isinstance(row[col2], record) and any(target_table in merchandise for merchandise in row[col2])),
axis=1
)
]
# For instance
filter_rows_with_table(master_data, 'upstreamTables', 'customSQLTables', 'Gross sales')
Beneath is the output. You possibly can see that three knowledge sources are affected by this modification. It’s also possible to warn Alice and Bob, the house owners of the info supply, about this prematurely. That manner you can begin fixing issues earlier than something breaks on the tableau dashboard.

You possibly can test the complete model of the code in my github repository here.
This is only one potential use case for the Tableau Metadata API. It’s also possible to extract the sphere names utilized in customized SQL queries and add them to the dataset to get field-level impression evaluation. It’s also possible to monitor previous knowledge sources extractLastUpdateTime
If they don’t seem to be used any extra, to see if there’s a downside with them or in the event that they have to be archived. It’s also possible to use it dashboards
An object that retrieves info on the dashboard stage.
Ultimate ideas
Should you’re right here, reward me. This is only one use case for automating Tableau knowledge administration. It is time to look again by yourself work and take into consideration which different duties you possibly can automate to make your life simpler. We hope this mini-project will function a enjoyable studying expertise to know the facility of the Tableau Metadata API. Should you wish to learn this, you may like one other considered one of my weblog posts about Tableau.
In my earlier weblog, I additionally regarded into constructing interactive, database-driven apps utilizing Python, Streamlit, and SQLite.
Earlier than you go…
Please observe me in order that you do not miss any new posts I’ve written sooner or later. You’ll discover extra of my articles. You possibly can join with me too LinkedIn or Twitter!