No extra downtime in Tableau: Metadata API for proactive knowledge well being

by root March 23, 2025

written by root March 23, 2025 0 comment 175 views

In as we speak’s world, Reliability Every thing about knowledge options. Once you create a dashboard and report, you possibly can anticipate the numbers mirrored there to be appropriate and updated. Primarily based on these numbers, insights are drawn and motion is taken. In case your dashboard is damaged for sudden causes or if the numbers are incorrect – it’s Fireplace extinguishing To repair all the pieces. If the issue shouldn’t be mounted in time, it’s belief Will probably be positioned on the info crew and its options.

However why do dashboards get damaged or have incorrect numbers? If the dashboard is constructed accurately for the primary time, the issue is 99% of the time from the info (knowledge warehouse) fed to the dashboard. Some doable eventualities are:

There have been few ETL pipelines that failed, so there isn’t a new knowledge but
The desk will likely be changed by one other new one
Some columns within the desk are dropped or renamed
Knowledge Warehouse schema modified
And extra.

It is nonetheless doable that the issue lies with the Tableau website, however in my expertise, most often it is at all times because of some adjustments to the info warehouse. Even when you realize the underlying trigger, it isn’t at all times simple to begin fixing it. There’s There isn’t any central location You possibly can see which Tableau knowledge sources rely on a selected desk. When you have a Tableau Knowledge Administration add-on, that could be helpful, however from what I do know, it is troublesome to seek out dependencies for customized SQL queries utilized in knowledge sources.

Nonetheless, the add-ons are too costly and most firms haven’t got them. The actual ache begins when you might want to manually undergo all the info sources and begin fixing. On high of that, there is a collection of customers full on my head, ready for a fast repair. The repair itself might not be troublesome, however it merely takes time.

What if you are able to do it? Predict these issues and Establish the affected knowledge sources Earlier than anybody notices the issue? Is not that nice? Properly, there is a strategy to do it now. Metadata API. The Metadata API makes use of GraphQL, the question language for the API that returns solely knowledge of curiosity. Try extra about GraphQL potentialities. graphql.org.

On this weblog put up, we are going to present you ways to hook up with Tableau Metadata API Utilizing the Tableau Server shopper in Python (TSC) A proactively figuring out library Use a selected desk and knowledge supply that will help you act sooner earlier than issues come up. As soon as you realize which Tableau knowledge sources are affected by a selected desk, you possibly can replace them your self or alert the proprietor of these knowledge sources about future adjustments to allow them to be ready.

Connecting to the Tableau Metadata API

Hook up with the Tableau server utilizing TSC. It is advisable import it into all of the libraries you want for train!

### Import all required libraries
import tableauserverclient as t
import pandas as pd
import json
import ast
import re

To hook up with the Metadata API, you should first create a private entry token in your Tableau account settings. Subsequent, replace <API_TOKEN_NAME> & <TOKEN_KEY> Use solely the token you created. I am going to replace once more <YOUR_SITE> On the Tableau website. If the connection is efficiently established, a “connection” will likely be printed within the output window.

### Hook up with Tableau server utilizing private entry token
tableau_auth = t.PersonalAccessTokenAuth("<API_TOKEN_NAME>", "<TOKEN_KEY>", 
                                           site_id="<YOUR_SITE>")
server = t.Server("https://dub01.on-line.tableau.com/", use_server_version=True)

with server.auth.sign_in(tableau_auth):
        print("Linked")

Right here you’ll get a listing of all the info sources printed in your website. There are lots of attributes that may be retrieved, however within the present use case, you possibly can preserve it easy and get solely the ID, identify, and proprietor contact info for all knowledge sources. This will likely be a grasp record that may add all the opposite info.

############### Get all of the record of information sources in your Website

all_datasources_query = """ {
  publishedDatasources {
    identify
    id
    proprietor {
    identify
    e mail
    }
  }
}"""
with server.auth.sign_in(tableau_auth):
    end result = server.metadata.question(
        all_datasources_query
    )

I wish to deal with this weblog on the right way to actively establish which knowledge sources are affected by a selected desk, so I do not go into the nuances of the Metadata API. To raised perceive how a question works, you possibly can seek advice from the very detailed tableau itself Metadata API Documentation.

One factor to notice is that the Metadata API returns knowledge in JSON format. Relying on what you’re querying, you will note a number of nested JSON lists, which will be very troublesome to transform into Pandas dataframes. For the above metadata question, you’ll get the outcomes you need (that is mock knowledge to offer you an thought of what the output will appear like):

{
  "knowledge": {
    "publishedDatasources": [
      {
        "name": "Sales Performance DataSource",
        "id": "f3b1a2c4-1234-5678-9abc-1234567890ab",
        "owner": {
          "name": "Alice Johnson",
          "email": "[email protected]"
        }
      },
      {
        "identify": "Buyer Orders DataSource",
        "id": "a4d2b3c5-2345-6789-abcd-2345678901bc",
        "proprietor": {
          "identify": "Bob Smith",
          "e mail": "[email protected]"
        }
      },
      {
        "identify": "Product Returns and Profitability",
        "id": "c5e3d4f6-3456-789a-bcde-3456789012cd",
        "proprietor": {
          "identify": "Alice Johnson",
          "e mail": "[email protected]"
        }
      },
      {
        "identify": "Buyer Segmentation Evaluation",
        "id": "d6f4e5a7-4567-89ab-cdef-4567890123de",
        "proprietor": {
          "identify": "Charlie Lee",
          "e mail": "[email protected]"
        }
      },
      {
        "identify": "Regional Gross sales Developments (Customized SQL)",
        "id": "e7a5f6b8-5678-9abc-def0-5678901234ef",
        "proprietor": {
          "identify": "Bob Smith",
          "e mail": "[email protected]"
        }
      }
    ]
  }
}

It is advisable convert this JSON response into a knowledge body to make it simpler to govern. Observe that you might want to extract the proprietor’s identify and e mail from inside the proprietor’s object.

### We have to convert the response into dataframe for straightforward knowledge manipulation

col_names = end result['data']['publishedDatasources'][0].keys()
master_df = pd.DataFrame(columns=col_names)

for i in end result['data']['publishedDatasources']:
    tmp_dt = {okay:v for okay,v in i.objects()}
    master_df = pd.concat([master_df, pd.DataFrame.from_dict(tmp_dt, orient='index').T])

# Extract the proprietor identify and e mail from the proprietor object
master_df['owner_name'] = master_df['owner'].apply(lambda x: x.get('identify') if isinstance(x, dict) else None)
master_df['owner_email'] = master_df['owner'].apply(lambda x: x.get('e mail') if isinstance(x, dict) else None)

master_df.reset_index(inplace=True)
master_df.drop(['index','owner'], axis=1, inplace=True)
print('There are ', master_df.form[0] , ' datasources in your website')

That is how it’s structured master_df Look:

Code pattern output

After getting ready your essential record, you possibly can proceed and begin getting the names of the tables embedded within the knowledge supply. If you’re an avid Tableau person, there are two methods to pick a desk in a Tableau knowledge supply. One is to make use of a number of tables to attain a brand new end result desk utilizing a customized SQL question by straight choosing tables and establishing a relationship between them and the opposite. Subsequently, each circumstances have to be addressed.

Dealing with customized SQL question tables

Beneath is a question that retrieves a listing of all of the customized SQLs utilized by the location together with the info supply: Observe that I filtered the record to get solely the primary 500 customized SQL queries. In lots of circumstances, your group might want to use offsets to retrieve the next customized SQL queries: There’s additionally an possibility to make use of the cursor methodology in pagination when getting a big record of outcomes (see here). For ease, I take advantage of the offset methodology as I do know, as I’ve lower than 500 customized SQL queries utilized in my website.

# Get the info sources and the desk names from all of the customized sql queries used in your Website

custom_table_query = """  {
  customSQLTablesConnection(first: 500){
    nodes {
        id
        identify
        downstreamDatasources {
        identify
        }
        question
    }
  }
}
"""

with server.auth.sign_in(tableau_auth):
    custom_table_query_result = server.metadata.question(
        custom_table_query
    )

Primarily based on the mock knowledge, that is what our output appears to be like like.

{
  "knowledge": {
    "customSQLTablesConnection": {
      "nodes": [
        {
          "id": "csql-1234",
          "name": "RegionalSales_CustomSQL",
          "downstreamDatasources": [
            {
              "name": "Regional Sales Trends (Custom SQL)"
            }
          ],
          "question": "SELECT r.region_name, SUM(s.sales_amount) AS total_sales FROM ecommerce.sales_data.Gross sales s JOIN ecommerce.sales_data.Areas r ON s.region_id = r.region_id GROUP BY r.region_name"
        },
        {
          "id": "csql-5678",
          "identify": "ProfitabilityAnalysis_CustomSQL",
          "downstreamDatasources": [
            {
              "name": "Product Returns and Profitability"
            }
          ],
          "question": "SELECT p.product_category, SUM(s.revenue) AS total_profit FROM ecommerce.sales_data.Gross sales s JOIN ecommerce.sales_data.Merchandise p ON s.product_id = p.product_id GROUP BY p.product_category"
        },
        {
          "id": "csql-9101",
          "identify": "CustomerSegmentation_CustomSQL",
          "downstreamDatasources": [
            {
              "name": "Customer Segmentation Analysis"
            }
          ],
          "question": "SELECT c.customer_id, c.location, COUNT(o.order_id) AS total_orders FROM ecommerce.sales_data.Clients c JOIN ecommerce.sales_data.Orders o ON c.customer_id = o.customer_id GROUP BY c.customer_id, c.location"
        },
        {
          "id": "csql-3141",
          "identify": "CustomerOrders_CustomSQL",
          "downstreamDatasources": [
            {
              "name": "Customer Orders DataSource"
            }
          ],
          "question": "SELECT o.order_id, o.customer_id, o.order_date, o.sales_amount FROM ecommerce.sales_data.Orders o WHERE o.order_status = 'Accomplished'"
        },
        {
          "id": "csql-3142",
          "identify": "CustomerProfiles_CustomSQL",
          "downstreamDatasources": [
            {
              "name": "Customer Orders DataSource"
            }
          ],
          "question": "SELECT c.customer_id, c.customer_name, c.section, c.location FROM ecommerce.sales_data.Clients c WHERE c.active_flag = 1"
        },
        {
          "id": "csql-3143",
          "identify": "CustomerReturns_CustomSQL",
          "downstreamDatasources": [
            {
              "name": "Customer Orders DataSource"
            }
          ],
          "question": "SELECT r.return_id, r.order_id, r.return_reason FROM ecommerce.sales_data.Returns r"
        }
      ]
    }
  }
}

Identical to earlier than I used to be making a grasp record of information sources, I nested JSON for my downstream knowledge sources right here. There you might want to extract solely the “identify” half. The Question column dumps the whole customized SQL. The Regex sample lets you simply seek for the names of tables utilized in queries.

We all know that desk names are at all times misplaced or come from participation clauses and often observe the format. <database_name>.<schema>.<table_name>. <database_name> It’s optionally available and isn’t used most often. There have been just a few queries utilizing this format, and we ended up getting solely the database and schema names, not the complete desk identify. After getting extracted the names of the info supply and the names of the tables, you might want to merge rows for every knowledge supply, as a number of customized SQL queries could also be utilized in a single knowledge supply.

### Convert the customized sql response into dataframe
col_names = custom_table_query_result['data']['customSQLTablesConnection']['nodes'][0].keys()
cs_df = pd.DataFrame(columns=col_names)

for i in custom_table_query_result['data']['customSQLTablesConnection']['nodes']:
    tmp_dt = {okay:v for okay,v in i.objects()}

    cs_df = pd.concat([cs_df, pd.DataFrame.from_dict(tmp_dt, orient='index').T])

# Extract the info supply identify the place the customized sql question was used
cs_df['data_source'] = cs_df.downstreamDatasources.apply(lambda x: x[0]['name'] if x and 'identify' in x[0] else None)
cs_df.reset_index(inplace=True)
cs_df.drop(['index','downstreamDatasources'], axis=1,inplace=True)

### We have to extract the desk names from the sql question. We all know the desk identify comes after FROM or JOIN clause
# Observe that the identify of desk will be of the format <data_warehouse>.<schema>.<table_name>
# Relying on the format of how desk known as, you'll have to modify the regex expression

def extract_tables(sql):
    # Regex to match database.schema.desk or schema.desk, keep away from alias
    sample = r'(?:FROM|JOIN)s+((?:[w+]|w+).(?:[w+]|w+)(?:.(?:[w+]|w+))?)b'
    matches = re.findall(sample, sql, re.IGNORECASE)
    return record(set(matches))  # Distinctive desk names

cs_df['customSQLTables'] = cs_df['query'].apply(extract_tables)
cs_df = cs_df[['data_source','customSQLTables']]

# We have to merge datasources as there will be a number of customized sqls utilized in the identical knowledge supply
cs_df = cs_df.groupby('data_source', as_index=False).agg({
    'customSQLTables': lambda x: record(set(merchandise for sublist in x for merchandise in sublist))  # Flatten & make distinctive
})

print('There are ', cs_df.form[0], 'datasources with customized sqls utilized in it')

After performing the entire above operations, this cs_df Look:

Processing common tables in a knowledge supply

Subsequent, you might want to get a listing of all of the common tables utilized by knowledge sources that aren’t a part of your customized SQL. There are two methods to deal with that. I am going to use both publishedDatasources Objects and checks upstreamTables Or use DatabaseTable And test it out upstreamDatasources. I would like outcomes on the knowledge supply stage so I am going to go the primary manner (principally, I have to be able to reuse some code once I have a look at a selected knowledge supply in additional element). Once more, as a substitute of going to pagination for ease, I am looping via every knowledge supply to make sure all the info. Get upstreamTables The within of the sphere object have to be cleaned out.

############### Get the info sources with the common desk names utilized in your website

### Its greatest to extract the tables info for each knowledge supply after which merge the outcomes.
# Since we solely get the desk info nested beneath fields, in case there are tons of of fields 
# utilized in a single knowledge supply, we are going to hit the response limits and will be unable to retrieve all the info.

data_source_list = master_df.identify.tolist()

col_names = ['name', 'id', 'extractLastUpdateTime', 'fields']
ds_df = pd.DataFrame(columns=col_names)

with server.auth.sign_in(tableau_auth):
    for ds_name in data_source_list:
        question = """ {
            publishedDatasources (filter: { identify: """"+ ds_name + """" }) {
            identify
            id
            extractLastUpdateTime
            fields {
                identify
                upstreamTables {
                    identify
                }
            }
            }
        } """
        ds_name_result = server.metadata.question(
        question
        )
        for i in ds_name_result['data']['publishedDatasources']:
            tmp_dt = {okay:v for okay,v in i.objects() if okay != 'fields'}
            tmp_dt['fields'] = json.dumps(i['fields'])
        ds_df = pd.concat([ds_df, pd.DataFrame.from_dict(tmp_dt, orient='index').T])

ds_df.reset_index(inplace=True)

That is how it’s structured ds_df Please look:

It must be flat fields Extracts area names and desk names as objects. Desk names are repeated a number of instances, in order that they have to be estimated to maintain solely distinctive ones.

# Operate to extract the values of fields and upstream tables in json lists
def extract_values(json_list, key):
    values = []
    for merchandise in json_list:
        values.append(merchandise[key])
    return values

ds_df["fields"] = ds_df["fields"].apply(ast.literal_eval)
ds_df['field_names'] = ds_df.apply(lambda x: extract_values(x['fields'],'identify'), axis=1)
ds_df['upstreamTables'] = ds_df.apply(lambda x: extract_values(x['fields'],'upstreamTables'), axis=1)

# Operate to extract the distinctive desk names 
def extract_upstreamTable_values(table_list):
    values = set()a
    for inner_list in table_list:
        for merchandise in inner_list:
            if 'identify' in merchandise:
                values.add(merchandise['name'])
    return record(values)

ds_df['upstreamTables'] = ds_df.apply(lambda x: extract_upstreamTable_values(x['upstreamTables']), axis=1)
ds_df.drop(["index","fields"], axis=1, inplace=True)

When performing the above operations, the ultimate construction of ds_df It could look one thing like this:

We have now all of the items, and now we have to merge them collectively:

###### Be part of all the info collectively
master_data = pd.merge(master_df, ds_df, how="left", on=["name","id"])
master_data = pd.merge(master_data, cs_df, how="left", left_on="identify", right_on="data_source")

# Save the outcomes to analyse additional
master_data.to_excel("Tableau Knowledge Sources with Tables.xlsx", index=False)

That is our last master_data:

Desk-level impression evaluation

Suppose you may have a schema change within the Gross sales desk and also you wish to know which knowledge sources are affected. Subsequent, you possibly can simply write a small perform that checks if the desk exists in one of many two columns.upstreamTablesor customSQLTables As follows:

def filter_rows_with_table(df, col1, col2, target_table):
    """
    Filters rows in df the place target_table is a part of any worth in both col1 or col2 (helps partial match).
    Returns full rows (all columns retained).
    """
    return df[
        df.apply(
            lambda row: 
                (isinstance(row[col1], record) and any(target_table in merchandise for merchandise in row[col1])) or
                (isinstance(row[col2], record) and any(target_table in merchandise for merchandise in row[col2])),
            axis=1
        )
    ]
# For instance 
filter_rows_with_table(master_data, 'upstreamTables', 'customSQLTables', 'Gross sales')

Beneath is the output. You possibly can see that three knowledge sources are affected by this modification. It’s also possible to warn Alice and Bob, the house owners of the info supply, about this prematurely. That manner you can begin fixing issues earlier than something breaks on the tableau dashboard.

You possibly can test the complete model of the code in my github repository here.

This is only one potential use case for the Tableau Metadata API. It’s also possible to extract the sphere names utilized in customized SQL queries and add them to the dataset to get field-level impression evaluation. It’s also possible to monitor previous knowledge sources extractLastUpdateTime If they don’t seem to be used any extra, to see if there’s a downside with them or in the event that they have to be archived. It’s also possible to use it dashboards An object that retrieves info on the dashboard stage.

Ultimate ideas

Should you’re right here, reward me. This is only one use case for automating Tableau knowledge administration. It is time to look again by yourself work and take into consideration which different duties you possibly can automate to make your life simpler. We hope this mini-project will function a enjoyable studying expertise to know the facility of the Tableau Metadata API. Should you wish to learn this, you may like one other considered one of my weblog posts about Tableau.

In my earlier weblog, I additionally regarded into constructing interactive, database-driven apps utilizing Python, Streamlit, and SQLite.

Earlier than you go…

Please observe me in order that you do not miss any new posts I’ve written sooner or later. You’ll discover extra of my articles. You possibly can join with me too LinkedIn or Twitter!

Welcome to Ivugangingo!

At Ivugangingo, we're passionate about delivering insightful content that empowers and informs our readers across a spectrum of crucial topics. Whether you're delving into the world of insurance, navigating the complexities of cryptocurrency, or seeking wellness tips in health and fitness, we've got you covered.

No extra downtime in Tableau: Metadata API for proactive knowledge well being

Connecting to the Tableau Metadata API

Dealing with customized SQL question tables

Processing common tables in a knowledge supply

Desk-level impression evaluation

Ultimate ideas

We. The SEC says crypto mining for work proof is just not a securities supply

Hegstes orders the elimination of the Pentagon’s local weather plan, however needs to organize for excessive climate

Converter

Editors Pick

Newsletter

Categories

Related Posts

Leave a Comment Cancel Reply

Latest

Best selling

Top rated