As the size and complexity of information dealt with by organizations enhance, conventional rules-based approaches to analyzing the info alone are now not viable. As an alternative, organizations are more and more trying to reap the benefits of transformative applied sciences like machine studying (ML) and synthetic intelligence (AI) to ship progressive merchandise, enhance outcomes, and achieve operational efficiencies at scale. Moreover, the democratization of AI and ML by way of AWS and AWS Companion options is accelerating its adoption throughout all industries.
For instance, a health-tech firm could also be trying to enhance affected person care by predicting the chance that an aged affected person could turn out to be hospitalized by analyzing each scientific and non-clinical knowledge. This can permit them to intervene early, personalize the supply of care, and take advantage of environment friendly use of current sources, comparable to hospital mattress capability and nursing employees.
AWS affords the broadest and deepest set of AI and ML providers and supporting infrastructure, comparable to Amazon SageMaker and Amazon Bedrock, that will help you at each stage of your AI/ML adoption journey, together with adoption of generative AI. Splunk, an AWS Companion, affords a unified safety and observability platform constructed for pace and scale.
As the range and quantity of information will increase, it’s important to grasp how they are often harnessed at scale by utilizing complementary capabilities of the 2 platforms. For organizations trying past the usage of out-of-the-box Splunk AI/ML features, this publish explores how Amazon SageMaker Canvas, a no-code ML growth service, can be utilized at the side of knowledge collected in Splunk to drive actionable insights. We additionally show tips on how to use the generative AI capabilities of SageMaker Canvas to hurry up your knowledge exploration and assist you construct higher ML fashions.
Use case overview
On this instance, a health-tech firm providing distant affected person monitoring is gathering operational knowledge from wearables utilizing Splunk. These gadget metrics and logs are ingested into and saved in a Splunk index, a repository of incoming knowledge. Inside Splunk, this knowledge is used to satisfy context-specific safety and observability use circumstances by Splunk customers, comparable to monitoring the safety posture and uptime of gadgets and performing proactive upkeep of the fleet.
Individually, the corporate makes use of AWS knowledge providers, comparable to Amazon Easy Storage Service (Amazon S3), to retailer knowledge associated to sufferers, comparable to affected person info, gadget possession particulars, and scientific telemetry knowledge obtained from the wearables. These may embrace exports from buyer relationship administration (CRM), configuration administration database (CMDB), and digital well being document (EHR) programs. On this instance, they’ve entry to an extract of affected person info and hospital admission data that reside in an S3 bucket.
The next desk illustrates the completely different knowledge explored on this instance use case.
|
Description |
Characteristic Identify |
Storage |
Instance Supply |
|
|
Age of affected person |
|
AWS |
EHR |
|
|
Models of alcohol consumed by affected person each week |
|
AWS |
EHR |
|
|
Tobacco utilization by affected person per week |
|
AWS |
EHR |
|
|
Common systolic blood stress of affected person |
|
AWS |
Wearables |
|
|
Common diastolic blood stress of affected person |
|
AWS |
Wearables |
|
|
Common resting coronary heart charge of affected person |
|
AWS |
Wearables |
|
|
Affected person admission document |
|
AWS |
EHR |
|
|
Variety of days the gadget has been energetic over a interval |
|
Splunk |
Wearables |
|
|
Common finish of the day battery stage over a interval |
|
Splunk |
Wearables |
|
This publish describes an method with two key parts:
- The 2 knowledge sources are saved alongside one another utilizing a standard AWS knowledge engineering pipeline. Information is introduced to the personas that want entry utilizing a unified interface.
- An ML mannequin to foretell hospital admissions (
admitted) is developed utilizing the mixed dataset and SageMaker Canvas. Professionals with out a background in ML are empowered to research the info utilizing no-code tooling.
The answer permits customized ML fashions to be developed from a broader number of scientific and non-clinical knowledge sources to cater for various real-life eventualities. For instance, it may be used to reply questions comparable to “If sufferers will be apt to have their wearables turned off and there’s no scientific telemetry knowledge obtainable, can the probability that they’re hospitalized nonetheless be precisely predicted?”
AWS knowledge engineering pipeline
The adaptable method detailed on this publish begins with an automatic knowledge engineering pipeline to make knowledge saved in Splunk obtainable to a variety of personas, together with enterprise intelligence (BI) analysts, knowledge scientists, and ML practitioners, by way of a SQL interface. That is achieved by utilizing the pipeline to switch knowledge from a Splunk index into an S3 bucket, the place it will likely be cataloged.
The method is proven within the following diagram.
Determine 1: Structure overview of information engineering pipeline
The automated AWS knowledge pipeline consists of the next steps:
- Information from wearables is saved in a Splunk index the place it may be queried by customers, comparable to safety operations heart (SOC) analysts, utilizing the Splunk search processing language (SPL). Spunk’s out-of-the-box AI/ML capabilities, such because the Splunk Machine Learning Toolkit (Splunk MLTK) and purpose-built fashions for safety and observability use circumstances (for instance, for anomaly detection and forecasting), could be utilized contained in the Splunk Platform. Utilizing these Splunk ML options permits you to derive contextualized insights shortly with out the necessity for added AWS infrastructure or abilities.
- Some organizations could look to develop customized, differentiated ML fashions, or wish to construct AI-enabled functions utilizing AWS providers for his or her particular use circumstances. To facilitate this, an automatic knowledge engineering pipeline is constructed utilizing AWS Step Features. The Step Features state machine is configured with an AWS Lambda operate to retrieve knowledge from the Splunk index utilizing the Splunk Enterprise SDK for Python. The SPL question requested by way of this REST API name is scoped to solely retrieve the info of curiosity.
-
- Lambda helps container pictures. This answer makes use of a Lambda operate that runs a Docker container picture. This enables bigger knowledge manipulation libraries, comparable to pandas and PyArrow, to be included within the deployment package deal.
- If a big quantity of information is being exported, the code could must run for longer than the utmost attainable period, or require extra reminiscence than supported by Lambda features. If that is so, Step Features could be configured to straight run a container job on Amazon Elastic Container Service (Amazon ECS).
-
- For authentication and authorization, the Spunk bearer token is securely retrieved from AWS Secrets and techniques Supervisor by the Lambda operate earlier than calling the Splunk
/searchREST API endpoint. This bearer authentication token lets customers entry the REST endpoint utilizing an authenticated identification. - Information retrieved by the Lambda operate is reworked (if required) and uploaded to the designated S3 bucket alongside different datasets. This knowledge is partitioned and compressed, and saved in storage and performance-optimized Apache Parquet file format.
- As its final step, the Step Features state machine runs an AWS Glue crawler to deduce the schema of the Splunk knowledge residing within the S3 bucket, and catalogs it for wider consumption as tables utilizing the AWS Glue Information Catalog.
- Wearable knowledge exported from Splunk is now obtainable to customers and functions by way of the Information Catalog as a desk. Analytics tooling comparable to Amazon Athena can now be used to question the info utilizing SQL.
- As knowledge saved in your AWS setting grows, it’s important to have centralized governance in place. AWS Lake Formation permits you to simplify permissions administration and knowledge sharing to keep up safety and compliance.
An AWS Serverless Software Mannequin (AWS SAM) template is obtainable to deploy all AWS sources required by this answer. This template could be discovered within the accompanying GitHub repository.
Discuss with the README file for required conditions, deployment steps, and the method to check the info engineering pipeline answer.
AWS AI/ML analytics workflow
After the info engineering pipeline’s Step Features state machine efficiently completes and wearables knowledge from Splunk is accessible alongside affected person healthcare knowledge utilizing Athena, we use an instance method primarily based on SageMaker Canvas to drive actionable insights.
SageMaker Canvas is a no-code visible interface that empowers you to arrange knowledge, construct, and deploy extremely correct ML fashions, streamlining the end-to-end ML lifecycle in a unified setting. You’ll be able to put together and remodel knowledge by way of point-and-click interactions and pure language, powered by Amazon SageMaker Information Wrangler. You can even faucet into the ability of automated machine studying (AutoML) and mechanically construct customized ML fashions for regression, classification, time sequence forecasting, pure language processing, and laptop imaginative and prescient, supported by Amazon SageMaker Autopilot.
On this instance, we use the service to categorise whether or not a affected person is more likely to be admitted to a hospital over the subsequent 30 days primarily based on the mixed dataset.
The method is proven within the following diagram.
Determine 2: Structure overview of ML growth
The answer consists of the next steps:
- An AWS Glue crawler crawls the info saved in S3 bucket. The Information Catalog exposes this knowledge discovered within the folder construction as tables.
- Athena supplies a question engine to permit individuals and functions to work together with the tables utilizing SQL.
- SageMaker Canvas makes use of Athena as an information supply to permit the info saved within the tables for use for ML mannequin growth.
Resolution overview
SageMaker Canvas permits you to construct a customized ML mannequin utilizing a dataset that you’ve imported. Within the following sections, we show tips on how to create, discover, and remodel a pattern dataset, use pure language to question the info, test for knowledge high quality, create extra steps for the info move, and construct, check, and deploy an ML mannequin.
Conditions
Earlier than continuing, confer with Getting began with utilizing Amazon SageMaker Canvas to be sure to have the required conditions in place. Particularly, validate that the AWS Identification and Entry Administration (IAM) position your SageMaker area is utilizing has a coverage hooked up with adequate permissions to entry Athena, AWS Glue, and Amazon S3 sources.
Create the dataset
SageMaker Canvas helps Athena as an information supply. Information from wearables and affected person healthcare knowledge residing throughout your S3 bucket is accessed utilizing Athena and the Information Catalog. This enables this tabular knowledge to be straight imported into SageMaker Canvas to start out your ML growth.
To create your dataset, full the next steps:
- On the SageMaker Canvas console, select Information Wrangler within the navigation pane.
- On the Import and put together dropdown menu, select Tabular because the dataset kind to indicate that the imported knowledge consists of rows and columns.
Determine 3: Importing tabular knowledge utilizing SageMaker Information Wrangler
- For Choose an information supply, select Athena.
On this web page, you will note your Information Catalog database and tables listed, named patient_data and splunk_ops_data.
- Be part of (inside be part of) the tables collectively utilizing the
user_idandidto create one overarching dataset that can be utilized throughout ML mannequin growth. - Below Import settings, enter
unprocessed_datafor Dataset identify. - Select Import to finish the method.
Determine 4: Becoming a member of knowledge utilizing SageMaker Information Wrangler
The mixed dataset is now obtainable to discover and remodel utilizing SageMaker Information Wrangler.
Discover and remodel the dataset
SageMaker Information Wrangler allows you to remodel and analyze the supply dataset by way of knowledge flows whereas nonetheless sustaining a no-code method.
The earlier step mechanically created an information move within the SageMaker Canvas console which now we have renamed to data_prep_data_flow.move. Moreover, two steps are mechanically generated, as listed within the following desk.
|
Step |
Identify |
Description |
|
1 |
Athena Supply |
Units the |
|
2 |
Information varieties |
Units column forms of |
Earlier than we create extra remodel steps, let’s discover two SageMaker Canvas options that may assist us deal with the suitable actions.
Use pure language to question the info
SageMaker Information Wrangler additionally supplies generative AI capabilities referred to as Chat for knowledge prep powered by a big language mannequin (LLM). This function permits you to discover your knowledge utilizing pure language with none background in ML or SQL. Moreover, any contextualized suggestions returned by the generative AI mannequin could be launched straight again into the info move with out writing any code.
On this part, we current some instance prompts to show this in motion. These examples have been chosen as an instance the artwork of the attainable. We advocate that you simply experiment with completely different prompts to achieve one of the best outcomes to your explicit use circumstances.
Instance 1: Establish Splunk default fields
On this first instance, we wish to know whether or not there are Splunk default fields that we may doubtlessly exclude from our dataset previous to ML mannequin growth.
- In SageMaker Information Wrangler, open your knowledge move.
- Select Step 2 Information varieties, and select Chat for knowledge prep.
- Within the Chat for knowledge prep pane, you possibly can enter prompts in pure language to discover and remodel the info. For instance:
On this instance, the generative AI LLM has appropriately recognized Splunk default fields that might be safely dropped from the dataset.
- Select Add to steps so as to add this recognized transformation to the info move.
Determine 5: Utilizing SageMaker Information Wrangler’s chat for knowledge prep to establish Splunk’s default fields
Instance 2: Establish extra columns that might be dropped
We now wish to establish any additional columns that might be dropped with out being too particular about what we’re searching for. We wish the LLM to make the ideas primarily based on the info, and supply us with the rationale. For instance:
Along with the Splunk default fields recognized earlier, the generative AI mannequin is now proposing the removing of columns comparable to timestamp, punct, id, index, and linecount that don’t seem like conducive to ML mannequin growth.
Determine 6: Utilizing SageMaker Information Wrangler’s chat for knowledge prep to establish extra fields that may be dropped
Instance 3: Calculate common age column in dataset
You can even use the generative AI mannequin to carry out Text2SQL duties in which you’ll merely ask questions of the info utilizing pure language. That is helpful if you wish to validate the content material of the dataset.
On this instance, we wish to know what the common affected person age worth is inside the dataset:
By increasing View code, you possibly can see what SQL statements the LLM has constructed utilizing its Text2SQL capabilities. This provides you full visibility into how the outcomes are being returned.
Determine 7: Utilizing SageMaker Information Wrangler’s chat for knowledge prep to run SQL statements
Examine for knowledge high quality
SageMaker Canvas additionally supplies exploratory knowledge evaluation (EDA) capabilities that can help you achieve deeper insights into the info previous to the ML mannequin construct step. With EDA, you possibly can generate visualizations and analyses to validate whether or not you’ve gotten the suitable knowledge, and whether or not your ML mannequin construct is more likely to yield outcomes which can be aligned to your group’s expectations.
Instance 1: Create a Information High quality and Insights Report
Full the next steps to create a Information High quality and Insights Report:
- Whereas within the knowledge move step, select the Analyses tab.
- For Evaluation kind, select Information High quality and Insights Report.
- For Goal column, select
admitted. - For Downside kind, select Classification.
This performs an evaluation of the info that you’ve and supplies info such because the variety of lacking values and outliers.
Determine 8: Working SageMaker Information Wrangler’s knowledge high quality and insights report
Discuss with Get Insights On Information and Information High quality for particulars on tips on how to interpret the outcomes of this report.
Instance 2: Create a Fast Mannequin
On this second instance, select Fast Mannequin for Evaluation kind and for Goal column, select admitted. The Fast Mannequin estimates the anticipated predicted high quality of the mannequin.
By working the evaluation, the estimated F1 score (a measure of predictive efficiency) of the mannequin and have significance scores are displayed.
Determine 9: Working SageMaker Information Wrangler’s fast mannequin function to evaluate the potential accuracy of the mannequin
SageMaker Canvas helps many different evaluation varieties. By reviewing these analyses upfront of your ML mannequin construct, you possibly can proceed to engineer the info and options to achieve adequate confidence that the ML mannequin will meet what you are promoting goals.
Create extra steps within the knowledge move
On this instance, now we have determined to replace our data_prep_data_flow.move knowledge move to implement extra transformations. The next desk summarizes these steps.
|
Step |
Remodel |
Description |
|
3 |
Chat for knowledge prep |
Removes Splunk default fields recognized. |
|
4 |
Chat for knowledge prep |
Removes extra fields recognized as being unhelpful to ML mannequin growth. |
|
5 |
Group by |
Teams collectively the rows by user_id and calculates a mean |
|
6 |
Drop column (handle columns) |
Drops remaining columns which can be pointless for our ML growth, comparable to columns with excessive cardinality (for instance, |
|
7 |
Parse column as kind |
Converts numerical worth varieties, for instance from |
|
8 |
Parse column as kind |
Converts extra columns that have to be parsed (every column requires a separate step). |
|
9 |
Drop duplicates (handle rows) |
Drops duplicate rows to keep away from overfitting. |
To create a brand new remodel, view the info move, then select Add remodel on the final step.
Determine 10: Utilizing SageMaker Information Wrangler so as to add a remodel to an information move
Select Add remodel, and proceed to decide on a remodel kind and its configuration.
Determine 11: Utilizing SageMaker Information Wrangler so as to add a remodel to an information move
The next screenshot reveals our newly up to date end-to-end knowledge move that includes a number of steps. On this instance, we ran the analyses on the finish of the info move.
Determine 12: Displaying the end-to-end SageMaker Canvas Information Wrangler knowledge move
If you wish to incorporate this knowledge move right into a productionized ML workflow, SageMaker Canvas can create a Jupyter pocket book that exports your knowledge move to Amazon SageMaker Pipelines.
Develop the ML mannequin
To get began with ML mannequin growth, full the next steps:
- Select Create mannequin straight from the final step of the info move.
Determine 13: Making a mannequin from the SageMaker Information Wrangler knowledge move
- For Dataset identify, enter a reputation to your reworked dataset (for instance,
processed_data). - Select Export.
Determine 14: Naming the exported dataset for use by the mannequin in SageMaker Information Wrangler
This step will mechanically create a brand new dataset.
- After the dataset has been created efficiently, select Create mannequin to start the ML mannequin creation.
Determine 15: Creating the mannequin in SageMaker Information Wrangler
- For Mannequin identify, enter a reputation for the mannequin (for instance,
my_healthcare_model). - For Downside kind, choose Predictive evaluation.
- Select Create.
Determine 16: Naming the mannequin in SageMaker Canvas and choosing the predictive evaluation kind
You at the moment are able to progress by way of the Construct, Analyze, Predict, and Deploy levels to develop and operationalize the ML mannequin utilizing SageMaker Canvas.
- On the Construct tab, for Goal column, select the column you wish to predict (
admitted). - Select Fast construct to construct the mannequin.
The Fast construct choice has a shorter construct time, however the Commonplace construct choice usually enjoys larger accuracy.
Determine 17: Choosing the goal column to foretell in SageMaker Canvas
After a couple of minutes, on the Analyze tab, it is possible for you to to view the accuracy of the mannequin, together with column influence, scoring, and different superior metrics. For instance, we are able to see {that a} function from the wearables knowledge captured in Splunk—average_num_days_device_active—has a robust influence on whether or not the affected person is more likely to be admitted or not, together with their age. As such, the health-tech firm could proactively attain out to aged sufferers who are inclined to hold their wearables off to attenuate the danger of their hospitalization.
Determine 18: Displaying the outcomes from the mannequin fast construct in SageMaker Canvas
Whenever you’re pleased with the outcomes from the Fast construct, repeat the method with a Commonplace construct to be sure to have an ML mannequin with larger accuracy that may be deployed.
Take a look at the ML mannequin
Our ML mannequin has now been constructed. Should you’re happy with its accuracy, you can also make predictions utilizing this ML mannequin utilizing internet new knowledge on the Predict tab. Predictions could be carried out both utilizing batch (listing of sufferers) or for a single entry (one affected person).
Experiment with completely different values and select Replace prediction. The ML mannequin will reply with a prediction for the brand new values that you’ve entered.
On this instance, the ML mannequin has recognized a 64.5% chance that this explicit affected person will likely be admitted to hospital within the subsequent 30 days. The health-tech firm will probably wish to prioritize the care of this affected person.
Determine 19: Displaying the outcomes from a single prediction utilizing the mannequin in SageMaker Canvas
Deploy the ML mannequin
It’s now attainable for the health-tech firm to construct functions that may use this ML mannequin to make predictions. ML fashions developed in SageMaker Canvas could be operationalized utilizing a broader set of SageMaker providers. For instance:
To deploy the ML mannequin, full the next steps:
- On the Deploy tab, select Create Deployment.
- Specify Deployment identify, Occasion kind, and Occasion depend.
- Select Deploy to make the ML mannequin obtainable as a SageMaker endpoint.
On this instance, we diminished the occasion kind to ml.m5.4xlarge and occasion depend to 1 earlier than deployment.
Determine 20: Deploying the utilizing SageMaker Canvas
At any time, you possibly can straight check the endpoint from SageMaker Canvas on the Take a look at deployment tab of the deployed endpoint listed below Operations on the SageMaker Canvas console.
Discuss with the Amazon SageMaker Canvas Developer Information for detailed steps to take your ML mannequin growth by way of its full growth lifecycle and construct functions that may devour the ML mannequin to make predictions.
Clear up
Discuss with the directions within the README file to wash up the sources provisioned for the AWS knowledge engineering pipeline answer.
SageMaker Canvas payments you all through the session, and we advocate logging out of SageMaker Canvas when you’re not utilizing it. Discuss with Logging out of Amazon SageMaker Canvas for extra particulars. Moreover, if you happen to deployed a SageMaker endpoint, be sure to have deleted it.
Conclusion
This publish explored a no-code method involving SageMaker Canvas that may drive actionable insights from knowledge saved throughout each Splunk and AWS platforms utilizing AI/ML methods. We additionally demonstrated how you should use the generative AI capabilities of SageMaker Canvas to hurry up your knowledge exploration and construct ML fashions which can be aligned to what you are promoting’s expectations.
Study extra about AI on Splunk and ML on AWS.
In regards to the Authors

Alan Peaty is a Senior Companion Options Architect, serving to World Techniques Integrators (GSIs), World Impartial Software program Distributors (GISVs), and their clients undertake AWS providers. Previous to becoming a member of AWS, Alan labored as an architect at programs integrators comparable to IBM, Capita, and CGI. Outdoors of labor, Alan is a eager runner who likes to hit the muddy trails of the English countryside, and is an IoT fanatic.

Brett Roberts is the World Companion Technical Supervisor for AWS at Splunk, main the technical technique to assist clients higher safe and monitor their vital AWS environments and functions utilizing Splunk. Brett was a member of the Splunk Belief and holds a number of Splunk and AWS certifications. Moreover, he co-hosts a group podcast and weblog referred to as Large Information Beard, exploring developments and applied sciences within the analytics and AI house.

Arnaud Lauer is a Principal Companion Options Architect within the Public Sector staff at AWS. He permits companions and clients to grasp tips on how to greatest use AWS applied sciences to translate enterprise wants into options. He brings greater than 18 years of expertise in delivering and architecting digital transformation initiatives throughout a variety of industries, together with public sector, vitality, and shopper items.

