Friday, May 8, 2026
banner
Top Selling Multipurpose WP Theme

Amazon SageMaker Studio is the newest web-based expertise for operating end-to-end machine studying (ML) workflows. SageMaker Studio provides a set of built-in growth environments (IDEs), which incorporates JupyterLab, Code Editor, in addition to RStudio. Knowledge scientists and ML engineers can spin up SageMaker Studio personal and shared areas, that are used to handle the storage and useful resource wants of the JupyterLab and Code Editor functions, allow stopping the functions when not in use to avoid wasting on compute prices, and resume the work from the place they stopped.

The storage sources for SageMaker Studio areas are Amazon Elastic Block Retailer (Amazon EBS) volumes, which provide low-latency entry to person knowledge like notebooks, pattern knowledge, or Python/Conda digital environments. Nonetheless, there are a number of situations the place utilizing a distributed file system shared throughout personal JupyterLab and Code Editor areas is handy, which is enabled by configuring an Amazon Elastic File System (Amazon EFS) file system in SageMaker Studio. Amazon EFS offers a scalable totally managed elastic NFS file system for AWS compute cases.

Amazon SageMaker helps routinely mounting a folder in an EFS quantity for every person in a website. Utilizing this folder, customers can share knowledge between their very own personal areas. Nonetheless, customers can’t share knowledge with different customers within the area; they solely have entry to their very own folder user-default-efs within the $HOME listing of the SageMaker Studio software.

On this publish, we discover three distinct situations that show the flexibility of integrating customized Amazon EFS with SageMaker Studio.

For additional data on configuring Amazon EFS in SageMaker Studio, consult with Attaching a customized file system to a website or person profile.

Resolution overview

Within the first state of affairs, an AWS infrastructure admin desires to arrange an EFS file system that may be shared throughout the personal areas of a given person profile in SageMaker Studio. Which means every person inside the area can have their very own personal area on the EFS file system, permitting them to retailer and entry their very own knowledge and information. The automation described on this publish will allow new staff members becoming a member of the info science staff can rapidly arrange their personal area on the EFS file system and entry the mandatory sources to begin contributing to the continued venture.

The next diagram illustrates this structure.

This state of affairs provides the next advantages:

  • Particular person knowledge storage and evaluation – Customers can retailer their private datasets, fashions, and different information of their personal areas, permitting them to work on their very own tasks independently. Segregation is made by their person profile.
  • Centralized knowledge administration – The administrator can handle the EFS file system centrally, sustaining knowledge safety, backup, and direct entry for all customers. By organising an EFS file system with a non-public area, customers can effortlessly monitor and preserve their work.
  • Cross-instance file sharing – Customers can entry their information from a number of SageMaker Studio areas, as a result of the EFS file system offers a persistent storage resolution.

The second state of affairs is said to the creation of a single EFS listing that’s shared throughout all of the areas of a given SageMaker Studio area. Which means all customers inside the area can entry and use the identical shared listing on the EFS file system, permitting for higher collaboration and centralized knowledge administration (for instance, to share widespread artifacts). It is a extra generic use case, as a result of there isn’t a particular segregated folder for every person profile.

The next diagram illustrates this structure.

Second scenario architecture

This state of affairs provides the next advantages:

  • Shared venture directories – Suppose the info science staff is engaged on a large-scale venture that requires collaboration amongst a number of staff members. By organising a shared EFS listing at venture stage, the staff can collaborate on the identical tasks by accessing and dealing on information within the shared listing. The information science staff can, for instance, use the shared EFS listing to retailer their Jupyter notebooks, evaluation scripts, and different project-related information.
  • Simplified file administration – Customers don’t must handle their very own personal file storage, as a result of they’ll depend on the shared listing for his or her file-related wants.
  • Improved knowledge governance and safety – The shared EFS listing, being centrally managed by the AWS infrastructure admin, can present improved knowledge governance and safety. The admin can implement entry controls and different knowledge administration insurance policies to take care of the integrity and safety of the shared sources.

The third state of affairs explores the configuration of an EFS file system that may be shared throughout a number of SageMaker Studio domains inside the similar VPC. This permits customers from completely different domains to entry and work with the identical set of information and knowledge, enabling cross-domain collaboration and centralized knowledge administration.

The next diagram illustrates this structure.

Third scenario architecture

This state of affairs provides the next advantages:

  • Enterprise-level knowledge science collaboration – Think about a big group with a number of knowledge science groups engaged on numerous tasks throughout completely different departments or enterprise models. By organising a shared EFS file system accessible throughout the group’s SageMaker Studio domains, these groups can collaborate on cross-functional tasks, share artifacts, and use a centralized knowledge repository for his or her work.
  • Shared infrastructure and sources – The EFS file system can be utilized as a shared useful resource throughout a number of SageMaker Studio domains, selling effectivity and cost-effectiveness.
  • Scalable knowledge storage – Because the variety of customers or domains will increase, the EFS file system routinely scales to accommodate the rising storage and entry necessities.
  • Knowledge governance – The shared EFS file system, being managed centrally, will be topic to stricter knowledge governance insurance policies, entry controls, and compliance necessities. This will help the group meet regulatory and safety requirements whereas nonetheless enabling cross-domain collaboration and knowledge sharing.

Stipulations

This publish offers an AWS CloudFormation template to deploy the primary sources for the answer. Along with this, the answer expects that the AWS account through which the template is deployed already has the next configuration and sources:

Check with Attaching a customized file system to a website or person profile for added stipulations.

Configure an EFS listing shared throughout personal areas of a given person profile

On this state of affairs, an administrator desires to provision an EFS file system for all customers of a SageMaker Studio area, creating a non-public file system listing for every person. We will distinguish two use circumstances:

  • Create new SageMaker Studio person profiles – A brand new staff member joins a preexisting SageMaker Studio area and desires to connect a customized EFS file system to the JupyterLab or Code Editor areas
  • Use preexisting SageMaker Studio person profiles – A staff member is already engaged on a selected SageMaker Studio area and desires to connect a customized EFS file system to the JupyterLab or Code Editor areas

The answer supplied on this publish focuses on the primary use case. We focus on how one can adapt the answer for preexisting SageMaker Studio area person profiles later on this publish.

The next diagram illustrates the high-level structure of the answer.

AWS Architecture

On this resolution, we use CloudTrail, Amazon EventBridge, and Lambda to routinely create a non-public EFS listing when a brand new SageMaker Studio person profile is created. The high-level steps to arrange this structure are as follows:

  1. Create an EventBridge rule that invokes the Lambda perform when a brand new SageMaker person profile is created and logged in CloudTrail.
  2. Create an EFS file system with an entry level for the Lambda perform and with a mount goal in each Availability Zone that the SageMaker Studio area is situated.
  3. Use a Lambda perform to create a non-public EFS listing with the required POSIX permissions for the profile. The perform will even replace the profile with the brand new file system configuration.

Deploy the answer utilizing AWS CloudFormation

To make use of the answer, you may deploy the infrastructure utilizing the next CloudFormation template. This template deploys three primary sources in your account: Amazon EFS sources (file system, entry factors, mount targets), an EventBridge rule, and a Lambda perform.

Check with Create a stack from the CloudFormation console for added data. The enter parameters for this template are:

  • SageMakerDomainId – The SageMaker Studio area ID that shall be related to the EFS file system.
  • SageMakerStudioVpc – The VPC related to the SageMaker Studio area.
  • SageMakerStudioSubnetId – One or a number of subnets related to the SageMaker Studio area. The template deploys its sources in these subnets.
  • SageMakerStudioSecurityGroupId – The safety group related to the SageMaker Studio area. The template configures the Lambda perform with this safety group.

Amazon EFS sources

After you deploy the template, navigate to the Amazon EFS console and ensure that the EFS file system has been created. The file system has a mount goal in each Availability Zone that your SageMaker area connects to.

Word that every mount goal makes use of the EC2 safety group that SageMaker created in your AWS account once you first created the area, which permits NFS visitors at port 2049. The supplied template routinely retrieves this safety group when it’s first deployed, utilizing a Lambda backed customized useful resource.

You too can observe that the file system has an EFS entry level. This entry level grants root entry on the file system for the Lambda perform that may create the directories for the SageMaker Studio person profiles.

EventBridge rule

The second primary useful resource is an EventBridge rule invoked when a brand new SageMaker Studio person profile is created. Its goal is the Lambda perform that creates the folder within the EFS file system and updates the profile that has been simply created. The enter of the Lambda perform is the occasion matched, the place you may get the SageMaker Studio area ID and the SageMaker person profile identify.

Lambda perform

Lastly, the template creates a Lambda perform that creates a listing within the EFS file system with the required POSIX permissions for the person profile and updates the person profile with the brand new file system configuration.

At a POSIX permissions stage, you may management which customers can entry the file system and which information or knowledge they’ll entry. The POSIX person and group ID for SageMaker apps are:

  • UID – The POSIX person ID. The default is 200001. A sound vary is a minimal worth of 10000 and most worth of 4000000.
  • GID – The POSIX group ID. The default is 1001. A sound vary is a minimal worth of 1001 and most worth of 4000000.

The Lambda perform is in the identical VPC because the EFS file system and it has connected the file system and entry level beforehand created.

Lambda function configuration

Adapt the answer for preexisting SageMaker Studio area person profiles

We will reuse the earlier resolution for situations through which the area already has person profiles created. For that, you may create an extra Lambda perform in Python that lists all of the person profiles for the given SageMaker Studio area and creates a devoted EFS listing for every person profile.

The Lambda perform needs to be in the identical VPC because the EFS file system and it has connected the file system and entry level beforehand created. You have to add the efs_id and domain_id values as surroundings variables for the perform.

You possibly can embody the next code as a part of this new Lambda perform and run it manually:

import json
import subprocess
import boto3
import os

sm_client = boto3.shopper('sagemaker')

def lambda_handler(occasion, context):
    
    # Get EFS and Area ID
    file_system=os.environ['efs_id']
    domain_id=os.environ['domain_id']    
    
    
    # Get Area person profiles
    list_user_profiles_response = sm_client.list_user_profiles(
        DomainIdEquals=domain_id
    )
    domain_users = list_user_profiles_response["UserProfiles"]
    
    # Create directories for every person
    for person in domain_users:

        user_profile_name = person["UserProfileName"]

        # Permissions
        repository=f'/mnt/efs/{user_profile_name}'
        subprocess.name(['mkdir', repository])
        subprocess.name(['chown', '200001:1001', repository])
        
        # Replace SageMaker person
        response = sm_client.update_user_profile(
            DomainId=domain_id,
            UserProfileName=user_profile_name,
            UserSettings={
                'CustomFileSystemConfigs': [
                    {
                        'EFSFileSystemConfig': {
                            'FileSystemId': file_system,
                            'FileSystemPath': f'/{user_profile_name}'
                        }
                    }
                ]
            }
        )

Configure an EFS listing shared throughout all areas of a given area

On this state of affairs, an administrator desires to provision an EFS file system for all customers of a SageMaker Studio area, utilizing the identical file system listing for all of the customers.

To realize this, along with the stipulations described earlier on this publish, you could full the next steps.

Create the EFS file system

The file system must be in the identical VPC because the SageMaker Studio area. Check with Creating EFS file techniques for added data.

Add mount targets to the EFS file system

Earlier than SageMaker Studio can entry the brand new EFS file system, the file system should have a mount goal in every of the subnets related to the area. For extra details about assigning mount targets to subnets, see Managing mount targets. You may get the subnets related to the area on the SageMaker Studio console beneath Community. You have to create a mount goal for every subnet.

Networking used

Moreover, for every mount goal, it’s essential to add the safety group that SageMaker created in your AWS account once you created the SageMaker Studio area. The safety group identify has the format security-group-for-inbound-nfs-domain-id.

The next screenshot exhibits an instance of an EFS file system with two mount targets for a SageMaker Studio area related to 2 subnets. Word the safety group related to each mount targets.

EFS file system

Create an EFS entry level

The Lambda perform accesses the EFS file system as root utilizing this entry level. See Creating entry factors for added data.

EFS access point

Create a brand new Lambda perform

Outline a brand new Lambda perform with the identify LambdaManageEFSUsers. This perform updates the default area settings of the SageMaker Studio area, configuring the file system settings to make use of a selected EFS file system shared repository path. This configuration is routinely utilized to all areas inside the area.

The Lambda perform is in the identical VPC because the EFS file system and it has connected the file system and entry level beforehand created. Moreover, you could add efs_id and domain_id as surroundings variables for the perform.

At a POSIX permissions stage, you may management which customers can entry the file system and which information or knowledge they’ll entry. The POSIX person and group ID for SageMaker apps are:

  • UID – The POSIX person ID. The default is 200001.
  • GID – The POSIX group ID. The default is 1001.

The perform updates the default area settings of the SageMaker Studio area, configuring the EFS file system for use by all customers. See the next code:

import json
import subprocess
import boto3
import os
import logging

logger = logging.getLogger()
logger.setLevel(logging.INFO)
sm_client = boto3.shopper('sagemaker')

def lambda_handler(occasion, context):
    
    # Surroundings variables
    file_system=os.environ['efs_id']
    domain_id=os.environ['domain_id']
    
    # EFS listing identify
    repository_name="shared_repository"
    repository=f'/mnt/efs/{repository_name}'
            
    # Add permissions to the brand new listing
    strive:
        subprocess.name(['mkdir -p', repository])
        subprocess.name(['chown', '200001:1001', repository])
    besides:
        print("Repository already created")
    
    # Replace Sagemaker area to allow entry to the brand new listing
    response = sm_client.update_domain(
        DomainId=domain_id,
        DefaultUserSettings={
            'CustomFileSystemConfigs': [
                {
                    'EFSFileSystemConfig': {
                        'FileSystemId': file_system,
                        'FileSystemPath': f'/{repository_name}'
                    }
                }
            ]
        }
    )
    logger.data(f"Up to date Studio Area {domain_id} and EFS {file_system}")
    return {
        'statusCode': 200,
        'physique': json.dumps(f"Created dir and modified permissions for Studio Area {domain_id}")
    }

The execution position of the Lambda perform must have permissions to replace the SageMaker Studio area:

{ 
"Model": "2012-10-17",
    "Assertion": [ 
        { 
        "Effect": "Allow", 
        "Action": [
            "sagemaker:UpdateDomain"
        ],
        "Useful resource": "*" 
        } 
    ]
}

Configure an EFS listing shared throughout a number of domains beneath the identical VPC

On this state of affairs, an administrator desires to provision an EFS file system for all customers of a number of SageMaker Studio domains, utilizing the identical file system listing for all of the customers. The concept on this case is to assign the identical EFS file system to all customers of all domains which can be inside the similar VPC. To check the answer, the account ought to ideally have two SageMaker Studio domains contained in the VPC and subnet.

Create the EFS file system, add mount targets, and create an entry level

Full the steps within the earlier part to arrange your file system, mount targets, and entry level.

Create a brand new Lambda perform

Outline a Lambda perform known as LambdaManageEFSUsers. This perform is liable for automating the configuration of SageMaker Studio domains to make use of a shared EFS file system inside a selected VPC. This may be helpful for organizations that wish to present a centralized storage resolution for his or her ML tasks throughout a number of SageMaker Studio domains. See the next code:

import json
import subprocess
import boto3
import os
import sys

import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)

sm_client = boto3.shopper('sagemaker')

def lambda_handler(occasion, context):
    
    #Surroundings variables
    event_domain_id =occasion["domain_id"]
    file_system=os.environ['efs_id']
    env_vpc_id =os.environ['vpc_id']
    
    #Occasion parameters 
    repository_name="shared_repository"
    repository=f'/mnt/efs/{repository_name}'
    domains =[]    

    # Checklist all SageMaker domains within the specified VPC
    response = sm_client.list_domains()
    all_domains = response['Domains']
    for area in all_domains:
        domain_id =area["DomainId"]
        knowledge =sm_client.describe_domain(DomainId=domain_id)
        domain_vpc_id = knowledge['VpcId']
        if domain_vpc_id ==env_vpc_id:
            domains.append(domain_id)
    
    # Create listing and add the permission
    strive:
        subprocess.name(['mkdir -p', repository])
        subprocess.name(['chown', '200001:1001', repository])
    besides:
        print("Repository already created")
    
    #Replace Sagemaker area
    if len(domains)>0:
        for domain_id in domains: 
            response = sm_client.update_domain(
                DomainId=event_domain_id,
                DefaultUserSettings={
                    'CustomFileSystemConfigs': [
                        {
                            'EFSFileSystemConfig': {
                                'FileSystemId': file_system,
                                'FileSystemPath': f'/{repository_name}'
                            }
                        }
                    ]
                }
            )
   
        logger.data(f"Up to date Studio for Domains {domains} and EFS {file_system}")
        return {
                'statusCode': 200,
                'physique': json.dumps(f"Created dir and modified permissions for Domains {domains}")
            }
    
    else:
        return {
            'statusCode': 400,
            'physique': json.dumps(f"VPC id of all of the domains {domain_vpc} is completely different than the vpc id configured {env_vpc_id}")
        }

The execution position of the Lambda perform must have permissions to explain and replace the SageMaker Studio area:

{ 
"Model": "2012-10-17",
    "Assertion": [ 
        { 
        "Effect": "Allow", 
        "Action": [
            "sagemaker:DescribeDomain",
            "sagemaker:UpdateDomain"
        ],
        "Useful resource": "*" 
        } 
    ]
}

Clear up

To scrub up the answer you carried out and keep away from additional prices, delete the CloudFormation template you deployed in your AWS account. Whenever you delete the template, you additionally delete the EFS file system and its storage. For added data, consult with Delete a stack from the CloudFormation console.

Conclusion

On this publish, we have now explored three situations demonstrating the flexibility of integrating Amazon EFS with SageMaker Studio. These situations spotlight how Amazon EFS can present a scalable, safe, and collaborative knowledge storage resolution for knowledge science groups.

The primary state of affairs centered on configuring an EFS listing with personal areas for particular person person profiles, permitting customers to retailer and entry their very own knowledge whereas the administrator manages the EFS file system centrally.

The second state of affairs showcased a shared EFS listing throughout all areas inside a SageMaker Studio area, enabling higher collaboration and centralized knowledge administration.

The third state of affairs explored an EFS file system shared throughout a number of SageMaker Studio domains, empowering enterprise-level knowledge science collaboration and selling environment friendly use of shared sources.

By implementing these Amazon EFS integration situations, organizations can unlock the total potential of their knowledge science groups, enhance knowledge governance, and improve the general effectivity of their data-driven initiatives. The combination of Amazon EFS with SageMaker Studio offers a flexible platform for knowledge science groups to thrive within the evolving panorama of ML and AI.


In regards to the Authors

Irene Arroyo Delgado is an AI/ML and GenAI Specialist Options Architect at AWS. She focuses on bringing out the potential of generative AI for every use case and productionizing ML workloads, to realize prospects’ desired enterprise outcomes by automating end-to-end ML lifecycles. In her free time, Irene enjoys touring and climbing.

Itziar Molina Fernandez is an AI/ML Marketing consultant within the AWS Skilled Providers staff. In her position, she works with prospects constructing large-scale machine studying platforms and generative AI use circumstances on AWS. In her free time, she enjoys exploring new locations.

Matteo Amadei is a Knowledge Scientist Marketing consultant within the AWS Skilled Providers staff. He makes use of his experience in synthetic intelligence and superior analytics to extract helpful insights and drive significant enterprise outcomes for purchasers. He has labored on a variety of tasks spanning NLP, pc imaginative and prescient, and generative AI. He additionally has expertise with constructing end-to-end MLOps pipelines to productionize analytical fashions. In his free time, Matteo enjoys touring and studying.

Giuseppe Angelo Porcelli is a Principal Machine Studying Specialist Options Architect for Amazon Net Providers. With a number of years of software program engineering and an ML background, he works with prospects of any dimension to know their enterprise and technical wants and design AI and ML options that make one of the best use of the AWS Cloud and the Amazon Machine Studying stack. He has labored on tasks in numerous domains, together with MLOps, pc imaginative and prescient, and NLP, involving a broad set of AWS companies. In his free time, Giuseppe enjoys enjoying soccer.

banner
Top Selling Multipurpose WP Theme

Converter

Top Selling Multipurpose WP Theme

Newsletter

Subscribe my Newsletter for new blog posts, tips & new photos. Let's stay updated!

Related Posts

banner
Top Selling Multipurpose WP Theme

Leave a Comment

banner
Top Selling Multipurpose WP Theme

Latest

Best selling

22000,00 $
16000,00 $
6500,00 $
900000,00 $

Top rated

6500,00 $
22000,00 $
900000,00 $

Products

Knowledge Unleashed
Knowledge Unleashed

Welcome to Ivugangingo!

At Ivugangingo, we're passionate about delivering insightful content that empowers and informs our readers across a spectrum of crucial topics. Whether you're delving into the world of insurance, navigating the complexities of cryptocurrency, or seeking wellness tips in health and fitness, we've got you covered.