This publish is co-authored by Santosh Waddi and Nanda Kishore Thatikonda from BigBasket.
big basket is India’s largest on-line meals and grocery retailer. It operates on a number of e-commerce channels together with fast commerce, slot supply, and day by day subscriptions. You can too buy at bodily shops and merchandising machines. We provide a wide array of 1,000 manufacturers, over 50,000 merchandise, and function in over 500 cities and cities. BigBasket serves his over 10 million clients.
This publish describes how BigBasket used Amazon SageMaker to coach a pc imaginative and prescient mannequin for fast-moving shopper items (FMCG) product identification. This diminished coaching time by roughly 50% and diminished prices by 20%.
Buyer challenges
At the moment, most supermarkets and brick-and-mortar shops in India supply handbook checkout on the cashier counter. There are two issues with this.
- As you scale, you will want further personnel, weight seals, and repeated coaching to your retailer operations staff.
- Most shops have completely different checkout and weigh-in counters, creating friction within the buyer journey. Prospects typically lose their weight stickers and return to the weigh counter to retrieve them once more earlier than continuing with the checkout course of.
Self-checkout course of
BigBasket has launched an AI-powered checkout system in its bodily shops that makes use of cameras to uniquely differentiate merchandise. The next diagram supplies an outline of the checkout course of.
The BigBasket staff was working open-source in-house ML algorithms for pc imaginative and prescient object recognition to energy AI-enabled checkouts. Fresio (bodily) retailer. We have been dealing with the next challenges in operationalizing our present setup:
- As a result of new merchandise are frequently launched, the pc imaginative and prescient mannequin wanted to repeatedly incorporate new product info. The system needed to deal with a big catalog of over 12,000 stock-keeping items (SKUs), and new SKUs have been frequently being added at a price of over 600 every month.
- New fashions have been created each month utilizing the most recent coaching information to accommodate new merchandise. Steadily coaching fashions to adapt to new merchandise was costly and time consuming.
- BigBasket needed to cut back coaching cycle instances to cut back time to market. With the rise in SKUs, mannequin time has elevated linearly, impacting time to market as coaching is far more frequent and takes longer.
- Information augmentation for mannequin coaching and manually managing the entire end-to-end coaching cycle added vital overhead. BigBasket was working this on a third-party platform, which got here at a big price.
Resolution overview
To deal with these challenges, we advisable that BigBasket rebuild its present FMCG product discovery and classification resolution utilizing SageMaker. BigBasket piloted SageMaker to guage efficiency, price, and value metrics earlier than transferring to full-scale operations.
Their purpose was to fine-tune present pc imaginative and prescient machine studying (ML) fashions for SKU detection. We used a convolutional neural community (CNN) structure. Resnet 152 For picture classification. A big dataset of roughly 300 photographs per SKU was estimated for mannequin coaching, leading to a complete of over 4 million coaching photographs. For sure His SKUs, we’ve augmented the info to cowl a wider vary of environmental situations.
The next diagram reveals the answer structure.
The whole course of may be summarized into the next high-level steps.
- Carry out information cleaning, annotation, and enhancement.
- Retailer your information in an Amazon Easy Storage Service (Amazon S3) bucket.
- Use SageMaker and Amazon FSx for Luster for environment friendly information enlargement.
- Cut up your information into coaching, validation, and check units. We used FSx for Luster and Amazon Relational Database Service (Amazon RDS) for quick parallel information entry.
- Use customized pie torch Docker container with different open supply libraries.
- To speed up distributed coaching, use SageMaker Distributed Information Parallelism (SMDDP).
- Log mannequin coaching metrics.
- Copy the ultimate mannequin to your S3 bucket.
BigBasket can now use SageMaker notebooks to coach ML fashions and simply port present open supply PyTorch and different open supply dependencies to SageMaker PyTorch containers to seamlessly run pipelines. Ta. This was the primary profit that the BigBasket staff realized. There have been only a few adjustments required to the code to make it appropriate with working within the SageMaker atmosphere.
The mannequin community consists of a ResNet 152 structure adopted by absolutely related layers. We froze the low-level characteristic layer and retained the weights obtained via switch studying from the ImageNet mannequin. The full mannequin parameters are 66 million, consisting of 23 million trainable parameters. This switch learning-based strategy used fewer photographs throughout coaching, permitting for sooner convergence and decreasing whole coaching time.
Constructing and coaching a mannequin inside Amazon SageMaker Studio offered an built-in improvement atmosphere (IDE) with every thing it’s worthwhile to put together, construct, prepare, and tune your mannequin. Augmenting your coaching information utilizing methods corresponding to cropping, rotating, and flipping photographs can enhance your mannequin’s coaching information and mannequin accuracy.
Mannequin coaching is now 50% sooner by utilizing the SMDDP library, which comprises optimized communication algorithms designed particularly for AWS infrastructure. To enhance information learn/write efficiency throughout mannequin coaching and information augmentation, we used FSx for Luster, which supplies excessive efficiency throughput.
The coaching information measurement at first was over 1.5 TB. We used two Amazon Elastic Compute Cloud (Amazon EC2) p4d.24 giant cases with 8 GPUs and 40 GB GPU reminiscence. For SageMaker distributed coaching, your cases should be in the identical AWS Area and Availability Zone. Moreover, coaching information saved in your S3 bucket should be in the identical Availability Zone. This structure additionally permits BigBasket to vary to different occasion sorts or add cases to the present structure to accommodate vital information progress or additional scale back coaching instances. .
How the SMDDP library helped scale back coaching time, price, and complexity
In conventional distributed information coaching, a coaching framework assigns ranks to GPUs (employees) and creates a duplicate of the mannequin on every GPU. Throughout every coaching iteration, the worldwide information batch is split into elements (batch shards) and the elements are distributed to every employee. Every employee then continues the ahead and backward passes outlined within the coaching script on every GPU. Lastly, the mannequin weights and gradients from completely different mannequin replicas are synchronized on the finish of the iteration via a collective communication operation referred to as AllReduce. After every employee and GPU has a synchronized duplicate of the mannequin, the subsequent iteration begins.
The SMDDP library is a collective communication library that improves the efficiency of this distributed information parallel coaching course of. The SMDDP library reduces communication overhead for key collective communication operations corresponding to AllReduce. The AllReduce implementation is designed for AWS infrastructure and lets you velocity up coaching by overlapping AllReduce operations with backward passes. This strategy achieves near-linear scaling effectivity and sooner coaching speeds by optimizing kernel operations between CPU and GPU.
Please be aware the next calculations.
- The worldwide batch measurement is (variety of nodes in cluster) * (variety of GPUs per node) * (per batch shard).
- A batch shard (small batch) is a subset of the dataset that’s assigned to every GPU (employee) at every iteration.
BigBasket used SMDDP library to cut back total coaching time. Utilizing FSx for Luster diminished information learn/write throughput throughout mannequin coaching and information augmentation. Information parallelism permits BigBasket to attain coaching that’s almost 50% sooner and 20% cheaper than different options, ensuing within the highest efficiency on AWS. SageMaker robotically shuts down the coaching pipeline after completion. Coaching time was diminished by 50% on AWS, and the undertaking accomplished efficiently (4.5 days on AWS vs. 9 days on conventional platforms).
As of this writing, BigBasket has had the entire resolution working in manufacturing for over six months and is increasing the system to incorporate new cities and including new shops each month.
“Our partnership with AWS to maneuver to distributed coaching utilizing the SMDDP service has been very profitable. Not solely has coaching time been diminished by 50%, however additionally it is 20% cheaper. All through our partnership, AWS units the usual in buyer focus and outcomes supply, and has labored with us to ship the promised advantages.”
– Keshav Kumar, Head of Engineering, BigBasket.
conclusion
On this publish, we realized how BigBasket used SageMaker to coach a pc imaginative and prescient mannequin for FMCG product identification. The introduction of an AI-powered automated self-checkout system improves the retail buyer expertise via innovation whereas eliminating human error within the checkout course of. Accelerating new product onboarding utilizing SageMaker distributed coaching reduces SKU onboarding time and prices. Integrating FSx for Luster allows quick parallel information entry and lets you effectively retrain fashions with tons of of recent SKUs each month. Total, this AI-based self-checkout resolution supplies an enhanced buying expertise with no front-end checkout errors. Automation and innovation have remodeled retail checkout and onboarding operations.
SageMaker supplies end-to-end ML improvement, deployment, and monitoring capabilities, together with the SageMaker Studio pocket book atmosphere for code writing, information acquisition, information tagging, mannequin coaching, mannequin tuning, deployment, monitoring, and extra. If your enterprise is dealing with any of the challenges described on this publish and want to save time to market and enhance prices, please contact his AWS account staff in your area. Begin utilizing SageMaker.
Concerning the creator
Santosh Wadi He’s the Principal Engineer at BigBasket and brings over 10 years of experience in fixing AI challenges. He has a powerful background in Laptop Imaginative and prescient, Information Science, Deep Studying and holds a postgraduate diploma from IIT Bombay. Santosh has authored his personal well-known IEEE publications, and as a veteran expertise weblog creator, he has additionally made vital contributions to the event of his pc imaginative and prescient options throughout his time at Samsung.
Nanda Kishore Tatikonda I’m an Engineering Supervisor main Information Engineering and Analytics at BigBasket. Nanda has constructed a number of functions for anomaly detection and has filed patents in related areas. He has labored on constructing enterprise grade functions, constructing information platforms at a number of organizations, and reporting platforms to streamline data-backed determination making. Nanda has over 18 years of expertise in Java/J2EE, Spring applied sciences, and large information frameworks utilizing Hadoop and Apache Spark.
Sudhanshu Haight is a principal AI & ML specialist at AWS, working with shoppers and advising them on their MLOps and generative AI initiatives. In my earlier position, I conceptualized, created, and led a staff to construct an open source-based AI and gamification platform from the bottom up, efficiently commercializing it for over 100 shoppers. Sudhanshu holds a number of patents. He has written two books, a number of articles, and a weblog. and offered his views in varied boards. He’s a thought chief and speaker, and has been within the business for almost 25 years. He has labored with Fortune 1000 shoppers from everywhere in the world, and extra lately, India’s Digital He has labored with native shoppers.
Ayush Kumar I am an answer architect at AWS. He works with quite a lot of his AWS clients, serving to them deploy the most recent fashionable functions and innovate sooner along with his cloud-native expertise. Yow will discover him experimenting within the kitchen in his spare time.


