This publish was written with Dian Xu and Joel Hawkins of Rocket Firms.
Rocket Companies is a Detroit-based FinTech firm with a mission to “Assist Everybody House”. With the present housing scarcity and affordability issues, Rocket simplifies the homeownership course of by an intuitive and AI-driven expertise. This complete framework streamlines each step of the homeownership journey, empowering shoppers to look, buy, and handle house financing effortlessly. Rocket integrates house search, financing, and servicing in a single setting, offering a seamless and environment friendly expertise.
The Rocket model is a synonym for providing easy, quick, and reliable digital options for complicated transactions. Rocket is devoted to serving to shoppers notice their dream of homeownership and monetary freedom. Since its inception, Rocket has grown from a single mortgage lender to an community of companies that creates new alternatives for its shoppers.
Rocket takes a sophisticated course of and makes use of expertise to make it less complicated. Making use of for a mortgage will be complicated and time-consuming. That’s why we use superior expertise and knowledge analytics to streamline each step of the homeownership expertise, from utility to closing. By analyzing a variety of knowledge factors, we’re capable of shortly and precisely assess the chance related to a mortgage, enabling us to make extra knowledgeable lending choices and get our shoppers the financing they want.
Our aim at Rocket is to supply a personalised expertise for each our present and potential shoppers. Rocket’s numerous product choices will be custom-made to fulfill particular shopper wants, whereas our crew of expert bankers should match with one of the best shopper alternatives that align with their abilities and information. Sustaining sturdy relationships with our giant, loyal shopper base and hedge positions to cowl monetary obligations is vital to our success. With the quantity of enterprise we do, even small enhancements can have a major influence.
On this publish, we share how we modernized Rocket’s knowledge science answer on AWS to extend the velocity to supply from eight weeks to underneath one hour, enhance operational stability and assist by lowering incident tickets by over 99% in 18 months, energy 10 million automated knowledge science and AI choices made every day, and supply a seamless knowledge science improvement expertise.
Rocket’s legacy knowledge science setting challenges
Rocket’s earlier knowledge science answer was constructed round Apache Spark and mixed the usage of a legacy model of the Hadoop setting and vendor-provided Information Science Expertise improvement instruments. The Hadoop setting was hosted on Amazon Elastic Compute Cloud (Amazon EC2) servers, managed in-house by Rocket’s expertise crew, whereas the info science expertise infrastructure was hosted on premises. Communication between the 2 methods was established by Kerberized Apache Livy (HTTPS) connections over AWS PrivateLink.
Information exploration and mannequin improvement have been performed utilizing well-known machine studying (ML) instruments akin to Jupyter or Apache Zeppelin notebooks. Apache Hive was used to supply a tabular interface to knowledge saved in HDFS, and to combine with Apache Spark SQL. Apache HBase was employed to supply real-time key-based entry to knowledge. Mannequin coaching and scoring was carried out both from Jupyter notebooks or by jobs scheduled by Apache’s Oozie orchestration instrument, which was a part of the Hadoop implementation.
Regardless of the advantages of this structure, Rocket confronted challenges that restricted its effectiveness:
- Accessibility limitations: The info lake was saved in HDFS and solely accessible from the Hadoop setting, hindering integration with different knowledge sources. This additionally led to a backlog of knowledge that wanted to be ingested.
- Steep studying curve for knowledge scientists: A lot of Rocket’s knowledge scientists didn’t have expertise with Spark, which had a extra nuanced programming mannequin in comparison with different in style ML options like scikit-learn. This created a problem for knowledge scientists to grow to be productive.
- Duty for upkeep and troubleshooting: Rocket’s DevOps/Expertise crew was answerable for all upgrades, scaling, and troubleshooting of the Hadoop cluster, which was put in on naked EC2 cases. This resulted in a backlog of points with each distributors that remained unresolved.
- Balancing improvement vs. manufacturing calls for: Rocket needed to handle work queues between improvement and manufacturing, which have been all the time competing for a similar sources.
- Deployment challenges: Rocket sought to assist extra real-time and streaming inferencing use instances, however this was restricted by the capabilities of MLeap for real-time fashions and Spark Streaming for streaming use instances, which have been nonetheless experimental at the moment.
- Insufficient knowledge safety and DevOps assist – The earlier answer lacked strong safety measures, and there was restricted assist for improvement and operations of the info science work.
Rocket’s legacy knowledge science structure is proven within the following diagram.
The diagram depicts the stream; the important thing parts are detailed under:
- Information Ingestion: Information is ingested into the system utilizing Attunity knowledge ingestion in Spark SQL.
- Information Storage and Processing: All compute is finished as Spark jobs inside a Hadoop cluster utilizing Apache Livy and Spark. Information is saved in HDFS and is accessed by way of Hive, which gives a tabular interface to the info and integrates with Spark SQL. HBase is employed to supply real-time key-based entry to knowledge.
- Mannequin Growth: Information exploration and mannequin improvement are performed utilizing instruments akin to Jupyter or Orchestration, which talk with the Spark server over Kerberized Livy connection.
- Mannequin Coaching and Scoring: Mannequin coaching and scoring is carried out both from Jupyter notebooks or by jobs scheduled by Apache’s Oozie orchestration instrument, which is a part of the Hadoop implementation.
Rocket’s migration journey
At Rocket, we consider within the energy of steady enchancment and continuously hunt down new alternatives. One such alternative is utilizing knowledge science options, however to take action, we will need to have a robust and versatile knowledge science setting.
To handle the legacy knowledge science setting challenges, Rocket determined emigrate its ML workloads to the Amazon SageMaker AI suite. This is able to enable us to ship extra customized experiences and perceive our prospects higher. To advertise the success of this migration, we collaborated with the AWS crew to create automated and clever digital experiences that demonstrated Rocket’s understanding of its shoppers and saved them linked.
We carried out an AWS multi-account technique, standing up Amazon SageMaker Studio in a construct account utilizing a network-isolated Amazon VPC. This permits us to separate improvement and manufacturing environments, whereas additionally enhancing our safety stance.
We moved our new work to SageMaker Studio and our legacy Hadoop workloads to Amazon EMR, connecting to the outdated Hadoop cluster utilizing Livy and SageMaker notebooks to ease the transition. This offers us entry to a wider vary of instruments and applied sciences, enabling us to decide on probably the most applicable ones for every downside we’re making an attempt to unravel.
As well as, we moved our knowledge from HDFS to Amazon Easy Storage Service (Amazon S3), and now use Amazon Athena and AWS Lake Formation to supply correct entry controls to manufacturing knowledge. This makes it simpler to entry and analyze the info, and to combine it with different methods. The crew additionally gives safe interactive integration by Amazon Elastic Kubernetes Service (Amazon EKS), additional enhancing the corporate’s safety stance.
SageMaker AI has been instrumental in empowering our knowledge science neighborhood with the pliability to decide on probably the most applicable instruments and applied sciences for every downside, leading to sooner improvement cycles and better mannequin accuracy. With SageMaker Studio, our knowledge scientists can seamlessly develop, prepare, and deploy fashions with out the necessity for added infrastructure administration.
Because of this modernization effort, SageMaker AI enabled Rocket to scale our knowledge science answer throughout Rocket Firms and combine utilizing a hub-and-spoke mannequin. The flexibility of SageMaker AI to mechanically provision and handle cases has allowed us to deal with our knowledge science work slightly than infrastructure administration, growing the variety of fashions in manufacturing by 5 instances and knowledge scientists’ productiveness by 80%.
Our knowledge scientists are empowered to make use of probably the most applicable expertise for the issue at hand, and our safety stance has improved. Rocket can now compartmentalize knowledge and compute, in addition to compartmentalize improvement and manufacturing. Moreover, we’re capable of present mannequin monitoring and lineage utilizing Amazon SageMaker Experiments and artifacts discoverable utilizing the SageMaker mannequin registry and Amazon SageMaker Function Retailer. All the info science work has now been migrated onto SageMaker, and all of the outdated Hadoop work has been migrated to Amazon EMR.
General, SageMaker AI has performed a important position in enabling Rocket’s modernization journey by constructing a extra scalable and versatile ML framework, lowering operational burden, enhancing mannequin accuracy, and accelerating deployment instances.
The profitable modernization allowed Rocket to beat our earlier limitations and higher assist our knowledge science efforts. We have been capable of enhance our safety stance, make work extra traceable and discoverable, and provides our knowledge scientists the pliability to decide on probably the most applicable instruments and applied sciences for every downside. This has helped us higher serve our prospects and drive enterprise development.
Rocket’s new knowledge science answer structure on AWS is proven within the following diagram.

The answer consists of the next parts:
- Information ingestion: Information is ingested into the info account from on-premises and exterior sources.
- Information refinement: Uncooked knowledge is refined into consumable layers (uncooked, processed, conformed, and analytical) utilizing a mix of AWS Glue extract, rework, and cargo (ETL) jobs and EMR jobs.
- Information entry: Refined knowledge is registered within the knowledge account’s AWS Glue Information Catalog and uncovered to different accounts by way of Lake Formation. Analytic knowledge is saved in Amazon Redshift. Lake Formation makes this knowledge out there to each the construct and compute accounts. For the construct account, entry to manufacturing knowledge is restricted to read-only.
- Growth: Information science improvement is finished utilizing SageMaker Studio. Information engineering improvement is finished utilizing AWS Glue Studio. Each disciplines have entry to Amazon EMR for Spark improvement. Information scientists have entry to the whole SageMaker ecosystem within the construct account.
- Deployment: SageMaker educated fashions developed within the construct account are registered with an MLFlow occasion. Code artifacts for each knowledge science actions and knowledge engineering actions are saved in Git. Deployment initiation is managed as a part of CI/CD.
- Workflows: We now have numerous workflow triggers. For on-line scoring, we usually present an external-facing endpoint utilizing Amazon EKS with Istio. We now have quite a few jobs which are launched by AWS Lambda features that in flip are triggered by timers or occasions. Processes that run could embrace AWS Glue ETL jobs, EMR jobs for added knowledge transformations or mannequin coaching and scoring actions, or SageMaker pipelines and jobs performing coaching or scoring actions.
Migration influence
We’ve developed a great distance in modernizing our infrastructure and workloads. We began our journey supporting six enterprise channels and 26 fashions in manufacturing, with dozens in improvement. Deployment instances stretched for months and required a crew of three system engineers and 4 ML engineers to maintain every thing operating easily. Regardless of the assist of our inner DevOps crew, our difficulty backlog with the seller was an unenviable 200+.
In the present day, we’re supporting 9 organizations and over 20 enterprise channels, with a whopping 210+ fashions in manufacturing and plenty of extra in improvement. Our common deployment time has gone from months to simply weeks—generally even right down to mere days! With only one part-time ML engineer for assist, our common difficulty backlog with the seller is virtually non-existent. We now assist over 120 knowledge scientists, ML engineers, and analytical roles. Our framework combine has expanded to incorporate 50% SparkML fashions and a various vary of different ML frameworks, akin to PyTorch and scikit-learn. These developments have given our knowledge science neighborhood the facility and adaptability to sort out much more complicated and difficult tasks with ease.
The next desk compares a few of our metrics earlier than and after migration.
| . | Earlier than Migration | After Migration |
|---|---|---|
| Pace to Supply | New knowledge ingestion challenge took 4–8 weeks | Information-driven ingestion takes underneath one hour |
| Operation Stability and Supportability | Over 100 incidents and tickets in 18 months | Fewer incidents: one per 18 months |
| Information Science | Information scientists spent 80% of their time ready on their jobs to run | Seamless knowledge science improvement expertise |
| Scalability | Unable to scale | Powers 10 million automated knowledge science and AI choices made every day |
Classes discovered
All through the journey of modernizing our knowledge science answer, we’ve discovered worthwhile classes that we consider may very well be of nice assist to different organizations who’re planning to undertake related endeavors.
First, we’ve come to understand that managed providers is usually a sport changer in optimizing your knowledge science operations.
The isolation of improvement into its personal account whereas offering read-only entry to manufacturing knowledge is a extremely efficient approach of enabling knowledge scientists to experiment and iterate on their fashions with out placing your manufacturing setting in danger. That is one thing that we’ve achieved by the mix of SageMaker AI and Lake Formation.
One other lesson we discovered is the significance of coaching and onboarding for groups. That is significantly true for groups which are transferring to a brand new setting like SageMaker AI. It’s essential to grasp one of the best practices of using the sources and options of SageMaker AI, and to have a stable understanding of methods to transfer from notebooks to jobs.
Lastly, we discovered that though Amazon EMR nonetheless requires some tuning and optimization, the executive burden is way lighter in comparison with internet hosting straight on Amazon EC2. This makes Amazon EMR a extra scalable and cost-effective answer for organizations who must handle giant knowledge processing workloads.
Conclusion
This publish supplied overview of the profitable partnership between AWS and Rocket Firms. By means of this collaboration, Rocket Firms was capable of migrate many ML workloads and implement a scalable ML framework. Ongoing with AWS, Rocket Firms stays dedicated to innovation and staying on the forefront of buyer satisfaction.
Don’t let legacy methods maintain again your group’s potential. Uncover how AWS can help you in modernizing your knowledge science answer and reaching outstanding outcomes, just like these achieved by Rocket Firms.
Concerning the Authors
Dian Xu is the Senior Director of Engineering in Information at Rocket Firms, the place she leads transformative initiatives to modernize enterprise knowledge platforms and foster a collaborative, data-first tradition. Beneath her management, Rocket’s knowledge science, AI & ML platforms energy billions of automated choices yearly, driving innovation and trade disruption. A passionate advocate for Gen AI and cloud applied sciences, Xu can also be a sought-after speaker at international boards, inspiring the following era of knowledge professionals. Outdoors of labor, she channels her love of rhythm into dancing, embracing kinds from Bollywood to Bachata as a celebration of cultural range.
Joel Hawkins is a Principal Information Scientist at Rocket Firms, the place he’s answerable for the info science and MLOps platform. Joel has many years of expertise creating refined tooling and dealing with knowledge at giant scales. A pushed innovator, he works hand in hand with knowledge science groups to make sure that now we have the most recent applied sciences out there to supply leading edge options. In his spare time, he’s an avid bike owner and has been identified to dabble in classic sports activities automotive restoration.
Venkata Santosh Sajjan Alla is a Senior Options Architect at AWS Monetary Companies. He companions with North American FinTech firms like Rocket and different monetary providers organizations to drive cloud and AI technique, accelerating AI adoption at scale. With deep experience in AI & ML, Generative AI, and cloud-native structure, he helps monetary establishments unlock new income streams, optimize operations, and drive impactful enterprise transformation. Sajjan collaborates carefully with Rocket Firms to advance its mission of constructing an AI-fueled homeownership platform to Assist Everybody House. Outdoors of labor, he enjoys touring, spending time along with his household, and is a proud father to his daughter.
Alak Eswaradass is a Principal Options Architect at AWS based mostly in Chicago, IL. She is keen about serving to prospects design cloud architectures utilizing AWS providers to unravel enterprise challenges and is obsessed with fixing quite a lot of ML use instances for AWS prospects. When she’s not working, Alak enjoys spending time together with her daughters and exploring the outside together with her canine.

