If you work on a service that has non zero customers, chances are your projects are somewhat invovling migrating old to new, while keep the service running. The Strangler Migration pattern is a common model used to gradually migrate an existing service to a new system or technology stack. The key idea is to "strangle" the old system by incrementally replacing its functionality with the new system, similar to how a strangler fig plant grows around and eventually takes over an existing tree. This approach allows the migration to happen in a controlled and iterative manner, minimizing disruption to the existing application and its users. It involves creating a facade or proxy layer that routes requests to either the old or new system, gradually shifting more traffic to the new system over time. The Strangler Migration pattern is often used when the existing service is large, complex, or tightly coupled, service downtime is unacceptable or must be minimized, making a big-bang migration risky or impractical. It allows the new system to be developed and tested in parallel, while the old system continues to operate. Here are the key steps of the Strangler Migration process, specifically tailed for online services: 1. Prevention of New Dependencies * Stop new services from integrating with the legacy system * Ensure all new development connects to the new system * Establish clear guidelines for new development teams 2. Incremental Migration with Fallback * Gradually move existing dependencies from old to new system * Implement "kill switch" mechanism for safety * Allow quick rollback to old system if issues arise * Test each migration phase thoroughly * Monitor system behavior during transition 3. Complete Transition with Shadow Mode * Switch all use cases to the new system * Keep old system running in parallel (shadow mode) * Verify all functionality works correctly in new system * Compare outputs between old and new systems * Ensure no regression in business processes 4. Legacy System Decommissioning * Confirm all functionalities are working in new system * Verify no remaining dependencies on old system * Plan and execute resource cleanup * Document system retirement * Remove old system infrastructure If you are philosophy junkies like me, here is a bonus note: The Ship of Theseus paradox and the Strangler Fig Pattern are closely related concepts that deal with gradual replacement and identity. The Ship of Theseus is an ancient philosophical paradox about whether an object remains the same after all its components are gradually replaced. The paradox comes from a ship that had all its parts replaced over time, raising the question of whether it remained the same ship. Philosopher Thomas Hobbes asked - which ship would be the "original" if someone collected all the old parts and built another ship? Regardless what your answer is, migration is the only thing constant!
Minimizing Downtime During Cloud Migration
Explore top LinkedIn content from expert professionals.
Summary
Minimizing downtime during cloud migration involves strategies and practices to ensure that services remain accessible and uninterrupted while transitioning to a new cloud environment or system. This process is essential for maintaining user satisfaction and business operations during migrations.
- Adopt gradual migration strategies: Use approaches like the Strangler Fig pattern or blue/green deployments to transition systems incrementally, allowing for thorough testing and fallback options without disrupting user access.
- Implement robust testing: Conduct pre-deployment tests, such as integration and smoke tests, to identify potential issues before moving updates or new systems into a production environment.
- Set up monitoring and rollback plans: Use health checks and automated rollback mechanisms to quickly identify and resolve post-deployment issues, ensuring minimal impact on users.
-
-
Post 13: Real-Time Cloud & DevOps Scenario Scenario: Your organization hosts a critical application on AWS Elastic Beanstalk. Recently, the application experienced downtime due to an untested update that caused compatibility issues. The rollback process took longer than expected, resulting in customer complaints. As a DevOps engineer, your task is to implement a robust deployment strategy and minimize downtime for future updates. Step-by-Step Solution: Adopt Blue/Green Deployments: Deploy the updated version of the application to a separate environment while keeping the existing environment live. Once verified, switch traffic to the updated environment using Elastic Beanstalk Swap CNAMEs. Rollback becomes simple by reverting the CNAME to the previous environment. Implement Canary Deployments: Gradually route a small percentage of traffic to the new version using tools like AWS App Runner or AWS CodeDeploy. Monitor performance and rollback if issues are detected during the initial phase. Set Up Pre-Deployment Testing: Automate integration and smoke tests using AWS CodePipeline and CodeBuild to ensure updates pass all tests before deployment. Integrate tests into the Elastic Beanstalk deployment pipeline. Enable Application Health Monitoring: Configure Elastic Beanstalk’s health checks to detect and alert on degraded performance after deployment. Use CloudWatch Alarms to trigger notifications for anomalies. Use Immutable Deployments: Choose immutable updates in Elastic Beanstalk to deploy the new version on a fresh set of instances. This ensures the old version remains untouched during the update process. Leverage Deployment Policies: Configure deployment settings in Elastic Beanstalk: All at Once: Quick but risky; use only for non-critical updates. Rolling: Updates instances in batches, balancing risk and speed. Rolling with Additional Batch: Adds a new batch to minimize downtime. Immutable: Creates a completely new environment. Automate Rollbacks: Use AWS CodeDeploy Automatic Rollbacks for Elastic Beanstalk to revert to the previous version if deployment health checks fail. Define failure thresholds for automatic rollback triggers. Document and Train: Document the deployment process and conduct regular training sessions for the team to ensure smooth updates. Perform mock scenarios to practice rollbacks and disaster recovery. Outcome: Improved deployment reliability with minimal downtime and faster rollback mechanisms. Enhanced customer satisfaction through consistent application availability. 💬 What deployment strategies have worked best for your teams? Let’s exchange ideas in the comments! ✅ Follow Thiruppathi Ayyavoo for daily real-time scenarios in Cloud and DevOps. Let’s grow and innovate together! #DevOps #AWS #ElasticBeanstalk #CloudComputing #BlueGreenDeployment #CanaryDeployment #RealTimeScenarios #CloudEngineering #TechSolutions #LinkedInLearning #careerbytecode #thirucloud #linkedin #USA CareerByteCode
-
Had to do an index migration yesterday to re-embed all of the stored text chunks from our users 😬. This used to be really easy back when we were running a closed beta as we could intentionally serve up a maintenance page after informing our closed beta users. Today, this gets harder as there are users constantly on Beloga and relying on it for their day to day work. Hence, serving a maintenance page and blocking users from accessing the platform was no longer an option. Re-embedding all of the stored chunks could cause a period of semantic search inaccuracy if not carried out carefully. Here's how I've tried to minimize that window during the migration phase. 1. I've created a backup of the existing search instance (server A) including files stored in it's SSD. 2. Once done, I ran a script (prepped beforehand) to re-embed all of the chunks within that instance (server B). This leaves a period of time where server A is ahead as users are actively storing and searching on Beloga. 3. Then, I ran an active script to sync all updates periodically (near real-time from server A to server B) as changes are still streaming into server A. 3. Once the servers are in sync, I switched the hostnames within the application and deploy. 4. After the changes are live in production (~4 minutes of deployment time), I tested it to ensure that data integrity is preserved. 5. Terminate server A after 24 hours of stability. This is by no means the best approach to minimizing service disruptions. There was another idea in mind where we could roll the changes out in stages depending on our users' timezones and current active usage but this required a more complex strategy which we will definitely utilize as we scale. 🚀 #buildinpublic #engineering #migration