Six months ago, we completed a major migration to Sitecore XM Cloud, a project that transformed the way we manage and deliver digital experiences. In my previous blog post, I walked through the challenges, processes, and solutions we implemented to ensure a smooth transition to the cloud. We rethought our architecture, overhauled forms, rebuilt components, and set up CI/CD pipelines to streamline deployments. At the time, we were excited about the possibilities and optimistic about the benefits we would gain.
Now, half a year later, it’s time to reflect on the journey so far. In this post, I’ll explore what’s changed since the migration, the impact it has had on performance, our operational practices, and whether our initial expectations were met. Additionally, I’ll share lessons learned and what’s next on our roadmap for continuing to evolve our digital ecosystem on XM Cloud.
What have we changed?
Before going live, we encountered several challenges. One significant issue arose during our deployments when the editing host broke on both the preview and production environments immediately after triggering a new deployment on the XM Cloud development environment. After extensive debugging and support from Sitecore, we discovered that during the build process, the public_url environment variable was set to the editing host. However, when promoting to the next environment, this variable wasn’t updated. This wasn’t a problem as long as development, preview, and production were running the same version, but it caused issues during our development cycle.
Although the issue hasn’t been fully resolved at the time of writing, we’ve implemented a workable solution. We decided to bypass the "promote" feature of XM Cloud and instead deploy separately to each environment. While this approach takes longer, it ensures that the editing host continues to function properly.
After going live, we faced another challenge in production. A specific API route on the head application experienced such high load that Sitecore Experience Edge began throttling our requests. This led to site outages. Sitecore's throttling occurs per Experience Edge, based on each XM Cloud project. Unfortunately, at the time of writing, Sitecore doesn’t provide visibility into Experience Edge statistics, making it difficult to determine what’s triggering the rate limit. However, during Sitecore Symposium, XM Cloud Product Lead Liz Nelson announced that they are working on a fix to provide developers with the necessary insights to take corrective action.
The throttling issue stemmed from two main factors. First, we were querying Experience Edge without proper caching, which led to excessive requests being sent under heavy load. To address this, we implemented caching layers in both our Azure Functions and the frontend head application. Although this alleviated the problem somewhat, we still encountered rate limiting during peak traffic.
Upon further investigation, we discovered that the frontend API was simply proxying requests to the backend API, adding an unnecessary layer. To streamline the process, we eliminated the middle layer and connected the requests directly to the backend, handled by an Azure Function. This was the second issue we identified and resolved.
But we didn’t stop there. Since the content from XM Cloud changes infrequently, we refactored the caching refresh mechanism. Now, it runs once daily to query Experience Edge, regardless of how many requests are made. The API endpoint simply reads from the Redis cache and delivers the cached version, until the scheduled refresh updates the query results.
Additionally, there were smaller issues that the team was able to resolve with time and effort.
In conclusion, while a lift-and-shift migration might seem straightforward, it’s important to factor in the time needed to address post-migration issues. These can be just as crucial to a successful deployment as the initial migration itself.