Zero-Downtime Cutover Plans
Context
Zero-downtime cutovers require strict environment parity, precise cache expiration controls, and atomic data synchronisation. They target webmasters, SEO engineers, site architects, and technical project managers who must move a live site between hosts without a visible outage or a ranking dip. Infrastructure drift or misaligned resolver caches fracture traffic routing and trigger immediate volatility, so the plan enforces deterministic failover paths and preserves crawl budget through the transition window. This work orchestrates the other phases of the DNS Configuration & Hosting Cutover sequence into a single rehearsed switch with a rehearsed reversal.
Zero downtime is not the absence of risk; it is the presence of a reversible, gradual switch. Two mechanisms make that possible. Blue-green keeps the legacy origin (blue) fully warm while the new origin (green) takes a slice of live traffic, so you can swing back instantly by shifting weight rather than waiting for DNS to re-propagate. Weighted DNS does the same at the resolver layer, advancing traffic from 10% to 100% in stages and watching error rates at each increment. Both depend on the same precondition: an atomic data switch, because you cannot have writes landing on two origins at once. The execution sequence therefore freezes writes first, proves parity, then moves traffic in controlled steps, each one cheap to reverse.
Pre-flight Checks
Establish identical staging and production baselines before modifying any authoritative record. Configuration drift guarantees split-brain states and unpredictable routing.
- Mirror OS kernels, runtime versions (PHP/Node/Python), and database schemas using infrastructure-as-code templates.
- Validate network ACLs, firewall rules, and WAF policies across both environments.
- Reduce authoritative A/AAAA/CNAME TTLs to 300 seconds 48â72 hours before execution using TTL Optimization Strategies.
- Pre-stage new DNS records with identical TTLs pointing at staging IPs for dry-run validation.
- Confirm data and configuration parity through Staging to Production Sync before any traffic shifts.
Execution Steps
Execute the switch in strict sequence. Do not parallelise database locks with DNS updates; maintain write-block windows to guarantee atomic consistency.
1. Lock Source Writes
Enforce application-level write locks or run SET GLOBAL read_only = ON; on the source database 15 minutes pre-cutover (MySQL 5.7+/8.0). This freezes the transactional state so the final sync is a clean snapshot with no in-flight writes.
2. Finalise Data and Asset Sync
Run incremental replication with mysqldump --single-transaction --routines --triggers or an AWS DMS continuous task, then rsync -avz --checksum --delete static assets for byte-perfect parity. Capture the final delta exactly as described in Staging to Production Sync to avoid primary-key collisions on merge.
3. Reconfigure the Edge and Bust Caches
Update CDN origin pull endpoints to the new hosting IP and configure origin shield routing to absorb backend load spikes. Inject Cache-Control: no-store, max-age=0 and Pragma: no-cache on critical HTML/CSS/JS so the edge does not serve stale assets during the swap.
4. Swap Records and Shift Traffic Gradually
Update A/AAAA records to production IPs, monitor SOA serial increments on secondaries, then shift load balancer weight in stages using Using Weighted DNS for Gradual Traffic Migration and the session-persistence approach in Implementing Blue-Green Deployments for Site Migrations.
5. Invalidate Edge Caches and Verify
Trigger a full edge cache invalidation via the provider API immediately after propagation completes, then run DNS Propagation Tracking across 10+ global resolver nodes to confirm adoption before retiring the legacy origin.
Configs / Commands
Zone file excerpt â use the exact production IP:
@ 300 IN A 203.0.113.10 ; low TTL set during the pre-flight reduction
www 300 IN CNAME origin.newhost.com.
Database sync pipeline (MySQL):
# --single-transaction takes a consistent snapshot without locking InnoDB tables
mysqldump -u root -p \
--single-transaction \
--set-gtid-purged=OFF \
--routines --triggers \
production_db \
| mysql -h new_host_ip -u root -p production_db
CDN origin headers (Nginx):
# Force fresh fetches during the transition; relax after adoption is global
add_header Cache-Control "no-store, no-cache, must-revalidate, proxy-revalidate, max-age=0" always;
add_header X-Frame-Options "SAMEORIGIN" always;
Validation command chain (dig / openssl / curl):
# Verify DNS resolution, NS delegation, TLS chain, and HTTP status in one pass
dig @8.8.8.8 yourdomain.com A +short
nslookup -type=NS yourdomain.com 8.8.8.8
openssl s_client -connect yourdomain.com:443 -servername yourdomain.com < /dev/null 2>/dev/null | grep -E "subject|issuer|Verify"
curl -s -o /dev/null -w '%{http_code}\n' https://yourdomain.com
Validation
Verify technical integrity immediately after the switch; confirm global resolver adoption and protect organic search equity.
- Validate redirect maps from URL Mapping & Redirect Architecture:
curl -I -L -s -o /dev/null -w '%{http_code} %{url_effective}'. - Verify SSL certificate chain validity, HSTS enforcement, and canonical tag consistency across migrated URLs.
- Monitor real-time error rates, TLS handshake failures, and connection pool exhaustion via APM dashboards.
- Confirm global resolver adoption with DNS Propagation Tracking across 10+ nodes.
- Begin the Search Console Handover so the new property starts collecting crawl and coverage data.
Critical Failure Modes:
- TTL left at the default 86400 s causes 24â48 hour propagation delays and split traffic.
- Failing to lock database writes during the final sync results in data loss or duplicate entries.
- CDN origin shields pointed at decommissioned IPs trigger persistent 502/504 errors.
- Omitting HSTS preload updates or SSL chain validation causes browser security warnings.
- Skipping 301 mapping for legacy URL parameters causes immediate ranking decay and crawl waste.
Rollback Triggers
Abort the cutover and revert to the previous authoritative configuration if any threshold is breached. Do not patch a failing live switch â follow Migration Rollback Playbooks.
- Propagation Stagnation: global resolver adoption falls below 85% after 60 minutes.
- Database Divergence: checksum mismatch or replication lag exceeds 300 seconds during final sync.
- CDN Origin Failure: 5xx error rate exceeds 2% or origin timeout spikes above 5 seconds.
- TLS/SSL Breakage: certificate chain validation fails or HSTS headers drop unexpectedly.
- Traffic Drop: organic session volume falls more than 15% within the first 30 minutes post-switch.
FAQ
What is the minimum safe TTL value before initiating a cutover? Set TTL to 300 seconds at least 48 hours in advance so global ISP and recursive resolver caches expire, allowing near-instant propagation when authoritative records are updated.
How do I prevent database inconsistencies during the final sync window?
Enable read-only mode on the source (SET GLOBAL read_only = ON;), run a final incremental sync, then atomically switch application connection strings, using maintenance flags to block writes during the transition.
Should I purge the CDN cache before or after updating DNS? Purge immediately after records propagate globally â purging beforehand forces edges to re-fetch from the old origin, while purging post-cutover ensures fresh assets from the new infrastructure.
How do I verify SSL/TLS integrity post-cutover?
Run openssl s_client -connect domain.com:443 -servername domain.com < /dev/null to validate the chain, expiry, and SNI, then cross-check with curl -I -v https://domain.com for HTTP/2 and HSTS delivery.
Related
- Implementing Blue-Green Deployments for Site Migrations
- Using Weighted DNS for Gradual Traffic Migration
- Staging to Production Sync
- DNS Propagation Tracking
- Migration Rollback Playbooks
â Back to DNS Configuration & Hosting Cutover