Disaster recovery
Cooperation with Amazon Web Service, Google Cloud Platform, Yandex Cloud, and OVH
To begin with, we use various platforms that duplicate each other. All the above-mentioned companies are the major cloud computing providers in the world. They have state-of-the-art data centers designed to withstand disasters and they use the same technology and hardware that Amazon, Google apps, Yandex, and a lot of other biggest worldwide and well-known companies run.
Automated snapshots and backup of every project
We create snapshots of all hosted websites every 24 hours 7 days a week or optional as your business requires. This means that even if your site is affected by an incident unrelated to our infrastructure, you will always have website backups. We restore them upon your request ASAP.
Machine-Level snapshots
Besides automatic backups for each client website, we create and store persistent disk snapshots of each server and component in our infrastructure every 12 hours and keep them for 24 hours. This means that if your website snapshots were not used for some reason, we have snapshots of our entire infrastructure that can be used to restore your data.
We monitor all websites and servers' health every moment
We check the status of all websites we host every minute. Also, we monitor ~300 operating system and server software metrics, 50 hardware parameters per 1-5 min as required. That translates to 1440 web checks for each of your websites and 250 000 server-side metrics every day. We forecast analytic and do AI learning procedures for triggers on this data. If we notice something wrong on your site, our team will typically reach out to you before you even realize there is a problem. In other words, we constantly improve our monitoring system to handle future incidents.
We keep you informed
You will be notified via email and a ticket message at your Managed Hosting dashboard (MyCloud) for any issues affecting your websites. Besides, we provide Slack and Telegram notifications for the Cloud and Scalable hosting solutions. If there is a system-wide outage or network event, these will be posted on your account at the top of the page as a notice.
Disaster recovery plan (step-by-step)
Step 0: The beginning of the incident (00:00)
- Attempt to correct the situation by self-healing methods
Step 1: Notification of a specialist on duty (01:00)
- Initiate the incident response work, determining the root cause of the problem
- Alert the client for critical issues
Step 2: Determining the root cause, determining the timing and planned work (05:00)
- Client notification
Step 3: Generation of the final disaster recovery plan (15:00)
- Providing recommendations
- Obtaining the necessary access from the client, if the required ones are not available
Step 4: Pre-final assessment of the situation (30:00)
- Pre-final assessment of the situation based on monitoring data and data (or lack thereof) from the service provider
- Waiting for accesses if needed for disaster recovery
- Recovery from natural disasters becomes "pending repair" status
Step 5: Start of recovery work (45:00)
- Start of recovery work on a new server from the nearest snapshot or backup
Step 6: After the incident (45:00+)
- Description of the reasons and what was done
- Recommendations and work to prevent this in the future