Proactive Tasks Before Performing a Failover of AlwaysON Availability Group
AlwaysON is the most popular feature in the release of SQL Server 2012 and the concept at high level which most people understand is that, it provides high availability (HA) and disaster recovery (DR) solution for databases. It is not at all difficult to setup AlwaysON Availability Groups and there are plenty of step-by-step articles available found by a quick search.
It is common that most DBA’s setup AlwaysON Availability Groups and are happy after they see all Green in the AlwaysON Dashboard, but when there is a requirement where they have to failover a Availability Group, they run into issues and get complains from Client, Developers and Managers. So, it would be better to understand tasks which DBA’s need to plan and take care before and after performing a AG failover.
Let’s consider a test environment setup, where we have two server Server01 and Server02 running on Windows Server 2012 Operating System and one default standalone SQL Server instance, installed on each server. Now we have configured AlwaysON Availability Group with Listener name as “PROD_LSTRN” with Automatic failover.
On a bad day, the primary server Server01 has gone down and AlwaysON detected the issue and performed an automatic failover of Availability Group PROD_LSTRN from Primary Server Server01 to secondary server Server02 and it completed successfully and databases are now online on secondary server Server02. DBA is happy to see that all databases are accessible, but not for long as the Developers/Client and Managers started sending E-Mails that they are unable to access the some Applications. Now, we have to fix the issues ASAP as we configured AlwaysON to have high availability, but now we ended up unplanned downtime.
To dig deeper into the issue, we need to understand what issue is the Developers/Client is facing. One of the issue which they may be complaining that they get login failure from the Application when tried to connect, it is because logins are in Master database and are not synchronized between Server01 and Server02 as part of AlwaysON. If we are lucky, we can script logins from Server01 and create Server02, but if the Server01 is still down, then we are in bigger trouble. If the login failed is for domain accounts, then we can create them as logins and grant some read/write/execute permissions temporarily, but if there are SQL Authentication accounts, they we have to create them with password, which we do not have and need to speak to the Application Team and if lucky we can get it quickly or may take long time. If we have backup of Master database from Server01, we can restore on a different test server and get the login details from there, but this is a time consuming process.
Another issue that can occur is that there may be some SQL Agent jobs, related to Application or maintenance jobs like Backups, Index Maintenance which are created over a period of time. If Primary Server Server01 is down, and if some or all of these jobs are not present on the secondary server Server02, then we cannot create these jobs on Server02 as primary server is down. If we are lucky and already the jobs are present on secondary server Server02 as well, then we need to know which jobs need to be enabled and which are to be left as disabled. If we have backup of MSDB database, then We would need to restore it on another test server and extract the job details and create them on Server02 and this is a time taking process.
To avoid these kind of issues, it is important that we need to be proactive and have some process like a SQL job or triggers in place to Synchronizes the logins from Primary Server to all the Secondary Servers. When we get a request to create a SQL Agent job, we Need to have some process in place where we need to create jobs on Primary and all secondary servers and document the details of all the SQL Server Agent jobs, so that they can be recreated if required. IN some AlwaysON Environment, DBA’s create the jobs on central server and configure those jobs to accesses the databases in Availability Group via linked server or other remote query methods.
Another issue that can happen when Availability Group is set to Asynchronous and automatic failover is not supported. You may get a request from Client/Developers that we need to plan and failover the Availability Group during downtime from primary server Server01 to secondary server Server02, may be because they want to perform some maintenance on Server01. When we try to perform the Failover, you see warning message that failover will result in Data loss and you will then be directed to Click here to confirm failover with potential data loss. At this point, you would be in dilemma, on how much will be the data loss.
As a DBA you need to monitor the AlwaysON and create some alert mechanisms, which sends alerts when there is latency identified between primary and secondary and you need to find the cause on why there is latency and take appropriate steps to fix latency issues to never happen.
Hope this was helpful.
In-Depth Blogs on SQL Server, Information about SQL Server Conferences and Events, Frequently asked questions, SQL Server Trainings, SQL Server Logshipping on Databases which are part of AlwaysON Availability Groups (AG)