Monitor Office 365 Outages with Twitter
Office 365 has high SLAs (the latest English version) backed by Microsoft’s excellent Azure Cloud. However, like every other cloud services, there is always the chance for something unexpected to happen.
This year April Office 365 had a major hiccup to its service. Its Asia Pacific backend Azure AD authentication went haywire. As a result, users lost access all O365 services. To make it worse, the usual Office 365 monitoring channel: Office 365 dashboard was not accessible due to this fault.
To track the issue at the time, I ended up relying on the official MS Twitter account @Office365Status for updates. The account contains all Office 365 service outage notifications. This made it become the only source for people to track the issue at that time.
After the incident, it become obvious that the Twitter account is a pretty reliable source for monitoring Office 365 service status. Based on this, I developed a solution, it uses PowerShell script to check for new tweets from @Office365Status Twitter account. The scheduled script is executed from a AWS EC2 instance, this completely eliminates the dependence from Microsoft services.
Ok, let’s go through the script in details.
In order to allow our script to read tweets, we first need to create an app in https://apps.twitter.com. Please be aware the site will be consolidated into https://developer.twitter.com soon. And as of July 2018, you will need to apply for a Twitter developer account in order to create a new app.
After created the app, under Permissions, set the permission as Read only, AS the monitoring script will have no need to write.
Under Keys and Access Tokens, copy following key values to notepad. These values will be needed in the script to configure the access to Twitter.
- Consumer Key (API Key)
- Consumer Secret (API Secret)
- Access Token
- Access Token Secret
Now we just need a PowerShell module that uses Twitter API to pull out tweets. Among all the PowerShell modules I tested, MyTwitter, written by Adam Bertram is the only one works for me. You can find the module from here.
First, we create a authenticated session with the generated app tokens.
Next, we use Get-TweetTimeline to get recent tweets from @Office365Status account. Then we grab the first item from the array, which is the latest tweet from the account.
The script will then get the creation time of the latest tweet. Twitter uses a non standard time format. So first, we will need to convert the time to a conventional format. This will allow us to compare the creation time with the current time on the server.
Twitter original creation time format:
To convert the time we use a Convert-DateString function.
ddd = First 3 letter of weekday.
MMM = First 3 letter of month.
dd = day of the month
HH:mm:ss = hours:minutes:seconds
zzzz = time zone hours
yyyy = year
The function uses .Net method TryParseExact to convert the time format to something like below.
Once we have the correct creation time, my script will check if the tweet is a reply. This is to filter out those reply tweets. As we only care about the incident updates.
The script will then check if the tweet is created within last hour. Depends on the interval you schedule the script. You may set this to the frequent you desire. In my case, I schedule the script to check every hour, hence the 1 hour interval.
Upon confirm the tweet is posted within last hour and is not a reply tweet, the script will then send out an email to a monitoring mailbox (Not your Office 365 mailbox of course!).
The PowerShell monitoring script can be downloaded from my GitHub repo here.
The last part of the solution is to upload the script to a server (EC2 in AWS in my case) and schedule it to run every hour!
Below is an Email I received for a recent incident. Hope you find this useful!