Introduction
Recently I was involved in troubleshooting a load balanced web service. That led me to quickly write PowerShell script to Monitor a web service. The original web service was actually not the problem (well not this time and after we’s set it to recycle a lot more until someone fixes the code and bar the fact there is no health check on the load balancer!?). I “failed” as it din’t really handle a another failed web service it depended on very well so that was unclear during initial troubleshooting. That web service is not highly available bar with manually switching over to a “stand by” server a ARR, no loadbalancing. But that’s another discussion.
The culprit & the state of our industry
When we found the problematic web service and saw it ran on Tomcat we tried restarting the Tomcat service but that didn’t help. Rebooting the servers on which it ran didn’t help either. Until some one sent us a document with the restart procedure for those servers. This also stated the Catalina folder need to be deleted for this to work and get the service back up an running. It also stated they often needed to do this twice. Well, OK … Based on that note we worked under their assumption nothing in that folder that is needed, as nothing was said about safe guarding any of that.
Having said that, why on earth over all those years, the developers did not find out
what is causing the issue and fixed it beats me. For year and years they’ve been doing this manually. Sometime several days a week, sometime multiple times a day. On several servers. Good luck when no one is around to so, or knows the process. The doc was from a developer. A developer in what is supposed to be a DevOps environment. No one ever made the effort to find out what makes the web service crash or automate recovery.
PowerShell Script to Monitor a web service
I think it’s safe to say I won’t get them to any form of site resilience engineering soon. But I did leave them with a script they can schedule to automate their manual actions. This does mean that an “ordinary” restart of the server does not fix any issues with the web service. So, ideally this script is also run at server startup!
The script has basic error handling and logging but it’s a quick fix for a manual process so it’s not a spic & span script. but it’s enough to do the job for now and hopefully inspire them to do better. It is 2018 after all and even Site Resilience Engineering needs a new incarnation in this fashion driven industry.
I’ve included this PowerShell script to monitor a web service below as an example and reference to my future self. Enjoy.
<#
Author: Didier Van Hoye
Date: 2018/09/24
version: 0.9.1
Blog: https://blog.workinghardinit.work
Twitter: @WorkingHardInIT
This PowerShell scripts automated the restart of Tomcat7 when needed. The need is based
it the web servcie running on Tomcat7 returns HTTP status 200 or not.
The work this script does is based on the memo that describe the manual procedure.
It takes away the manual reactive actions that they did multiple days per week, sometimes
multiple times per day
You can register this script as a scheduled task to run every X times.
Below is a example. NOTE LINE WRAPS!!!
Schtasks.exe /CREATE /TN MonitorMyWebService /TR "Powershell.exe C:\SysAdmin\Scripts\MonitorMyWebService.ps1"
/RU SYSTEM /RL HIGHEST /F /SC DAILY /RI 15 /ST 00:00
Having said that, why on earth over all those years the developers did not find out
what was causing the issue and fixed that beats me. Also since the need to have the catalina
folder deleted for this to work we work under their assumption nothing in there is needed.
This does mean that an "ordinary" restart of the server does not fix any issues with the web service.
So, ideally this script is also run at server startup.
The script logs its findings and actions to a script in the script directory.
#>
$ErrorActionPreference = "Stop"
$FolderToDelete = "C:\Program Files\Apache Software Foundation\Tomcat 7.0\work\Catalina"
$MystatusRunning = 'Running'
$MyStatusStopped = 'Stopped'
$MyService = "Tomcat7"
$MyWebService = "https://mywebservice.company.com/metadatasearch"
$MyWebServiceStatus = 0
$MyServiceCheckLogFile = "MetaDataMonitor"
#region CheckWebServiceStatus
Function CheckWebService() {
[CmdletBinding()]
param(
[String]$WebService
)
try {
# Create new web request.
$HTTP_Request = [System.Net.WebRequest]::Create($WebService)
# WGet a response from the site
$HTTP_Response = $HTTP_Request.GetResponse()
# Cast the status of the service to an integer
$HTTP_Status = [int]$HTTP_Response.StatusCode
If ($HTTP_Status -eq 200) {
##Write-Host "All is OK!"
Return $HTTP_Status
}
Else {
##Write-Host "The service might be down!"
Return $HTTP_Status
}
# Don't litter :-)
$HTTP_Response.Close()
}
Catch {
#Write-Host $_.Exception.InnerException
if ($_.Exception.InnerException -contains "The remote server returned an error: (500) Internal Server Error.") {
$HTTP_Status = [int]500
}
else {
$HTTP_Status = [int]999
}
Return $HTTP_Status
}
Finally {
}
}
#endregion
#region Write-2-Log
function Write-2-Log {
[CmdletBinding()]
param(
[Parameter()]
[ValidateNotNullOrEmpty()]
[string]$Message,
[Parameter()]
[ValidateNotNullOrEmpty()]
[ValidateSet('Information','Warning','Error')]
[string]$Severity = 'Information'
)
$Date = get-date -format "yyyyMMdd"
[pscustomobject]@{
Time = (Get-Date -f g)
Message = $Message
Severity = $Severity
} | Export-Csv -Path "$PSScriptRoot\$MyServiceCheckLogFile-$Date.log" -Append -NoTypeInformation
}
#endregion
Write-Log -Message "Starting Web Service Status check" -Severity Information
$MyWebServiceStatus = CheckWebService $MyWebService
<#
For some reason once might not be enough
So we strafe them twice - what's worth shooting once is worth shooting twice.
#>
For ($counter = 1; $counter -le 2; $counter++) {
Try {
If ($MyWebServiceStatus -ne 200) {
#Write-Host "The webservice has a problem"
Write-2-Log -message "The webservice has a problem as it did not return HTTP Status 200" -Severity Warning
#Stop the TomCat service if it is running
$ServiceObject = $Null
$ServiceObject = Get-Service $MyService
If ($ServiceObject | Where-Object {$_.Status -eq $myStatusRunning}) {
#Write-Host "Running"
Write-2-Log -Message "Stopping $MyService ..." -Severity Information
Stop-Service $MyService
$ServiceObject.WaitForStatus($MyStatusStopped, "00:00:05")
#Get-Service $MyService
Write-2-Log -Message "$MyService has been stopped..." -Severity Information
#Write-Host "Stopped"
}
#Write-Host "Delete folder"
if (Test-path $FolderToDelete -PathType Container) {
#Write-Host Folder Exists
Write-2-Log -Message "The $FolderToDelete exists ..." -Severity Information
#Delete the folder & its contents
Get-ChildItem -Path "$FolderToDelete\\*" -Recurse | Remove-Item -Force -Recurse
Write-2-Log -Message "$FolderToDelete content has been recursively deleted ..." -Severity Information
Remove-Item $FolderToDelete -Recurse -force
Write-2-Log -Message "$FolderToDelete has been deleted ..." -Severity Information
#Write-Host Folder Deleted
}
#Start the TomCat service
$ServiceObject = $Null
$ServiceObject = Get-Service $MyService
if ($ServiceObject | Where-Object {$_.status -eq $MyStatusStopped} ) {
Write-2-Log -Message "Starting $MyService ..." -Severity Information
Start-Service $MyService
$ServiceObject.WaitForStatus($MyStatusRunning, "00:00:05")
#Get-Service $MyService
Write-2-Log -Message "$MyService has been started..." -Severity Information
#Write-Host Started
}
}
}
catch {
Write-2-Log -Message "Bailing out of the script due to error" -Severity Warning
Write-2-Log -Message $_.Exception.Message -Severity Error
Write-2-Log -Message $_.InvocationInfo.PositionMessage -Severity Error
Write-2-Log -Message $_.FullyQualifiedErrorID -Severity Error
Write-2-Log -Message "Stopping Web Service Status check" -Severity Information
}
Finally {
Start-Sleep -Seconds 5
}
}