Azure Cloud Role

Available Metrics

Possible metric types that can be tracked on a given resource.

Metric Type Data Type Description
AggregatedMetric
Object
Allows for aggregating existing metrics over a period of time. Useful for running sums or averages; also for extracting single data item from collection-based metrics
AzureCloudRoleInstanceDetails
AzureCloudRoleInstanceDetails[]
Instance String
Size String
State String
StateDetails String
ErrorCode String
Tracks detailed information about Azure role instances as a list.
AzureVirtualMachineState
String
Specifies the current status of a role instance as reported by Azure. Possible values for this metric are listed in this MSDN article in the RoleInstanceList section.
DerivedMetric
Double
Allows for deriving new metrics from existing ones. Useful for combining existing metrics together or for multiplying metrics by a factor
InternalUrlResponseCode
String
Tracks an http result from testing of internal IP address. Possible values are http verbs: OK, Unauthorized, etc.
InternalUrlResponseTime
Double
Tracks response time of http request to internal IP address
InternalUrlState
UrlStatus[]
Host String
Down Boolean
ResponseTime Double
StatusCode Int32
ErrorMessage String
Timestamp String
Tracks results of http request to internal IP address
LinkedMetric
Object
Allows for tracking of metrics from other resources. Useful when it is important to evaluate metrics from different resources side-by-side
PowershellArrayMetric
Object[]
ResourceInstanceCount
Int32
Tracks current number of compute instances
ResourceStatus
String
Tracks overall status of the resource. This is an important metric as it is used to drive Uptime reports. Possible values: Ready, Down, Unknown and in some cases Stopped
ScheduledTaskLastRunInMinutes
Double
Tracks the number of minutes since a particular Windows scheduled task has last executed.
ScheduledTaskLastStatus
Int32
Tracks the last status of a particular Windows scheduled task. Status of 0 indicates a successful run.
WindowsCustomEventLogEntry
EventEntity
UniqueId Guid
EventId Int64
MachineName String
Message String
Source String
UserName String
EntryType String
Timestamp DateTime
Tracks entries from the Windows Event Log.
WindowsEventLogEntry
EventEntity
UniqueId Guid
EventId Int64
MachineName String
Message String
Source String
UserName String
EntryType String
Timestamp DateTime
Tracks entries from the Windows Event Log.
WindowsPerformanceCounter
Double
Tracks performance counters defined as individual metrics. Any performance counter might be tracked.
WindowsPerformanceCounterMultiInstance
PerformanceCounterInstance[]
Server String
Instance String
Value Double
Tracks multi-instance performance counters. It returns an array of PerformanceCounterInstance objects for each counter instance.
WindowsProcessList
ProcessEntity[]
UniqueId Guid
Name String
IsResponding Boolean
MemorySize Double
Cpu Double
Tracks a list of currently running processes.
WindowsServiceState
String
Tracks the last known status of a particular Windows service.
WindowsUpdatesDrivers
WindowsUpdate[]
Title String
Url String
Mandatory Boolean
Priority String
Date DateTime
Tracks available Windows Driver Updates. Used for ensuring all important updates are installed regularly.
WindowsUpdatesSoftware
WindowsUpdate[]
Title String
Url String
Mandatory Boolean
Priority String
Date DateTime
Tracks available Windows Updates. Used for ensuring all important updates are installed regularly.

Available Commands

Possible commands that can be executed on a given resource. Ultimate subscription is required.

Command Type Description
AzureCloudServiceInstanceReboot Reboots specified Azure Cloud Service instance
AzureCloudServiceInstanceReimage Reimages specified Azure Cloud Service instance
CustomPowershellScript Runs custom Powershell script on specified resource
PowershellRestartService Restarts specified Windows Service using Powershell script
WebRequest Runs custom WebRequest to specified URL

Default Templates

CloudMonix provided default monitoring templates.

Pre-configured Metrics

Metric Name Metric Type Description
ApplicationEventLogs WindowsEventLogEntry Tracks entries from the Windows Event Log (Application source)
AspNetApplicationRestarts WindowsPerformanceCounter Tracks the number of times that an application has been restarted during the Web server's lifetime. Application restarts are incremented each time an Application_OnEnd event is raised. An application restart can occur because of changes to the Web.config file, changes to assemblies stored in the application's Bin directory, or when an application must be recompiled due to numerous changes in ASP.NET Web pages. Unexpected increases in this counter can mean that problems are causing Web application to recycle.
AspNetBytesOut WindowsPerformanceCounter Tracks total size in bytes of responses sent to a client. Does not include HTTP response headers.
AspNetErrors WindowsPerformanceCounter Tracks the average number of errors that occurred per second during the execution of HTTP requests. Includes any parser, compilation, or run-time errors.
AspNetRequests WindowsPerformanceCounter Tracks the number of requests executed per second. This represents the current throughput of the application. Under constant load, this number should remain within a certain range, barring other server work (such as garbage collection, cache cleanup thread, external server tools, and so on).
AspNetRequestsQueued WindowsPerformanceCounter Tracks the number of requests waiting for service from the queue. When this number starts to increment linearly with increased client load, the Web server computer has reached the limit of concurrent requests that it can process.
AspNetRequestsRejected WindowsPerformanceCounter Tracks the total number of requests not executed because of insufficient server resources to process them. This counter represents the number of requests that return a 503 HTTP status code, indicating that the server is too busy
AspNetRequestWaitTime WindowsPerformanceCounter Tracks the number of milliseconds that the most recent request waited in the queue for processing
CpuTime WindowsPerformanceCounter Tracks overall CPU utilization on the monitored server
CpuTime30MinAverage AggregatedMetric Tracks 30-minute CPU utilization average across all instances within monitored Cloud Role
DiskFree WindowsPerformanceCounter Tracks amount of free space across all drives in megabytes
DiskIdleTime WindowsPerformanceCounter Tracks the percentage of time when disk. Sustained numbers below 20% indicate an over-saturated disk.
DiskReadSpeed WindowsPerformanceCounter Tracks average time, in seconds, it takes to read data from the disk
DiskWriteSpeed WindowsPerformanceCounter Tracks average time, in seconds, it takes to write data to the disk
InstanceList AzureCloudRoleInstanceDetails Metric is tracking detailed status of monitored cloud role instances
MemoryCommittedPct WindowsPerformanceCounter Tracks the amount of virtual memory in use. It is the ratio of Commited Bytes to the Commit Limit
MemoryFree WindowsPerformanceCounter Tracks free memory (in MBs) on the monitored server
Status ResourceStatus Tracks the overall readiness status of the monitored resource. Possible values are: Ready, Down, Stopped and Unknown
SystemEventLogs WindowsEventLogEntry Tracks entries from the Windows Event Log (System source)
SystemUptime WindowsPerformanceCounter

Pre-configured Alerts

Alert Name Expression Severity Description
High CPU CpuTime30MinAverage > 70 Warning Raises an alert when average CPU utilization for the last 30 minutes across all instances is over 70%
Instance Was Rebooted SystemUptime < 600 Warning
Low Memory MemoryFree < 100 Warning Raises an alert if the amount of available physical memory falls below 100MBs for sustained amount of time
Requests are Queueing Up AspNetRequestsQueued > 10 Warning Raises an alert when the number of queued requests exceeds 10, for 5 minutes sustained. Queued requests indicate that IIS or backened processes are not able to process the requests quickly enough
Resource Outage Status == "Down" Error Raises an alert when monitored server is reported as Down by Azure and if no metrics come through from diagnostic agents, for a sustained period of time
Role has NO Ready Instances Count(InstanceList, "State == \"ReadyRole\"") == 0 && Any(InstanceList, "State != \"ReadyRole\"") Error No 'ReadyRole' instances have been detected for sustained period of time
Role has some Non-Ready Instances Any(InstanceList, "State != \"ReadyRole\"") && Count(InstanceList, "State == \"ReadyRole\"") > 0 Warning Some 'ReadyRole' and some non-'ReadyRole' instances have been detected for sustained period of time
Slow Disk DiskReadSpeed > 0.025 || DiskWriteSpeed > 0.025 || DiskIdleTime < 20 Warning Disabled by default. Raises an alert if the average disk read or write speeds exceed 25 milliseconds or if the disk is idle for less than 20% of the time sustained for 5 minutes. For mission critical servers, disk speed metrics should not be exceeding 10 milliseconds.

Pre-configured Actions

Action Name Command Type Expression Severity Description
Daily reboot AzureCloudServiceInstanceReboot CheckTimeUtc.Hour == (int.Parse(InstanceName.Substring(InstanceName.LastIndexOf("_") + 1)) % 24) Information Disabled by default. Reboots cloud role instances once per day. Reboot happens when instance's index matches current clock hour (in UTC). For example: 1st instance is rebooted at UTC midnight, 2nd instance is rebooted at 1am UTC, etc. For deployments with 25+ instances, this action reboots every 24th instance. For example, for deployment with 100 instances, at UTC midnight 1st, 25th, 49th, 73rd and 96th instances will be rebooted; at UTC 1am, 2nd, 26th, 50th, 74th and 97th instances will be rebooted; etc. More information here: http://support.cloudmonix.com/support/solutions/articles/5000629071
Low Ram Reboot AzureCloudServiceInstanceReboot MemoryFree < 100 Warning Disabled by default. Reboot Cloud Role instance if available memory drops below 100MB for 5 minutes sustained. This action will not be executed more than once per hour due to Suspended period setting.

Pre-configured Metrics

Metric Name Metric Type Description
ApplicationEventLogs WindowsEventLogEntry Tracks entries from the Windows Event Log (Application source)
CpuTime WindowsPerformanceCounter Tracks overall CPU utilization on the monitored server
CpuTime30MinAverage AggregatedMetric Tracks 30-minute CPU utilization average across all instances within monitored Cloud Role
DiskFree WindowsPerformanceCounter Tracks amount of free space across all drives in megabytes
DiskIdleTime WindowsPerformanceCounter Tracks the percentage of time when disk. Sustained numbers below 20% indicate an over-saturated disk.
DiskReadSpeed WindowsPerformanceCounter Tracks average time, in seconds, it takes to read data from the disk
DiskWriteSpeed WindowsPerformanceCounter Tracks average time, in seconds, it takes to write data to the disk
InstanceList AzureCloudRoleInstanceDetails Metric is tracking detailed status of monitored cloud role instances
MemoryCommittedPct WindowsPerformanceCounter Tracks the amount of virtual memory in use. It is the ratio of Commited Bytes to the Commit Limit
MemoryFree WindowsPerformanceCounter Tracks free memory (in MBs) on the monitored server
Status ResourceStatus Tracks the overall readiness status of the monitored resource. Possible values are: Ready, Down, Stopped and Unknown
SystemEventLogs WindowsEventLogEntry Tracks entries from the Windows Event Log (System source)
SystemUptime WindowsPerformanceCounter

Pre-configured Alerts

Alert Name Expression Severity Description
High CPU CpuTime30MinAverage > 70 Warning Raises an alert when average CPU utilization for the last 30 minutes across all instances is over 70%
Instance Was Rebooted SystemUptime < 600 Warning
Low Memory MemoryFree < 100 Warning Raises an alert if the amount of available physical memory falls below 100MBs for sustained amount of time
Resource Outage Status == "Down" Error Raises an alert when monitored server is reported as Down by CloudMonix for sustained period of time
Role has NO Ready Instances Count(InstanceList, "State == \"ReadyRole\"") == 0 && Any(InstanceList, "State != \"ReadyRole\"") Error No 'ReadyRole' instances have been detected for sustained period of time
Role has some Non-Ready Instances Any(InstanceList, "State != \"ReadyRole\"") && Count(InstanceList, "State == \"ReadyRole\"") > 0 Warning Some 'ReadyRole' and some non-'ReadyRole' instances have been detected for sustained period of time
Slow Disk DiskReadSpeed > 0.025 || DiskWriteSpeed > 0.025 || DiskIdleTime < 20 Warning Disabled by default. Raises an alert if the average disk read or write speeds exceed 25 milliseconds or if the disk is idle for less than 20% of the time sustained for 5 minutes. For mission critical servers, disk speed metrics should not be exceeding 10 milliseconds.

Pre-configured Actions

Action Name Command Type Expression Severity Description
Daily reboot AzureCloudServiceInstanceReboot CheckTimeUtc.Hour == (int.Parse(InstanceName.Substring(InstanceName.LastIndexOf("_") + 1)) % 24) Information Disabled by default. Reboots cloud role instances once per day. Reboot happens when instance's index matches current clock hour (in UTC). For example: 1st instance is rebooted at UTC midnight, 2nd instance is rebooted at 1am UTC, etc. For deployments with 25+ instances, this action reboots every 24th instance. For example, for deployment with 100 instances, at UTC midnight 1st, 25th, 49th, 73rd and 96th instances will be rebooted; at UTC 1am, 2nd, 26th, 50th, 74th and 97th instances will be rebooted; etc. More information here: http://support.cloudmonix.com/support/solutions/articles/5000629071
Low Ram Reboot AzureCloudServiceInstanceReboot MemoryFree < 100 Warning Disabled by default. Reboot Cloud Role instance if available memory drops below 100MB for 5 minutes sustained. This action will not be executed more than once per hour due to Suspended period setting.