Possible metric types that can be tracked on a given resource.
Metric Type | Data Type | Description | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AggregatedMetric |
Object
|
Allows for aggregating existing metrics over a period of time. Useful for running sums or averages; also for extracting single data item from collection-based metrics | ||||||||||||||||
AzureCloudRoleInstanceDetails |
|
Tracks detailed information about Azure role instances as a list. | ||||||||||||||||
AzureVirtualMachineState |
String
|
Specifies the current status of a role instance as reported by Azure. Possible values for this metric are listed in this MSDN article in the RoleInstanceList section. | ||||||||||||||||
DerivedMetric |
Double
|
Allows for deriving new metrics from existing ones. Useful for combining existing metrics together or for multiplying metrics by a factor | ||||||||||||||||
InternalUrlResponseCode |
String
|
Tracks an http result from testing of internal IP address. Possible values are http verbs: OK, Unauthorized, etc. | ||||||||||||||||
InternalUrlResponseTime |
Double
|
Tracks response time of http request to internal IP address | ||||||||||||||||
InternalUrlState |
|
Tracks results of http request to internal IP address | ||||||||||||||||
LinkedMetric |
Object
|
Allows for tracking of metrics from other resources. Useful when it is important to evaluate metrics from different resources side-by-side | ||||||||||||||||
PowershellArrayMetric |
Object[]
|
|||||||||||||||||
ResourceInstanceCount |
Int32
|
Tracks current number of compute instances | ||||||||||||||||
ResourceStatus |
String
|
Tracks overall status of the resource. This is an important metric as it is used to drive Uptime reports. Possible values: Ready, Down, Unknown and in some cases Stopped | ||||||||||||||||
ScheduledTaskLastRunInMinutes |
Double
|
Tracks the number of minutes since a particular Windows scheduled task has last executed. | ||||||||||||||||
ScheduledTaskLastStatus |
Int32
|
Tracks the last status of a particular Windows scheduled task. Status of 0 indicates a successful run. | ||||||||||||||||
WindowsCustomEventLogEntry |
|
Tracks entries from the Windows Event Log. | ||||||||||||||||
WindowsEventLogEntry |
|
Tracks entries from the Windows Event Log. | ||||||||||||||||
WindowsPerformanceCounter |
Double
|
Tracks performance counters defined as individual metrics. Any performance counter might be tracked. | ||||||||||||||||
WindowsPerformanceCounterMultiInstance |
|
Tracks multi-instance performance counters. It returns an array of PerformanceCounterInstance objects for each counter instance. | ||||||||||||||||
WindowsProcessList |
|
Tracks a list of currently running processes. | ||||||||||||||||
WindowsServiceState |
String
|
Tracks the last known status of a particular Windows service. | ||||||||||||||||
WindowsUpdatesDrivers |
|
Tracks available Windows Driver Updates. Used for ensuring all important updates are installed regularly. | ||||||||||||||||
WindowsUpdatesSoftware |
|
Tracks available Windows Updates. Used for ensuring all important updates are installed regularly. |
Possible commands that can be executed on a given resource. Ultimate subscription is required.
Command Type | Description | |
---|---|---|
AzureCloudServiceInstanceReboot | Reboots specified Azure Cloud Service instance | |
AzureCloudServiceInstanceReimage | Reimages specified Azure Cloud Service instance | |
CustomPowershellScript | Runs custom Powershell script on specified resource | |
PowershellRestartService | Restarts specified Windows Service using Powershell script | |
WebRequest | Runs custom WebRequest to specified URL |
CloudMonix provided default monitoring templates.
Metric Name | Metric Type | Description |
---|---|---|
ApplicationEventLogs | WindowsEventLogEntry | Tracks entries from the Windows Event Log (Application source) |
AspNetApplicationRestarts | WindowsPerformanceCounter | Tracks the number of times that an application has been restarted during the Web server's lifetime. Application restarts are incremented each time an Application_OnEnd event is raised. An application restart can occur because of changes to the Web.config file, changes to assemblies stored in the application's Bin directory, or when an application must be recompiled due to numerous changes in ASP.NET Web pages. Unexpected increases in this counter can mean that problems are causing Web application to recycle. |
AspNetBytesOut | WindowsPerformanceCounter | Tracks total size in bytes of responses sent to a client. Does not include HTTP response headers. |
AspNetErrors | WindowsPerformanceCounter | Tracks the average number of errors that occurred per second during the execution of HTTP requests. Includes any parser, compilation, or run-time errors. |
AspNetRequests | WindowsPerformanceCounter | Tracks the number of requests executed per second. This represents the current throughput of the application. Under constant load, this number should remain within a certain range, barring other server work (such as garbage collection, cache cleanup thread, external server tools, and so on). |
AspNetRequestsQueued | WindowsPerformanceCounter | Tracks the number of requests waiting for service from the queue. When this number starts to increment linearly with increased client load, the Web server computer has reached the limit of concurrent requests that it can process. |
AspNetRequestsRejected | WindowsPerformanceCounter | Tracks the total number of requests not executed because of insufficient server resources to process them. This counter represents the number of requests that return a 503 HTTP status code, indicating that the server is too busy |
AspNetRequestWaitTime | WindowsPerformanceCounter | Tracks the number of milliseconds that the most recent request waited in the queue for processing |
CpuTime | WindowsPerformanceCounter | Tracks overall CPU utilization on the monitored server |
CpuTime30MinAverage | AggregatedMetric | Tracks 30-minute CPU utilization average across all instances within monitored Cloud Role |
DiskFree | WindowsPerformanceCounter | Tracks amount of free space across all drives in megabytes |
DiskIdleTime | WindowsPerformanceCounter | Tracks the percentage of time when disk. Sustained numbers below 20% indicate an over-saturated disk. |
DiskReadSpeed | WindowsPerformanceCounter | Tracks average time, in seconds, it takes to read data from the disk |
DiskWriteSpeed | WindowsPerformanceCounter | Tracks average time, in seconds, it takes to write data to the disk |
InstanceList | AzureCloudRoleInstanceDetails | Metric is tracking detailed status of monitored cloud role instances |
MemoryCommittedPct | WindowsPerformanceCounter | Tracks the amount of virtual memory in use. It is the ratio of Commited Bytes to the Commit Limit |
MemoryFree | WindowsPerformanceCounter | Tracks free memory (in MBs) on the monitored server |
Status | ResourceStatus | Tracks the overall readiness status of the monitored resource. Possible values are: Ready, Down, Stopped and Unknown |
SystemEventLogs | WindowsEventLogEntry | Tracks entries from the Windows Event Log (System source) |
SystemUptime | WindowsPerformanceCounter |
Alert Name | Expression | Severity | Description |
---|---|---|---|
High CPU |
CpuTime30MinAverage > 70
|
Warning | Raises an alert when average CPU utilization for the last 30 minutes across all instances is over 70% |
Instance Was Rebooted |
SystemUptime < 600
|
Warning | |
Low Memory |
MemoryFree < 100
|
Warning | Raises an alert if the amount of available physical memory falls below 100MBs for sustained amount of time |
Requests are Queueing Up |
AspNetRequestsQueued > 10
|
Warning | Raises an alert when the number of queued requests exceeds 10, for 5 minutes sustained. Queued requests indicate that IIS or backened processes are not able to process the requests quickly enough |
Resource Outage |
Status == "Down"
|
Error | Raises an alert when monitored server is reported as Down by Azure and if no metrics come through from diagnostic agents, for a sustained period of time |
Role has NO Ready Instances |
Count(InstanceList, "State == \"ReadyRole\"") == 0 && Any(InstanceList, "State != \"ReadyRole\"")
|
Error | No 'ReadyRole' instances have been detected for sustained period of time |
Role has some Non-Ready Instances |
Any(InstanceList, "State != \"ReadyRole\"") && Count(InstanceList, "State == \"ReadyRole\"") > 0
|
Warning | Some 'ReadyRole' and some non-'ReadyRole' instances have been detected for sustained period of time |
Slow Disk |
DiskReadSpeed > 0.025 || DiskWriteSpeed > 0.025 || DiskIdleTime < 20
|
Warning | Disabled by default. Raises an alert if the average disk read or write speeds exceed 25 milliseconds or if the disk is idle for less than 20% of the time sustained for 5 minutes. For mission critical servers, disk speed metrics should not be exceeding 10 milliseconds. |
Action Name | Command Type | Expression | Severity | Description |
---|---|---|---|---|
Daily reboot | AzureCloudServiceInstanceReboot |
CheckTimeUtc.Hour == (int.Parse(InstanceName.Substring(InstanceName.LastIndexOf("_") + 1)) % 24)
|
Information | Disabled by default. Reboots cloud role instances once per day. Reboot happens when instance's index matches current clock hour (in UTC). For example: 1st instance is rebooted at UTC midnight, 2nd instance is rebooted at 1am UTC, etc. For deployments with 25+ instances, this action reboots every 24th instance. For example, for deployment with 100 instances, at UTC midnight 1st, 25th, 49th, 73rd and 96th instances will be rebooted; at UTC 1am, 2nd, 26th, 50th, 74th and 97th instances will be rebooted; etc. More information here: http://support.cloudmonix.com/support/solutions/articles/5000629071 |
Low Ram Reboot | AzureCloudServiceInstanceReboot |
MemoryFree < 100
|
Warning | Disabled by default. Reboot Cloud Role instance if available memory drops below 100MB for 5 minutes sustained. This action will not be executed more than once per hour due to Suspended period setting. |
Metric Name | Metric Type | Description |
---|---|---|
ApplicationEventLogs | WindowsEventLogEntry | Tracks entries from the Windows Event Log (Application source) |
CpuTime | WindowsPerformanceCounter | Tracks overall CPU utilization on the monitored server |
CpuTime30MinAverage | AggregatedMetric | Tracks 30-minute CPU utilization average across all instances within monitored Cloud Role |
DiskFree | WindowsPerformanceCounter | Tracks amount of free space across all drives in megabytes |
DiskIdleTime | WindowsPerformanceCounter | Tracks the percentage of time when disk. Sustained numbers below 20% indicate an over-saturated disk. |
DiskReadSpeed | WindowsPerformanceCounter | Tracks average time, in seconds, it takes to read data from the disk |
DiskWriteSpeed | WindowsPerformanceCounter | Tracks average time, in seconds, it takes to write data to the disk |
InstanceList | AzureCloudRoleInstanceDetails | Metric is tracking detailed status of monitored cloud role instances |
MemoryCommittedPct | WindowsPerformanceCounter | Tracks the amount of virtual memory in use. It is the ratio of Commited Bytes to the Commit Limit |
MemoryFree | WindowsPerformanceCounter | Tracks free memory (in MBs) on the monitored server |
Status | ResourceStatus | Tracks the overall readiness status of the monitored resource. Possible values are: Ready, Down, Stopped and Unknown |
SystemEventLogs | WindowsEventLogEntry | Tracks entries from the Windows Event Log (System source) |
SystemUptime | WindowsPerformanceCounter |
Alert Name | Expression | Severity | Description |
---|---|---|---|
High CPU |
CpuTime30MinAverage > 70
|
Warning | Raises an alert when average CPU utilization for the last 30 minutes across all instances is over 70% |
Instance Was Rebooted |
SystemUptime < 600
|
Warning | |
Low Memory |
MemoryFree < 100
|
Warning | Raises an alert if the amount of available physical memory falls below 100MBs for sustained amount of time |
Resource Outage |
Status == "Down"
|
Error | Raises an alert when monitored server is reported as Down by CloudMonix for sustained period of time |
Role has NO Ready Instances |
Count(InstanceList, "State == \"ReadyRole\"") == 0 && Any(InstanceList, "State != \"ReadyRole\"")
|
Error | No 'ReadyRole' instances have been detected for sustained period of time |
Role has some Non-Ready Instances |
Any(InstanceList, "State != \"ReadyRole\"") && Count(InstanceList, "State == \"ReadyRole\"") > 0
|
Warning | Some 'ReadyRole' and some non-'ReadyRole' instances have been detected for sustained period of time |
Slow Disk |
DiskReadSpeed > 0.025 || DiskWriteSpeed > 0.025 || DiskIdleTime < 20
|
Warning | Disabled by default. Raises an alert if the average disk read or write speeds exceed 25 milliseconds or if the disk is idle for less than 20% of the time sustained for 5 minutes. For mission critical servers, disk speed metrics should not be exceeding 10 milliseconds. |
Action Name | Command Type | Expression | Severity | Description |
---|---|---|---|---|
Daily reboot | AzureCloudServiceInstanceReboot |
CheckTimeUtc.Hour == (int.Parse(InstanceName.Substring(InstanceName.LastIndexOf("_") + 1)) % 24)
|
Information | Disabled by default. Reboots cloud role instances once per day. Reboot happens when instance's index matches current clock hour (in UTC). For example: 1st instance is rebooted at UTC midnight, 2nd instance is rebooted at 1am UTC, etc. For deployments with 25+ instances, this action reboots every 24th instance. For example, for deployment with 100 instances, at UTC midnight 1st, 25th, 49th, 73rd and 96th instances will be rebooted; at UTC 1am, 2nd, 26th, 50th, 74th and 97th instances will be rebooted; etc. More information here: http://support.cloudmonix.com/support/solutions/articles/5000629071 |
Low Ram Reboot | AzureCloudServiceInstanceReboot |
MemoryFree < 100
|
Warning | Disabled by default. Reboot Cloud Role instance if available memory drops below 100MB for 5 minutes sustained. This action will not be executed more than once per hour due to Suspended period setting. |