Possible metric types that can be tracked on a given resource.
Metric Type | Data Type | Description | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AggregatedMetric |
Object
|
Allows for aggregating existing metrics over a period of time. Useful for running sums or averages; also for extracting single data item from collection-based metrics | ||||||||||||||||||||
AzureAdvisorHealthMetric |
|
Tracks Azure health using latest Azure Health API - more info @ https://docs.microsoft.com/en-us/rest/api/resourcehealth/availabilitystatuses/getbyresource/ | ||||||||||||||||||||
AzureAdvisorRecommendationMetric |
|
Tracks Azure recommendations using latest Azure Advisor API - more info @ https://docs.microsoft.com/en-us/rest/api/advisor/ | ||||||||||||||||||||
AzureMonitorMetric |
Double
|
Tracks Azure metrics using latest Azure Monitor API - more info @ https://docs.microsoft.com/en-us/rest/api/monitor/ | ||||||||||||||||||||
AzureServiceFabricStatus |
String
|
Indicates the state of the Azure Service Fabric | ||||||||||||||||||||
AzureVirtualMachineOperations |
|
|||||||||||||||||||||
AzureVmssInstanceDetails |
|
Tracks detailed information about Azure VM instances as a list. | ||||||||||||||||||||
DerivedMetric |
Double
|
Allows for deriving new metrics from existing ones. Useful for combining existing metrics together or for multiplying metrics by a factor | ||||||||||||||||||||
InternalUrlResponseCode |
String
|
Tracks an http result from testing of internal IP address. Possible values are http verbs: OK, Unauthorized, etc. | ||||||||||||||||||||
InternalUrlResponseTime |
Double
|
Tracks response time of http request to internal IP address | ||||||||||||||||||||
InternalUrlState |
|
Tracks results of http request to internal IP address | ||||||||||||||||||||
LinkedMetric |
Object
|
Allows for tracking of metrics from other resources. Useful when it is important to evaluate metrics from different resources side-by-side | ||||||||||||||||||||
ResourceInstanceCount |
Int32
|
Tracks current number of compute instances | ||||||||||||||||||||
ResourceStatus |
String
|
Tracks overall status of the resource. This is an important metric as it is used to drive Uptime reports. Possible values: Ready, Down, Unknown and in some cases Stopped | ||||||||||||||||||||
WindowsCustomEventLogEntry |
|
Tracks entries from the Windows Event Log. | ||||||||||||||||||||
WindowsEventLogEntry |
|
Tracks entries from the Windows Event Log. | ||||||||||||||||||||
WindowsPerformanceCounter |
Double
|
Tracks performance counters defined as individual metrics. Any performance counter might be tracked. | ||||||||||||||||||||
WindowsPerformanceCounterMultiInstance |
|
Tracks multi-instance performance counters. It returns an array of PerformanceCounterInstance objects for each counter instance. |
Possible commands that can be executed on a given resource. Ultimate subscription is required.
Command Type | Description | |
---|---|---|
AzureVmScaleSetInstanceReboot | Reboots specified VM ScaleSet instance | |
AzureVmScaleSetInstanceReimage | Reimages specified VM ScaleSet instance | |
AzureVmScaleSetStart | Starts specified VM ScaleSet instance | |
AzureVmScaleSetStopDeallocate | Deallocates specified VM ScaleSet instance | |
WebRequest | Runs custom WebRequest to specified URL |
CloudMonix provided default monitoring templates.
Metric Name | Metric Type | Description |
---|---|---|
ApplicationsEventLogs | WindowsEventLogEntry | Tracks entries from the Windows Event Log (Application source) |
AvgDiskReadPerSec | WindowsPerformanceCounter | Tracks the average time it takes to make a read across all drives |
AvgDiskReadQueueLength | WindowsPerformanceCounter | Tracks the average Read queue length across all drives |
AvgDiskWritePerSec | WindowsPerformanceCounter | Tracks the average time it takes to make a write across all drives |
AvgDiskWriteQueueLength | WindowsPerformanceCounter | Tracks the average Write queue length across all drives |
CpuTime | WindowsPerformanceCounter | Tracks overall CPU utilization on the monitored server |
CpuTime5MinAverage | AggregatedMetric | Tracks 30-minute CPU utilization average across all instances within monitored VMSS |
DiskFreeSpace | WindowsPerformanceCounterMultiInstance | Tracks free space in Megabytes on each disk |
DiskFreeSpacePct | WindowsPerformanceCounterMultiInstance | Tracks amount of free space in % on each disk |
DiskFreeSpaceTotal | WindowsPerformanceCounter | Tracks total amount of free space across all drives |
DiskIdleTime | WindowsPerformanceCounter | Tracks the percentage of time when disk. Sustained numbers below 20% indicate an over-saturated disk. |
DiskReadsBytesPerSec | WindowsPerformanceCounter | Tracks the Read throughput across all drives |
DiskReadSpeed | WindowsPerformanceCounter | Tracks average time, in seconds, it takes to read data from the disk |
DiskWritesBytesPerSec | WindowsPerformanceCounter | Tracks the Writes throughput across all drives |
DiskWriteSpeed | WindowsPerformanceCounter | Tracks average time, in seconds, it takes to write data to the disk |
FabricProcessDetected | AggregatedMetric | |
FabricProcessMemory | WindowsPerformanceCounter | |
InboundNetworkErrors | WindowsPerformanceCounterMultiInstance | Tracks the number of inbound packets that could not be transmitted because of errors |
Instances | AzureVmssInstanceDetails | |
MemoryCommittedPct | WindowsPerformanceCounter | Tracks the amount of virtual memory in use. It is the ratio of Commited Bytes to the Commit Limit |
MemoryFree | WindowsPerformanceCounter | Tracks free memory (in MBs) on the monitored instance |
OutboundNetworkErrors | WindowsPerformanceCounterMultiInstance | Tracks the number of outbound packets that could not be transmitted because of errors |
PagingFileUsage | WindowsPerformanceCounter | Overall usage of all paging files in % |
ServiceFabricAvgInvocationTime | WindowsPerformanceCounterMultiInstance | Tracks the time taken to execute service methods in milliseconds |
ServiceFabricAvgRequestTime | WindowsPerformanceCounterMultiInstance | Tracks time taken (in milliseconds) by the service to process requests |
ServiceFabricExceptions | WindowsPerformanceCounterMultiInstance | Tracks the number of times that service methods throw an exception per second |
ServiceFabricInvocations | WindowsPerformanceCounterMultiInstance | Tracks the number of times that service methods are invoked per second |
ServiceFabricOutstandingRequests | WindowsPerformanceCounterMultiInstance | Tracks the number of requests being processed by services |
ServiceFabricStatus | AzureServiceFabricStatus | Tracks the Service Fabric status of the monitored Scale Set |
Status | ResourceStatus | Tracks the overall running status of the monitored instances within Scale Set |
SystemEventLogs | WindowsEventLogEntry | Tracks entries from the Windows Event Log (System source) |
SystemUptime | WindowsPerformanceCounter |
Alert Name | Expression | Severity | Description |
---|---|---|---|
Fabric Process Not Running |
FabricProcessDetected == 0
|
Warning | |
High CPU |
CpuTime > 70
|
Warning | Raises an alert when CPU utilization for a specific instance is over 70% for the last 5 minutes sustained |
Instance Was Rebooted |
SystemUptime < 600
|
Warning | |
Low Disk Space |
(Any(DiskFreeSpace, "Value < 1024 && Contains(Instance,\":\")") || Any(DiskFreeSpacePct, "Value < 20 && Contains(Instance,\":\")") ) && DiskFreeSpaceTotal > 0
|
Warning | Raises an alert when any of the disks has less than 1GB or 20% of free space left |
Low Memory |
MemoryFree < 100
|
Warning | Raises an alert if the amount of available physical memory on a specific instance, falls below 100MBs for the last 2 monitoring cycles sustained |
Resource Outage |
Status == "Down"
|
Error | Raises an alert when monitored server is reported as not-Ready by Azure of if no metrics come through from diagnostic agents, for a sustained period of time |
Service Fabric Is Not Ready |
ServiceFabricStatus != "Ready"
|
Warning | Raises an alert when monitored Service Fabric resource is reported as not-Ready by Azure for a sustained period of time |
Action Name | Command Type | Expression | Severity | Description |
---|---|---|---|---|
Daily reboot | AzureVmScaleSetInstanceReboot |
CheckTimeUtc.Hour == (InstanceIndex % 24)
|
Information | Disabled by default. Reboots VMSS instances once per day. Reboot happens when instance's index matches current clock hour (in UTC). For example: 1st instance is rebooted at UTC midnight, 2nd instance is rebooted at 1am UTC, etc. For deployments with 25+ instances, this action reboots every 24th instance. For example, for deployment with 100 instances, at UTC midnight the 1st, 25th, 49th, 73rd and 96th instances will be rebooted; at UTC 1am, 2nd, 26th, 50th, 74th and 97th instances will be rebooted; etc. More information here: http://support.cloudmonix.com/support/solutions/articles/5000629071 |
Low Ram Reboot | AzureVmScaleSetInstanceReboot |
MemoryFree < 100
|
Warning | Disabled by default. Reboot VMSS instance if available memory drops below 100MB for 5 minutes sustained. This action will not be executed more than once per hour due to Suspended period setting. |
Metric Name | Metric Type | Description |
---|---|---|
ApplicationsEventLogs | WindowsEventLogEntry | Tracks entries from the Windows Event Log (Application source) |
CpuTime | WindowsPerformanceCounter | Tracks overall CPU utilization on the monitored server |
CpuTime30MinAverage | AggregatedMetric | Tracks 30-minute CPU utilization average across all instances within monitored VMSS |
DiskFreeSpace | WindowsPerformanceCounterMultiInstance | Tracks free space in Megabytes on each disk |
DiskFreeSpaceTotal | WindowsPerformanceCounter | Tracks total amount of free space across all drives |
DiskIdleTime | WindowsPerformanceCounter | Tracks the percentage of time when disk. Sustained numbers below 20% indicate an over-saturated disk. |
DiskReadSpeed | WindowsPerformanceCounter | Tracks average time, in seconds, it takes to read data from the disk |
DiskWriteSpeed | WindowsPerformanceCounter | Tracks average time, in seconds, it takes to write data to the disk |
Instances | AzureVmssInstanceDetails | |
MemoryCommittedPct | WindowsPerformanceCounter | Tracks the amount of virtual memory in use. It is the ratio of Commited Bytes to the Commit Limit |
MemoryFree | WindowsPerformanceCounter | Tracks free memory (in MBs) on the monitored instance |
RecommendedActions | AzureAdvisorRecommendationMetric | Tracks recommended actions for specified resource. |
Status | ResourceStatus | Tracks the overall running status of the monitored instances within Scale Set |
SystemEventLogs | WindowsEventLogEntry | Tracks entries from the Windows Event Log (System source) |
Alert Name | Expression | Severity | Description |
---|---|---|---|
High CPU |
CpuTime > 70
|
Warning | Raises an alert when CPU utilization for a specific instance is over 70% for the last 5 minutes sustained |
Low Disk Space |
Any(DiskFreeSpace, "Value < 1024 && Contains(Instance,\":\")") && DiskFreeSpaceTotal > 0
|
Warning | Raises an alert when any of the disks has less than 1GB of free space left |
Low Memory |
MemoryFree < 100
|
Warning | Raises an alert if the amount of available physical memory on a specific instance, falls below 100MBs for the last 2 monitoring cycles sustained |
Resource Outage |
Status == "Down"
|
Error | Raises an alert when monitored server is reported as not-Ready by Azure of if no metrics come through from diagnostic agents, for a sustained period of time |
Action Name | Command Type | Expression | Severity | Description |
---|---|---|---|---|
Daily reboot | AzureVmScaleSetInstanceReboot |
CheckTimeUtc.Hour == (InstanceIndex % 24)
|
Information | Disabled by default. Reboots VMSS instances once per day. Reboot happens when instance's index matches current clock hour (in UTC). For example: 1st instance is rebooted at UTC midnight, 2nd instance is rebooted at 1am UTC, etc. For deployments with 25+ instances, this action reboots every 24th instance. For example, for deployment with 100 instances, at UTC midnight the 1st, 25th, 49th, 73rd and 96th instances will be rebooted; at UTC 1am, 2nd, 26th, 50th, 74th and 97th instances will be rebooted; etc. More information here: http://support.cloudmonix.com/support/solutions/articles/5000629071 |
Low Ram Reboot | AzureVmScaleSetInstanceReboot |
MemoryFree < 100
|
Warning | Disabled by default. Reboot VMSS instance if available memory drops below 100MB for 5 minutes sustained. This action will not be executed more than once per hour due to Suspended period setting. |
Metric Name | Metric Type | Description |
---|---|---|
ApplicationsEventLogs | WindowsEventLogEntry | Tracks entries from the Windows Event Log (Application source) |
AspNetApplicationRestarts | WindowsPerformanceCounter | Tracks the number of times that an application has been restarted during the Web server's lifetime. Application restarts are incremented each time an Application_OnEnd event is raised. An application restart can occur because of changes to the Web.config file, changes to assemblies stored in the application's Bin directory, or when an application must be recompiled due to numerous changes in ASP.NET Web pages. Unexpected increases in this counter can mean that problems are causing Web application to recycle. |
AspNetBytesOut | WindowsPerformanceCounter | Tracks total size in bytes of responses sent to a client. Does not include HTTP response headers. |
AspNetErrors | WindowsPerformanceCounter | Tracks the average number of errors that occurred per second during the execution of HTTP requests. Includes any parser, compilation, or run-time errors. |
AspNetRequests | WindowsPerformanceCounter | Tracks the number of requests executed per second. This represents the current throughput of the application. Under constant load, this number should remain within a certain range, barring other server work (such as garbage collection, cache cleanup thread, external server tools, and so on). |
AspNetRequestsQueued | WindowsPerformanceCounter | Tracks the number of requests waiting for service from the queue. When this number starts to increment linearly with increased client load, the Web server computer has reached the limit of concurrent requests that it can process. |
AspNetRequestsRejected | WindowsPerformanceCounter | Tracks the total number of requests not executed because of insufficient server resources to process them. This counter represents the number of requests that return a 503 HTTP status code, indicating that the server is too busy |
AspNetRequestsWaitTime | WindowsPerformanceCounter | Tracks the number of milliseconds that the most recent request waited in the queue for processing |
CpuTime | WindowsPerformanceCounter | Tracks overall CPU utilization on the monitored server |
CpuTime30MinAverage | AggregatedMetric | Tracks 30-minute CPU utilization average across all instances within monitored VMSS |
DiskFreeSpace | WindowsPerformanceCounterMultiInstance | Tracks free space in Megabytes on each disk |
DiskFreeSpaceTotal | WindowsPerformanceCounter | Tracks total amount of free space across all drives |
DiskIdleTime | WindowsPerformanceCounter | Tracks the percentage of time when disk. Sustained numbers below 20% indicate an over-saturated disk. |
DiskReadSpeed | WindowsPerformanceCounter | Tracks average time, in seconds, it takes to read data from the disk |
DiskWriteSpeed | WindowsPerformanceCounter | Tracks average time, in seconds, it takes to write data to the disk |
Instances | AzureVmssInstanceDetails | |
MemoryCommittedPct | WindowsPerformanceCounter | Tracks the amount of virtual memory in use. It is the ratio of Commited Bytes to the Commit Limit |
MemoryFree | WindowsPerformanceCounter | Tracks free memory (in MBs) on the monitored instance |
RecommendedActions | AzureAdvisorRecommendationMetric | Tracks recommended actions for specified resource. |
Status | ResourceStatus | Tracks the overall running status of the monitored instances within Scale Set |
SystemEventLogs | WindowsEventLogEntry | Tracks entries from the Windows Event Log (System source) |
Alert Name | Expression | Severity | Description |
---|---|---|---|
High CPU |
CpuTime > 70
|
Warning | Raises an alert when CPU utilization for a specific instance is over 70% for the last 5 minutes sustained |
Low Disk Space |
Any(DiskFreeSpace, "Value < 1024 && Contains(Instance,\":\")") && DiskFreeSpaceTotal > 0
|
Warning | Raises an alert when any of the disks has less than 1GB of free space left |
Low Memory |
MemoryFree < 100
|
Warning | Raises an alert if the amount of available physical memory on a specific instance, falls below 100MBs for the last 2 monitoring cycles sustained |
Requests are Queueing Up |
AspNetRequestsQueued > 10
|
Warning | Raises an alert when the number of queued requests exceeds 10, for 5 minutes sustained. Queued requests indicate that IIS or backened processes are not able to process the requests quickly enough |
Resource Outage |
Status == "Down"
|
Error | Raises an alert when monitored server is reported as not-Ready by Azure of if no metrics come through from diagnostic agents, for a sustained period of time |
Action Name | Command Type | Expression | Severity | Description |
---|---|---|---|---|
Daily reboot | AzureVmScaleSetInstanceReboot |
CheckTimeUtc.Hour == (InstanceIndex % 24)
|
Information | Disabled by default. Reboots VMSS instances once per day. Reboot happens when instance's index matches current clock hour (in UTC). For example: 1st instance is rebooted at UTC midnight, 2nd instance is rebooted at 1am UTC, etc. For deployments with 25+ instances, this action reboots every 24th instance. For example, for deployment with 100 instances, at UTC midnight the 1st, 25th, 49th, 73rd and 96th instances will be rebooted; at UTC 1am, 2nd, 26th, 50th, 74th and 97th instances will be rebooted; etc. More information here: http://support.cloudmonix.com/support/solutions/articles/5000629071 |
Low Ram Reboot | AzureVmScaleSetInstanceReboot |
MemoryFree < 100
|
Warning | Disabled by default. Reboot VMSS instance if available memory drops below 100MB for 5 minutes sustained. This action will not be executed more than once per hour due to Suspended period setting. |