Pages

Wednesday, July 28, 2010

Monitoring your UCS faults with syslog

When you deploy your UCS environment once of the first things you will want to do is integrate it into your monitoring system. One way is through integrating with syslog. Here are some notes and tips.

When problems occur in your UCS environment they will appear as Faults inside the Administration area. Click on the screen shot below to see some.


One thing to know is this page only shows you the current alerts, once they clear they disappear.

Here is an example alert exported from my system.
Severity | Code | ID | Affected object | Cause | Last Transition | Description
major | F0207 | 225741 | sys/chassis-1/blade-4/adaptor-1/host-fc-2/fault-F0207 | link-down | 2010-07-28T12:18:59 | Adapter host interface 1/4/1/2 link state: down

One of the key bits of information you are looking for is the fault code, in the example above its F0207. With that code you can look it up in the Cisco UCS Fault Reference.

If you search the reference for that code here is the details presented.

fltAdaptorHostIfLink-down

Fault Code:F0207

Message

Adapter [transport] host interface [chassisId]/[slotId]/[id]/[id] link state: [linkState]

Explanation

This fault typically occurs as a result of one of the following issues:

The fabric interconnect is in End-Host mode, and all uplink ports failed.

The server port to which the adapter is pinned failed.

A transient error that caused the link to fail.

Recommended Action

If you see this fault, take the following actions:


Step 1 If an uplink port is disabled, enable the port.

Step 2 If the server port to which the adapter is pinned is disabled, enable that port.

Step 3 Reacknowledge the server with the adapter that has the failed link.

Step 4 If the above actions did not resolve the issue, execute the show tech-support command and contact Cisco technical support.

Fault Details

Severity: major  
Cause: link-down  
mibFaultCode: 207  
mibFaultName: fltAdaptorHostIfLinkDown  
moClass: adaptor:HostIf  
Type: network
All codes are listed and the fault reference may be a valuable reference for you the first time you come across and error.

For here you will typically you will want send these alerts to your management platform for automated monitoring. A great way to do this is via syslog. Cisco have a good guide "Set up Syslog for Cisco UCS" you can follow for doing the configuration. Here is a shot of the page where you set it up.



Now once this is configure the alerts will appear in your syslog server.

Here is what our example above looks like as a syslog entry.
Jul 26 01:05:01 192.168.128.16 : 2010 Jul 26 01:08:54 EST: %LOCAL0-3-SYSTEM_MSG: [F0207][major][link-down][sys/chassis-1/blade-4/adaptor-1/host-fc-1] Adapter  host interface 1/4/1/1 link state: down - svc_sam_dme[3250]
Jul 26 01:05:14 192.168.128.16 : 2010 Jul 26 01:09:07 EST: %LOCAL0-3-SYSTEM_MSG: [F0207][cleared][link-down][sys/chassis-1/blade-4/adaptor-1/host-fc-1] Adapter host interface 1/4/1/1 link state: down - svc_sam_dme[3250]
You can see that fault ID F0207 which you an use as a reference. But also notice I have copied in two entries. One is the first event where the fault occurred and the severity level "major" and then there is another entry which states "cleared". You will want to filter out the cleared ones or if you have a smart system get it to match the two so you know which events have been resolved.

Hopefully the examples assist some people.

Rodos

2 comments:

  1. Anonymous12:59 am

    SO what are Console and Monitor? I have yet to find a good definition in any of the Cisco materials.

    adam

    ReplyDelete
  2. Console may be the serial console port.

    ReplyDelete