Archive

Archive for the ‘Troubleshooting’ Category

System Center Operations Manager 2007 R2 CU3 & CU4 may cause non-SCOM 2007 services to restart

April 1, 2011 Leave a comment

There has been a lot of discussion on the forums about this – Microsoft as released a new KB article which explains the issue in detail:

http://support.microsoft.com/kb/2526113

Advertisements

Troubleshooting Greyed Agents

November 4, 2010 Leave a comment

A great place to start:
http://support.microsoft.com/kb/2288515

If it is the agent on a Domain Controller that is greyed out then you might want to check these out first:

http://support.microsoft.com/kb/946428

http://thoughtsonopsmgr.blogspot.com/2009/09/hslockdown-explained.html

Categories: Troubleshooting

Event 31552 – Exception ‘SqlException’: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.

March 29, 2010 Leave a comment

Kevin Holman posted some great information on this on the technet forums:

Kevin Holman
This is caused by your backup process – or your own in-house SQL maintenance taking all the I/O on the warehouse, and impacting the performance of the built in housekeeping of SCOM.

This is all on the warehouse database.

You should first check your backup – and set this to only backup once per week, instead of nightly, if that is the case. You should also see where it is backing up to… and make sure it isnt backing up directly to tape (which can often take too long) or backing up to the same disk spindles that the DB resides on (this will cause more I/O contention.

Next – are you doing any in house maintenance? Such as running anything like a DBCC checkDB, Update statistics, Reindex, etc? If so – you should really not do this against the warehouse. This is all built into the product – and often SQL DBA’s just apply their “standard package” of maintenance which actually causes problems.

Look in your SQL job history – find which jobs are taking the longest – and find out why – or if you really should be running these at all.

Next – consider tuning all the stuff you are throwing in the warehouse. Look at your most common event, perf, and state changes… and fix/tune/disable the unecessary junk to reduce the load and storage impact…. thereby reducing maintenance operations.

Lastly – improve the I/O by making sure you have enough spindles in a RAID 10 database, with distinct spindles for DB and logs, to provide the necessary I/O for maintenance operations on this large database.

Cumulative Update 1 – Process flow

March 29, 2010 2 comments

I hope Kevin doesn’t mind me posting this … but this is excellent information from the forums that I think is useful to a wider audience.

Kevin Holman – From the patchlist state view – you can run a task – “Flush Healthservice cache and state” – for any *agents* that are not showing up. It possible they have something stuck in their queue… are sick, etc…

The workflow responsible for populating this is a discovery:

“Discover the list of patches installed on Agents”

It targets Health Service, and runs a script (DiscoverAgentPatches.vbs) every 6 hours. Normally – when a patch is applied – and we bounce the Healthservice – this patch data will be available almost immediately, because this discovery runs on startup. However, if there is a timing issue on that initial run, you should see the patch data on the next run of the script – which will be in 6 hours. If you dont see it after 6 hours, you likely have some issue.

It runs the following script (at the end of post) This script is very simple – it connects to the WIndows Installer scripting object – and requests the patchlist for each patch attached to the MOM agent component ID.

http://msdn.microsoft.com/en-us/library/aa369432(VS.85).aspx

If your isnt working, it might just need the Healthservice bounced again, it might need the HS cache cleared on the agent…. there agent’s OS might have a more serious issue like exhausted resource, broken/disabled Windows installer service, needs a reboot, etc….

You can test this by running it manually.

Find the script by name in your HealthServiceState subdirectories – copy it out to a local directory – and run it manually from a command line:

C:\bin>cscript DiscoverAgentPatches.vbs {00000000-0000-0000-0000-000000000000} {00000000-0000-0000-0000-000000000000} agentname.domain.com

Running this manually should return a blob of XML to the command line which has CU1 in it.

The outcome of the script give you ideas on what to do.

——————————————————

DiscoverAgentPatches.vbs
$MPElement$ $Target/Id$ $Config/ComputerName$

‘ A script that enumerates Software Updates for MOM Agent (not MOM 2005 but higher versions)
‘ For use with Windows Scripting Host, CScript.exe or WScript.exe
‘ Copyright (c) Microsoft Corporation. All rights reserved.
‘ does NOT work agentlessly

Option Explicit

Const msiInstallStateNotUsed = -7
Const msiInstallStateBadConfig = -6
Const msiInstallStateIncomplete = -5
Const msiInstallStateSourceAbsent = -4
Const msiInstallStateInvalidArg = -2
Const msiInstallStateUnknown = -1
Const msiInstallStateBroken = 0
Const msiInstallStateAdvertised = 1
Const msiInstallStateRemoved = 1
Const msiInstallStateAbsent = 2
Const msiInstallStateLocal = 3
Const msiInstallStateSource = 4
Const msiInstallStateDefault = 5

Const momAgentComponentID = “{60ADDA03-1710-4954-B299-A9F42DD889A6}”

Dim oArgs
Set oArgs = WScript.Arguments
if oArgs.Count <> 3 Then
Wscript.Quit -1
End If

Dim SourceId, ManagedEntityId, TargetComputerID

SourceId = oArgs(0)
ManagedEntityId = oArgs(1)
TargetComputerID = oArgs(2)

Dim oAPI, oAgentPatchDiscoveryData
Set oAPI = CreateObject(“MOM.ScriptAPI”)
set oAgentPatchDiscoveryData = oAPI.CreateDiscoveryData(0, SourceId, ManagedEntityId)

Call DoPatchDiscovery(oAgentPatchDiscoveryData )
Call oAPI.Return(oAgentPatchDiscoveryData)
WScript.Quit

‘ Connect to Windows Installer object
Function DoPatchDiscovery(ByVal oDisc)
On Error Resume Next
Dim installer : Set installer = Nothing
Set installer = Wscript.CreateObject(“WindowsInstaller.Installer”) : CheckError

Dim product, productList, count, path, patch, patchList, patchListString, oHealthServiceinstance
On Error Resume Next
Set productList = installer.ProductsEx(“”,””,4) : CheckError

patchListString = “”
For count = 0 to (productList.Count-1)
set product = productList.Item(count)
path = installer.ComponentPath(product.ProductCode, momAgentComponentID)
If path <> “” Then
Set patchList = installer.PatchesEx(product.productCode, “”, 7, 1)
For Each patch In patchList
patchListString = patchListString & patch.PatchProperty(“DisplayName”) & “; ”
Next
End If
Next

Set productList = Nothing

Set oHealthServiceinstance = oDisc.CreateClassInstance(“$MPElement[Name=’SCLibrary!Microsoft.SystemCenter.HealthService’]$”)
With oHealthServiceinstance
.AddProperty “$MPElement[Name=’Windows!Microsoft.Windows.Computer’]/PrincipalName$”, TargetComputerID
.AddProperty “$MPElement[Name=’SCLibrary!Microsoft.SystemCenter.HealthService’]/PatchList$”, patchListString
End With
call oDisc.AddInstance(oHealthServiceinstance)
End Function

Sub CheckError
Dim message, errRec
If Err = 0 Then Exit Sub
message = Err.Source & ” : ” & Hex(Err) & “: ” & Err.Description
If Not installer Is Nothing Then
Set errRec = installer.LastErrorRecord
If Not errRec Is Nothing Then message = message & vbNewLine & errRec.FormatText
End If
Wscript.Echo message
Wscript.Quit 2
End Sub

Network Adapter disconnects fail to alert in OpsMgr 2007

March 29, 2010 Leave a comment

This came up in the forums and Jonathan Almquist gave a great answer.

“There is a problem we’re currently investigating in the Data Source in the Network Adapater Connection Health monitor. This monitor type uses an event log detection before pass-thru to a script diagnostic. The event log provider is not configured correctly, in that we will never detect the events and allow the script to process the diagnostics. The reason why this monitor works when you perform a recalculation from Health Explorer is because we don’t use the Data Source in recalculations. We pass-thru directly to the script in this case. The script works fine, and will detect health of the adapater. Keep in mind that there are additional parameters the script takes to diagnose certain adapter “disconnect” conditions.

The logic of the script is as follows:

Select * from Win32_NetworkAdapter

If ((MediaDisconnectionFlag = true) And (oWmiObj.NetConnectionStatus = 7)) or
((DisconnectionFlag = true) And (oWmiObj.NetConnectionStatus = 0)) or
((HardwareFlag = true) And (oWmiObj.NetConnectionStatus = 6))

Then set state and generate alert.”

Categories: Troubleshooting

High Cpu Usage

November 12, 2009 Leave a comment

There are number of items to consider if you are finding agents that are being impacted by high cpu usage:

– there are a couple of hot fixes available:
http://support.microsoft.com/kb/974051
http://support.microsoft.com/kb/954903/en-us

– upgrade agent to WSH5.7:
http://www.microsoft.com/downloads/details.aspx?FamilyID=f00cb8c0-32e9-411d-a896-f2cd5ef21eb4&displaylang=en

Can’t remotely manage manually installed agents with R2

July 28, 2009 Leave a comment

Note that in Operations Manager 2007 R2, Microsoft have blocked the ability to remotely manage manually installed agents.

Reason – “This was decided due to issues customers ran into while trying to manage AD Integrated agents and update agents behind firewalls.”