Update Rollup 6 is out! :)
http://support.microsoft.com/kb/3039363
We are constantly improving SC Service Manager and we're not kidding about it ;)
The fixes and performance improvement in Update Rollup 6 are a proof of this!
Great job guys!
Enjoy!
Update Rollup 6 is out! :)
http://support.microsoft.com/kb/3039363
We are constantly improving SC Service Manager and we're not kidding about it ;)
The fixes and performance improvement in Update Rollup 6 are a proof of this!
Great job guys!
Enjoy!
When we install and use Global Service Monitor in System Center Operations Manager 2012, the Management Server(s) need to access the GSM site in Azure which will be using a certificate from Microsoft. The Management Server needs to trust this certificate and for this it needs to have in its computer certificate store, in Trusted Root Certificate Authorities store, a list of trusted Microsoft Certificate Authorities.
So, let’s say that you just installed GSM and you get this Warning Event in the Operations Manager Event Log on the Management Server:
Global Service Monitor Modules: Failed to discover Global Service Monitor locations.
Failure step: ‘Couldn’t get the ACS endpoint from discovery service. SubscriptionId: ‘SOME_ID‘, OutsideInServiceBaseUri: ‘https://gsm-prod.systemcenter.microsoft.com/”
Message: ‘Could not establish trust relationship for the SSL/TLS secure channel with authority ‘gsm-prod.systemcenter.microsoft.com’.’
Details: ‘System.ServiceModel.Security.SecurityNegotiationException: Could not establish trust relationship for the SSL/TLS secure channel with authority ‘gsm-prod.systemcenter.microsoft.com’.
—> System.Net.WebException: The underlying connection was closed: Could not establish trust relationship for the SSL/TLS secure channel.
—> System.Security.Authentication.AuthenticationException: The remote certificate is invalid according to the validation procedure.
In this particular case when we checked the Trusted Root Certificate Authorities store on this Management Server, we have noticed that some Microsoft Root CA certificates were missing, for example one of the most important ones for GSM, the Baltimore Cyber Trust Root certificate.
All these certificates should get imported on your Computers through Windows Update, basically KB931125. This gets updated very often with the new certificates so it might be a good idea to check if there are new certificates from time to time.
So here we installed KB931125 from the download link and then the error was gone and GSM started working again: http://www.microsoft.com/en-us/download/details.aspx?id=6149
After waiting a couple of minutes, we could see data getting in from the GSM Web Tests that were created in OpsMgr yuhuuu!
After upgrading SC Operations Manager 2012 SP1 to R2 you might notice general problems related to the Data Warehouse component, including Reporting of course. This may happen in situations where on upgrade there is more than one AlertDetail_GUID table in the DW database and not all will get upgraded to get the 2 new columns – TfsWorkItemId and TfsWorkItemOwner.
If you look in the Operations Manager event logs, you will notice this error event 31565 on one of the Management Servers:
Failed to deploy Data Warehouse component. The operation will be retried.
Exception 'DeploymentException': Failed to perform Data Warehouse component deployment operation: Install; Component: DataSet, Id: '0d698dff-9b7e-24d1-8a74-4657b86a59f8', Management Pack Version-dependent Id: '29a3dd22-8645-bae5-e255-9b56bf0b12a8'; Target: DataSet, Id: '23ee52b1-51fb-469b-ab18-e6b4be37ab35'. Batch ordinal: 3;
Exception: Sql execution failed. Error 207, Level 16, State 1, Procedure vAlertDetail, Line 18,
Message: Invalid column name 'TfsWorkItemId'.
One or more workflows were affected by this.
Workflow name: Microsoft.SystemCenter.DataWarehouse.Deployment.Component
To resolve the issue execute this SQL Query on the OperationsManagerDW database:
DECLARE
@GuidString NVARCHAR(50),
@Guid UNIQUEIDENTIFIER,
@StandardDatasetTableMapRowId INT,
@Statement NVARCHAR(MAX),
@SchemaName SYSNAME,
@TableNameSuffix SYSNAME,
@BaseTableName SYSNAME,
@FullTableName SYSNAME
SELECT @GuidString = DatasetId
FROM StandardDataset WITH(NOLOCK)
WHERE SchemaName = 'Alert'
SET @StandardDatasetTableMapRowId = 0
WHILE EXISTS (
SELECT *
FROM StandardDatasetTableMap AS TM
WHERE (tm.StandardDatasetTableMapRowId > @StandardDatasetTableMapRowId)
AND (tm.DatasetId = @GuidString)
) BEGIN
SELECT TOP 1
@StandardDatasetTableMapRowId = TM.StandardDatasetTableMapRowId,
@SchemaName = SD.SchemaName,
@TableNameSuffix = TM.TableNameSuffix,
@BaseTableName = SDAS.BaseTableName
FROM StandardDatasetTableMap AS TM
JOIN StandardDataset AS SD
ON TM.DatasetId = sd.DatasetId
JOIN StandardDatasetAggregationStorage AS SDAS
ON SDAS.DatasetId = TM.DatasetId AND SDAS.AggregationTypeId = TM.AggregationTypeId
WHERE
TM.StandardDatasetTableMapRowId > @StandardDatasetTableMapRowId AND
TM.DatasetId = @GUIDString AND
SDAS.TableTag = 'detail' AND
SDAS.DependentTableInd = 1
ORDER BY tm.StandardDatasetTableMapRowId
SET @FullTableName = @BaseTableName + '_' + @TableNameSuffix
IF NOT EXISTS (
SELECT *
FROM INFORMATION_SCHEMA.COLUMNS
WHERE
TABLE_NAME = @FullTableName AND
TABLE_SCHEMA = @SchemaName AND
COLUMN_NAME = N'TfsWorkItemId'
) BEGIN
SET @Statement = 'ALTER TABLE ' + QUOTENAME(@SchemaName) + '.' + QUOTENAME(@FullTableName) + ' ADD TfsWorkItemId NVARCHAR(256) NULL'
EXECUTE (@Statement)
END
IF NOT EXISTS (
SELECT *
FROM INFORMATION_SCHEMA.COLUMNS
WHERE
TABLE_NAME = @FullTableName AND
TABLE_SCHEMA = @SchemaName AND
COLUMN_NAME = N'TfsWorkItemOwner'
) BEGIN
SET @Statement = 'ALTER TABLE ' + QUOTENAME(@SchemaName) + '.' + QUOTENAME(@FullTableName) + ' ADD TfsWorkItemOwner NVARCHAR(256) NULL'
EXECUTE (@Statement)
END
END
EXEC StandardDatasetBuildCoverView @GUIDString, 0
This should get you on your way
There are some questions on how the object GUIDs are calculated in Operations Manager and Service Manager when thinking about moving MPs to different Management Groups and/or side-by-side migration scenarios. Would the Groups created have the same GUIDs?
When I talk about the GUID of an object, I am talking about the BaseManagedEntityId value of the BaseManagedEntity table and the TypedManagedEntityId from the TypedManagedEntity table in the operational database.
The string which is hashed for Singleton Classes like Groups is the same as the class ManagedTypeId from the ManagedType table because for singleton classes we assume that the InstanceId is the same as the TypeId because we will always have *only* one object of that class.
When we want to get the InstanceId of a singleton class like a Group, we need to know how the TypeId was calculated – it is a hash (SHA1) of this string format:
MPName=<MPName>,[KeyToken=<KeyToken>,]ObjectId=<ObjectId>
Here is an example for the All Windows Computers Group:
—
Now for any non-singleton class like the HealthService or WindowsComputer classes, this is the string used to calculate the InstanceId:
TypeId={<TypeGuid>}[,{<KeyPropertyId>}=<KeyValue>][,{<KeyPropertyId_2>}=<KeyValue_2> …]
Here is an example on how the HealthService GUID is being calculated for an Agent based on its FQDN:
I hope this answers some questions about what happens in different scenarios like upgrading the OS on an Agent, migration, moving groups from a Management Group to another and so on
Here is an interesting issue I came across. I did not encounter this too often but still, it's very interesting and it was very fun to troubleshoot!
In case you get the same error I suggest opening a case with Microsoft Support because it will be easier. Still, let me tell the story just for fun.
Some Alerts were not appearing in Reports and when we checked these were definitely not written to the Data Warehouse database – the AlertStage table for example where it ends up in the first place when Alerts are synched from the operational database to the data warehouse database. Ok then, so first thing to check is the Operations Manager event log on all Management Servers in case you have OM 2012 because any Management Server from the All Management Server Resource Pool can get the role of Data Warehouse Synchronization Server *OR* if you are on OM 2007 still, then *only* the Root Management Server will have this role.
Interestingly enough, we were seeing error events 31551 in the Operations Manager event log on the Server which had the Data Warehouse Synchronization Server role:
Failed to store data in the Data Warehouse. The operation will be retried.
Exception 'InvalidOperationException': The given value of type String from the data source cannot be converted to type nvarchar of the specified target column.
So what does this actually mean?!
Well the way that the Alert data is inserted into the Data Warehouse when synchronized from the operational database is by using SQL Bulk Insert because of the amount of data needed to be inserted. Well when Bulk Insert is used the class used will check the values of the data being inserted against the database tables field types and limitations. It will perform a check in this case so that all the values of an Alert (Item) will "fit" into the fields of the AlertStage table of the data warehouse database. If any of the fields (like Alert Title) are bigger than the MAX value of the table (AlertStage) then it will fail with that error message. The AlertStage Title field has NVARCHAR(255) as maximum length and if we have an Alert which has a title bigger than 255, then we will encounter this issue – as of course any other field which does not fit – like the CustomProperty fields of an Alert.
Here is a screenshot as FYI for the field types and definitions of the AlertStage table from the data warehouse database:
So now the question is … how do we figure out *which* of the Alerts has a longer string *than* *which* property (field) ?! Well because we are doing a Bulk Insert here from the .NET SQL Class we cannot see this by tracing the SQL Server there … it never ends up trying to insert this into SQL because it fails locally in the code on the Management Server which is trying to do this action. Right … so how to figure this one out?!
Debugging my friends, the answer to everything, hehe!
Ok so we need to look at … what process here? Well as you know (or will find out now) the Management Servers as well as the Agents have the "core" service called HealthService.exe. This process (service) however, does *NOT* do the heavy-lifting. It will spawn MonitoringHost.exe processes which will do the work for it after loading the necessary modules (DLL's) in them which are needed for the specific tasks they are assigned to do.
So which MonitoringHost.exe process will it be which runs the Data Warehouse Synchronization workflow? Well this would have to be a MonitoringHost.exe process which is running under the Data Warehouse Writer Account would it not?
Because of the fact that the process is *NOT* crashing when throwing this Exception – as it is handled – we cannot simply get a memory dump of that process. So we either attach the debugger (WinDbg) live to the environment *OR* we can use another nifty tool which is called Time Travel Tracing to get the "dynamic" memory dump of that process so that we can "step" through the process execution. We could also use a tool like DebugDiag which can be set to write a memory dump of a process on a custom .NET Exception like in this example.
Either way, whatever method we will use here, we end up in the "same" scenario where we need to set up a breakpoint of the Exception we are interested in (if not using DebugDiag which writes the memory dump on exactly that Exception). So I will continue with this further on with the example of attaching to the MonitoringHost.exe process in question directly in the live environment.
Let's start shall we? We are in the debugger now and let's make sure we have loaded the correct SOS .NET Debugger Extension version: .cordll -ve -u -l
Now we can continue by creating the breakpoint to stop when hitting the exception we are interested in which in this case is System.InvalidOperationException: !StopOnException -create System.InvalidOperationException 1
And then we enter g in the debugger so that the process continues execution until we hit the breakpoint (Exception). When it stops and the debugger breaks in, we use !PrintException to see the exception details:
0:008> !PrintException
Exception object: 0000000201f1dc38
Exception type: System.InvalidOperationException
Message: The given value of type String from the data source cannot be converted to type nvarchar of the specified target column.
InnerException: System.InvalidOperationException, Use !PrintException 0000000201f1d6d8 to see more.
StackTrace (generated):
SP IP Function
00000000075FC7A0 000007FEF338D5A8 system_data_ni!System.Data.SqlClient.SqlBulkCopy.ConvertValue(System.Object, System.Data.SqlClient._SqlMetaData)+0x19ec38
00000000075FED10 000007FEF31EDD39 system_data_ni!System.Data.SqlClient.SqlBulkCopy.WriteToServerInternal()+0x989
00000000075FEE00 000007FEF31EE263 system_data_ni!System.Data.SqlClient.SqlBulkCopy.WriteRowSourceToServer(Int32)+0x463
00000000075FEED0 000007FEF31EE945 system_data_ni!System.Data.SqlClient.SqlBulkCopy.WriteToServer(System.Data.IDataReader)+0x125
00000000075FEF40 000007FF0018DA89 microsoft_enterprisemanagement_datawarehouse_dataaccess!Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.Commands.DataWarehouseSqlBulkInsertCommand.Execute()+0x99
StackTraceString: <none>
HResult: 80131509
As we can see we have a nested Exception here, so let's have a look at this as well:
0:008> !PrintException 0000000201f1d6d8
Exception object: 0000000201f1d6d8
Exception type: System.InvalidOperationException
Message: String or binary data would be truncated.
InnerException: <none>
StackTrace (generated):
SP IP Function
00000000075FEBB0 000007FEF31EEF5A system_data_ni!System.Data.SqlClient.SqlBulkCopy.ConvertValue(System.Object, System.Data.SqlClient._SqlMetaData)+0x5ea
StackTraceString: <none>
HResult: 80131509
Ok, so this is interning – now how do we find out which table field is affected here? It is going to be a System.Data.SqlClient._SqlMetaData object and should be in the Managed Heap here of this Thread we stopped on. Thus we can use !DumpStackObjects to see all the objects:
0:008> !DumpStackObjects
OS Thread Id: 0x3cb4 (8)
RSP/REG Object Name
[…SNIPPED…]
00000000075FA218 00000001c01beca0 System.Data.SqlClient.TdsParserStateObject
00000000075FA370 00000001c01beb90 System.Data.SqlClient.TdsParser
00000000075FA490 0000000201f1dc38 System.InvalidOperationException
00000000075FA8B0 0000000201f12240 System.Data.SqlClient._SqlMetaData
00000000075FA8C8 0000000201f1d400 System.Int32[]
00000000075FA8F8 0000000201f1d2c0 System.Object[] (System.Object[])
00000000075FABD0 0000000201ee0948 System.Data.SqlClient.SqlBulkCopy
00000000075FABF0 0000000201ee0be8 System.Data.SqlClient.SqlBulkCopyColumnMappingCollection
00000000075FAE00 0000000201ee0948 System.Data.SqlClient.SqlBulkCopy
00000000075FAE20 0000000201ee0be8 System.Data.SqlClient.SqlBulkCopyColumnMappingCollection
00000000075FB128 0000000201f1dc38 System.InvalidOperationException
00000000075FB410 00000001bfe40488 System.String
00000000075FB4D0 0000000201ee0948 System.Data.SqlClient.SqlBulkCopy
00000000075FB4F0 0000000201ee0be8 System.Data.SqlClient.SqlBulkCopyColumnMappingCollection
00000000075FB880 0000000201f1eaf0 System.Environment+ResourceHelper+GetResourceStringUserData
00000000075FBA70 0000000201f1eb68 System.Runtime.CompilerServices.RuntimeHelpers+CleanupCode
00000000075FBA78 0000000201f1eaf0 System.Environment+ResourceHelper+GetResourceStringUserData
00000000075FBAA0 0000000201f1eb68 System.Runtime.CompilerServices.RuntimeHelpers+CleanupCode
00000000075FBAB0 0000000201f1eb68 System.Runtime.CompilerServices.RuntimeHelpers+CleanupCode
00000000075FBB08 0000000201f1eb28 System.Runtime.CompilerServices.RuntimeHelpers+TryCode
00000000075FBB10 0000000201f1eaf0 System.Environment+ResourceHelper+GetResourceStringUserData
[…SNIPPED…]
Ok, from that big list, let's grab the object ID of the object we are interested in (System.Data.SqlClient._SqlMetaData) – there will be more entries, but it should have the same Object ID: !DumpObj 0000000201f12240
0:008> !DumpObj 0000000201f12240
Name: System.Data.SqlClient._SqlMetaData
MethodTable: 000007fef2d98b90
EEClass: 000007fef2c36e60
Size: 224(0xe0) bytes
File: C:\Windows\Microsoft.Net\assembly\GAC_64\System.Data\v4.0_4.0.0.0__b77a5c561934e089\System.Data.dll
Fields:
MT Field Offset Type VT Attr Value Name
000007fef2d91c50 4001bc9 80 System.Int32 1 instance 12 type
000007fef880c158 4001bca 8c System.Byte 1 instance 231 tdsType
000007fef880c158 4001bcb 8d System.Byte 1 instance 255 precision
000007fef880c158 4001bcc 8e System.Byte 1 instance 255 scale
000007fef880c7d8 4001bcd 84 System.Int32 1 instance 512 length
000007fef2d925a0 4001bce 8 …ient.SqlCollation 0 instance 0000000201f139c0 collation
000007fef880c7d8 4001bcf 88 System.Int32 1 instance 1251 codePage
0000000000000000 4001bd0 10 0 instance 000000017fe6eb28 encoding
000007fef880d608 4001bd1 8f System.Boolean 1 instance 1 isNullable
000007fef880d608 4001bd2 90 System.Boolean 1 instance 0 isMultiValued
000007fef88068f0 4001bd3 18 System.String 0 instance 0000000000000000 udtDatabaseName
000007fef88068f0 4001bd4 20 System.String 0 instance 0000000000000000 udtSchemaName
000007fef88068f0 4001bd5 28 System.String 0 instance 0000000000000000 udtTypeName
000007fef88068f0 4001bd6 30 System.String 0 instance 0000000000000000 udtAssemblyQualifiedName
000007fef8808278 4001bd7 38 System.Type 0 instance 0000000000000000 udtType
000007fef88068f0 4001bd8 40 System.String 0 instance 0000000000000000 xmlSchemaCollectionDatabase
000007fef88068f0 4001bd9 48 System.String 0 instance 0000000000000000 xmlSchemaCollectionOwningSchema
000007fef88068f0 4001bda 50 System.String 0 instance 0000000000000000 xmlSchemaCollectionName
000007fef2d928b8 4001bdb 58 …qlClient.MetaType 0 instance 000000017fe701c0 metaType
000007fef88068f0 4001bdc 60 System.String 0 instance 0000000000000000 structuredTypeDatabaseName
000007fef88068f0 4001bdd 68 System.String 0 instance 0000000000000000 structuredTypeSchemaName
000007fef88068f0 4001bde 70 System.String 0 instance 0000000000000000 structuredTypeName
0000000000000000 4001bdf 78 0 instance 0000000000000000 structuredFields
000007fef88068f0 4001be0 98 System.String 0 instance 0000000201f139f0 column
000007fef88068f0 4001be1 a0 System.String 0 instance 0000000000000000 baseColumn
000007fef32c3cb0 4001be2 b0 …ultiPartTableName 1 instance 0000000201f122f0 multiPartTableName
000007fef880c7d8 4001be3 94 System.Int32 1 instance 25 ordinal
000007fef880c158 4001be4 91 System.Byte 1 instance 2 updatability
000007fef880c158 4001be5 a8 System.Byte 1 instance 0 tableNum
000007fef880d608 4001be6 a9 System.Boolean 1 instance 0 isDifferentName
000007fef880d608 4001be7 aa System.Boolean 1 instance 0 isKey
000007fef880d608 4001be8 ab System.Boolean 1 instance 0 isHidden
000007fef880d608 4001be9 ac System.Boolean 1 instance 0 isExpression
000007fef880d608 4001bea ad System.Boolean 1 instance 0 isIdentity
000007fef880d608 4001beb ae System.Boolean 1 instance 0 isColumnSet
000007fef880c158 4001bec af System.Byte 1 instance 0 op
000007fef88142d0 4001bed 92 System.UInt16 1 instance 0 operand
Ok, so using the column field of this object we can see that it is the CustomProperty9 column which we know has a max 255 character limitation – here we need to dump the pointer address of the object + the offset where the column table is found:
0:008> !DumpObj poi(0000000201f12240+98)
Name: System.String
MethodTable: 000007fef88068f0
EEClass: 000007fef838ed78
Size: 50(0x32) bytes
File: C:\Windows\Microsoft.Net\assembly\GAC_64\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll
String: CustomField9
Fields:
MT Field Offset Type VT Attr Value Name
000007fef880c7d8 4000103 8 System.Int32 1 instance 12 m_stringLength
000007fef880b318 4000104 c System.Char 1 instance 43 m_firstChar
000007fef88068f0 4000105 10 System.String 0 shared static Empty
>> Domain:Value 00000000003e9470:00000001bfe40488 <<
So now we need to get a list of Alerts which the process is trying to insert and check the CustomProperty9 of all of them and find which is bigger than 255. We need to dump the object which is having all the alerts which is System.Data.SqlClient.SqlBulkCopy – just like before !DumpStackObjects and look for the object address:
0:008> !DumpStackObjects
OS Thread Id: 0x3cb4 (8)
RSP/REG Object Name
00000000075F5908 0000000201f1dc38 System.InvalidOperationException
00000000075F59A8 0000000201f1dc38 System.InvalidOperationException
00000000075F5A30 0000000201ee0080 Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.DataWarehouseCommandExecutionContext
00000000075F5A38 0000000201f1dc38 System.InvalidOperationException
00000000075F5A48 0000000201ee0018 Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.DataWarehouseSingleCommand+ExecutionStep
00000000075F5A50 0000000201ee0098 System.Collections.Generic.Dictionary`2[[Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.DataWarehouseConnectionDescriptor, Microsoft.EnterpriseManagement.DataWarehouse.DataAccess],[Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.DataWarehouseConnection, Microsoft.EnterpriseManagement.DataWarehouse.DataAccess]]
00000000075F5C90 0000000201f1dc38 System.InvalidOperationException
00000000075F60B0 00000002000b13d8 Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.Commands.DataWarehouseSqlBulkInsertCommand
00000000075F60C8 0000000201ee0018 Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.DataWarehouseSingleCommand+ExecutionStep
00000000075F60D0 0000000201ee0098 System.Collections.Generic.Dictionary`2[[Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.DataWarehouseConnectionDescriptor, Microsoft.EnterpriseManagement.DataWarehouse.DataAccess],[Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.DataWarehouseConnection, Microsoft.EnterpriseManagement.DataWarehouse.DataAccess]]
00000000075F63D0 0000000201ee0080 Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.DataWarehouseCommandExecutionContext
00000000075F63E8 0000000201ee0018 Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.DataWarehouseSingleCommand+ExecutionStep
00000000075F63F0 0000000201ee0098 System.Collections.Generic.Dictionary`2[[Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.DataWarehouseConnectionDescriptor, Microsoft.EnterpriseManagement.DataWarehouse.DataAccess],[Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.DataWarehouseConnection, Microsoft.EnterpriseManagement.DataWarehouse.DataAccess]]
00000000075F6600 0000000201ee0080 Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.DataWarehouseCommandExecutionContext
00000000075F6618 0000000201ee0018 Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.DataWarehouseSingleCommand+ExecutionStep
00000000075F6620 0000000201ee0098 System.Collections.Generic.Dictionary`2[[Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.DataWarehouseConnectionDescriptor, Microsoft.EnterpriseManagement.DataWarehouse.DataAccess],[Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.DataWarehouseConnection, Microsoft.EnterpriseManagement.DataWarehouse.DataAccess]]
00000000075F6928 0000000201f1dc38 System.InvalidOperationException
00000000075F6CD0 0000000201ee0080 Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.DataWarehouseCommandExecutionContext
00000000075F6CE8 0000000201ee0018 Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.DataWarehouseSingleCommand+ExecutionStep
00000000075F6CF0 0000000201ee0098 System.Collections.Generic.Dictionary`2[[Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.DataWarehouseConnectionDescriptor, Microsoft.EnterpriseManagement.DataWarehouse.DataAccess],[Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.DataWarehouseConnection, Microsoft.EnterpriseManagement.DataWarehouse.DataAccess]]
00000000075F7640 0000000201f12240 System.Data.SqlClient._SqlMetaData
00000000075F7968 0000000201f1dc38 System.InvalidOperationException
00000000075F7B10 0000000201ee0948 System.Data.SqlClient.SqlBulkCopy
00000000075F7B40 0000000201f1dc38 System.InvalidOperationException
00000000075F7B58 000000014007fa40 System.RuntimeType
[…SNIPPED…]
Ok now let's have a look at the list shall we – dumping the object first as before using !DumpObj and the object address:
0:008> !DumpObj 0000000201ee0948
<Note: this object has an invalid CLASS field>
Name: System.Data.SqlClient.SqlBulkCopy
MethodTable: 000007fef32c3868
EEClass: 000007fef2c4eae0
Size: 176(0xb0) bytes
File: C:\Windows\Microsoft.Net\assembly\GAC_64\System.Data\v4.0_4.0.0.0__b77a5c561934e089\System.Data.dll
Fields:
MT Field Offset Type VT Attr Value Name
000007fef880c7d8 40016e9 78 System.Int32 1 instance 0 _batchSize
000007fef880d608 40016ea a0 System.Boolean 1 instance 0 _ownConnection
000007fef32a4370 40016eb 7c System.Int32 1 instance 0 _copyOptions
000007fef880c7d8 40016ec 80 System.Int32 1 instance 30 _timeout
000007fef88068f0 40016ed 8 System.String 0 instance 000000013fe4ea28 _destinationTableName
000007fef880c7d8 40016ee 84 System.Int32 1 instance 0 _rowsCopied
000007fef880c7d8 40016ef 88 System.Int32 1 instance 0 _notifyAfter
000007fef880c7d8 40016f0 8c System.Int32 1 instance 0 _rowsUntilNotification
000007fef880d608 40016f1 a1 System.Boolean 1 instance 32 _insideRowsCopiedEvent
000007fef8805a48 40016f2 10 System.Object 0 instance 00000002000b4c10 _rowSource
000007fef2d91038 40016f3 18 …ent.SqlDataReader 0 instance 0000000000000000 _SqlDataReaderRowSource
000007fef32c3a48 40016f4 20 …MappingCollection 0 instance 0000000201ee09f8 _columnMappings
000007fef32c3a48 40016f5 28 …MappingCollection 0 instance 0000000201ee0be8 _localColumnMappings
000007fef2d8fba8 40016f6 30 …ent.SqlConnection 0 instance 0000000201ee0610 _connection
000007fef32c3b70 40016f7 38 …nt.SqlTransaction 0 instance 0000000000000000 _internalTransaction
000007fef32c3b70 40016f8 40 …nt.SqlTransaction 0 instance 0000000000000000 _externalTransaction
000007fef32a4058 40016f9 90 System.Int32 1 instance 1 _rowSourceType
000007fef2d9b880 40016fa 48 System.Data.DataRow 0 instance 74042e00b98a6449 _currentRow
000007fef880c7d8 40016fb 94 System.Int32 1 instance -1738182535 _currentRowLength
000007fef3290ab0 40016fc 98 System.Int32 1 instance 1765434373 _rowState
000007fef880f240 40016fd 50 …tions.IEnumerator 0 instance 693a640598657079 _rowEnumerator
000007fef2d973e8 40016fe 58 …lClient.TdsParser 0 instance 00000001c01beb90 _parser
000007fef2d97ac8 40016ff 60 …ParserStateObject 0 instance 0000000000000000 _stateObj
000007fef8810158 4001700 68 …ections.ArrayList 0 instance 0000000201f18ca8 _sortedColumnMappings
000007fef32c3630 4001701 70 …opiedEventHandler 0 instance 4c4d582f31303032 _rowsCopiedEventHandler
000007fef880c7d8 4001703 9c System.Int32 1 instance 212855 _objectID
000007fef880c7d8 4001702 9d8 System.Int32 1 static 212855 _objectTypeCount
So the _rowSource is what we are interested in – let's dump this shall we? Again using !DumpObj with the object address + field offset:
0:008> !DumpObj poi(0000000201ee0948+10)
Name: Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.Commands.BulkDataReader
MethodTable: 000007ff002c4338
EEClass: 000007ff002a4d90
Size: 64(0x40) bytes
File: C:\Program Files\Microsoft System Center 2012 R2\Operations Manager\Server\Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.dll
Fields:
MT Field Offset Type VT Attr Value Name
000007fef880d608 40000e5 10 System.Boolean 1 instance 0 isClosed
000007ff002c3b70 40000e6 8 …DataReaderAdaptor 0 instance 00000002000b1480 dataAdaptor
000007fef8828658 40000e7 18 System.Guid 1 instance 00000002000b4c28 dataSetId
000007fef8828658 40000e8 28 System.Guid 1 instance 00000002000b4c38 managementGroupId
Ok what now? Now we need to look at the dataAdapter, like before But wait … we used poi() before – well, we'll use it twice this time:
0:008> !DumpObj poi(poi(0000000201ee0948+10)+8)
Name: Microsoft.EnterpriseManagement.HealthService.Modules.DataWarehouse.AlertDataReaderAdaptor
MethodTable: 000007ff002cfa98
EEClass: 000007ff00301318
Size: 48(0x30) bytes
File: C:\Program Files\Microsoft System Center 2012 R2\Operations Manager\Server\Microsoft.EnterpriseManagement.HealthService.Modules.DataWarehouse.dll
Fields:
MT Field Offset Type VT Attr Value Name
000007fef2d9a548 4000035 8 …em.Data.DataTable 0 instance 00000002000b1928 schemaTable
000007fef880adf8 4000036 10 System.Object[] 0 instance 00000002001d89e8 dataItems
000007fef880c7d8 4000037 18 System.Int32 1 instance 0 currentIndex
000007fef880c7d8 4000038 1c System.Int32 1 instance 0 startIndex
000007fef880c7d8 4000039 20 System.Int32 1 instance 1000 itemsToProcessCount
We are interested in the dataItems object – but wait … it is an array as we can see from the [] characters in System.Object[] so we can use a nifty other command called !DumpArray here – again with triple pointer function:
0:008> !DumpArray poi(poi(poi(0000000201ee0948+10)+8)+10)
Name: Microsoft.EnterpriseManagement.HealthService.DataItemBase[]
MethodTable: 000007fef880adf8
EEClass: 000007fef838fc68
Size: 8032(0x1f60) bytes
Array: Rank 1, Number of elements 1000, Type CLASS
Element Methodtable: 000007ff000474b8
[0] 0000000140b851a0
[1] 0000000140b8a950
[2] 0000000140b8d420
[3] 0000000140b8feb8
[4] 0000000140b92950
[5] 0000000140b94eb0
[6] 0000000140b97428
[7] 0000000140b999a0
[8] 0000000140b9aba8
[9] 0000000140b9d6b8
[10] 0000000140ba0150
[11] 0000000140ba2be8
[12] 0000000140ba5648
[…SNIPPED…]
Ok interesting, so this is an Array which is holding items of type Microsoft.EnterpriseManagement.HealthService.DataItemBase but when we dump the array we only get the address of each such object, so we would need to dump each of them using !DumpObj and check them out – wow! … really? there are a looot of them … Ok, let's worry about that a little later and now just dump one of the items (first one as example) just to see what we need to check there for each one:
0:008> !DumpObj 0000000140b851a0
Name: Microsoft.EnterpriseManagement.Mom.AlertSubscriptionModule.DataItemAlertSubscription
MethodTable: 000007ff002cdc70
EEClass: 000007ff002fe738
Size: 568(0x238) bytes
File: C:\Program Files\Microsoft System Center 2012 R2\Operations Manager\Server\Microsoft.Mom.AlertSubscriptionDataSourceModule.dll
Fields:
MT Field Offset Type VT Attr Value Name
000007fef88296c8 400001a 18 System.DateTime 1 instance 0000000140b851b8 dateCreated
000007fef8828658 400001b 20 System.Guid 1 instance 0000000140b851c0 sourceHealthServiceId
000007fef88068f0 400001c 8 System.String 0 instance 00000001400a54d8 itemXml
000007fef88068f0 400001d 10 System.String 0 instance 0000000140b862b8 dataTypeName
000007fef8828658 4000019 8 System.Guid 1 static 000000023fe46128 localHealthServiceId
000007ff002cd6f8 400009b 118 System.Int32 1 instance 0 subscriptionType
000007ff002cd850 400009c 11c System.Int32 1 instance 0 subscriptionProperty
000007fef8828658 400009d 138 System.Guid 1 instance 0000000140b852d8 id
000007fef88068f0 400009e 30 System.String 0 instance 0000000140b864b8 name
000007fef88068f0 400009f 38 System.String 0 instance 0000000140b86520 description
000007fef8828658 40000a0 148 System.Guid 1 instance 0000000140b852e8 baseManagedEntityId
000007fef8828658 40000a1 158 System.Guid 1 instance 0000000140b852f8 problemId
000007fef880d608 40000a2 130 System.Boolean 1 instance 0 createdByMonitor
000007fef8828658 40000a3 168 System.Guid 1 instance 0000000140b85308 workflowId
000007fef8827af0 40000a4 120 System.UInt32 1 instance 0 resolutionState
000007ff002cd9a8 40000a5 124 System.Int32 1 instance 2 priority
000007ff002cdb00 40000a6 128 System.Int32 1 instance 1 severity
000007fef88068f0 40000a7 40 System.String 0 instance 0000000140b86778 category
000007fef88068f0 40000a8 48 System.String 0 instance 0000000000000000 owner
000007fef88068f0 40000a9 50 System.String 0 instance 0000000000000000 resolvedBy
000007fef88296c8 40000aa 178 System.DateTime 1 instance 0000000140b85318 timeRaised
000007fef88296c8 40000ab 180 System.DateTime 1 instance 0000000140b85320 timeAdded
000007fef88296c8 40000ac 188 System.DateTime 1 instance 0000000140b85328 lastModified
000007fef88296c8 40000ad 190 System.DateTime 1 instance 0000000140b85330 lastModifiedExceptRepeatCount
000007fef88068f0 40000ae 58 System.String 0 instance 0000000140b86f70 lastModifiedBy
000007fef88296c8 40000af 198 System.DateTime 1 instance 0000000140b85338 timeResolved
000007fef88296c8 40000b0 1a0 System.DateTime 1 instance 0000000140b85340 timeResolutionStateLastModified
000007fef88296c8 40000b1 1a8 System.DateTime 1 instance 0000000140b85348 timeResolutionStateLastModifiedInDB
000007fef88068f0 40000b2 60 System.String 0 instance 0000000140b86f98 customField1
000007fef88068f0 40000b3 68 System.String 0 instance 0000000140b86fe0 customField2
000007fef88068f0 40000b4 70 System.String 0 instance 0000000140b87020 customField3
000007fef88068f0 40000b5 78 System.String 0 instance 0000000140b87060 customField4
000007fef88068f0 40000b6 80 System.String 0 instance 0000000140b870a0 customField5
000007fef88068f0 40000b7 88 System.String 0 instance 0000000140b870f8 customField6
000007fef88068f0 40000b8 90 System.String 0 instance 0000000140b87128 customField7
000007fef88068f0 40000b9 98 System.String 0 instance 0000000140b87160 customField8
000007fef88068f0 40000ba a0 System.String 0 instance 0000000140b871a0 customField9
000007fef88068f0 40000bb a8 System.String 0 instance 0000000140b873c0 customField10
000007fef88068f0 40000bc b0 System.String 0 instance 0000000000000000 ticketId
000007fef88068f0 40000bd b8 System.String 0 instance 0000000140b87438 context
000007fef88296c8 40000be 1b0 System.DateTime 1 instance 0000000140b85350 lastModifiedByNonConnector
000007fef8828658 40000bf 1b8 System.Guid 1 instance 0000000140b85358 connectorId
000007fef880c7d8 40000c0 12c System.Int32 1 instance 0 repeatCount
000007fef8828658 40000c1 1c8 System.Guid 1 instance 0000000140b85368 alertStringId
000007fef88068f0 40000c2 c0 System.String 0 instance 0000000140b88848 alertParams
000007fef88068f0 40000c3 c8 System.String 0 instance 0000000000000000 siteName
000007fef88068f0 40000c4 d0 System.String 0 instance 0000000140b88b30 baseManagedEntityFullName
000007fef88068f0 40000c5 d8 System.String 0 instance 0000000000000000 baseManagedEntityPath
000007fef88068f0 40000c6 e0 System.String 0 instance 0000000140b88bb8 baseManagedEntityName
000007fef88068f0 40000c7 e8 System.String 0 instance 0000000140b88bf8 baseManagedEntityDisplayName
000007fef88068f0 40000c8 f0 System.String 0 instance 0000000140b88c30 resolutionStateName
000007fef88068f0 40000c9 f8 System.String 0 instance 0000000000000000 timeZone
000007fef88068f0 40000ca 100 System.String 0 instance 0000000140b88c50 languageCode
000007fef88296c8 40000cb 1d8 System.DateTime 1 instance 0000000140b85378 timeRaisedLocal
000007fef88296c8 40000cc 1e0 System.DateTime 1 instance 0000000140b85380 timeAddedLocal
000007fef88296c8 40000cd 1e8 System.DateTime 1 instance 0000000140b85388 lastModifiedLocal
000007fef88296c8 40000ce 1f0 System.DateTime 1 instance 0000000140b85390 lastModifiedExceptRepeatCountLocal
000007fef88296c8 40000cf 1f8 System.DateTime 1 instance 0000000140b85398 timeResolvedLocal
000007fef88296c8 40000d0 200 System.DateTime 1 instance 0000000140b853a0 timeResolutionStateLastModifiedLocal
000007fef88296c8 40000d1 208 System.DateTime 1 instance 0000000140b853a8 timeResolutionStateLastModifiedInDBLocal
000007fef88296c8 40000d2 210 System.DateTime 1 instance 0000000140b853b0 lastModifiedByNonConnectorLocal
000007fef88292a0 40000d3 218 System.TimeSpan 1 instance 0000000140b853b8 queryExecutionTimeSpan
000007fef88296c8 40000d4 220 System.DateTime 1 instance 0000000140b853c0 dataItemCreateTime
000007fef88296c8 40000d5 228 System.DateTime 1 instance 0000000140b853c8 dataItemCreateTimeLocal
000007fef88068f0 40000d6 108 System.String 0 instance 0000000000000000 tfsWorkItemId
000007fef88068f0 40000d7 110 System.String 0 instance 0000000000000000 tfsWorkItemOwner
Alright, so we know that the CustomField9 which we are interested in had offset a0 – so now we can dump it directly:
0:008> !DumpObj poi(0000000140b851a0+a0)
Name: System.String
MethodTable: 000007fef88068f0
EEClass: 000007fef838ed78
Size: 540(0x21c) bytes
File: C:\Windows\Microsoft.Net\assembly\GAC_64\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll
String: SOME_STRING_HERE_WON'T_TELL_YOU_THE_VALUE_HA!
Fields:
MT Field Offset Type VT Attr Value Name
000007fef880c7d8 4000103 8 System.Int32 1 instance 257 m_stringLength
000007fef880b318 4000104 c System.Char 1 instance 57 m_firstChar
000007fef88068f0 4000105 10 System.String 0 shared static Empty
>> Domain:Value 00000000003e9470:00000001bfe40488 <<
Cool, so now we know what we want and need to dump here – so what's next? Well we have this nifty .foreach and .shell command to use the Windows CMD Find command on the output as we can see in these examples WinDbg Scripting
0:008> .shell -ci ".foreach (obj { !DumpArray poi(poi(poi(0000000201ee0948+10)+8)+10) }) { !DumpObj poi(${obj}+a0) }" FIND /V "Invalid parameter"
<Note: this object has an invalid CLASS field>
Invalid object
Integrated managed debugging does not support enumeration of symbols.
See http://dbg/managed.htm for more details.
<Note: this object has an invalid CLASS field>
Invalid object
Name: System.String
MethodTable: 000007fef88068f0
EEClass: 000007fef838ed78
Size: 540(0x21c) bytes
File: C:\Windows\Microsoft.Net\assembly\GAC_64\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll
String: SOME_STRING_HERE_WON'T_TELL_YOU_THE_VALUE_HA!
Fields:
MT Field Offset Type VT Attr Value Name
000007fef880c7d8 4000103 8 System.Int32 1 instance 257 m_stringLength
000007fef880b318 4000104 c System.Char 1 instance 57 m_firstChar
000007fef88068f0 4000105 10 System.String 0 shared static Empty
>> Domain:Value 00000000003e9470:00000001bfe40488 <<
Name: System.String
MethodTable: 000007fef88068f0
EEClass: 000007fef838ed78
Size: 540(0x21c) bytes
File: C:\Windows\Microsoft.Net\assembly\GAC_64\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll
String: SOME_STRING_HERE_WON'T_TELL_YOU_THE_VALUE_HA!
Fields:
MT Field Offset Type VT Attr Value Name
000007fef880c7d8 4000103 8 System.Int32 1 instance 257 m_stringLength
000007fef880b318 4000104 c System.Char 1 instance 57 m_firstChar
000007fef88068f0 4000105 10 System.String 0 shared static Empty
>> Domain:Value 00000000003e9470:00000001bfe40488 <<
.shell: Process exited
—
You do know the FIND command in CMD right?
Now from the output there check the length of the strings and when you found the one which has a size bigger than 255 that's your hit there!
Well it turns out that we had a 3rd Party Management Pack there which was using the CustomField9 for … well something whatever … and thus we either remove this MP or we engage the Vendor.
Problem solved!
By the way – there are some cool WinDbg extensions out there which may help you a lot with debugging stuff in general so that you don't need to use WinDbg scripting. These are more than awesome help for .NET Debugging.
A good example of such an extension would be SOSEx.
Have fun debugging!!!
There are quite a few situations where we end up having "corrupt" – actually orphaned or missing – information about the DW Registration.
I have created a tool which you can use to fix *mostly* all such issues automatically. Using this tool you can delete all basic information about the DW Registration or if you are sure about what is going on, just re-create all basic information about it. The cool part with deleting the existing information which may be incomplete, is that you can then simply re-register the DW from the SM Console.
But – let's first talk about how this actually works under the hood!
When we register the DW from the Console in Administration tab, there are several things which happen:
These are the tables of interest:
1. On the ServiceManager database, make sure we have the DwSdkResourceStore object for the Data Warehouse – this is the way SM (and the Console) "know" that they have a DW registered and where/how to connect to it – this is how the Data Warehouse tab appears in the SM Console:
select *
from MT_Microsoft$SystemCenter$ResourceAccessLayer$DwSdkResourceStore
2. On the DWStagingAndConfig database, make sure we have the CMDBSource object created for the SM DataSource – check the properties in the results and look at DataSourceName, SdkServer, Database, DatabaseServer. There should be at least 2 entries here – 1 for your DW (it's actually registered itself to itself as a DataSource – another story for another time) and 1 entry for your ServiceManager DataSource (Management Group):
select *
from MTV_Microsoft$Systemcenter$Datawarehouse$CMDBSource
3. On the DWStagingAndConfig database, make sure we have the Datasource entry in the Staging.Datasource table because it is an entry which is needed for the registration to actually work. There should be at least 2 entries here – 1 for your DW and 1 entry for your ServiceManager DataSource (Management Group):
select ds.DatasourceId, ds.TimeAdded, cmdb.DataSourceName_AC09B683_AE61_BDCA_6383_2007DB60859D
from Staging.Datasource as ds
join MTV_Microsoft$Systemcenter$Datawarehouse$CMDBSource as cmdb
on ds.DatasourceId = cmdb.DataSourceId_17109AB9_58CD_F741_8AE3_3A9F29C83709
4. On the DWStagingAndConfig database, make sure that we have the SdkResourceStore object created. It is how the DW SDK knows to what (managed) DataSource to connect to for a certain DataSource as it connects to a SDK service. There should be at least 3 entries here – 1 is an internal source (Ral.SdkResourceStore.Sdk), 1 for your DW and 1 entry for your ServiceManager DataSource (Management Group). Check the properties here and see if they are correct – DisplayName should have YOUR_SM_MG.Sdk and then check also if the Server is correct – should be the SM Workflow Management Server:
select *
from MTV_Microsoft$SystemCenter$ResourceAccessLayer$SdkResourceStore
5. On the DWStagingAndConfig database, make sure that the SqlResourceStore object is created. It is how the DW SDK knows to which SQL Server to connect to for getting the data for the DataSource from its database. There should be at least 2 entries here – 1 for your DW and 1 entry for your ServiceManager DataSource (Management Group). Again, check the properties, DataService, Name, Server:
select *
from MTV_Microsoft$Systemcenter$ResourceAccessLayer$SqlResourceStore
where DisplayName like '%.ExtractionSource'
6. On the DWStagingAndConfig database, make sure you have the relationships needed here of type StoreHasProperty for the SM DataSource. There should be at least 5 entries here. 1 is for the internal DataSource (Ral.SdkResourceStore.Sdk) and then 2 for the DW DataSource (YOUR_DW_MG.Sdk and YOUR_DW_MG.ExtractionSource) and 2 for your SM DataSource (YOUR_SM_MG.Sdk and YOUR_SM_MG.ExtractionSource) – so make sure these exist:
select r.RelationshipId, bmes.FullName as 'SourceEntity', bmet.FullName as 'TargetEntity'
from Relationship as r
join BaseManagedEntity as bmes
on r.SourceEntityId = bmes.BaseManagedEntityId
join BaseManagedEntity as bmet
on r.TargetEntityId = bmet.BaseManagedEntityId
where r.TargetEntityId in (
select BaseManagedEntityId
from MTV_Microsoft$SystemCenter$ResourceAccessLayer$StoreProperty
where DisplayName in ('ExtractionSource', 'Sdk')
) and r.RelationshipTypeId = (
select RelationshipTypeId
from RelationshipType
where RelationshipTypeName = 'Microsoft.SystemCenter.ResourceAccessLayer.StoreHasProperty'
)
What the Tool does *not* do is to add or remove the SecureReference, MPSync Rule and Extract Rule for the new DataSource – so make sure that you *know* that the DW was registered and working before.
Another idea and recommendation if you are not sure is to run the tool with "-a:rem" so that we delete the basic registration information and then just re-register it normally from the SM Console, thus creating everything needed including SecureReference, MPSync Rule and Extract Rule for the ServiceManager database.
To run the tool, you need to run it on a computer which has network access to both the SM Management Server and the DW Management Server. This includes of course Kerberos Authentication working and the User Account with which you are running the tool needs to be a SM Admin Account.
This application needs these 4 mandatory parameters below to be passed and the format is "-PARAMETER:VALUE" with the "-" and the ":".
-u: User Account which will be used for the DW in DOMAIN\USER format.
-sm: Service Manager Workflow Management Server NetBios.
-dw: Service Manager Data Warehouse Management Server NetBios.
-a: "add" or "rem"
Use "add" as action (-a:add) if you want to try to re-create the core objects and relationships needed.
Use "rem" as action (-a:rem) if you want to try to delete the core objects and relationships needed and afterwards register normally from the SM Console.
Example: SCSMRegisterDW.exe -u:CONTOSO\SMAdmin -sm:SMServer -dw:DWServer -a:rem
Run this tool at your own risk!
Make a full Backup of the ServiceManager and DWStagingAndConfig database before running the tool!
The best suggestion is to open a case with Microsoft Support before attempting to use the tool!
Happy Data Warehouse-ing!
So, Update Rollup 4 for System Center 2012 R2 is out as you might know
http://support.microsoft.com/kb/2992012
I encourage you to install UR4 for Service Manager because next to other fixes, there's a very important fix included here for the Data Warehouse.
This is part of the fixes and improvements we are trying to bring to the Data Warehouse: http://support.microsoft.com/kb/2989601
Data Warehouse Transform jobs have a hardcoded 60-minute time-out. Therefore, Data Warehouse jobs cannot be disabled for very long because the volume of data to be transformed can quickly pile up. This can cause an issue if the volume exceeds the amount that can be processed by the transform modules within the time-out period.
This fix included in UR4 has 2 parts: an improvement of an involved stored procedure which makes it be able to run faster + a new registry setting that can be created to be able to control the default timeout for the command, as well as changing the default from 60 to 180 seconds.
Happy updating!
There are some great articles out there which talk about troubleshooting the Data Warehouse. In this article I will be going into explaining the internals of the Data Warehouse along with troubleshooting information.
It is important to first of all understand how the DW Jobs work. There are some Jobs that are only visible by looking into the database, others may be seen in PowerShell only and the most basic ones can be seen in the Console. From the perspective of the databases, the DW Jobs are actually called Processes. From now on, I will refer to the DW Jobs as Processes in this article.
The DW Processes are actually categorized like this:
Now what you need to know is that each Process will be executed via a Schedule which is actually a Rule (Workflow that runs on a schedule and starts its owning Process). When a Process is scheduled (for execution) it will actually get a new Batch created. Think of a Process as a class and of a Batch as an instance of that class that is actually doing something. It is the same for a ProcessModule, each will get a WorkItem when it is scheduled for execution. Because a Process can have from one to many ProcessModules, when a Batch is created, it also gets all the WorkItems associated with it (ProcessModules of the Process).
So, to recap, because we will be using these a lot in this article:
Important to know and also the way the processing/execution will depend on and work, is that each Batch, as well as each WorkItem (of a Batch) will have a certain Status :
It is important to know that the actual Status is important for the WorkItems, because the Batch Status depends directly on the Statuses of its WorkItems – here are some examples:
When a Process gets created the first time (on installation or DW registration), a new Batch will be created with some BatchId (the newest one possible – starting from 1 – for the first Process). Then for each ProcessModule of this Process, a new WorkItem with a certain WorkItemId will get created for this corresponding Batch. This will all initially have the Status of Not Started and when the corresponding Schedule Rule will reach it schedule it will send a "start request" for it's Batch. This will set the Status of that Batch to Running. The actual way how these run is that there is a Rule associated with each Process and these Rules run every 30 seconds. As soon as such a Rule runs, if it sees that its corresponding Process has the (latest) Batch with a Status of Running, will take the first/next WorkItem that has a Status of Not Started (depending on dependencies and level) and will try to execute that WorkItem. If it can execute and it is successful, it will get the Status of Completed once it finishes. If it fails, then it will get the Status of Failed. It is important to know here, that it can and will get a Status of Failed also without there being an actual error – but rather it cannot start/run right now because a dependent WorkItem is not finished yet and thus the "error" is actually "waiting on workitems to finish".
Another very important thing to know is that there is also a certain synchronization method between the different Processes, so that they don't interfere with each others work. This is needed because there are some actions which change the database(s) schema, add/remove columns from tables, indexes, primary/foreign keys, etc. We would not want the ETL (Extract, Transform, Load) Processes to run and copy/transform data while we are doing such changes. Because of this, we have these implementations:
I have talked about some "invisible" Jobs which can only be seen by querying the database. There are actually a lot of those, but the "real" way to talk about them, is to go back to the Process Categories. There is a very important Process Category here which needs to be mentioned, which is the Deployment category. This is invisible because the MPSyncJob actually takes care handing work over this this category. MPSyncJob will associate Management Packs from one data source to another (ex. from CMDB to the DW). Each Management Pack that got associated will get a new Job (Process) created for it which falls under the Deployment Process Category. We can also refer to this as the "Deployment Job" if you will – and if fact, this is how you would usually hear about this. These are responsible for *actually* deploying (or installing if you will) the Management Packs which were synchronized over – without this, nothing would work because we would have no extended information in the Data Warehouse about what Classes, Relationships, etc. we have in the CMDB (and other managed data sources which we can register to the Data Warehouse).
If either of the deployment "jobs" fail or are not yet finished, then you can be more or less sure that nothing will work properly and you will 99% get errors related to either of the ETL (Extract, Transform, Load) Processes and maybe not only those – depending on where deployment is at that point.
While any of the Deployment Processes will follow the same rules as the others, that is get a Batch and one to many WorkItems, it also has another way of actually making the deployment. Each such Deployment Process (for each Management Pack) will have a new DeploySequence created (with a DeploySequenceId). Because each Management Pack has one to many "items" (Classes, Relationships, etc.) which need to be deployed, for each DeploySequence we will have one to many DeployItems created. This is important information for when you get into the actual troubleshooting part in the database.
In case you are not familiar with the basics of troubleshooting the DW Jobs from PowerShell, I suggest first starting with this article: http://technet.microsoft.com/en-us/library/hh542403.aspx
Also, if you are not familiar with what the ETL (Extract, Transform, Load) Jobs are and how these work, I suggest reading this article as well: http://blogs.technet.com/b/servicemanager/archive/2009/06/04/data-warehouse-anatomy-of-extract-transform-load-etl.aspx
Remember to always start by checking if there are any Management Packs where they Deployment Status shows as Failed in the SvcMgr Console in the Date Warehouse tab under the Management Packs view. For any such Management Pack which is Failed, you should run the "Restart Deployment" task from the Console in the tasks pane. If you are lucky, it was just some timeout or deadlock and it will succeed this time. If not, then you can always see errors about this failure as well as any other failures in the DW Jobs by looking into the Operations Manager event log directly on the SvcMgr Data Warehouse Management Server (filter by sources: Data Warehouse and Deployment).
Additionally, you can try to force reset the DW Jobs and run them in a certain order by using the script from this article: http://blogs.technet.com/b/mihai/archive/2013/07/03/resetting-and-running-the-service-manager-data-warehouse-jobs-separately.aspx
The Data Warehouse database which we are interested in when troubleshooting the DW Jobs is the DWStagingAndConfig database. Here is a list of tables of interest when troubleshoot the DW Jobs and at the end, also a file attached with useful sql queries:
1. The Infra.ProcessCategory table is where all the existing Process Categories are stored. It has a column called IsEnabled which needs to be 1 in order for any Process under this Process Category to be able to run – this is only modified (0 and 1) by the DWMaintenance and/or MPSyncJobs when they run – this is how they disable the other Processes.
2. The Infra.Process table is where all existing Processes are available and classified on Process Categories via the ProcessCategoryId column. These also have a IsEnabled column which should always be 1 in order for them to be able to execute. The only reason why either of these would have IsEnabled set to 0 is if anyone explicitly disabled these by using the Disable-SCDWJob CMDLet, which should never be done. To disable a Process (Job) always disable only its Schedule by using Disable-SCDWJobSchedule. In the screenshot below, I have explicitly filtered for only the "common" Processes, but if you will query the entire table, you will see all of them.
3. The Infra.ProcessModule table is where all existing ProcessModules are located and are classified on each Process via the ProcessId column. That is what you should use in a where clause of a sql query to see all ProcessModules that belong to a certain Process. Here is an example in the screenshot below for all ProcessModules of the DWMaintenance Process. These are not "ordered" in this result – if you want to figure out the "order" in which they would run, you need to check out the ModuleLevel column along with each dependency for each module which can be seen in the Infra.ModuleTriggerCondition table.
4. The Infra.Batch table is where all the Batches will be found (current/previous and next – check out Infra.BatchHistory for a history of these). This is where you will be able to see the BatchIds for the Processes and you can view them for a specific Process by using the ProcessId in a sql where clause. In the example screenshot below, we can see the Batches of the time the screenshot was taken for the Process with ProcessId = 1 (which in this case is the DWMaintenance Process).
5. The Infra.WorkItem table is where all the WorkItems of a Batch will be found. The most important part here is that this table is where we can see any errors for failed WorkItems of a Batch (that is an instance of a certain Process). In the screenshot below, an example of the WorkItems of the Batch with BatchId 3098 that is an instance of the DWMaintenance Process.
6. The DeploySequence table where we have an entry for each DeploySequence (so each MP that is or was deployed). You should know that in this process, a staging table is used which is called DeploySequenceStaging and if deployment finished and was successful, then this staging table should be *empty*. Here is a screenshot (not all results) of how this looks like.
7. The DeployItems table where we store an entry for each DeploySequence (so each MP that is or was deployed). You should know that in this process, a staging table is used which is called DeployItemStaging and if deployment finished and was successful, then this staging table should be *empty*. Here is a screenshot with the list of DeployItems belonging to the DeploySequence of the System.WorkItem.Incident.Library Management Pack.
In addition to that, you can always use SQL Server Profiler to get detailed information about what queries are being executed, details into different errors that might happen and of course having the queries, also a possibility of understanding the "why" behind them. The most useful column on which you can filter on is the ApplicationName column a the trace. These are the various application names used by the DW Jobs processing modules:
A cool template I usually use for the SQL Profiler Trace which is in 99% of the cases enough, has these settings (events and columns):
NOTE: Attached, you will also find a file (DWJobs_sql_queries.sql) that contains (commented) useful sql queries which you can and should use when troubleshooting the DW Jobs.
Good hunting!
Update Rollup 5 is here: http://support.microsoft.com/kb/3009517
Beside other cool fixes, this UR includes also some cool changes for the Data Warehouse component. These will make it much easier to troubleshoot possible issues that may appear.
This is part of our effort to improve the Data Warehouse and the Service Manager product in general. I hope you will like it and await more good news in the future
INFOS:
Note The System Center 2012 R2 Service Manager Update Rollups are cumulative. Therefore, this update rollup contains new fixes for the following issues along with fixes that were included in System Center 2012 R2 Service Manager Update Rollup 4, Update Rollup 3, and Update Rollup 2.
Logging the batch start and completion events for all DW job categories in event log. The start and completion event will include the following:
So, here's a cool ebook on OpsMgr for you!
Check out the description – the contents are pretty interesting as we are explaining a lot of inside knowledge about deployment, optimizations, troubleshooting and internals of the product
Enjoy!
Update Rollup 6 is out!
http://support.microsoft.com/kb/3039363
We are constantly improving SC Service Manager and we're not kidding about it
The fixes and performance improvement in Update Rollup 6 are a proof of this!
Great job guys!
Enjoy!
Hi everyone! It’s been a while since my last post. I’ve been pretty busy with migrating to the cloud, hehe
I do have some posts however, which I wanted to write and never got the chance to yet. So, let’s continue with the articles – this is one of them about troubleshooting LFX Connectors in Service Manager.
Enjoy!
First, try to concentrate only on specific stuff (if there is more than 1 issue, we need to take them one after the other):
You can also have a look at this article which explains a little bit about how LFX Connectors work and check out if BatchSize(es) for the Data that is missing is high enough in relation to how much data is getting imported:
he different types of data import template definitions for the SCCM 2012+ Connector are defined in the Microsoft.EnterpriseManagement.ServiceManager.Connector.Sms2011 MP and you can export this MP using this query to understand the definitions:
select MPName, convert(xml, MPXML)
from ManagementPack
where MPName = 'Microsoft.EnterpriseManagement.ServiceManager.Connector.Sms2011'
All this, is stored in the LFX.DataTable with specific DataName – here, we will have LFX Staging Table and View information together with the Queries used to get the data from the direct source (here SCCM site database) and afterwards from the LFX Staging Tabled, import it into the SM (CMDB) Instance Space (tables) => BME/TME, MT_ClassName and Relationship tables.
Here is a Query to have a general overview of the information needed (for now, for all the DataTables of the SCCM 2012+ Connector):
select *
from LFX.DataTable
where DataName like '%CMv5_%'
Notice how some DataNames have a “Cached_” prefix in their name. This is because:
This is important because if you look at the QueryString field, you can see the SQL Query that will be executed to get that data.
Let’s search for the type of data we are interested in (LastInventoryDate) directly in the XML definition of the MP and we find it here:
<Object Path="$Context/Path[Relationship='LFX!System.LinkingFramework.ConnectorEmbedsTables' SeedRole='Source' TypeConstraint='LFX!System.LinkingFramework.DataTable']$">
<Property Path="$Context/Property[Type='LFX!System.LinkingFramework.DataTable']/DataName$">Cached_CMv5_LogicalComputers</Property>
<Property Path="$Context/Property[Type='LFX!System.LinkingFramework.DataTable']/WatermarkField$">E.Lfx_Timestamp</Property>
<Property Path="$Context/Property[Type='LFX!System.LinkingFramework.DataTable']/WatermarkType$">0</Property>
<Property Path="$Context/Property[Type='LFX!System.LinkingFramework.DataTable']/BatchIdField$">E.Lfx_RowId</Property>
<Property Path="$Context/Property[Type='LFX!System.LinkingFramework.DataTable']/BatchIdType$">0</Property>
<Property Path="$Context/Property[Type='LFX!System.LinkingFramework.DataTable']/BatchIdSize$">500</Property>
<Property Path="$Context/Property[Type='LFX!System.LinkingFramework.DataTable']/UseCache$">false</Property>
<Property Path="$Context/Property[Type='LFX!System.LinkingFramework.DataTable']/QueryString$">
SELECT E.Lfx_RowId,
E.Lfx_SourceID,
E.Lfx_Status,
E.DisplayName,
E.PrincipalName,
E.NetbiosComputerName,
E.NetbiosDomainName,
E.OffsetInMinuteFromGreenwichTime,
E.IsVirtualMachine,
E.LastInventoryDate,
E.ActiveDirectorySite
from [LFXSTG].v_Cached_CMv5_LogicalComputers E
</Property>
<Property Path="$Context/Property[Type='LFX!System.LinkingFramework.DataTable']/CollectionName$">Cached_CMv5_LogicalComputers</Property>
<Property Path="$Context/Property[Type='LFX!System.LinkingFramework.DataTable']/DependOnDataTable$">Cached_CMv5_PhysicalComputers</Property>
</Object>
We can figure out from here, that the LastInventoryDate value is coming from the v_Cached_CMv5_LogicalComputers View (notice the v_ at the beginning of the name).
What we can also figure out already from the definition above, is that this depends on another view called Cached_CMv5_PhysicalComputers (just add a “v_” as prefix if you want to query it – and don’t forget that it belongs to the LFXSTG schema).
So, let’s have a look from SQL Management Studio at the definition of this view v_Cached_CMv5_LogicalComputers:
CREATE VIEW [LFXSTG].[v_Cached_CMv5_LogicalComputers] AS
SELECT S.Lfx_RowId,
S.Lfx_SourceID,
S.Lfx_Timestamp,
CCX.Lfx_Status,
CCX.Name0 AS 'DisplayName',
COALESCE(CCX.Name0, S.Netbios_Name0)
+ '.' + COALESCE(CCX.Domain0, S.Resource_Domain_OR_Workgr0) AS 'PrincipalName',
S.Netbios_Name0 AS 'NetbiosComputerName',
S.Resource_Domain_OR_Workgr0 AS 'NetbiosDomainName',
CCX.CurrentTimeZone0 AS 'OffsetInMinuteFromGreenwichTime',
CCX.IsVirtualMachine,
W.LastHWScan AS 'LastInventoryDate',
S.AD_Site_Name0 AS 'ActiveDirectorySite'
FROM LFXSTG.CMv5_SYSTEM S
CROSS APPLY
LFXSTG.fn_CheckCMv5CachedComputers(S.Lfx_SourceID, S.ResourceID, S.SMS_Unique_Identifier0, S.Lfx_Status) AS CCX
LEFT JOIN LFXSTG.CMv5_WORKSTATION_STATUS W
ON S.ResourceID = W.ResourceID AND S.Lfx_SourceId = W.Lfx_SourceId
WHERE S.Netbios_Name0 IS NOT NULL
AND S.Resource_Domain_OR_Workgr0 IS NOT NULL
AND CCX.Lfx_Status != 'I'
So now we figure out that we get the LastInventoryDate value from LastHWScan field of the LFXSTG.CMv5_WORKSTATION_STATUS table.
This also tells us, that this is the Data Consumer part (notice the “Cached_” prefix in the DataName). So how are we getting this data in the table?
The table name is CMv5_WORKSTATION_STATUS and so we can look for the QueryString in the LFX.DataTable by its name:
select *
from LFX.DataTable
where DataTableName = 'CMv5_WORKSTATION_STATUS'
So here we are, this is the Query we are executing directly on the SCCM site database to get the data from:
SELECT
S.ChangeAction as Lfx_Status,
S.ResourceID,
S.BatchingKey,
S.GroupKey,
S.TimeStamp,
S.LastHWScan,
S.SystemDefaultLCID,
S.TimezoneOffset,
S.LastReportVersion
FROM SCCM_Ext.vex_GS_WORKSTATION_STATUS S
INNER JOIN SCCM_Ext.vex_FullCollectionMembership CM
ON S.ResourceID = CM.ResourceID
INNER JOIN SCCM_Ext.vex_Collection C
ON C.CollectionID = CM.CollectionID
WHERE S.LastHWScan IS NOT NULL
AND C.ChangeAction = 'U' and CM.ChangeAction = 'U'
AND $COLLECTIONLIST
ORDER BY S.rowversion
We can see that we get the LastHWScan value from the SCCM_Ext.vex_GS_WORKSTATION_STATUS table/view from the SCCM site database.
Data to collect while we let the affected SCCM Connector (responsible for the chosen Computer) run until it finishes, would be:
In the SQL Profiler Trace of the SCCM Site database, look for the query that we have identified that gets the data – and ideally, run it manually to see what data you get as result from the SCCM Site database.
After this, look in the SQL Profiler Trace of the ServiceManager database and look at the other query that we identified that gets the data from the LFXSTG tables/views (as explained/identified above).
Identify the ResourceId of the Computer you are looking for and you can check if it is being synchronized by which and what Connectors using this query (on the ServiceManager database):
select
wss.Lfx_RowId,
wss.Lfx_Timestamp,
ds.DisplayName,
wss.Lfx_Status,
wss.ResourceID,
wss.BatchingKey,
wss.GroupKey,
wss.TimeStamp,
wss.LastHWScan,
wss.SystemDefaultLCID,
wss.LastReportVersion
from LFXSTG.CMv5_WORKSTATION_STATUS wss
join LFX.DataSource ds
on wss.Lfx_SourceId = ds.DataSourceId
where wss.ResourceId in (RES_ID_1, RES_ID_2, RES_ID_3, RES_ID_ETC)
Now another thing we may want to know about (as you might have noticed from the SQL Profiler queries and Batch/WaterMark fields in the LFX.DataTable), is that we import data in batches and that we also use a WaterMark field (defined in the LFX.DataTable) to keep track of where we left of the on the previous time to only get new data on next runs.
We are keeping track of the WaterMark for each DataName in the LFX.ClientWorkTable table:
select
cwt.ClientWorkTableId as [WorkTableId],
cwt.Watermark as [WaterMark],
cwt.LastSyncTime as [LastSyncTime],
ds.DataSourceName as [ConnectorName],
ds.DisplayName as [ConnectorDisplayName],
dt.DataName as [DataName],
dt.DataTableName as [DataTableName],
(
case
when dt.WatermarkType = -1 then 'None'
when dt.WatermarkType = 0 then 'DateTime'
when dt.WatermarkType = 1 then 'Timestamp'
when dt.WatermarkType = 2 then 'Number'
when dt.WatermarkType = 3 then 'Number'
end
) as [WaterMarkType]
from LFX.ClientWorkTable as cwt
inner join LFX.DataSource as ds
on ds.DataSourceId = cwt.DataSourceId
inner join LFX.DataTable as dt
on dt.DataTableId = cwt.DataTableId
To reset the WaterMark for the CMv5_WORKSTATION_STATUS DataName for example, so that the SCCM Connector will try to resynchronize all the data type available in SCCM again, we can run this query:
update cwt
set cwt.Watermark = 0x00000000
from LFX.ClientWorkTable as cwt
inner join LFX.DataSource as ds
on ds.DataSourceId = cwt.DataSourceId
inner join LFX.DataTable as dt
on dt.DataTableId = cwt.DataTableId
where
ds.DisplayName = 'REPLACE_WITH_CONNECTOR_DSIPLAYNAME_STRING_HERE' and
dt.DataName = 'CMv5_WORKSTATION_STATUS'
Ever authored a Management Pack and after it is in production for quite some time, you figured out that you need to make a breaking change – one that will require you to remove the MP first as it cannot be upgraded? Such a scenario would be needed if for some reason, you want to delete a Property of a Class for some reason.
But wait … if we remove the MP, doesn’t that mean that all the data in the database (instances & relationships) related to this MP, will be deleted? … Yup, unfortunately it does …
So, here is a PowerShell script that can be executed on a SM Management Server, which gets the (internal) name of the MP you need to remove and which will export (in memory) all the data (instances & relationships) related to the MP. It will then stop and wait for your key-press to continue.
Once it is waiting, you can delete the MP and re-import the updated version of the MP that has the change you made.
At this point, you can press any key in the PowerShell window where the script is waiting and it will start to create all instances & relationships that existed prior to removing the MP.
NOTE: Please be aware that I have tested only SOME scenarios. It might be that this will either fail, or not be able to re-create everything properly in who-knows-what-corner-case-scenarios. Please try to use and test this in pre-production if possible, and ALWAYS take a FRESH Full Backup of the ServiceManager database before attempting this in production.
$mpName = "MPClassesHostingTest" # internal MP Name goes here
$message = "Perform the needed actions here (remove MP, re-import, etc.). After finishing the actions, press any key to continue."
$props = Get-ItemProperty "HKLM:\SOFTWARE\Microsoft\System Center\2010\Common\Setup"
$instdir = $props.InstallDirectory
Add-Type -Path "$instdir`SDK Binaries\Microsoft.EnterpriseManagement.Core.dll"
$dataMap = @{}
$mp = Get-SCManagementPack -Name $mpName
$classList = Get-SCClass -ManagementPack $mp | ? { $_.Singleton -eq $false -and $_.Abstract -eq $false }
foreach($class in $classList) {
$instanceList = Get-SCClassInstance -Class $class
foreach($instance in $instanceList) {
$object = @{
instance = $instance;
rels = @{
source = Get-SCRelationshipInstance -SourceInstance $instance | ? { $_.IsDeleted -eq $false -and $_.Name -ne "System.Hosting" };
target = Get-SCRelationshipInstance -TargetInstance $instance | ? { $_.IsDeleted -eq $false -and $_.Name -ne "System.Hosting" };
};
}
$dataMap.Add($instance.EnterpriseManagementObject.Id, $object)
}
}
if($psISE) {
Add-Type -AssemblyName System.Windows.Forms
[System.Windows.Forms.MessageBox]::Show("$message")
} else {
Write-Host "$message" -ForegroundColor Yellow
$x = $Host.UI.RawUI.ReadKey("NoEcho,IncludeKeyDown")
}
# Create all objects first so that we are sure we can use them for relationships afterwards
# These need to be all there created first in order (non-Hosted first & Hosted after)
foreach($key in $($dataMap.Keys)) {
$entry = $dataMap.Item($key)
$class = Get-SCClass -Name $entry.instance.EnterpriseManagementObject.GetClasses().Name
$props = @{}
if($class.Hosted -eq $true) {
# need to handle Hosted classes differently (cannot use New-SCClassInstance cmdlet for various corner-cases)
# handling with CreatableEnterpriseManagementObject instead (need to also create parent-Hosting class together with it)
$hostingClass = $class.GetParentClasses([Microsoft.EnterpriseManagement.Configuration.DerivedClassTraversalDepth]::Recursive, $hostingRel, [Microsoft.EnterpriseManagement.Common.TraversalDepth]::OneLevel).Item(0)
$hostingClassKey = $hostingClass.GetKeyProperties().Item(0)
$object = New-Object -TypeName Microsoft.EnterpriseManagement.Common.CreatableEnterpriseManagementObject($emg, $hostingClass)
$object.Item($hostingClass, $hostingClassKey.Name).Value = $entry.instance.$($hostingClassKey.Name)
$object.Commit()
$object = New-Object -TypeName Microsoft.EnterpriseManagement.Common.CreatableEnterpriseManagementObject($emg, $class)
$object.Item($hostingClass, $hostingClassKey.Name).Value = $entry.instance.$($hostingClassKey.Name)
foreach($property in $class.GetProperties([Microsoft.EnterpriseManagement.Configuration.BaseClassTraversalDepth]::Recursive)) {
if($entry.instance.$($property.Name) -ne $null) {
$object.Item($class, $property.Name).Value = $entry.instance.$($property.Name)
}
}
$object.Commit()
$entry.instance = $object
} elseif((Get-SCClassInstance -Id $entry.instance.EnterpriseManagementObject.Id) -eq $null) {
foreach($property in $class.GetProperties([Microsoft.EnterpriseManagement.Configuration.BaseClassTraversalDepth]::Recursive)) {
if($entry.instance.$($property.Name) -ne $null) {
$props.Add($property.Name, $entry.instance.$($property.Name))
}
}
$entry.instance = New-SCClassInstance -Class $class -Property $props -PassThru
}
$dataMap.Set_Item($key, $entry)
}
# Now that we know we have all objects created, we can safely create the Relationships
foreach($key in $dataMap.Keys) {
$entry = $dataMap.Item($key)
foreach($rel in $entry.rels.Item("source")) {
$relType = Get-SCRelationship -Id $rel.RelationshipId
$target = Get-SCClassInstance -Id $rel.TargetObject.Id
if($target -ne $null) {
New-SCRelationshipInstance -RelationshipClass $relType -Source $entry.instance -Target $target
}
}
foreach($rel in $entry.rels.Item("target")) {
$relType = Get-SCRelationship -Id $rel.RelationshipId
$source = Get-SCClassInstance -Id $rel.SourceObject.Id
if($source -ne $null) {
New-SCRelationshipInstance -RelationshipClass $relType -Source $source -Target $entry.instance
}
}
}
The entire solution (source code) is available here: https://github.com/SubZer0MS/TransformWorkItemTasks
I have been asked to create a solution for Service Manager 2012 R2 that contains Console Tasks that allow users us to transform an Incident into a different WorkItem type. In this example, we are going to have 4 Console Tasks, for each of the other 4 WorkItem types beside Incident (Service Request, Change Request, Release Record, Problem). This way, we can assign permissions only to some or only for the one needed for a specific role like (ex. custom – derived from Service Request Analyst role).
This is a pretty nifty example that shows how such a thing can be done and that can be directly used, or stand as a base solution that can be changed or extended as needed
This solution will do the following (based on the Task being executed) – example for Service Request (“Transform to SR” Task):
As for permissions needed, it either needs to be run by users of the Advanced Operators role, or this can be configured granular and would need the following permissions:
The Management Pack Bundle that contains the solution and can be directly imported into Service Manager 2012 R2 is here: TransformWorkItemTasks.mpb (zip)
The entire solution (source code) is available here: https://github.com/SubZer0MS/TransformWorkItemTasks
Let’s take a little tour on this with an example of one of the Tasks (let’s continue with the “Transform to SR” Task as I have been using it as example up to this point anyway).
All the needed references are already included in the solution folders, so no need to copy/add any DLLs or MPs that are referenced. It is however, a good thing to look at, in order to see what references are being used.
This is how a new Console Task should be declared (under <Presentation> node, under <ConsoleTasks>):
<ConsoleTask ID="TransformIncidentToServiceRequest" Accessibility="Public" Enabled="true" Target="Incident!System.WorkItem.Incident" RequireOutput="false">
<Assembly>Console!SdkDataAccessAssembly</Assembly>
<Handler>Microsoft.EnterpriseManagement.UI.SdkDataAccess.ConsoleTaskHandler</Handler>
<Parameters>
<Argument Name="Assembly">CustomTransformWorkItemTasks</Argument>
<Argument Name="Type">CustomTransformWorkItemTasks.TransformTaskHandler</Argument>
<Argument>Service</Argument>
</Parameters>
</ConsoleTask>
By setting the Target to System.WorkItem.Incident, we tell SM to display this Task in views where we are displaying Incidents (when selecting an Incident).
We are declaring that the Task Handler will be the default Microsoft.EnterpriseManagement.UI.SdkDataAccess.ConsoleTaskHandler, and we are passing 3 arguments to this Method:
In the DLL we are developing for this, we are defining our Handler like this:
namespace CustomTransformWorkItemTasks
{
public class TransformTaskHandler : ConsoleCommand
{
The build-in method ExecuteCommand of the ConsoleCommand class from which we are inheriting from, which will get executed on initialization, is called , so we have to define that in our class and handle the arguments (in this example the only one argument with the value: Service) we pass from the MP there:
public override void ExecuteCommand(IList nodes, NavigationModelNodeTask task, ICollection parameters)
{
In this method, we decide what we do based on what argument we have sent from the Task (in the MP) – in this case we only pass the argument Service:
if (parameters.Contains("Service"))
{
// do stuff here - the actual work you need to be done in your task when this argument is passed by the Task
// in this case, I am setting some variables that decide what I will create later in the actual processing based on the WorkItem type (check the entire source code on GitHub)
}
This is the part does the actual magic. Notice the foreach block, we are going through all the selected Incidents here, that were selected when running the Task in the Console by enumerating through the nodes list passed as argument to the ExecuteCommand method:
try
{
ManagementPack workItemMp = emg.GetManagementPack(workItemMpName, Constants.mpKeyTocken, Constants.mpSMR2Version);
ManagementPack mpSettings = emg.GetManagementPack(workItemSettingMpName, Constants.mpKeyTocken, Constants.mpSMR2Version);
ManagementPack knowledgeLibraryMp = emg.GetManagementPack(ManagementPacks.knowledgeLibrary, Constants.mpKeyTocken, Constants.mpSMR2Version);
ManagementPackClass workItemClass = emg.EntityTypes.GetClass(workItemClassName, workItemMp);
ManagementPackClass workItemClassSetting = emg.EntityTypes.GetClass(workItemSettingClassName, mpSettings);
EnterpriseManagementObject generalSetting = emg.EntityObjects.GetObject(workItemClassSetting.Id, ObjectQueryOptions.Default);
foreach (NavigationModelNodeBase node in nodes)
{
IList bmeIdsList = new List();
bmeIdsList.Add(new Guid(node[Constants.nodePropertyId].ToString()));
ObjectProjectionCriteria incidentObjectProjection = new ObjectProjectionCriteria(incidentProjection);
ObjectQueryOptions queryOptions = new ObjectQueryOptions(ObjectPropertyRetrievalBehavior.All);
queryOptions.ObjectRetrievalMode = ObjectRetrievalOptions.Buffered;
IObjectProjectionReader incidentReader = emg.EntityObjects.GetObjectProjectionReader(incidentObjectProjection, queryOptions);
incidentReader.PageSize = 1;
EnterpriseManagementObjectProjection incident = incidentReader.GetData(bmeIdsList).FirstOrDefault();
EnterpriseManagementObjectProjection workItem = new EnterpriseManagementObjectProjection(emg, workItemClass);
if (!string.IsNullOrEmpty(workItemTemplateName))
{
ManagementPackObjectTemplateCriteria templateCriteria = new ManagementPackObjectTemplateCriteria(string.Format("Name = '{0}'", workItemTemplateName));
ManagementPackObjectTemplate template = emg.Templates.GetObjectTemplates(templateCriteria).FirstOrDefault();
if(template != null)
{
workItem.ApplyTemplate(template);
}
}
workItem.Object[workItemClass, WorkItemProperties.Id].Value = generalSetting[workItemClassSetting, workItemSettingPrefixName] + Constants.workItemPrefixPattern;
workItem.Object[workItemClass, WorkItemProperties.Title].Value = string.Format("{0} ({1})", incident.Object[incidentClass, WorkItemProperties.Title].Value, incident.Object[incidentClass, WorkItemProperties.Id].Value);
workItem.Object[workItemClass, WorkItemProperties.Description].Value = incident.Object[incidentClass, WorkItemProperties.Description].Value;
ManagementPackRelationship workItemToWorkItemRelationshipClass = emg.EntityTypes.GetRelationshipClass(RelationshipTypes.workItemRelatesToWorkItem, wiLibraryMp);
workItem.Add(incident.Object, workItemToWorkItemRelationshipClass.Target);
CreatableEnterpriseManagementObject analystComment = new CreatableEnterpriseManagementObject(emg, analystCommentClass);
analystComment[analystCommentClass, AnalystCommentProperties.Id].Value = Guid.NewGuid().ToString();
analystComment[analystCommentClass, AnalystCommentProperties.Comment].Value = string.Format(Constants.incidentClosedComment, workItemClass.Name, workItem.Object.Id.ToString());
analystComment[analystCommentClass, AnalystCommentProperties.EnteredBy].Value = EnterpriseManagementGroup.CurrentUserName;
analystComment[analystCommentClass, AnalystCommentProperties.EnteredDate].Value = DateTime.Now.ToUniversalTime();
incident.Object[incidentClass, IncidentProperties.Status].Value = incidentClosedStatus.Id;
incident.Object[incidentClass, IncidentProperties.ClosedDate].Value = DateTime.Now.ToUniversalTime();
ManagementPackRelationship incidentHasAnalystCommentRelationshipClass = emg.EntityTypes.GetRelationshipClass(RelationshipTypes.workItemHasAnalystComment, wiLibraryMp);
incident.Add(analystComment, incidentHasAnalystCommentRelationshipClass.Target);
IList relationshipsToAddList = new List()
{
workItemToWorkItemRelationshipClass,
emg.EntityTypes.GetRelationshipClass(RelationshipTypes.workItemHasCommentLog, wiLibraryMp),
emg.EntityTypes.GetRelationshipClass(RelationshipTypes.createdByUser, wiLibraryMp),
emg.EntityTypes.GetRelationshipClass(RelationshipTypes.affectedUser, wiLibraryMp),
emg.EntityTypes.GetRelationshipClass(RelationshipTypes.assignedToUser, wiLibraryMp),
emg.EntityTypes.GetRelationshipClass(RelationshipTypes.workItemHasAttachment, wiLibraryMp),
emg.EntityTypes.GetRelationshipClass(RelationshipTypes.workItemAboutConfigItem, wiLibraryMp),
emg.EntityTypes.GetRelationshipClass(RelationshipTypes.workItemRelatesToConfigItem, wiLibraryMp),
emg.EntityTypes.GetRelationshipClass(RelationshipTypes.entityToArticle, knowledgeLibraryMp)
};
foreach (ManagementPackRelationship relationship in relationshipsToAddList)
{
if (incident[relationship.Target].Any())
{
foreach (IComposableProjection itemProjection in incident[relationship.Target])
{
workItem.Add(itemProjection.Object, relationship.Target);
itemProjection.Remove();
}
}
if(incident[relationship.Source].Any())
{
foreach (IComposableProjection itemProjection in incident[relationship.Source])
{
workItem.Add(itemProjection.Object, relationship.Source);
itemProjection.Remove();
}
}
}
incident.Overwrite();
try
{
workItem.Overwrite();
}
catch (Exception ex)
{
ManagementPackEnumerationCriteria incidentActiveEnumCriteria = new ManagementPackEnumerationCriteria(string.Format("Name = '{0}'", EnumTypes.incidentStatusActive));
ManagementPackEnumeration incidentActiveStatus = emg.EntityTypes.GetEnumerations(incidentActiveEnumCriteria).FirstOrDefault();
incident.Object[incidentClass, IncidentProperties.Status].Value = incidentActiveStatus.Id;
incident.Object[incidentClass, IncidentProperties.ClosedDate].Value = null;
incident.Overwrite();
throw ex;
}
}
RequestViewRefresh();
}
catch(Exception ex)
{
MessageBox.Show(string.Format("Error: {0}: {1}\n\n{2}", ex.GetType().ToString(), ex.Message, ex.StackTrace));
}
Have fun coding!
Got a request the other day to have a look at a full memory dump of a SCCM server that was hanging due to physical memory pressure (has 32GB RAM installed).
As it is the case usually in these situations, the issue was not caused by any SCCM component, but rather something else
It’s a good opportunity though to discuss a little bit about the Windows Kernel Cache Manager. Everything will be done with Public Symbols as usual and also the heavy use of the MEX Debugger Extension (showing it’s power as well ).
Note: Even if I am using Public Symbols, I am using WinDe dbg extension which is available for some customers under NDA. Some MEX dbg extensions also will not work because they rely on WinDe dbg extension. One of these commands is !mex.mem. I will use this one for simplicity’s sake as the output is structured and easy to see. If these are not available, you will need to use !kdexts.memusage or even better, the !kdexts.vm command. I will use WinDe/MEX because the output is much nicer to see :p But, you can achieve the same with the build-in commands mentioned.
First pf all, let’s start by checking the memory status of the server – we can do this by running !mex.mem
5: kd> !mex.mem
Page File: \??\C:\pagefile.sys
Current: 10794940 Kb Free Space: 6592416 Kb ( 10.29 GB)
Minimum: 10794940 Kb Maximum: 10794940 Kb ( 10.29 GB)
Physical Memory: 8385237 ( 33540948 Kb) ( 31.99 GB)
Available Pages: 27177 ( 108708 Kb) ( 106.16 MB)
ResAvail Pages: 8193973 ( 32775892 Kb) ( 31.26 GB)
Locked IO Pages: 0 ( 0 Kb) ( 0)
Free System PTEs: 33514422 ( 134057688 Kb) ( 127.85 GB)
Modified Pages: 4073135 ( 16292540 Kb) ( 15.54 GB)
Modified PF Pages: 3 ( 12 Kb) ( 12.00 KB)
Modified No Write Pages: 0 ( 0 Kb) ( 0)
NonPagedPool 0: 8326 ( 33304 Kb) ( 32.52 MB)
NonPagedPool 1: 6844 ( 27376 Kb) ( 26.73 MB)
NonPagedPool 2: 0 ( 0 Kb) ( 0)
NonPagedPool 3: 0 ( 0 Kb) ( 0)
NonPagedPool Usage: 36694 ( 146776 Kb) ( 143.34 MB) Current pool size: 981.94 MB
NonPagedPool Max: 6263793 ( 25055172 Kb) ( 23.89 GB)
PagedPool 0: 107157 ( 428628 Kb) ( 418.58 MB)
PagedPool 1: 12274 ( 49096 Kb) ( 47.95 MB)
PagedPool 2: 6081 ( 24324 Kb) ( 23.75 MB)
PagedPool 3: 0 ( 0 Kb) ( 0)
PagedPool 4: 38 ( 152 Kb) ( 152.00 KB)
PagedPool Usage: 125550 ( 502200 Kb) ( 490.43 MB) WorkingSet: 286.58 MB, PeakWorkingSet: 517.27 MB
PagedPool Maximum: 33554432 ( 134217728 Kb) ( 128.00 GB)
Processor Commit: 3751 ( 15004 Kb) ( 14.65 MB)
Session Commit: 12901 ( 51604 Kb) ( 50.39 MB)
Syspart SharedCommit 0
Shared Commit: 17255 ( 69020 Kb) ( 67.40 MB)
Special Pool: 0 ( 0 Kb) ( 0)
Kernel Stacks: 13455 ( 53820 Kb) ( 52.56 MB)
Pages For MDLs: 1859 ( 7436 Kb) ( 7.26 MB)
Pages For AWE: 0 ( 0 Kb) ( 0)
NonPagedPool Commit: 0 ( 0 Kb) ( 0)
PagedPool Commit: 125678 ( 502712 Kb) ( 490.93 MB)
Driver Commit: 4240 ( 16960 Kb) ( 16.56 MB)
Boot Commit: 0 ( 0 Kb) ( 0)
System PageTables: 0 ( 0 Kb) ( 0)
VAD/PageTable Bitmaps: 3882 ( 15528 Kb) ( 15.16 MB)
ProcessLockedFilePages: 0 ( 0 Kb) ( 0)
Pagefile Hash Pages: 0 ( 0 Kb) ( 0)
Sum System Commit: 183021 ( 732084 Kb) ( 714.93 MB)
Total Private: 5012200 ( 20048800 Kb) ( 19.12 GB)
Misc/Transient Commit: 170655 ( 682620 Kb) ( 666.62 MB)
Committed pages: 5365876 ( 21463504 Kb) ( 20.47 GB)
Commit limit: 11083507 ( 44334028 Kb) ( 42.28 GB)
*** Low Available Pages (Standby+Zeroed+Free). This can cause performance issues. ***
Virtual Memory Physical Memory File Cache Cache Writes
Notice these 2 interesting things from the output:
What is the MiModifiedPageWriter (Modified Page Writer) thread (running in System context) doing – is it blocked by anything?
To switch to the System process, we need to find it’s address – we can do this by using !mex.tl and passing it the name of the process we are interested in:
5: kd> !mex.tl system
PID Address Name
=========== ================ =========================================
0x4 0n4 fffffa8019a6b040 System
0x1b8 0n440 fffffa8028f02060 svchost.exe(LocalSystemNetworkRestricted)
=========== ================ =========================================
PID Address Name
Warning! Zombie process(es) detected (not displayed). Count: 2 [zombie report]
Ok, got it (fffffa8019a6b040) – let’s switch to it by running !mex.p and passing it this address:
5: kd> !mex.p fffffa8019a6b040
Name Address Ses PID User Name Create Time Up Time Mods Handle Act Thrd Z Thrd Parent
====== ================ ==== ======= =============== ========================== ============ ==== ====== ======== ====== ==============
System fffffa8019a6b040 none 4 (0n4) DOMAIN\COMPUTERNAME$ 06/26/2017 12:11:57.995 PM 2h:03:22.004 0 1176 204 0 Idle 0 (0n0)
Memory Details:
VM Peak Work Set Commit Size
======== ======== ======== ===========
23.18 MB 27.99 MB 18.49 MB 556 KB
CPU Details:
User Kernel Total
==== ========== ==========
0s 24m:33.696 24m:33.696
Show Threads: Unique Stacks !mex.listthreads (!lt) fffffa8019a6b040 !winde.lp fffffa8019a6b040 !process fffffa8019a6b040 7
Right! We are in the context of the System process, let’s find the Modified Page Writer thread – it should have this function call on the stack “nt!MiModifiedPageWriter“.
Being in the context of the System process we can use MEX again to search in all available threads for a thread stack that matches what we are looking for – we can use !mex.us and pass it the string we are looking for “nt!MiModifiedPageWriter“:
5: kd> !mex.us nt!MiModifiedPageWriter
1 thread [stats]: fffffa8019ab43c0
fffff80001eda6aa nt!KiSwapContext+0x7a
fffff80001edd142 nt!KiCommitThreadWait+0x1d2
fffff80001e9c03b nt!KeWaitForGate+0xfb
fffff80001e73c3a nt!MiModifiedPageWriter+0x5a
fffff80002170f12 nt!PspSystemThreadStartup+0x5a
fffff80001ec9de6 nt!KiStartSystemThread+0x16
Threads matching filter: 1 out of 204
Cool, found it! Now we want to see more information about what this thread is doing, it’s state, etc. We can do this with !mex.t and passing it the thread address (fffffa8019ab43c0):
5: kd> !mex.t fffffa8019ab43c0
Process Thread CID UserTime KernelTime ContextSwitches Wait Reason Time
System (fffffa8019a6b040) fffffa8019ab43c0 4.ac 0s 1s.154 3135 WrFreePage 3s.650
WaitBlockList:
Object Name Type Other Waiters
fffff80002092e00 nt!MmModifiedPageWriterGate+0x0 Gate 0
Priority:
Current Base UB FB IO Page
17 8 0 0 2 5
# Child-SP Return Call Site
0 fffff880029fdac0 fffff80001edd142 nt!KiSwapContext+0x7a
1 fffff880029fdc00 fffff80001e9c03b nt!KiCommitThreadWait+0x1d2
2 fffff880029fdc90 fffff80001e73c3a nt!KeWaitForGate+0xfb
3 fffff880029fdce0 fffff80002170f12 nt!MiModifiedPageWriter+0x5a
4 fffff880029fdd40 fffff80001ec9de6 nt!PspSystemThreadStartup+0x5a
5 fffff880029fdd80 0000000000000000 nt!KiStartSystemThread+0x16
It’s not blocked by anything, it is just waiting on nt!MmModifiedPageWriterGate which is a (nt!_KGATE) Gate object (short said, event to get triggered to start doing work) – we can have a look at the Gate object using dx and passing it the address (fffff80002092e00):
0: kd> dx -r1 ((nt!_KGATE *)0xfffff80002092e00)
((nt!_KGATE *)0xfffff80002092e00) : 0xfffff80002092e00 [Type: _KGATE *]
[+0x000] Header [Type: _DISPATCHER_HEADER]
0: kd> dx -r1 ((ntkrnlmp!_DISPATCHER_HEADER *)0xfffff80002092e00)
((ntkrnlmp!_DISPATCHER_HEADER *)0xfffff80002092e00) : 0xfffff80002092e00 [Type: _DISPATCHER_HEADER *]
[+0x000] Type : 0x7 [Type: unsigned char]
[+0x001] TimerControlFlags : 0x1 [Type: unsigned char]
[+0x001 ( 0: 0)] Absolute : 0x1 [Type: unsigned char]
[+0x001 ( 1: 1)] Coalescable : 0x0 [Type: unsigned char]
[+0x001 ( 2: 2)] KeepShifting : 0x0 [Type: unsigned char]
[+0x001 ( 7: 3)] EncodedTolerableDelay : 0x0 [Type: unsigned char]
[+0x001] Abandoned : 0x1 [Type: unsigned char]
[+0x001] Signalling : 0x1 [Type: unsigned char]
[+0x002] ThreadControlFlags : 0x6 [Type: unsigned char]
[+0x002 ( 0: 0)] CpuThrottled : 0x0 [Type: unsigned char]
[+0x002 ( 1: 1)] CycleProfiling : 0x1 [Type: unsigned char]
[+0x002 ( 2: 2)] CounterProfiling : 0x1 [Type: unsigned char]
[+0x002 ( 7: 3)] Reserved : 0x0 [Type: unsigned char]
[+0x002] Hand : 0x6 [Type: unsigned char]
[+0x002] Size : 0x6 [Type: unsigned char]
[+0x003] TimerMiscFlags : 0x0 [Type: unsigned char]
[+0x003 ( 5: 0)] Index : 0x0 [Type: unsigned char]
[+0x003 ( 6: 6)] Inserted : 0x0 [Type: unsigned char]
[+0x003 ( 7: 7)] Expired : 0x0 [Type: unsigned char]
[+0x003] DebugActive : 0x0 [Type: unsigned char]
[+0x003 ( 0: 0)] ActiveDR7 : 0x0 [Type: unsigned char]
[+0x003 ( 1: 1)] Instrumented : 0x0 [Type: unsigned char]
[+0x003 ( 5: 2)] Reserved2 : 0x0 [Type: unsigned char]
[+0x003 ( 6: 6)] UmsScheduled : 0x0 [Type: unsigned char]
[+0x003 ( 7: 7)] UmsPrimary : 0x0 [Type: unsigned char]
[+0x003] DpcActive : 0x0 [Type: unsigned char]
[+0x000] Lock : 393479 [Type: long]
[+0x004] SignalState : 0 [Type: long]
[+0x008] WaitListHead [Type: _LIST_ENTRY]
Hmm, what else? Well we also have another system thread that is writing modified pages (aka dirty pages) – this is being used to write memory mapped files that have been modified (these are usually being backed by the page file) – nt!MiMappedPageWriter – same as before !mex.us) and then using !mex.t to have more detailed information:
5: kd> !mex.us nt!MiMappedPageWriter
1 thread [stats]: fffffa8019ab5040
fffff80001eda6aa nt!KiSwapContext+0x7a
fffff80001edd142 nt!KiCommitThreadWait+0x1d2
fffff80001ede056 nt!KeDelayExecutionThread+0x186
fffff80001f117bc nt!MiGatherMappedPages+0x8c
fffff80001f122f8 nt!MiMappedPageWriter+0x198
fffff80002170f12 nt!PspSystemThreadStartup+0x5a
fffff80001ec9de6 nt!KiStartSystemThread+0x16
Threads matching filter: 1 out of 204
5: kd> !mex.t fffffa8019ab5040
Process Thread CID UserTime KernelTime ContextSwitches Wait Reason Time
System (fffffa8019a6b040) fffffa8019ab5040 4.b0 0s 187ms 133350 DelayExecution 15ms
Priority:
Current Base UB FB IO Page
17 8 0 0 2 5
# Child-SP Return Call Site
0 fffff88002b08910 fffff80001edd142 nt!KiSwapContext+0x7a
1 fffff88002b08a50 fffff80001ede056 nt!KiCommitThreadWait+0x1d2
2 fffff88002b08ae0 fffff80001f117bc nt!KeDelayExecutionThread+0x186
3 fffff88002b08b50 fffff80001f122f8 nt!MiGatherMappedPages+0x8c
4 fffff88002b08c50 fffff80002170f12 nt!MiMappedPageWriter+0x198
5 fffff88002b08d40 fffff80001ec9de6 nt!PspSystemThreadStartup+0x5a
6 fffff88002b08d80 0000000000000000 nt!KiStartSystemThread+0x16
Also not blocked, it’s normally waiting in nt!KeDelayExecutionThread which means that after some time, it will run again and then wait again and so on (it basically runs on an interval).
But what files are being currently cached and how big are they in memory? We can have a look at this using !mex.mem -fc and we get this here:
5: kd> !mex.mem -fc
Name ControlArea FsContext Valid StandbyDirty Shared Locked Total
=================================================================================== ================ ================ ======== ============ ====== ====== ========
[...SNIPPED...]
$LogFile fffffa801fc6b990 fffffa801f656e10 860 KB 860 KB
CM_123.mdf fffffa802a3dd490 fffff8a003a85140 1.75 MB 1.75 MB
$Mft fffffa801ce9d4a0 fffffa801ce7dbb0 2.82 MB 2.82 MB
CM_123.mdf fffffa80204ffd30 fffff8a003423140 10.69 MB 15.54 GB 15.55 GB
Virtual Memory Physical Memory File Cache Cache Writes
Interesting … there is a file called CM_123.mdf that is using 15.54GB – that is half of our physical memory !!! Ha, there we have it!
Let’s have a look at more details (ex. it’s path, etc.). This address (fffffa80204ffd30) to the file will point to a Control Area which is s structure the Kernel creates for the image file – we can use !kdexts.ca with this address and get more information:
5: kd> !kdexts.ca fffffa80204ffd30
ControlArea @ fffffa80204ffd30
Segment fffff8a004a8b940 Flink 0000000000000000 Blink 0000000000000000
Section Ref 1 Pfn Ref 3e30f0 Mapped Views 19f
User Ref 0 WaitForDel 0 Flush Count 0
File Object fffffa802a4144f0 ModWriteCount 0 System Views 19f
WritableRefs 0
Flags (8080) File WasPurged
\SCCMSQLBackup\123Backup\SiteDBServer\CM_123.mdf
Segment @ fffff8a004a8b940
ControlArea fffffa80204ffd30 ExtendInfo 0000000000000000
Total Ptes 1745000
Segment Size 1745000000 Committed 0
Flags (c0000) ProtectionMask
Subsection 1 @ fffffa80204ffdb0
ControlArea fffffa80204ffd30 Starting Sector 0 Number Of Sectors 800000
Base Pte fffff8a014a00000 Ptes In Subsect 800000 Unused Ptes 0
Flags d Sector Offset 0 Protection 6
Accessed
Flink fffffa801fc069d0 Blink fffffa802a3dd560 MappedViews 0
Subsection 2 @ fffffa801fc0a010
ControlArea fffffa80204ffd30 Starting Sector 800000 Number Of Sectors 800000
Base Pte fffff8a01ca00000 Ptes In Subsect 800000 Unused Ptes 0
Flags d Sector Offset 0 Protection 6
Accessed
Flink 0000000000000000 Blink 0000000000000000 MappedViews 19f
Subsection 3 @ fffffa801fc54b30
ControlArea fffffa80204ffd30 Starting Sector 1000000 Number Of Sectors 745000
Base Pte 0000000000000000 Ptes In Subsect 745000 Unused Ptes 0
Flags c Sector Offset 0 Protection 6
Flink 0000000000000000 Blink 0000000000000000 MappedViews 0
It really seems to be something related to SCCM – but which process is holding this file?
In this moment, I am guessing that we are already doing Cache Write Throttling – this is a mechanism used to improve server performance when we may delay the writes. This is something that will be triggered when nt!CcTotalDirtyPages will be bigger than nt!CcDirtyPageThreshold.
From the Windows Internals book:
“The file system and cache manager must determine whether a cached write request will affect system performance and then schedule any delayed writes. First the file system asks the cache manager whether a certain number of bytes can be written right now without hurting performance by using the CcCanIWrite function and blocking the write if necessary. For asynchronous I/O, the file system sets up a callback with the cache manager for automatically writing the bytes when writes are again permitted by calling CcDeferWrite. Otherwhise, it just blocks and waits on CcCanIWrite to continue. Once it’s notified of an impending write operation, the cache manager determines how many firty pages are in the cache and how much physical memory is available. If few physical pages are free, the cache manager momentarily blocks the file system thread that’s requesting to write data to the cache. The cache manager’s lazy writer flushes some of the dirty pages to disk and then allows the blocked file system thread to continue. This throttling prevents system performance from degrading because of a lack of memory when a file system or network server issues a large write operation“.
We have a very cool command we can use to check this which will also show us which (if any) files are currently in this state (from where we can get the file handle) – !kdexts.defwrites:
5: kd> !kdexts.defwrites
*** Cache Write Throttle Analysis ***
CcTotalDirtyPages: 4075797 (16303188 Kb)
CcDirtyPageThreshold: 4058054 (16232216 Kb)
AvailablePages: 27177 ( 108708 Kb)
ThrottleTop: 450 ( 1800 Kb)
ThrottleBottom: 80 ( 320 Kb)
ModifiedPages: 4073135 (16292540 Kb)
CcTotalDirtyPages >= CcDirtyPageThreshold, writes throttled
Check these thread(s): CcWriteBehind(LazyWriter)
Check critical workqueue for the lazy writer, !exqueue 16
Cc Deferred Write list: (CcDeferredWrites)
File: fffffa802a4144f0 Event: fffff880074906d0
File: fffffa8029c08070 Event: fffff88007968640
Yup, sure enough, we are currently doing throttling and here are the 2 file handles we have fffffa802a4144f0 & fffffa8029c08070.
Nice, so now that we have the handles, we can use !kdexts.fileobj and give it the address to understand what files these are:
5: kd> !kdexts.fileobj fffffa802a4144f0
\SCCMSQLBackup\123Backup\SiteDBServer\CM_123.mdf
Device Object: 0xfffffa801f7d07c0 \Driver\volmgr
Vpb: 0xfffffa8029683ad0
Event signalled
Access: Read Write Delete
Flags: 0x43062
Synchronous IO
Sequential Only
Cache Supported
Modified
Size Changed
Handle Created
File Object is currently busy and has 0 waiters.
FsContext: 0xfffff8a003423140 FsContext2: 0xfffff8a00f5d3500
Private Cache Map: 0xfffffa802029b9c0
CurrentByteOffset: edb0f0000
Cache Data:
Section Object Pointers: fffffa8029479818
Shared Cache Map: fffffa802029b850 File Offset: ffffffffdb0f0000
Ha! Exactly the one we are looking for (using 15.54GB of RAM): \SCCMSQLBackup\123Backup\SiteDBServer\CM_123.mdf
No need to look at the other one now, it’s the one we are interested in. So ok, what SCCM process is currently having a handle open on this file? We can use !mex.findhandle with the address of the file handle (fffffa802a4144f0) to get the process (you can use !kdexts.findhandle instead):
5: kd> !mex.findhandle fffffa802a4144f0
smssqlbkup.exe fffffa802935eb10
384
Ok, let’s have a look at what this SCCM process (smssqlbkup.exe) is doing by switching to it now that we have the address (fffffa802935eb10):
5: kd> !mex.p fffffa802935eb10
Name Address Ses PID User Name Create Time Up Time Mods Handle Act Thrd Z Thrd Parent
============== ================ === ============ =============== ========================== ============ ==== ====== ======== ====== ========================
smssqlbkup.exe fffffa802935eb10 0 614 (0n1556) DOMAIN\COMPUTERNAME$ 06/26/2017 12:13:00.723 PM 2h:02:19.276 54 221 8 0 services.exe 264 (0n612)
Command Line: "D:\SMS_FOLDER\bin\x64\smssqlbkup.exe"
Memory Details:
VM Peak Work Set Commit Size PP Quota NPP Quota
======== ========= ======== =========== ========= =========
86.53 MB 107.85 MB 66.95 MB 12.94 MB 137.89 KB 15.38 KB
CPU Details:
User Kernel Total
====== ========= =========
2s.403 2m:21.447 2m:23.850
Show Threads: Unique Stacks !mex.listthreads (!lt) fffffa802935eb10 !winde.lp fffffa802935eb10 !process fffffa802935eb10 7
Show LPC Port information for process
Let’s have a look at the list of threads and afterwards we should have a look at each thread – we can do this using !mex.lt now that we are already in the context of this process:
5: kd> !mex.lt
Process PID Thread Id Time Reason
============== === ================ ==== ======= ==============
smssqlbkup.exe 614 fffffa8029361b50 5c0 2s.371 UserRequest
smssqlbkup.exe 614 fffffa801f6a5b50 53c 43s.649 DelayExecution
smssqlbkup.exe 614 fffffa801f6a55c0 874 514ms Executive
smssqlbkup.exe 614 fffffa80293b2b50 c38 43s.649 UserRequest
smssqlbkup.exe 614 fffffa80295c0b50 7d4 43s.649 UserRequest
smssqlbkup.exe 614 fffffa801f929b50 11fc 43s.649 WrQueue
smssqlbkup.exe 614 fffffa8021324060 1af4 2s.386 WrQueue
smssqlbkup.exe 614 fffffa8029610060 10a0 2s.371 WrQueue
Thread Count: 8
Looking through each thread, we find this veeery interesting thread (fffffa801f6a55c0):
5: kd> !mex.t fffffa801f6a55c0
Process Thread CID UserTime KernelTime ContextSwitches Wait Reason Time COM-Initialized
smssqlbkup.exe (fffffa802935eb10) fffffa801f6a55c0 614.874 1s.841 2m:21.228 36298 Executive 514ms APTKIND_MULTITHREADED (MTA)
WaitBlockList:
Object Type Other Waiters
fffff880074906d0 NotificationEvent 0
# Child-SP Return Call Site
0 fffff88007490430 fffff80001edd142 nt!KiSwapContext+0x7a
1 fffff88007490570 fffff80001edf96f nt!KiCommitThreadWait+0x1d2
2 fffff88007490600 fffff80001ea2527 nt!KeWaitForSingleObject+0x19f
3 fffff880074906a0 fffff880016ebcc8 nt!CcCanIWrite+0xfffffffffffa11f7
4 fffff88007490770 fffff88001401102 Ntfs!NtfsCopyWriteA+0x68
5 fffff88007490970 fffff880014048ba fltmgr!FltpPerformFastIoCall+0xf2
6 fffff880074909d0 fffff8800142283e fltmgr!FltpPassThroughFastIo+0xda
7 fffff88007490a10 fffff800021f1bbe fltmgr!FltpFastIoWrite+0x1ce
8 fffff88007490ab0 fffff80001ed70d3 nt!NtWriteFile+0x5ad
9 fffff88007490bb0 0000000077a2bdba nt!KiSystemServiceCopyEnd+0x13
a 00000000014cce38 000007fefd73865c ntdll!NtWriteFile+0xa
b 00000000014cce40 00000000778d171a KERNELBASE!WriteFile+0xfe
c 00000000014cceb0 00000000778d1ea5 kernel32!BaseCopyStream+0x4b2
d 00000000014cd8a0 00000000778d1907 kernel32!BasepCopyFileExW+0x545
e 00000000014cdda0 00000000779556f2 kernel32!CopyFileExW+0x97
f 00000000014cde20 000000013ff6fb4a kernel32!CopyFileA+0x62
10 00000000014cde80 000000013ff81a5c smssqlbkup+0x9fb4a
11 00000000014cdf10 000000013ff7c430 smssqlbkup+0xb1a5c
12 00000000014ce280 000000013ff7bb5a smssqlbkup+0xac430
13 00000000014cf3b0 000000013ff7753f smssqlbkup+0xabb5a
14 00000000014cf400 000000013ff7987c smssqlbkup+0xa753f
15 00000000014cf6f0 000000013ff85643 smssqlbkup+0xa987c
16 00000000014cf7c0 0000000140155d55 smssqlbkup+0xb5643
17 00000000014cf9e0 0000000140155bec smssqlbkup+0x285d55
18 00000000014cfa10 00000000778d59cd smssqlbkup+0x285bec
19 00000000014cfa40 0000000077a0a561 kernel32!BaseThreadInitThunk+0xd
1a 00000000014cfa70 0000000000000000 ntdll!RtlUserThreadStart+0x1d
Well, well, well – what do we have on the stack here?
So, it seems that we either have a very slow disk or we have a problem with some disk related driver. Let’s check for any pending disk IRPs – we can do this by running !mex.ioq (you can use !kdexts.irpfind, but will need to parse the results manually):
5: kd> !mex.ioq
Collecting Data...This might take a moment.
Collecting Raid Data...Returned Failure
Collecting Disk Data...Added Object Disk with Filter DISK
Returned Success
================== Disk.sys Devices ==========================
Device Name DeviceObject DeviceExtension PrivateFdoData PendingIRPs
===================== ================ ================ ================ ===========
\Device\Harddisk0\DR0 fffffa801cd28060 fffffa801cd281b0 fffffa801cd29010 2
==================RaidPort Info: \Device\RaidPort2 ==========================
Device Name Miniport DeviceExtension BusType IoModel
=================================== ======== ================ =========== =========================
\Device\RaidPort2(fffffa801a668060) somestoragedriver fffffa801a6681b0 BusTypeRAID StorSynchronizeFullDuplex
RaidUnit(LUN) ClassPNPDevice DeviceState QueueState OutstandingCount QueueDepth PendingIrps
================ ================ ================== ========== ================ ========== ===========
fffffa801a6b2930 fffffa801cd28060 DeviceStateWorking Normal 2 254 2
Summary Notes:
IOQ found 80 devices, of which 2 have a total of 4 pending Irps.
Alright, so we have pending disk IRPs – let’s have a look at the IRP queue now that we have the DeviceExtension address (fffffa801cd281b0) by using !mex.ioq again, but this time with the “-e” parameter with the address (fffffa801cd281b0):
5: kd> !mex.ioq -e fffffa801cd281b0
============================================
Disk Information: \Device\Harddisk0\DR0
============================================
ClassPNP Devices(Disk.sys) Information:
============================================
Device Name DeviceObject DeviceExtension PrivateFdoData PendingIRPs
===================== ================ ================ ================ ===========
\Device\Harddisk0\DR0 fffffa801cd28060 fffffa801cd281b0 fffffa801cd29010 2
Transfer Packet Current Irp Thread Wait Thread #Retries TimeOut (s) DeviceName CurrentDrvr Original Irp FileName Srb SrbStatus
================ ================ =========== ============================== ======== =========== =============== =========== ================ ============= ================ ==================
fffffa801d11a310 fffffa801d11b010 15ms fffffa80213cc430(RAMMap64.exe) 0 0 HarddiskVolume2 somestoragedriver fffffa8029b27010 \pagefile.sys fffffa801d11a430 SRB_STATUS_PENDING
fffffa801d1fdb10 fffffa801d1fd910 0s fffffa8029573b50(sqlservr.exe) 0 0 HarddiskVolume2 somestoragedriver fffffa801f8ab980 \pagefile.sys fffffa801d1fdc30 SRB_STATUS_PENDING
Wow … stuck IRPs on writing to the page file (pagefile.sys) – remember that the memory mapped files are usually backed by the page file.
So, now it’s clear and it does not matter that that we do not see our file here – this is normal because at the time the dump was taken, the we were waiting in nt!CcCanIWrite and it seems that we did not have any pending IRP for it currently.
The important thing is that we have a disk related issue here.
Let’s have a quick look at both IRPs in more detail by using !mex.mirp with the “-v” parameter (for verbose) and each IRP addresses (fffffa801d11b010 & fffffa801d1fd910) (you can use !kdexts.irp instead):
5: kd> !mex.mirp -v fffffa801d11b010
Irp Details: fffffa801d11b010 [ verbose | !ddt | !winde.io | !irp ]
Mdl : fffffa801fd8a7b0
System buffer :
Issuing Process :
Thread :
Frame Count : 2
IoStatus Status : c00000bb
IoStatus Info : 0000000000000000
Requester Mode :
Cancel : 0
Cancel IRQL : 0
Apc Environment : 0
User Iosb : 0000000000000000
User Event :
APC : 0000000000000000
Completion Key : 0000000000000000
Cancel Routine :
Original File Object: 0000000000000000
Original File Name :
Irp Stack Frame(s)
# Driver Major Minor Dispatch Routine Control Code Flg Ctrl Status Completion Invoker(s) Device QueueLocation File Context Completion Routine Args
=== ================ ======================= ===== ======================== ============ === ==== ======= ====================== ================ ============= ====== ====================== ============================ ===============================================================================
1 CREATE 0 0 0 None 0000000000000000 (null) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
->2 \Driver\somestoragedriver INTERNAL_DEVICE_CONTROL 0 storport!RaDriverScsiIrp 12 e1 Pending Cancel, Success, Error fffffa801a6b27e0 (null) fffffa801d11a310(CPnp) CLASSPNP!TransferPktComplete fffffa801d11a430(CPnp) 0000000000000000 0000000000000000 fffffa801a6b2930(Devi)
IO Status: 0xc00000bb (The request is not supported.)
5: kd> !mex.mirp -v fffffa801d1fd910
Irp Details: fffffa801d1fd910 [ verbose | !ddt | !winde.io | !irp ]
Mdl : fffffa801f5b9670
System buffer :
Issuing Process :
Thread :
Frame Count : 2
IoStatus Status : c00000bb
IoStatus Info : 0000000000000000
Requester Mode :
Cancel : 0
Cancel IRQL : 0
Apc Environment : 0
User Iosb : 0000000000000000
User Event :
APC : 0000000000000000
Completion Key : 0000000000000000
Cancel Routine :
Original File Object: 0000000000000000
Original File Name :
Irp Stack Frame(s)
# Driver Major Minor Dispatch Routine Control Code Flg Ctrl Status Completion Invoker(s) Device QueueLocation File Context Completion Routine Args
=== ================ ======================= ===== ======================== ============ === ==== ======= ====================== ================ ============= ====== ====================== ============================ ===============================================================================
1 CREATE 0 0 0 None 0000000000000000 (null) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
->2 \Driver\somestoragedriver INTERNAL_DEVICE_CONTROL 0 storport!RaDriverScsiIrp 12 e1 Pending Cancel, Success, Error fffffa801a6b27e0 (null) fffffa801d1fdb10(CPnp) CLASSPNP!TransferPktComplete fffffa801d1fdc30(CPnp) 0000000000000000 0000000000000000 fffffa801a6b2930(Devi)
IO Status: 0xc00000bb (The request is not supported.)
Ok, there we go, we are stuck with these IRPs inside of the disk vendor driver (called “somestoragedriver” here for obvious purposes). The cause these are stuck is that we get the IO Status 0xc00000bb from the IOCTL, which means STATUS_NOT_SUPPORTED (The request is not supported).
Last thing – let’s check some information about this driver (like how old it is) – we can do this by running lmtm (or any other lm* command) and give it the name of the driver (somestoragedriver):
5: kd> lmtm somestoragedriver
Browse full module list
start end module name
fffff880`01549000 fffff880`01559000 somestoragedriver Mon Aug 9 21:21:24 2010 (4C604724)
Wow … this is awesome … we are in July 2017 and this disk driver is from August 2010 … niiice …
So, what we need to do is:
Sure enough, after updating the “somestoragedriver.sys” driver, the issue was solved and the SCCM server stopped eating up memory until it gets into a hung state.
WOOT!
PS: We can also get to the file directly by running !mex.mem -p, but this command usually takes a very long time (couple of hours sometimes, depending how big the dump is – this can also be achieved with !kdexts.memusage, but the output will need to be looked at manually – not summary for the Mapped Files).
The reason why it takes a lot of time, is that it has to scan the PFN (Page Frame Number) database and process all the data.
5: kd> !mex.mem -p
Loading PFN database... Building kernel map...this will take a while.
Top physical RAM users:
Type Name ControlArea/PID Valid Standby Dirty Shared Locked PageTables TOTAL
=========== ============== ================ ========= ======== ======== ======= ========= ========== =========
Mapped File CM_123.mdf FFFFFA80204FFD30 10.69 MB 15.54 GB 15.55 GB
Process sqlservr.exe 14866272 14.18 GB 79.96 MB 16 KB n/a n/a 35.34 MB 14.29 GB
Process ReportingServi FFFFFA80292A6B10 241.43 MB 2.62 MB 16 KB n/a n/a 2.28 MB 246.35 MB
System NonPaged Pool 184.33 MB n/a 184.33 MB 184.33 MB
Process RAMMap64.exe FFFFFA8020C87380 111.77 MB 4.94 MB n/a n/a 2.06 MB 118.77 MB
Process some_other_process FFFFFA8029411060 97.23 MB 1.47 MB n/a n/a 1.18 MB 99.88 MB
System Kernel Stacks 53.94 MB 44 KB n/a 992 KB 128 KB 54.11 MB
Process procexp64.exe FFFFFA80297EB060 36.13 MB n/a n/a 608 KB 36.72 MB
Process svchost.exe FFFFFA8028EF25D0 28.81 MB 1.97 MB n/a n/a 732 KB 31.49 MB
Process explorer.exe FFFFFA8029673A40 23.09 MB 196 KB n/a n/a 724 KB 23.99 MB
Process perfmon.exe FFFFFA802976A6C0 21.61 MB 160 KB n/a n/a 460 KB 22.21 MB
Mapped File sqlservr.exe FFFFFA8029838AC0 20.22 MB 20.22 MB
Process explorer.exe FFFFFA80296AAB10 19.06 MB 120 KB n/a n/a 664 KB 19.82 MB
Process lsass.exe FFFFFA801F106A70 18.47 MB 96 KB n/a n/a 352 KB 18.91 MB
Process WmiPrvSE.exe FFFFFA80294AF060 18.11 MB 4 KB n/a n/a 548 KB 18.64 MB
Process smsexec.exe FFFFFA801F5D0B10 17.64 MB 40 KB n/a n/a 724 KB 18.38 MB
Process procexp64.exe FFFFFA802A574B10 17.18 MB 40 KB n/a n/a 460 KB 17.66 MB
Mapped File shell32.dll FFFFFA801EF80680 13.5 MB 3.12 MB 16.62 MB
Process svchost.exe FFFFFA8028F0BB10 15.03 MB 28 KB n/a n/a 432 KB 15.48 MB
Process some_process FFFFFA8029018B10 14.19 MB 196 KB n/a n/a 432 KB 14.8 MB
=========== ============== ================ ========= ======== ======== ======= ========= ========== =========
Type Name ControlArea/PID Valid Standby Dirty Shared Locked PageTables TOTAL
Only the top 20 of 1612 rows are being shown. Show all
Virtual Memory Physical Memory File Cache Cache Writes
Here is how it would look like with !kdexts.memusage with all other stuff snipped out:
5: kd> !kdexts.memusage
[...SNIPPED...]
Usage Summary (in Kb):
Control Valid Standby Dirty Shared Locked PageTables name
ffffffffffffd 8384 0 0 0 0 0 AWE
fffffa8019a34010 440 0 0 184 0 0 mapped_file( winhttp.dll )
fffffa8019ac18d0 16 0 0 0 0 0 mapped_file( netutils.dll )
[...SNIPPED...]
fffffa80204ffd30 10944 0 16292096 0 0 0 mapped_file( CM_123.mdf )
[...SNIPPED...]
Don’t forget to read-up on your Windows Internals
So, wow, this was nice! … a super one-off scenario related to SCOM. This is very interesting though from a point of view of debugging and can theoretically happen in any application, so we are going to have a look at how to debug (and automate debugging) of the DLL Loader component (LDR) from ntdll.dll.
Let’s start with the actual issue first
SCOM Network Discovery component is not working and the error we can see being logged in the Operations Manager event log on the Management Server which is configured to run the Network Discovery rule, is Error event ID 157:
Creation of module with CLSID "{6620A03E-9E85-4BEA-947C-D508CDB540FE}" failed with error "%1 is not a valid Win32 application." in rule "discovery<SOME_NUMBER>.<SOME_GUID>" running for instance "<SOME_NAME>" with id:"{<SOME_GUID>}" in management group "<SOME_MG_NAME>".
Now the error “not a valid Win32 application” means that we are trying to load an EXE or DLL file and it has one of 2 issues:
But what file (most probably DLL) is this?! We have the CLSID in the error which is: 6620A03E-9E85-4BEA-947C-D508CDB540FE
So how do we figure out which DLL it is? By searching in the registry (on a SCOM Management Server) by that CLSID and somewhere under HKEY_CLASSES_ROOT\CLSID\, we should find this GUID – and after a search, we do and under: HKEY_CLASSES_ROOT\CLSID\{6620A03E-9E85-4bea-947C-D508CDB540FE}\InprocServer32, we find the value called (Default) which contains the full path to the DLL: C:\Program Files\Microsoft System Center 2012 R2\Operations Manager\Server\NetworkDiscoveryModules.dll
Pah! Super simple, right? It looks like the NetworkDiscoveryModules.dll is somehow corrupted and we should just replace it with a working one from a SCOM Management Server with the exact same SCOM version and Update Rollup installed – right?!
Well, we did that and it did not work … so whaaat is going on here?!
Well, this clearly is not a “SCOM” issue, but some file/OS related issue that can happen with any software – but, because I am a curious man …
First things first – let’s totally separate our problem from SCOM and let’s manually try to reproduce the problem by just loading the DLL directly – we don’t need to complicate ourselves with writing native code, we can just do this in PowerShell directly – like this:
Add-Type @"
using System;
using System.Runtime.InteropServices;
public static class NativeMethods {
[DllImport(@"$($env:ProgramFiles)\Microsoft System Center 2012 R2\Operations Manager\Server\NetworkDiscoveryModules.dll")]
public static extern void DllCanUnloadNow();
}
"@
[NativeMethods]::DllCanUnloadNow()
Extra info for the other curious people out there about this script:
So the error we are getting is actually the same (only more descriptive): An attempt was made to load a program with an incorrect format.
Ok cool! So now we can reproduce the issue totally outside of SCOM, by just trying to load that DLL and calling a function from it (because only when we make an actual call will the DLL be truly loaded for code execution).
So, now we can do a live debug by attaching to the PowerShell.exe process which we will use to execute that script: https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/
Everything I will show next in the debugger, will be using Public Symbols, so everyone can follow the steps (don’t need any internal private symbols): https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/microsoft-public-symbols
The Windows LDR works like this:
So we need to track down which dependent DLL has the actual corruption …
Where do we break to check what is wrong? Well, we have these 2 nice function calls where we will break and check what is going on:
Let’s set the first breakpoint and go in WinDbg:
0:000> bp kernelbase!LoadLibraryExW
0:000> g
Breakpoint 0 hit
kernelbase!LoadLibraryExW:
00007ffe`36b48d10 48895c2410 mov qword ptr [rsp+10h],rbx ss:0000000c`301d4b38=0000000c301d91a8
So, sure enough, we will break on the first/next execution of kernelbase!LoadLibraryExW and we can check what DLL is currently being loaded.
Just as an “FYI” we can/will also have a look at the stack because we are curious people:
0:001> k
Child-SP RetAddr Call Site
0000000c`301d4b28 00007ffe`310ae4b4 kernelbase!LoadLibraryExW
0000000c`301d4b30 00007ffe`310ae421 clr!CLRLoadLibraryExWorker+0x54
0000000c`301d4b80 00007ffe`31098aff clr!CLRLoadLibraryEx+0x51
0000000c`301d4bd0 00007ffe`31098a28 clr!LoadedImageLayout::LoadedImageLayout+0xca
0000000c`301d4e90 00007ffe`313f6a84 clr!PEImageLayout::Load+0x3d
0000000c`301d4ed0 00007ffe`313f69de clr!PEImage::GetLayoutInternal+0x361930
0000000c`301d4f20 00007ffe`3107ee8f clr!PEImage::GetLayout+0x36195e
0000000c`301d4fa0 00007ffe`3107ed53 clr!RuntimeOpenImageInternal+0xcf
0000000c`301d5080 00007ffe`3107ec7b clr!GetAssemblyMDInternalImportEx+0xcd
0000000c`301d5100 00007ffe`310d3cc9 clr!CreateMetaDataImport+0x1b
0000000c`301d5140 00007ffe`310d3c1d clr!BindResult::Init+0x69
0000000c`301d51b0 00007ffe`3118867b clr!BindResult::CreateFromPath+0xc9
0000000c`301d5220 00007ffe`31188891 clr!NATIVE_BINDER_SPACE::VerifyNativeCandidate+0x55f
0000000c`301d58b0 00007ffe`31086b40 clr!NATIVE_BINDER_SPACE::CreateVerifiedNICandidate+0x10a
0000000c`301d5bc0 00007ffe`31085f39 clr!NATIVE_BINDER_SPACE::BindToNativeDependency+0x2ac
0000000c`301d5de0 00007ffe`31085803 clr!BindToNativeAssembly+0xed
0000000c`301d5f00 00007ffe`3108799f clr!BindToNativeImage+0x1bb
0000000c`301d6230 00007ffe`31082a6b clr!CAsmDownloadMgr::RecordInfoAndProbeNativeImage+0x106
0000000c`301d62a0 00007ffe`3108303b clr!CAsmDownloadMgr::PreDownloadCheck+0x518
0000000c`301d6ae0 00007ffe`31082e45 clr!CAssemblyDownload::PreDownload+0xb7
0000000c`301d6b80 00007ffe`3108251a clr!CAssemblyName::BindToObject+0x419
0000000c`301d6cd0 00007ffe`310822c2 clr!FusionBind::RemoteLoad+0x1aa
0000000c`301d6e10 00007ffe`31082047 clr!AssemblySpec::LoadAssembly+0x19a
0000000c`301d6f30 00007ffe`31081cf5 clr!AssemblySpec::FindAssemblyFile+0x113
0000000c`301d76d0 00007ffe`3108a4ae clr!AppDomain::BindAssemblySpec+0xef7
0000000c`301d8950 00007ffe`310ab18b clr!AssemblySpec::LoadDomainAssembly+0x1ec
0000000c`301d8f00 00007ffe`3118c9e8 clr!AssemblySpec::LoadAssembly+0x1b
0000000c`301d8f40 00007ffe`2f6ebde2 clr!AssemblyNative::Load+0x304
0000000c`301d92b0 00007ffe`304c4326 mscorlib_ni!System.Reflection.RuntimeAssembly.InternalLoadAssemblyName(System.Reflection.AssemblyName, System.Security.Policy.Evidence, System.Reflection.RuntimeAssembly, System.Threading.StackCrawlMark ByRef, IntPtr, Boolean, Boolean, Boolean)+0xd2
0000000c`301d9360 00007ffe`31209f51 mscorlib_ni!System.Reflection.RuntimeAssembly.LoadWithPartialNameInternal(System.Reflection.AssemblyName, System.Security.Policy.Evidence, System.Threading.StackCrawlMark ByRef)+0xd66ad6
0000000c`301d93e0 00007ffe`31209e4f clr!ExceptionTracker::CallHandler+0xc5
0000000c`301d9480 00007ffe`312081c7 clr!ExceptionTracker::CallCatchHandler+0x7f
0000000c`301d9510 00007ffe`396c347d clr!ProcessCLRException+0x2e6
0000000c`301d95f0 00007ffe`39685405 ntdll!RtlpExecuteHandlerForUnwind+0xd
0000000c`301d9620 00007ffe`31208260 ntdll!RtlUnwindEx+0x385
0000000c`301d9d00 00007ffe`3120821c clr!ClrUnwindEx+0x40
0000000c`301da220 00007ffe`396c33fd clr!ProcessCLRException+0x2b2
0000000c`301da300 00007ffe`39684847 ntdll!RtlpExecuteHandlerForException+0xd
0000000c`301da330 00007ffe`39683a6d ntdll!RtlDispatchException+0x197
0000000c`301daa00 00007ffe`36b495fc ntdll!RtlRaiseException+0x18d
0000000c`301db1c0 00007ffe`31209864 kernelbase!RaiseException+0x68
0000000c`301db2a0 00007ffe`314bb7ef clr!RaiseTheExceptionInternalOnly+0x2fe
0000000c`301db3a0 00007ffe`3118cb37 clr!UnwindAndContinueRethrowHelperAfterCatch+0x80
0000000c`301db3f0 00007ffe`2f75d99e clr!AssemblyNative::Load+0x453
0000000c`301db760 00007ffe`2ff13666 mscorlib_ni!System.Reflection.RuntimeAssembly.LoadWithPartialNameInternal(System.Reflection.AssemblyName, System.Security.Policy.Evidence, System.Threading.StackCrawlMark ByRef)+0x14e
0000000c`301db890 00007ffe`1933970f mscorlib_ni!System.Reflection.Assembly.LoadWithPartialName(System.String)+0x46
0000000c`301db8e0 00007ffe`193394e1 system_management_automation_ni!System.Management.Automation.ExecutionContext.LoadAssembly(System.String, System.String, System.Exception ByRef)+0x15f
0000000c`301db9a0 00007ffe`193211f2 system_management_automation_ni!System.Management.Automation.ExecutionContext.AddAssembly(System.String, System.String, System.Exception ByRef)+0x21
0000000c`301db9f0 00007ffe`19316887 system_management_automation_ni!Microsoft.PowerShell.Commands.ModuleCmdletBase.LoadBinaryModule(System.Management.Automation.PSModuleInfo, Boolean, System.String, System.String, System.Reflection.Assembly, System.String, System.Management.Automation.SessionState, ImportModuleOptions, ManifestProcessingFlags, System.String, Boolean, Boolean, Boolean ByRef, System.String, Boolean)+0x452
0000000c`301dbbf0 00007ffe`1931b82b system_management_automation_ni!Microsoft.PowerShell.Commands.ModuleCmdletBase.LoadModuleNamedInManifest(System.Management.Automation.PSModuleInfo, Microsoft.PowerShell.Commands.ModuleSpecification, System.String, Boolean, System.String, System.Management.Automation.SessionState, ImportModuleOptions, ManifestProcessingFlags, Boolean, Boolean, System.Object, Boolean ByRef, System.String)+0x977
0000000c`301dbe20 00007ffe`19316ab5 system_management_automation_ni!Microsoft.PowerShell.Commands.ModuleCmdletBase.LoadModuleManifest(System.String, System.Management.Automation.ExternalScriptInfo, System.Collections.Hashtable, System.Collections.Hashtable, ManifestProcessingFlags, System.Version, System.Version, ImportModuleOptions ByRef, Boolean ByRef)+0x4d0b
0000000c`301dc9d0 00007ffe`1931fa1b system_management_automation_ni!Microsoft.PowerShell.Commands.ModuleCmdletBase.LoadModuleManifest(System.Management.Automation.ExternalScriptInfo, ManifestProcessingFlags, System.Version, System.Version, ImportModuleOptions ByRef)+0xd5
0000000c`301dca70 00007ffe`1932386c system_management_automation_ni!Microsoft.PowerShell.Commands.ModuleCmdletBase.LoadModule(System.Management.Automation.PSModuleInfo, System.String, System.String, System.String, System.Management.Automation.SessionState, System.Object, ImportModuleOptions ByRef, ManifestProcessingFlags, Boolean ByRef, Boolean ByRef)+0xc4b
0000000c`301dcda0 00007ffe`193241d8 system_management_automation_ni!Microsoft.PowerShell.Commands.ImportModuleCommand.ImportModule_LocallyViaName(ImportModuleOptions, System.String)+0x4ac
0000000c`301dcef0 00007ffe`193dbb0e system_management_automation_ni!Microsoft.PowerShell.Commands.ImportModuleCommand.ProcessRecord()+0x198
0000000c`301dcfd0 00007ffe`19382b55 system_management_automation_ni!System.Management.Automation.CommandProcessor.ProcessRecord()+0x29e
0000000c`301dd080 00007ffe`1937953e system_management_automation_ni!System.Management.Automation.CommandProcessorBase.DoExecute()+0xe5
0000000c`301dd0f0 00007ffe`193666d0 system_management_automation_ni!System.Management.Automation.Internal.PipelineProcessor.SynchronousExecuteEnumerate(System.Object, System.Collections.Hashtable, Boolean)+0x17e
0000000c`301dd1e0 00007ffe`1936581f system_management_automation_ni!System.Management.Automation.Runspaces.LocalPipeline.InvokeHelper()+0x750
0000000c`301dd400 00007ffe`19365241 system_management_automation_ni!System.Management.Automation.Runspaces.LocalPipeline.InvokeThreadProc()+0x2af
0000000c`301dd550 00007ffe`193348d5 system_management_automation_ni!System.Management.Automation.Runspaces.LocalPipeline.StartPipelineExecution()+0x321
0000000c`301dd5d0 00007ffe`19363fe4 system_management_automation_ni!System.Management.Automation.Runspaces.PipelineBase.CoreInvoke(System.Collections.IEnumerable, Boolean)+0x305
0000000c`301dd650 00007ffe`193df48e system_management_automation_ni!System.Management.Automation.Runspaces.PipelineBase.Invoke(System.Collections.IEnumerable)+0x24
0000000c`301dd690 00007ffe`193deff0 system_management_automation_ni!System.Management.Automation.PowerShell+Worker.ConstructPipelineAndDoWork(System.Management.Automation.Runspaces.Runspace, Boolean)+0x2de
0000000c`301dd770 00007ffe`193d9fbe system_management_automation_ni!System.Management.Automation.PowerShell.CoreInvokeHelper[[System.__Canon, mscorlib],[System.__Canon, mscorlib]](System.Management.Automation.PSDataCollection`1<System.__Canon>, System.Management.Automation.PSDataCollection`1<System.__Canon>, System.Management.Automation.PSInvocationSettings)+0x350
0000000c`301dd820 00007ffe`193d9aca system_management_automation_ni!System.Management.Automation.PowerShell.CoreInvoke[[System.__Canon, mscorlib],[System.__Canon, mscorlib]](System.Management.Automation.PSDataCollection`1<System.__Canon>, System.Management.Automation.PSDataCollection`1<System.__Canon>, System.Management.Automation.PSInvocationSettings)+0x4de
0000000c`301dd9b0 00007ffe`193efca0 system_management_automation_ni!System.Management.Automation.PowerShell.CoreInvoke[[System.__Canon, mscorlib]](System.Collections.IEnumerable, System.Management.Automation.PSDataCollection`1<System.__Canon>, System.Management.Automation.PSInvocationSettings)+0x1ba
0000000c`301dda60 00007ffe`193ed6c2 system_management_automation_ni!System.Management.Automation.CommandDiscovery.AutoloadSpecifiedModule(System.String, System.Management.Automation.ExecutionContext, System.Management.Automation.SessionStateEntryVisibility, System.Exception ByRef)+0x2c0
0000000c`301ddb00 00007ffe`19376326 system_management_automation_ni!System.Management.Automation.CommandDiscovery.TryModuleAutoDiscovery(System.String, System.Management.Automation.ExecutionContext, System.String, System.Management.Automation.CommandOrigin, System.Management.Automation.SearchResolutionOptions, System.Management.Automation.CommandTypes, System.Exception ByRef)+0x312
0000000c`301ddc10 00007ffe`19375ed1 system_management_automation_ni!System.Management.Automation.CommandDiscovery.LookupCommandInfo(System.String, System.Management.Automation.CommandTypes, System.Management.Automation.SearchResolutionOptions, System.Management.Automation.CommandOrigin, System.Management.Automation.ExecutionContext)+0x3d6
0000000c`301ddcf0 00007ffe`193dd8ce system_management_automation_ni!System.Management.Automation.CommandDiscovery.LookupCommandProcessor(System.String, System.Management.Automation.CommandOrigin, System.Nullable`1<Boolean>)+0x31
0000000c`301ddd50 00007ffe`193dcfd1 system_management_automation_ni!System.Management.Automation.ExecutionContext.CreateCommand(System.String, Boolean)+0x6e
0000000c`301dddb0 00007ffe`19335255 system_management_automation_ni!System.Management.Automation.PipelineOps.AddCommand(System.Management.Automation.Internal.PipelineProcessor, System.Management.Automation.CommandParameterInternal[], System.Management.Automation.Language.CommandBaseAst, System.Management.Automation.CommandRedirection[], System.Management.Automation.ExecutionContext)+0x3f1
0000000c`301ddef0 00007ffe`1a0a036e system_management_automation_ni!System.Management.Automation.PipelineOps.InvokePipeline(System.Object, Boolean, System.Management.Automation.CommandParameterInternal[][], System.Management.Automation.Language.CommandBaseAst[], System.Management.Automation.CommandRedirection[][], System.Management.Automation.Language.FunctionContext)+0x135
0000000c`301ddfb0 00007ffe`1938591b system_management_automation_ni!System.Management.Automation.Interpreter.ActionCallInstruction`6[[System.__Canon, mscorlib],[System.Boolean, mscorlib],[System.__Canon, mscorlib],[System.__Canon, mscorlib],[System.__Canon, mscorlib],[System.__Canon, mscorlib]].Run(System.Management.Automation.Interpreter.InterpretedFrame)+0x19e
0000000c`301de050 00007ffe`1938591b system_management_automation_ni!System.Management.Automation.Interpreter.EnterTryCatchFinallyInstruction.Run(System.Management.Automation.Interpreter.InterpretedFrame)+0x16b
0000000c`301de160 00007ffe`1938575b system_management_automation_ni!System.Management.Automation.Interpreter.EnterTryCatchFinallyInstruction.Run(System.Management.Automation.Interpreter.InterpretedFrame)+0x16b
0000000c`301de270 00007ffe`193752af system_management_automation_ni!System.Management.Automation.Interpreter.Interpreter.Run(System.Management.Automation.Interpreter.InterpretedFrame)+0x7b
0000000c`301de2f0 00007ffe`19383925 system_management_automation_ni!System.Management.Automation.Interpreter.LightLambda.RunVoid1[[System.__Canon, mscorlib]](System.__Canon)+0x11f
0000000c`301de380 00007ffe`193832f8 system_management_automation_ni!System.Management.Automation.DlrScriptCommandProcessor.RunClause(System.Action`1<System.Management.Automation.Language.FunctionContext>, System.Object, System.Object)+0x405
0000000c`301de510 00007ffe`19382fce system_management_automation_ni!System.Management.Automation.DlrScriptCommandProcessor.Complete()+0x1a8
0000000c`301de5c0 00007ffe`19382d01 system_management_automation_ni!System.Management.Automation.CommandProcessorBase.DoComplete()+0x16e
0000000c`301de680 00007ffe`19379551 system_management_automation_ni!System.Management.Automation.Internal.PipelineProcessor.DoCompleteCore(System.Management.Automation.CommandProcessorBase)+0xb1
0000000c`301de720 00007ffe`193666d0 system_management_automation_ni!System.Management.Automation.Internal.PipelineProcessor.SynchronousExecuteEnumerate(System.Object, System.Collections.Hashtable, Boolean)+0x191
0000000c`301de810 00007ffe`1936581f system_management_automation_ni!System.Management.Automation.Runspaces.LocalPipeline.InvokeHelper()+0x750
0000000c`301dea30 00007ffe`192e7001 system_management_automation_ni!System.Management.Automation.Runspaces.LocalPipeline.InvokeThreadProc()+0x2af
0000000c`301deb80 00007ffe`2f6d2d45 system_management_automation_ni!System.Management.Automation.Runspaces.PipelineThread.WorkerProc()+0x31
0000000c`301debb0 00007ffe`2f6d2ab9 mscorlib_ni!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)+0x285
0000000c`301ded10 00007ffe`2f6d2a97 mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)+0x9
0000000c`301ded40 00007ffe`2f6ea161 mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)+0x57
0000000c`301ded90 00007ffe`3103afb3 mscorlib_ni!System.Threading.ThreadHelper.ThreadStart()+0x51
0000000c`301dede0 00007ffe`3103ae9e clr!CallDescrWorkerInternal+0x83
0000000c`301dee20 00007ffe`3103b632 clr!CallDescrWorkerWithHandler+0x4a
0000000c`301dee60 00007ffe`31146bd9 clr!MethodDescCallSite::CallTargetWorker+0x251
0000000c`301df010 00007ffe`3103c8e1 clr!ThreadNative::KickOffThread_Worker+0x105
0000000c`301df270 00007ffe`3103c868 clr!ManagedThreadBase_DispatchInner+0x2d
0000000c`301df2b0 00007ffe`3103c7d9 clr!ManagedThreadBase_DispatchMiddle+0x6c
0000000c`301df3b0 00007ffe`3103c91b clr!ManagedThreadBase_DispatchOuter+0x75
0000000c`301df440 00007ffe`31146aba clr!ManagedThreadBase_FullTransitionWithAD+0x2f
0000000c`301df4a0 00007ffe`31153656 clr!ThreadNative::KickOffThread+0xd2
0000000c`301df570 00007ffe`38f113d2 clr!Thread::intermediateThreadProc+0x7d
0000000c`301df7b0 00007ffe`396454e4 kernel32!BaseThreadInitThunk+0x22
0000000c`301df7e0 00000000`00000000 ntdll!RtlUserThreadStart+0x34
Ok, cool, so now how do we figure out which is the DLL that is currently being loaded? Well, from MSDN, we know that the first parameter passed to the kernelbase!LoadLibraryExW function is the name of the DLL (lpFileName).
Hmm … first parameter … so where is that? Well, we know that on x64 calling convention (fastcall), the first parameter is passed on into the RCX register: https://msdn.microsoft.com/en-us/library/ms235286.aspx
So good! Let’s have a look at that – as it should be a UNICODE string, we can look at it like this:
0:001> du @rcx
0000000c`2f4dd240 "C:\Windows\assembly\NativeImages"
0000000c`2f4dd280 "_v4.0.30319_64\Microsoft.P521220"
0000000c`2f4dd2c0 "ea#\ab32ec62ecc4b90efc07602a8353"
0000000c`2f4dd300 "8c65\Microsoft.PowerShell.Comman"
0000000c`2f4dd340 "ds.Utility.ni.dll"
Pff … well this is lame, not the correct DLL … so we go again further until the next breakpoint with the “g” command and verify again and go again and verify and so on, until we finally reach the DLL we are interested in (in this case NetworkDiscoveryModules.dll):
0:001> g
Breakpoint 0 hit
kernelbase!LoadLibraryExW:
00007ffe`36b48d10 48895c2410 mov qword ptr [rsp+10h],rbx ss:0000000c`301dcbd8=0000000c301dddf8
0:001> du @rcx
0000000c`301dd078 "C:\Program Files\Microsoft Syste"
0000000c`301dd0b8 "m Center 2012 R2\Operations Mana"
0000000c`301dd0f8 "ger\Server\NetworkDiscoveryModul"
0000000c`301dd138 "es.dll"
Perfect! Now we can put breakpoints on the other functions that will get called next to have a look (dependency) DLL by DLL to figure out which is corrupt and then go (continue execution):
0:001> bp ntdll!LdrpFindOrMapDll 0:001> bp ntdll!ZwCreateSection 0:001> g Breakpoint 2 hit ntdll!LdrpFindOrMapDll: 00007ffe`396a61c0 4053 push rbx
So, we break into the next call of ntdll!LdrpFindOrMapDll. Aaaaaand how exactly do we know which dependent DLL is being currently loaded? The function is not documented on MSDN … ok, so a little bit of internals – this function gets the DLL name as a pointer to a string in the format of the ntdll!_UNICODE_STRING structure – which is actually documented on MSDN and which we can see using Public Symbols.
First parameter? Ok, so again, still in RCX register – let’s dump it out using “dx” (Debugger Data Model functionality – which can be extended using JavaScript as well – but a little later on extending it):
0:001> dx ((ntdll!_UNICODE_STRING *)@rcx) ((ntdll!_UNICODE_STRING *)0x00000000c301dc948) : 0xc301dc948 : "C:\Program Files\Microsoft System Center 2012 R2\Operations Manager\Server\NetworkDiscoveryModules.dll" [Type: _UNICODE_STRING *] [<Raw View>] [Type: _UNICODE_STRING]
Ha, ok, now we’re taking, we are going through the verification/load of our DLL of interest directly in ntdll!LdrpFindOrMapDll. Cool – so let’s continue execution and we should break in ntdll!NtCreateSection next which we can use to verify the DLL consistency check status and see if there is any error (corruption in this case) – let’s also have a look at the stack after we break:
0:001> g
Breakpoint 1 hit
ntdll!NtCreateSection:
00007ffe`396c0b50 4c8bd1 mov r10,rcx
0:001> k
Child-SP RetAddr Call Site
0000000c`301dc538 00007ffe`396a6921 ntdll!NtCreateSection
0000000c`301dc540 00007ffe`3965b2b5 ntdll!LdrpFindOrMapDll+0x761
0000000c`301dc8b0 00007ffe`39698d69 ntdll!LdrpLoadDll+0x295
0000000c`301dcae0 00007ffe`36b48dda ntdll!LdrLoadDll+0x99
0000000c`301dcb60 00007ffe`310ae4b4 kernelbase!LoadLibraryExW+0xca
0000000c`301dcbd0 00007ffe`310ae421 clr!CLRLoadLibraryExWorker+0x54
0000000c`301dcc20 00007ffe`31197dfd clr!CLRLoadLibraryEx+0x51
0000000c`301dcc70 00007ffe`3140bf1e clr!LocalLoadLibraryHelper+0x31
0000000c`301dcca0 00007ffe`310af2a4 clr!NDirect::LoadLibraryModule+0x35c7de
0000000c`301dd4f0 00007ffe`310af50a clr!NDirect::NDirectLink+0x80
0000000c`301dd820 00007ffe`310af469 clr!NDirect::GetStubForILStub+0x4a
0000000c`301dd870 00007ffe`310af54b clr!GetStubForInteropMethod+0x65
0000000c`301dd8b0 00007ffe`310a9696 clr!MethodDesc::DoPrestub+0xca7
0000000c`301dda80 00007ffe`3103226a clr!PreStubWorker+0x3d6
0000000c`301ddd90 00007ffd`d1a40a1c clr!ThePreStub+0x5a
0000000c`301dde60 00007ffe`2cb64c1c DynamicClass.CallSite.Target(System.Runtime.CompilerServices.Closure, System.Runtime.CompilerServices.CallSite, System.Type)+0xfc
0000000c`301ddee0 00007ffe`1938b088 system_core_ni!System.Dynamic.UpdateDelegates.UpdateAndExecute1[[System.__Canon, mscorlib],[System.__Canon, mscorlib]](System.Runtime.CompilerServices.CallSite, System.__Canon)+0x39c
0000000c`301ddff0 00007ffe`1938591b system_management_automation_ni!System.Management.Automation.Interpreter.DynamicInstruction`2[[System.__Canon, mscorlib],[System.__Canon, mscorlib]].Run(System.Management.Automation.Interpreter.InterpretedFrame)+0x58
0000000c`301de050 00007ffe`1938591b system_management_automation_ni!System.Management.Automation.Interpreter.EnterTryCatchFinallyInstruction.Run(System.Management.Automation.Interpreter.InterpretedFrame)+0x16b
0000000c`301de160 00007ffe`1938575b system_management_automation_ni!System.Management.Automation.Interpreter.EnterTryCatchFinallyInstruction.Run(System.Management.Automation.Interpreter.InterpretedFrame)+0x16b
0000000c`301de270 00007ffe`193752af system_management_automation_ni!System.Management.Automation.Interpreter.Interpreter.Run(System.Management.Automation.Interpreter.InterpretedFrame)+0x7b
0000000c`301de2f0 00007ffe`19383925 system_management_automation_ni!System.Management.Automation.Interpreter.LightLambda.RunVoid1[[System.__Canon, mscorlib]](System.__Canon)+0x11f
0000000c`301de380 00007ffe`193832f8 system_management_automation_ni!System.Management.Automation.DlrScriptCommandProcessor.RunClause(System.Action`1<System.Management.Automation.Language.FunctionContext>, System.Object, System.Object)+0x405
0000000c`301de510 00007ffe`19382fce system_management_automation_ni!System.Management.Automation.DlrScriptCommandProcessor.Complete()+0x1a8
0000000c`301de5c0 00007ffe`19382d01 system_management_automation_ni!System.Management.Automation.CommandProcessorBase.DoComplete()+0x16e
0000000c`301de680 00007ffe`19379551 system_management_automation_ni!System.Management.Automation.Internal.PipelineProcessor.DoCompleteCore(System.Management.Automation.CommandProcessorBase)+0xb1
0000000c`301de720 00007ffe`193666d0 system_management_automation_ni!System.Management.Automation.Internal.PipelineProcessor.SynchronousExecuteEnumerate(System.Object, System.Collections.Hashtable, Boolean)+0x191
0000000c`301de810 00007ffe`1936581f system_management_automation_ni!System.Management.Automation.Runspaces.LocalPipeline.InvokeHelper()+0x750
0000000c`301dea30 00007ffe`192e7001 system_management_automation_ni!System.Management.Automation.Runspaces.LocalPipeline.InvokeThreadProc()+0x2af
0000000c`301deb80 00007ffe`2f6d2d45 system_management_automation_ni!System.Management.Automation.Runspaces.PipelineThread.WorkerProc()+0x31
0000000c`301debb0 00007ffe`2f6d2ab9 mscorlib_ni!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)+0x285
0000000c`301ded10 00007ffe`2f6d2a97 mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)+0x9
0000000c`301ded40 00007ffe`2f6ea161 mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)+0x57
0000000c`301ded90 00007ffe`3103afb3 mscorlib_ni!System.Threading.ThreadHelper.ThreadStart()+0x51
0000000c`301dede0 00007ffe`3103ae9e clr!CallDescrWorkerInternal+0x83
0000000c`301dee20 00007ffe`3103b632 clr!CallDescrWorkerWithHandler+0x4a
0000000c`301dee60 00007ffe`31146bd9 clr!MethodDescCallSite::CallTargetWorker+0x251
0000000c`301df010 00007ffe`3103c8e1 clr!ThreadNative::KickOffThread_Worker+0x105
0000000c`301df270 00007ffe`3103c868 clr!ManagedThreadBase_DispatchInner+0x2d
0000000c`301df2b0 00007ffe`3103c7d9 clr!ManagedThreadBase_DispatchMiddle+0x6c
0000000c`301df3b0 00007ffe`3103c91b clr!ManagedThreadBase_DispatchOuter+0x75
0000000c`301df440 00007ffe`31146aba clr!ManagedThreadBase_FullTransitionWithAD+0x2f
0000000c`301df4a0 00007ffe`31153656 clr!ThreadNative::KickOffThread+0xd2
0000000c`301df570 00007ffe`38f113d2 clr!Thread::intermediateThreadProc+0x7d
0000000c`301df7b0 00007ffe`396454e4 kernel32!BaseThreadInitThunk+0x22
0000000c`301df7e0 00000000`00000000 ntdll!RtlUserThreadStart+0x34
Hmm, ok good! But what now? How do we figure out what the ntdll!NtCreateSection function will return? Because it will actually do a return of the status (which can be an error value). We know that by conversion, compilers will use the RAX register as return value just before finishing execution of a function (unless specifically coded otherwise in assembly).
Let’s go at the end of the function execution of ntdll!NtCreateSection using “pt” and have a look at the value of the RAX register:
0:001> pt
ntdll!NtCreateSection+0xa:
00007ffe`396c0b5a c3 ret
0:001> r @rax
rax=0000000000000000
Ok, so RAX is 0 and 0 is good! 0 means ERROR_SUCCESS status. Don’t get confused by the name – it actually meas that the operation completed successfully: https://msdn.microsoft.com/en-us/library/windows/desktop/ms681382(v=vs.85).aspx
So, we don’t throw an error and we validate this DLL (currently NetworkDiscoveryModules.dll) successfully. Ok, so next, we should go through each dependent DLL and try to validate/load each on of them. So, let’s step through each one of these and check the status (return value in RAX) of ntdll!NtCreateSection each time.
So after checking each one (ntdll!LdrpFindOrMapDll for the DLL name & ntdll!NtCreateSection for the return value) we finally reach the DLL that is throwing the error (binary file validation error):
0:001> g
Breakpoint 2 hit
ntdll!LdrpFindOrMapDll:
00007ffe`396a61c0 4053 push rbx
0:001> dx ((ntdll!_UNICODE_STRING *)@rcx)
((ntdll!_UNICODE_STRING *)@rcx) : 0xc301dc4b0 : "dmboot.dll" [Type: _UNICODE_STRING *]
[<Raw View>] [Type: _UNICODE_STRING]
0:001> g
Breakpoint 1 hit
ntdll!NtCreateSection:
00007ffe`396c0b50 4c8bd1 mov r10,rcx
0:001> pt
ntdll!NtCreateSection+0xa:
00007ffe`396c0b5a c3 ret
0:001> r @rax
rax=00000000c000012f
Ha, nice! So return status for ntdll!NtCreateSection for the dmboot.dll DLL is not 0 (ERROR_SUCCESS) anymore – it is: 00000000c000012f
Guess what? We also have a command from a build-in extension (!error) for WinDbg which we can use to “resolve” the actual error name/description by the HRESULT (00000000c000012f) value:
0:001> !error 00000000c000012f
Error code: (NTSTATUS) 0xc000012f (3221225775) - The specified image file did not have the correct format, it did not have an initial MZ.
Well … what do you know? It looks like the dmboot.dll DLL is the one that is actually corrupt and it is being loaded because it is a dependency DLL of our NetworkDiscoveryModules.dll DLL.
We got the dmboot.dll DLL from a healthy SCOM Management Server with the same SCOM version and Update Rollup and replaced this corrupt one – aaand … there you go, issue solved!
How did this file (DLL) get corrupt in the first place?! Well now, that is the real question! Unfortunately, it cannot really be explained after it already happened without any type of “file-change-auditing”, but it has too many possible causes:
Yes – this gets much easier if Private Symbols are available But, as this should (I hope) ideally help anyone trying to debug such a scenario, I have documented everything using Public Symbols
But hey … why not automate this process by creating a cool JavaScript extension for WinDbg and just let the computer do the work for me next time this ever happens with … whatever … other software?
Here is the documentation about creating extensions in JavaScript for the Data Model: https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/javascript-debugger-scripting & also check out this Channel9 video from the Defrag Tools series about this, with cool info from Andrew Richards, Andy Luhrs and Bill Messmer: https://channel9.msdn.com/Shows/Defrag-Tools/Defrag-Tools-170-Debugger-JavaScript-Scripting
This could have easily been just a series of functions in the scripts, but I complicated it a little bit because I wanted to show it as an extension to the process object – I might extend this in the future and create a more complex extension around the LDR component. I highly doubt that it will be public though :p because without public symbols, I have to rely on calling conversions (and get directly directly from registers when possible – and it’s not always possible) as I cannot resolve any locals or parameters (by calling DX’s getModuleSymbol), which would over-complicate things.
"use strict";
class __DllLoadStatus
{
get Ldr()
{
return new __DllLoadFailure(this);
}
}
class __DllLoadFailure
{
constructor(process)
{
this.__process = process;
}
Initialize()
{
var ctl = host.namespace.Debugger.Utility.Control;
ctl.ExecuteCommand("bp /1 ntdll!ZwCreateSection;g");
ctl.ExecuteCommand("pt");
var address = host.currentThread.Registers.User.rip;
ctl.ExecuteCommand("bp ntdll!LdrpFindOrMapDll \"dx @$curprocess.Ldr.GetDllName();gc\"");
ctl.ExecuteCommand("bp " + address.toString(16) + " \"dx @$curprocess.Ldr.GetLoadStatus();gc\"");
host.diagnostics.debugLog("Ok, we are all set! Run \"g\" to start automated debugging.\n");
}
GetDllName()
{
var ctl = host.namespace.Debugger.Utility.Control;
var address = host.currentThread.Registers.User.rcx;
var pDllName = host.createPointerObject(address, "ntdll.dll", "_UNICODE_STRING *");
var dllName = host.memory.readWideString(pDllName.dereference().Buffer).toString();
host.diagnostics.debugLog(dllName, "\n");
}
GetLoadStatus()
{
var ctl = host.namespace.Debugger.Utility.Control;
var status = ctl.ExecuteCommand("!error @rax").First().toString();
host.diagnostics.debugLog(status, "\n");
}
}
function initializeScript()
{
return [
new host.namedModelParent(__DllLoadStatus, "Debugger.Models.Process")
];
}
Alright, so now we have this script saved somewhere (ex. C:\DbgScripts\LdrExtensionSample.js). So let’s load the JsProvider (!load jsprovider) and afterwards using the full path of the script as parameter, load the script using the .scriptload command:
0:004> !load jsprovider
0:004> .scriptload C:\DbgScripts\LdrExtensionTest.js
JavaScript script successfully loaded from 'C:\DbgScripts\LdrExtensionTest.js'
So, let’s have a go at our script and see the output of the automated analysis by running “dx @$curprocess.Ldr.Initialize()” and then letting the process execute with the “g” command:
0:000> dx @$curprocess.Ldr.Initialize()
Ok, we are all set! Run "g" to start automated debugging.
@$curprocess.Ldr.Initialize()
0:001> g
C:\Windows\system32\rsaenh.dll
@$curprocess.Ldr.GetDllName()
Error code: (Win32) 0 (0) - The operation completed successfully.
@$curprocess.Ldr.GetLoadStatus()
C:\Program Files\Microsoft System Center 2012 R2\Operations Manager\Server\NetworkDiscoveryModules.dll
@$curprocess.Ldr.GetDllName()
Error code: (Win32) 0 (0) - The operation completed successfully.
@$curprocess.Ldr.GetLoadStatus()
MSVCR100.dll
@$curprocess.Ldr.GetDllName()
Error code: (Win32) 0 (0) - The operation completed successfully.
@$curprocess.Ldr.GetLoadStatus()
KERNEL32.dll
@$curprocess.Ldr.GetDllName()
Error code: (Win32) 0 (0) - The operation completed successfully.
@$curprocess.Ldr.GetLoadStatus()
USER32.dll
@$curprocess.Ldr.GetDllName()
Error code: (Win32) 0 (0) - The operation completed successfully.
@$curprocess.Ldr.GetLoadStatus()
ole32.dll
@$curprocess.Ldr.GetDllName()
Error code: (Win32) 0 (0) - The operation completed successfully.
@$curprocess.Ldr.GetLoadStatus()
OLEAUT32.dll
@$curprocess.Ldr.GetDllName()
Error code: (Win32) 0 (0) - The operation completed successfully.
@$curprocess.Ldr.GetLoadStatus()
ADVAPI32.dll
@$curprocess.Ldr.GetDllName()
Error code: (Win32) 0 (0) - The operation completed successfully.
@$curprocess.Ldr.GetLoadStatus()
sm-server.dll
@$curprocess.Ldr.GetDllName()
Error code: (Win32) 0 (0) - The operation completed successfully.
@$curprocess.Ldr.GetLoadStatus()
sm-discovery.dll
@$curprocess.Ldr.GetDllName()
Error code: (Win32) 0 (0) - The operation completed successfully.
@$curprocess.Ldr.GetLoadStatus()
sm-auto-discovery.dll
@$curprocess.Ldr.GetDllName()
Error code: (Win32) 0 (0) - The operation completed successfully.
@$curprocess.Ldr.GetLoadStatus()
MSVCP100.dll
@$curprocess.Ldr.GetDllName()
Error code: (Win32) 0 (0) - The operation completed successfully.
@$curprocess.Ldr.GetLoadStatus()
ntdll.dll
@$curprocess.Ldr.GetDllName()
Error code: (Win32) 0 (0) - The operation completed successfully.
@$curprocess.Ldr.GetLoadStatus()
RPCRT4.dll
@$curprocess.Ldr.GetDllName()
Error code: (Win32) 0 (0) - The operation completed successfully.
@$curprocess.Ldr.GetLoadStatus()
SHLWAPI.dll
@$curprocess.Ldr.GetDllName()
Error code: (Win32) 0 (0) - The operation completed successfully.
@$curprocess.Ldr.GetLoadStatus()
HealthServiceRuntime.dll
@$curprocess.Ldr.GetDllName()
Error code: (Win32) 0 (0) - The operation completed successfully.
@$curprocess.Ldr.GetLoadStatus()
KERNEL32.dll
@$curprocess.Ldr.GetDllName()
Error code: (Win32) 0 (0) - The operation completed successfully.
@$curprocess.Ldr.GetLoadStatus()
MSVCP100.dll
@$curprocess.Ldr.GetDllName()
Error code: (Win32) 0 (0) - The operation completed successfully.
@$curprocess.Ldr.GetLoadStatus()
MSVCR100.dll
@$curprocess.Ldr.GetDllName()
Error code: (Win32) 0 (0) - The operation completed successfully.
@$curprocess.Ldr.GetLoadStatus()
KERNEL32.dll
@$curprocess.Ldr.GetDllName()
Error code: (Win32) 0 (0) - The operation completed successfully.
@$curprocess.Ldr.GetLoadStatus()
ADVAPI32.dll
@$curprocess.Ldr.GetDllName()
Error code: (Win32) 0 (0) - The operation completed successfully.
@$curprocess.Ldr.GetLoadStatus()
sm-clsapi.dll
@$curprocess.Ldr.GetDllName()
Error code: (Win32) 0 (0) - The operation completed successfully.
@$curprocess.Ldr.GetLoadStatus()
dmboot.dll
@$curprocess.Ldr.GetDllName()
Error code: (NTSTATUS) 0xc000012f (3221225775) - The specified image file did not have the correct format, it did not have an initial MZ.
@$curprocess.Ldr.GetLoadStatus()
Pfff … well, that was much easier
Have fun developing WinDbg JS Extensions & Scripts!