Quantcast
Channel: Mihai Sarbulescu's System Center Blog
Viewing all 76 articles
Browse latest View live

Improving Service Manager - we're not kidding!

$
0
0

Update Rollup 6 is out! :)

http://support.microsoft.com/kb/3039363

We are constantly improving SC Service Manager and we're not kidding about it ;)

The fixes and performance improvement in Update Rollup 6 are a proof of this!

Great job guys!

Enjoy!


Global Service Monitor not working because the remote certificate is invalid according to the validation procedure

$
0
0

When we install and use Global Service Monitor in System Center Operations Manager 2012, the Management Server(s) need to access the GSM site in Azure which will be using a certificate from Microsoft. The Management Server needs to trust this certificate and for this it needs to have in its computer certificate store, in Trusted Root Certificate Authorities store, a list of trusted Microsoft Certificate Authorities.

 

So, let’s say that you just installed GSM and you get this Warning Event in the Operations Manager Event Log on the Management Server:

Global Service Monitor Modules: Failed to discover Global Service Monitor locations.
Failure step: ‘Couldn’t get the ACS endpoint from discovery service. SubscriptionId: ‘SOME_ID‘, OutsideInServiceBaseUri: ‘https://gsm-prod.systemcenter.microsoft.com/”
Message: ‘Could not establish trust relationship for the SSL/TLS secure channel with authority ‘gsm-prod.systemcenter.microsoft.com’.’
Details: ‘System.ServiceModel.Security.SecurityNegotiationException: Could not establish trust relationship for the SSL/TLS secure channel with authority ‘gsm-prod.systemcenter.microsoft.com’.
         —> System.Net.WebException: The underlying connection was closed: Could not establish trust relationship for the SSL/TLS secure channel.
         —> System.Security.Authentication.AuthenticationException: The remote certificate is invalid according to the validation procedure.

 

In this particular case when we checked the Trusted Root Certificate Authorities store on this Management Server, we have noticed that some Microsoft Root CA certificates were missing, for example one of the most important ones for GSM, the Baltimore Cyber Trust Root certificate.

All these certificates should get imported on your Computers through Windows Update, basically KB931125. This gets updated very often with the new certificates so it might be a good idea to check if there are new certificates from time to time.

So here we installed KB931125 from the download link and then the error was gone and GSM started working again: http://www.microsoft.com/en-us/download/details.aspx?id=6149

 

After waiting a couple of minutes, we could see data getting in from the GSM Web Tests that were created in OpsMgr 😀 yuhuuu!

 

 

Data Warehouse issues after upgrading OM 2012 SP1 to R2

$
0
0

After upgrading SC Operations Manager 2012 SP1 to R2 you might notice general problems related to the Data Warehouse component, including Reporting of course. This may happen in situations where on upgrade there is more than one AlertDetail_GUID table in the DW database and not all will get upgraded to get the 2 new columns – TfsWorkItemId and TfsWorkItemOwner.

If you look in the Operations Manager event logs, you will notice this error event 31565 on one of the Management Servers:

Failed to deploy Data Warehouse component. The operation will be retried.
Exception 'DeploymentException': Failed to perform Data Warehouse component deployment operation: Install; Component: DataSet, Id: '0d698dff-9b7e-24d1-8a74-4657b86a59f8', Management Pack Version-dependent Id: '29a3dd22-8645-bae5-e255-9b56bf0b12a8'; Target: DataSet, Id: '23ee52b1-51fb-469b-ab18-e6b4be37ab35'. Batch ordinal: 3;
Exception: Sql execution failed. Error 207, Level 16, State 1, Procedure vAlertDetail, Line 18,
Message: Invalid column name 'TfsWorkItemId'.
One or more workflows were affected by this.
Workflow name: Microsoft.SystemCenter.DataWarehouse.Deployment.Component

 

To resolve the issue execute this SQL Query on the OperationsManagerDW database:

DECLARE
   @GuidString NVARCHAR(50),
   @Guid UNIQUEIDENTIFIER,
   @StandardDatasetTableMapRowId INT,
   @Statement NVARCHAR(MAX),
   @SchemaName SYSNAME,
   @TableNameSuffix SYSNAME,
   @BaseTableName SYSNAME,
   @FullTableName SYSNAME
SELECT @GuidString = DatasetId
FROM StandardDataset WITH(NOLOCK)
WHERE SchemaName = 'Alert'
SET @StandardDatasetTableMapRowId = 0
WHILE EXISTS (
   SELECT *
   FROM StandardDatasetTableMap AS TM
   WHERE (tm.StandardDatasetTableMapRowId > @StandardDatasetTableMapRowId)
   AND (tm.DatasetId = @GuidString)
) BEGIN
   SELECT TOP 1
      @StandardDatasetTableMapRowId = TM.StandardDatasetTableMapRowId,
      @SchemaName = SD.SchemaName,
      @TableNameSuffix = TM.TableNameSuffix,
      @BaseTableName = SDAS.BaseTableName
   FROM StandardDatasetTableMap AS TM
   JOIN StandardDataset AS SD
      ON TM.DatasetId = sd.DatasetId
   JOIN StandardDatasetAggregationStorage AS SDAS
      ON SDAS.DatasetId = TM.DatasetId AND SDAS.AggregationTypeId = TM.AggregationTypeId
   WHERE
      TM.StandardDatasetTableMapRowId > @StandardDatasetTableMapRowId AND
      TM.DatasetId = @GUIDString AND
      SDAS.TableTag = 'detail' AND
      SDAS.DependentTableInd = 1
   ORDER BY tm.StandardDatasetTableMapRowId
   SET @FullTableName = @BaseTableName + '_' + @TableNameSuffix
   IF NOT EXISTS (
      SELECT *
      FROM INFORMATION_SCHEMA.COLUMNS
      WHERE
         TABLE_NAME = @FullTableName AND
         TABLE_SCHEMA = @SchemaName AND
         COLUMN_NAME = N'TfsWorkItemId'
   ) BEGIN
      SET @Statement = 'ALTER TABLE ' + QUOTENAME(@SchemaName) + '.' + QUOTENAME(@FullTableName) + ' ADD TfsWorkItemId NVARCHAR(256) NULL'
      EXECUTE (@Statement)
   END
   IF NOT EXISTS (
      SELECT *
      FROM INFORMATION_SCHEMA.COLUMNS
      WHERE
         TABLE_NAME = @FullTableName AND
         TABLE_SCHEMA = @SchemaName AND
         COLUMN_NAME = N'TfsWorkItemOwner'
   ) BEGIN
      SET @Statement = 'ALTER TABLE ' + QUOTENAME(@SchemaName) + '.' + QUOTENAME(@FullTableName) + ' ADD TfsWorkItemOwner NVARCHAR(256) NULL'
      EXECUTE (@Statement)
   END
END
EXEC StandardDatasetBuildCoverView @GUIDString, 0

 

This should get you on your way 😀

How object GUIDs are created in Operations Manager and Service Manager

$
0
0

There are some questions on how the object GUIDs are calculated in Operations Manager and Service Manager when thinking about moving MPs to different Management Groups and/or side-by-side migration scenarios. Would the Groups created have the same GUIDs?

When I talk about the GUID of an object, I am talking about the BaseManagedEntityId value of the BaseManagedEntity table and the TypedManagedEntityId from the TypedManagedEntity table in the operational database.

The string which is hashed for Singleton Classes like Groups is the same as the class ManagedTypeId from the ManagedType table because for singleton classes we assume that the InstanceId is the same as the TypeId because we will always have *only* one object of that class.

  • when I will talk about TypeId I will be referencing the ManagedTypeId from the ManagedType table
  • while when talking about the InstanceId I will be referencing the BaseManagedEnittyId from the BaseManagedEntity table or the TypedManagedEntityId from the TypedManagedEntity table

When we want to get the InstanceId of a singleton class like a Group, we need to know how the TypeId was calculated – it is a hash (SHA1) of this string format:

       MPName=<MPName>,[KeyToken=<KeyToken>,]ObjectId=<ObjectId>

  • when I talk about the TypeId, this is directly the MPObjectId which is calculated based on the string ID of the element in the XML definition
  • so this is exactly how definitions (classes, relationships, rules, monitors, etc.) get their GUID from the string ID from the XML definition

Here is an example for the All Windows Computers Group:

Now for any non-singleton class like the HealthService or WindowsComputer classes, this is the string used to calculate the InstanceId:

       TypeId={<TypeGuid>}[,{<KeyPropertyId>}=<KeyValue>][,{<KeyPropertyId_2>}=<KeyValue_2> ]

Here is an example on how the HealthService GUID is being calculated for an Agent based on its FQDN:

I hope this answers some questions about what happens in different scenarios like upgrading the OS on an Agent, migration, moving groups from a Management Group to another and so on 😀

Some Alerts are not appearing in Reports in Operations Manager and we see 31551 events in the event log

$
0
0

Here is an interesting issue I came across. I did not encounter this too often but still, it's very interesting and it was very fun to troubleshoot! 😀

In case you get the same error I suggest opening a case with Microsoft Support because it will be easier. Still, let me tell the story just for fun.

Some Alerts were not appearing in Reports and when we checked these were definitely not written to the Data Warehouse database – the AlertStage table for example where it ends up in the first place when Alerts are synched from the operational database to the data warehouse database. Ok then, so first thing to check is the Operations Manager event log on all Management Servers in case you have OM 2012 because any Management Server from the All Management Server Resource Pool can get the role of Data Warehouse Synchronization Server *OR* if you are on OM 2007 still, then *only* the Root Management Server will have this role.

Interestingly enough, we were seeing error events 31551 in the Operations Manager event log on the Server which had the Data Warehouse Synchronization Server role:

  • please keep in mind that the error event ID (31551) is pretty general and it is not the event ID that matters, it is its description that tells us if we have the same issue or not

  Failed to store data in the Data Warehouse. The operation will be retried.
  Exception 'InvalidOperationException': The given value of type String from the data source cannot be converted to type nvarchar of the specified target column.

 

So what does this actually mean?!

Well the way that the Alert data is inserted into the Data Warehouse when synchronized from the operational database is by using SQL Bulk Insert because of the amount of data needed to be inserted. Well when Bulk Insert is used the class used will check the values of the data being inserted against the database tables field types and limitations. It will perform a check in this case so that all the values of an Alert (Item) will "fit" into the fields of the AlertStage table of the data warehouse database. If any of the fields (like Alert Title) are bigger than the MAX value of the table (AlertStage) then it will fail with that error message. The AlertStage Title field has NVARCHAR(255) as maximum length and if we have an Alert which has a title bigger than 255, then we will encounter this issue – as of course any other field which does not fit – like the CustomProperty fields of an Alert.

 

Here is a screenshot as FYI for the field types and definitions of the AlertStage table from the data warehouse database:

 

So now the question is … how do we figure out *which* of the Alerts has a longer string *than* *which* property (field) ?! Well because we are doing a Bulk Insert here from the .NET SQL Class we cannot see this by tracing the SQL Server there … it never ends up trying to insert this into SQL because it fails locally in the code on the Management Server which is trying to do this action. Right … so how to figure this one out?!

Debugging my friends, the answer to everything, hehe! 😀

Ok so we need to look at … what process here? Well as you know (or will find out now) the Management Servers as well as the Agents have the "core" service called HealthService.exe. This process (service) however, does *NOT* do the heavy-lifting. It will spawn MonitoringHost.exe processes which will do the work for it after loading the necessary modules (DLL's) in them which are needed for the specific tasks they are assigned to do.

So which MonitoringHost.exe process will it be which runs the Data Warehouse Synchronization workflow? Well this would have to be a MonitoringHost.exe process which is running under the Data Warehouse Writer Account would it not? 😉

  • in case you have the same account used for more roles or something you can use Process Explorer to check out the loaded DLLs in the MonitoringHost.exe processes and find out which has the Microsoft.EnterpriseManagement.HealthService.Modules.DataWarehouse.dll DLL loaded

Because of the fact that the process is *NOT* crashing when throwing this Exception – as it is handled – we cannot simply get a memory dump of that process. So we either attach the debugger (WinDbg) live to the environment *OR* we can use another nifty tool which is called Time Travel Tracing to get the "dynamic" memory dump of that process so that we can "step" through the process execution. We could also use a tool like DebugDiag which can be set to write a memory dump of a process on a custom .NET Exception like in this example.

Either way, whatever method we will use here, we end up in the "same" scenario where we need to set up a breakpoint of the Exception we are interested in (if not using DebugDiag which writes the memory dump on exactly that Exception). So I will continue with this further on with the example of attaching to the MonitoringHost.exe process in question directly in the live environment.

Let's start shall we? 😀 We are in the debugger now and let's make sure we have loaded the correct SOS .NET Debugger Extension version:    .cordll -ve -u -l

Now we can continue by creating the breakpoint to stop when hitting the exception we are interested in which in this case is System.InvalidOperationException:    !StopOnException -create System.InvalidOperationException 1

And then we enter g in the debugger so that the process continues execution until we hit the breakpoint (Exception). When it stops and the debugger breaks in, we use !PrintException to see the exception details:

0:008> !PrintException
Exception object: 0000000201f1dc38
Exception type:   System.InvalidOperationException
Message:          The given value of type String from the data source cannot be converted to type nvarchar of the specified target column.
InnerException:   System.InvalidOperationException, Use !PrintException 0000000201f1d6d8 to see more.
StackTrace (generated):
    SP               IP               Function
    00000000075FC7A0 000007FEF338D5A8 system_data_ni!System.Data.SqlClient.SqlBulkCopy.ConvertValue(System.Object, System.Data.SqlClient._SqlMetaData)+0x19ec38
    00000000075FED10 000007FEF31EDD39 system_data_ni!System.Data.SqlClient.SqlBulkCopy.WriteToServerInternal()+0x989
    00000000075FEE00 000007FEF31EE263 system_data_ni!System.Data.SqlClient.SqlBulkCopy.WriteRowSourceToServer(Int32)+0x463
    00000000075FEED0 000007FEF31EE945 system_data_ni!System.Data.SqlClient.SqlBulkCopy.WriteToServer(System.Data.IDataReader)+0x125
    00000000075FEF40 000007FF0018DA89 microsoft_enterprisemanagement_datawarehouse_dataaccess!Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.Commands.DataWarehouseSqlBulkInsertCommand.Execute()+0x99

StackTraceString: <none>
HResult: 80131509

As we can see we have a nested Exception here, so let's have a look at this as well:

0:008> !PrintException 0000000201f1d6d8
Exception object: 0000000201f1d6d8
Exception type:   System.InvalidOperationException
Message:          String or binary data would be truncated.
InnerException:   <none>
StackTrace (generated):
    SP               IP               Function
    00000000075FEBB0 000007FEF31EEF5A system_data_ni!System.Data.SqlClient.SqlBulkCopy.ConvertValue(System.Object, System.Data.SqlClient._SqlMetaData)+0x5ea

StackTraceString: <none>
HResult: 80131509

Ok, so this is interning – now how do we find out which table field is affected here? It is going to be a System.Data.SqlClient._SqlMetaData object and should be in the Managed Heap here of this Thread we stopped on. Thus we can use !DumpStackObjects to see all the objects:

0:008> !DumpStackObjects
OS Thread Id: 0x3cb4 (8)
RSP/REG          Object           Name

[…SNIPPED…]

00000000075FA218 00000001c01beca0 System.Data.SqlClient.TdsParserStateObject
00000000075FA370 00000001c01beb90 System.Data.SqlClient.TdsParser
00000000075FA490 0000000201f1dc38 System.InvalidOperationException
00000000075FA8B0 0000000201f12240 System.Data.SqlClient._SqlMetaData
00000000075FA8C8 0000000201f1d400 System.Int32[]
00000000075FA8F8 0000000201f1d2c0 System.Object[]    (System.Object[])
00000000075FABD0 0000000201ee0948 System.Data.SqlClient.SqlBulkCopy
00000000075FABF0 0000000201ee0be8 System.Data.SqlClient.SqlBulkCopyColumnMappingCollection
00000000075FAE00 0000000201ee0948 System.Data.SqlClient.SqlBulkCopy
00000000075FAE20 0000000201ee0be8 System.Data.SqlClient.SqlBulkCopyColumnMappingCollection
00000000075FB128 0000000201f1dc38 System.InvalidOperationException
00000000075FB410 00000001bfe40488 System.String   
00000000075FB4D0 0000000201ee0948 System.Data.SqlClient.SqlBulkCopy
00000000075FB4F0 0000000201ee0be8 System.Data.SqlClient.SqlBulkCopyColumnMappingCollection
00000000075FB880 0000000201f1eaf0 System.Environment+ResourceHelper+GetResourceStringUserData
00000000075FBA70 0000000201f1eb68 System.Runtime.CompilerServices.RuntimeHelpers+CleanupCode
00000000075FBA78 0000000201f1eaf0 System.Environment+ResourceHelper+GetResourceStringUserData
00000000075FBAA0 0000000201f1eb68 System.Runtime.CompilerServices.RuntimeHelpers+CleanupCode
00000000075FBAB0 0000000201f1eb68 System.Runtime.CompilerServices.RuntimeHelpers+CleanupCode
00000000075FBB08 0000000201f1eb28 System.Runtime.CompilerServices.RuntimeHelpers+TryCode
00000000075FBB10 0000000201f1eaf0 System.Environment+ResourceHelper+GetResourceStringUserData

[…SNIPPED…]

Ok, from that big list, let's grab the object ID of the object we are interested in (System.Data.SqlClient._SqlMetaData) – there will be more entries, but it should have the same Object ID: !DumpObj 0000000201f12240

0:008> !DumpObj 0000000201f12240
Name:        System.Data.SqlClient._SqlMetaData
MethodTable: 000007fef2d98b90
EEClass:     000007fef2c36e60
Size:        224(0xe0) bytes
File:        C:\Windows\Microsoft.Net\assembly\GAC_64\System.Data\v4.0_4.0.0.0__b77a5c561934e089\System.Data.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
000007fef2d91c50  4001bc9       80         System.Int32  1 instance               12 type
000007fef880c158  4001bca       8c          System.Byte  1 instance              231 tdsType
000007fef880c158  4001bcb       8d          System.Byte  1 instance              255 precision
000007fef880c158  4001bcc       8e          System.Byte  1 instance              255 scale
000007fef880c7d8  4001bcd       84         System.Int32  1 instance              512 length
000007fef2d925a0  4001bce        8 …ient.SqlCollation  0 instance 0000000201f139c0 collation
000007fef880c7d8  4001bcf       88         System.Int32  1 instance             1251 codePage
0000000000000000  4001bd0       10                       0 instance 000000017fe6eb28 encoding
000007fef880d608  4001bd1       8f       System.Boolean  1 instance                1 isNullable
000007fef880d608  4001bd2       90       System.Boolean  1 instance                0 isMultiValued
000007fef88068f0  4001bd3       18        System.String  0 instance 0000000000000000 udtDatabaseName
000007fef88068f0  4001bd4       20        System.String  0 instance 0000000000000000 udtSchemaName
000007fef88068f0  4001bd5       28        System.String  0 instance 0000000000000000 udtTypeName
000007fef88068f0  4001bd6       30        System.String  0 instance 0000000000000000 udtAssemblyQualifiedName
000007fef8808278  4001bd7       38          System.Type  0 instance 0000000000000000 udtType
000007fef88068f0  4001bd8       40        System.String  0 instance 0000000000000000 xmlSchemaCollectionDatabase
000007fef88068f0  4001bd9       48        System.String  0 instance 0000000000000000 xmlSchemaCollectionOwningSchema
000007fef88068f0  4001bda       50        System.String  0 instance 0000000000000000 xmlSchemaCollectionName
000007fef2d928b8  4001bdb       58 …qlClient.MetaType  0 instance 000000017fe701c0 metaType
000007fef88068f0  4001bdc       60        System.String  0 instance 0000000000000000 structuredTypeDatabaseName
000007fef88068f0  4001bdd       68        System.String  0 instance 0000000000000000 structuredTypeSchemaName
000007fef88068f0  4001bde       70        System.String  0 instance 0000000000000000 structuredTypeName
0000000000000000  4001bdf       78                       0 instance 0000000000000000 structuredFields
000007fef88068f0  4001be0       98        System.String  0 instance 0000000201f139f0 column
000007fef88068f0  4001be1       a0        System.String  0 instance 0000000000000000 baseColumn
000007fef32c3cb0  4001be2       b0 …ultiPartTableName  1 instance 0000000201f122f0 multiPartTableName
000007fef880c7d8  4001be3       94         System.Int32  1 instance               25 ordinal
000007fef880c158  4001be4       91          System.Byte  1 instance                2 updatability
000007fef880c158  4001be5       a8          System.Byte  1 instance                0 tableNum
000007fef880d608  4001be6       a9       System.Boolean  1 instance                0 isDifferentName
000007fef880d608  4001be7       aa       System.Boolean  1 instance                0 isKey
000007fef880d608  4001be8       ab       System.Boolean  1 instance                0 isHidden
000007fef880d608  4001be9       ac       System.Boolean  1 instance                0 isExpression
000007fef880d608  4001bea       ad       System.Boolean  1 instance                0 isIdentity
000007fef880d608  4001beb       ae       System.Boolean  1 instance                0 isColumnSet
000007fef880c158  4001bec       af          System.Byte  1 instance                0 op
000007fef88142d0  4001bed       92        System.UInt16  1 instance                0 operand

Ok, so using the column field of this object we can see that it is the CustomProperty9 column which we know has a max 255 character limitation – here we need to dump the pointer address of the object + the offset where the column table is found:

0:008> !DumpObj poi(0000000201f12240+98)
Name:        System.String
MethodTable: 000007fef88068f0
EEClass:     000007fef838ed78
Size:        50(0x32) bytes
File:        C:\Windows\Microsoft.Net\assembly\GAC_64\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll
String:      CustomField9
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
000007fef880c7d8  4000103        8         System.Int32  1 instance               12 m_stringLength
000007fef880b318  4000104        c          System.Char  1 instance               43 m_firstChar
000007fef88068f0  4000105       10        System.String  0   shared           static Empty
                                 >> Domain:Value  00000000003e9470:00000001bfe40488 <<

So now we need to get a list of Alerts which the process is trying to insert and check the CustomProperty9 of all of them and find which is bigger than 255. We need to dump the object which is having all the alerts which is System.Data.SqlClient.SqlBulkCopy – just like before !DumpStackObjects and look for the object address:

0:008> !DumpStackObjects
OS Thread Id: 0x3cb4 (8)
RSP/REG          Object           Name
00000000075F5908 0000000201f1dc38 System.InvalidOperationException
00000000075F59A8 0000000201f1dc38 System.InvalidOperationException
00000000075F5A30 0000000201ee0080 Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.DataWarehouseCommandExecutionContext
00000000075F5A38 0000000201f1dc38 System.InvalidOperationException
00000000075F5A48 0000000201ee0018 Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.DataWarehouseSingleCommand+ExecutionStep
00000000075F5A50 0000000201ee0098 System.Collections.Generic.Dictionary`2[[Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.DataWarehouseConnectionDescriptor, Microsoft.EnterpriseManagement.DataWarehouse.DataAccess],[Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.DataWarehouseConnection, Microsoft.EnterpriseManagement.DataWarehouse.DataAccess]]
00000000075F5C90 0000000201f1dc38 System.InvalidOperationException
00000000075F60B0 00000002000b13d8 Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.Commands.DataWarehouseSqlBulkInsertCommand
00000000075F60C8 0000000201ee0018 Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.DataWarehouseSingleCommand+ExecutionStep
00000000075F60D0 0000000201ee0098 System.Collections.Generic.Dictionary`2[[Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.DataWarehouseConnectionDescriptor, Microsoft.EnterpriseManagement.DataWarehouse.DataAccess],[Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.DataWarehouseConnection, Microsoft.EnterpriseManagement.DataWarehouse.DataAccess]]
00000000075F63D0 0000000201ee0080 Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.DataWarehouseCommandExecutionContext
00000000075F63E8 0000000201ee0018 Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.DataWarehouseSingleCommand+ExecutionStep
00000000075F63F0 0000000201ee0098 System.Collections.Generic.Dictionary`2[[Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.DataWarehouseConnectionDescriptor, Microsoft.EnterpriseManagement.DataWarehouse.DataAccess],[Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.DataWarehouseConnection, Microsoft.EnterpriseManagement.DataWarehouse.DataAccess]]
00000000075F6600 0000000201ee0080 Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.DataWarehouseCommandExecutionContext
00000000075F6618 0000000201ee0018 Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.DataWarehouseSingleCommand+ExecutionStep
00000000075F6620 0000000201ee0098 System.Collections.Generic.Dictionary`2[[Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.DataWarehouseConnectionDescriptor, Microsoft.EnterpriseManagement.DataWarehouse.DataAccess],[Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.DataWarehouseConnection, Microsoft.EnterpriseManagement.DataWarehouse.DataAccess]]
00000000075F6928 0000000201f1dc38 System.InvalidOperationException
00000000075F6CD0 0000000201ee0080 Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.DataWarehouseCommandExecutionContext
00000000075F6CE8 0000000201ee0018 Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.DataWarehouseSingleCommand+ExecutionStep
00000000075F6CF0 0000000201ee0098 System.Collections.Generic.Dictionary`2[[Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.DataWarehouseConnectionDescriptor, Microsoft.EnterpriseManagement.DataWarehouse.DataAccess],[Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.DataWarehouseConnection, Microsoft.EnterpriseManagement.DataWarehouse.DataAccess]]
00000000075F7640 0000000201f12240 System.Data.SqlClient._SqlMetaData
00000000075F7968 0000000201f1dc38 System.InvalidOperationException
00000000075F7B10 0000000201ee0948 System.Data.SqlClient.SqlBulkCopy
00000000075F7B40 0000000201f1dc38 System.InvalidOperationException
00000000075F7B58 000000014007fa40 System.RuntimeType

[…SNIPPED…]

Ok now let's have a look at the list shall we – dumping the object first as before using !DumpObj and the object address:

0:008> !DumpObj 0000000201ee0948
<Note: this object has an invalid CLASS field>
Name:        System.Data.SqlClient.SqlBulkCopy
MethodTable: 000007fef32c3868
EEClass:     000007fef2c4eae0
Size:        176(0xb0) bytes
File:        C:\Windows\Microsoft.Net\assembly\GAC_64\System.Data\v4.0_4.0.0.0__b77a5c561934e089\System.Data.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
000007fef880c7d8  40016e9       78         System.Int32  1 instance                0 _batchSize
000007fef880d608  40016ea       a0       System.Boolean  1 instance                0 _ownConnection
000007fef32a4370  40016eb       7c         System.Int32  1 instance                0 _copyOptions
000007fef880c7d8  40016ec       80         System.Int32  1 instance               30 _timeout
000007fef88068f0  40016ed        8        System.String  0 instance 000000013fe4ea28 _destinationTableName
000007fef880c7d8  40016ee       84         System.Int32  1 instance                0 _rowsCopied
000007fef880c7d8  40016ef       88         System.Int32  1 instance                0 _notifyAfter
000007fef880c7d8  40016f0       8c         System.Int32  1 instance                0 _rowsUntilNotification
000007fef880d608  40016f1       a1       System.Boolean  1 instance               32 _insideRowsCopiedEvent
000007fef8805a48  40016f2       10        System.Object  0 instance 00000002000b4c10 _rowSource
000007fef2d91038  40016f3       18 …ent.SqlDataReader  0 instance 0000000000000000 _SqlDataReaderRowSource
000007fef32c3a48  40016f4       20 …MappingCollection  0 instance 0000000201ee09f8 _columnMappings
000007fef32c3a48  40016f5       28 …MappingCollection  0 instance 0000000201ee0be8 _localColumnMappings
000007fef2d8fba8  40016f6       30 …ent.SqlConnection  0 instance 0000000201ee0610 _connection
000007fef32c3b70  40016f7       38 …nt.SqlTransaction  0 instance 0000000000000000 _internalTransaction
000007fef32c3b70  40016f8       40 …nt.SqlTransaction  0 instance 0000000000000000 _externalTransaction
000007fef32a4058  40016f9       90         System.Int32  1 instance                1 _rowSourceType
000007fef2d9b880  40016fa       48  System.Data.DataRow  0 instance 74042e00b98a6449 _currentRow
000007fef880c7d8  40016fb       94         System.Int32  1 instance      -1738182535 _currentRowLength
000007fef3290ab0  40016fc       98         System.Int32  1 instance       1765434373 _rowState
000007fef880f240  40016fd       50 …tions.IEnumerator  0 instance 693a640598657079 _rowEnumerator
000007fef2d973e8  40016fe       58 …lClient.TdsParser  0 instance 00000001c01beb90 _parser
000007fef2d97ac8  40016ff       60 …ParserStateObject  0 instance 0000000000000000 _stateObj
000007fef8810158  4001700       68 …ections.ArrayList  0 instance 0000000201f18ca8 _sortedColumnMappings
000007fef32c3630  4001701       70 …opiedEventHandler  0 instance 4c4d582f31303032 _rowsCopiedEventHandler
000007fef880c7d8  4001703       9c         System.Int32  1 instance           212855 _objectID
000007fef880c7d8  4001702      9d8         System.Int32  1   static           212855 _objectTypeCount

So the _rowSource is what we are interested in – let's dump this shall we? Again using !DumpObj with the object address + field offset:

0:008> !DumpObj poi(0000000201ee0948+10)
Name:        Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.Commands.BulkDataReader
MethodTable: 000007ff002c4338
EEClass:     000007ff002a4d90
Size:        64(0x40) bytes
File:        C:\Program Files\Microsoft System Center 2012 R2\Operations Manager\Server\Microsoft.EnterpriseManagement.DataWarehouse.DataAccess.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
000007fef880d608  40000e5       10       System.Boolean  1 instance                0 isClosed
000007ff002c3b70  40000e6        8 …DataReaderAdaptor  0 instance 00000002000b1480 dataAdaptor
000007fef8828658  40000e7       18          System.Guid  1 instance 00000002000b4c28 dataSetId
000007fef8828658  40000e8       28          System.Guid  1 instance 00000002000b4c38 managementGroupId

Ok what now? Now we need to look at the dataAdapter, like before 😉 But wait … we used poi() before – well, we'll use it twice this time: 

0:008> !DumpObj poi(poi(0000000201ee0948+10)+8)
Name:        Microsoft.EnterpriseManagement.HealthService.Modules.DataWarehouse.AlertDataReaderAdaptor
MethodTable: 000007ff002cfa98
EEClass:     000007ff00301318
Size:        48(0x30) bytes
File:        C:\Program Files\Microsoft System Center 2012 R2\Operations Manager\Server\Microsoft.EnterpriseManagement.HealthService.Modules.DataWarehouse.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
000007fef2d9a548  4000035        8 …em.Data.DataTable  0 instance 00000002000b1928 schemaTable
000007fef880adf8  4000036       10      System.Object[]  0 instance 00000002001d89e8 dataItems
000007fef880c7d8  4000037       18         System.Int32  1 instance                0 currentIndex
000007fef880c7d8  4000038       1c         System.Int32  1 instance                0 startIndex
000007fef880c7d8  4000039       20         System.Int32  1 instance             1000 itemsToProcessCount

We are interested in the dataItems object – but wait … it is an array as we can see from the [] characters in System.Object[] so we can use a nifty other command called !DumpArray here – again with triple pointer function:

0:008> !DumpArray poi(poi(poi(0000000201ee0948+10)+8)+10)
Name:        Microsoft.EnterpriseManagement.HealthService.DataItemBase[]
MethodTable: 000007fef880adf8
EEClass:     000007fef838fc68
Size:        8032(0x1f60) bytes
Array:       Rank 1, Number of elements 1000, Type CLASS
Element Methodtable: 000007ff000474b8
[0] 0000000140b851a0
[1] 0000000140b8a950
[2] 0000000140b8d420
[3] 0000000140b8feb8
[4] 0000000140b92950
[5] 0000000140b94eb0
[6] 0000000140b97428
[7] 0000000140b999a0
[8] 0000000140b9aba8
[9] 0000000140b9d6b8
[10] 0000000140ba0150
[11] 0000000140ba2be8
[12] 0000000140ba5648

[…SNIPPED…]

Ok interesting, so this is an Array which is holding items of type Microsoft.EnterpriseManagement.HealthService.DataItemBase but when we dump the array we only get the address of each such object, so we would need to dump each of them using !DumpObj and check them out – wow! … really? there are a looot of them … Ok, let's worry about that a little later and now just dump one of the items (first one as example) just to see what we need to check there for each one:

0:008> !DumpObj 0000000140b851a0
Name:        Microsoft.EnterpriseManagement.Mom.AlertSubscriptionModule.DataItemAlertSubscription
MethodTable: 000007ff002cdc70
EEClass:     000007ff002fe738
Size:        568(0x238) bytes
File:        C:\Program Files\Microsoft System Center 2012 R2\Operations Manager\Server\Microsoft.Mom.AlertSubscriptionDataSourceModule.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
000007fef88296c8  400001a       18      System.DateTime  1 instance 0000000140b851b8 dateCreated
000007fef8828658  400001b       20          System.Guid  1 instance 0000000140b851c0 sourceHealthServiceId
000007fef88068f0  400001c        8        System.String  0 instance 00000001400a54d8 itemXml
000007fef88068f0  400001d       10        System.String  0 instance 0000000140b862b8 dataTypeName
000007fef8828658  4000019        8          System.Guid  1   static 000000023fe46128 localHealthServiceId
000007ff002cd6f8  400009b      118         System.Int32  1 instance                0 subscriptionType
000007ff002cd850  400009c      11c         System.Int32  1 instance                0 subscriptionProperty
000007fef8828658  400009d      138          System.Guid  1 instance 0000000140b852d8 id
000007fef88068f0  400009e       30        System.String  0 instance 0000000140b864b8 name
000007fef88068f0  400009f       38        System.String  0 instance 0000000140b86520 description
000007fef8828658  40000a0      148          System.Guid  1 instance 0000000140b852e8 baseManagedEntityId
000007fef8828658  40000a1      158          System.Guid  1 instance 0000000140b852f8 problemId
000007fef880d608  40000a2      130       System.Boolean  1 instance                0 createdByMonitor
000007fef8828658  40000a3      168          System.Guid  1 instance 0000000140b85308 workflowId
000007fef8827af0  40000a4      120        System.UInt32  1 instance                0 resolutionState
000007ff002cd9a8  40000a5      124         System.Int32  1 instance                2 priority
000007ff002cdb00  40000a6      128         System.Int32  1 instance                1 severity
000007fef88068f0  40000a7       40        System.String  0 instance 0000000140b86778 category
000007fef88068f0  40000a8       48        System.String  0 instance 0000000000000000 owner
000007fef88068f0  40000a9       50        System.String  0 instance 0000000000000000 resolvedBy
000007fef88296c8  40000aa      178      System.DateTime  1 instance 0000000140b85318 timeRaised
000007fef88296c8  40000ab      180      System.DateTime  1 instance 0000000140b85320 timeAdded
000007fef88296c8  40000ac      188      System.DateTime  1 instance 0000000140b85328 lastModified
000007fef88296c8  40000ad      190      System.DateTime  1 instance 0000000140b85330 lastModifiedExceptRepeatCount
000007fef88068f0  40000ae       58        System.String  0 instance 0000000140b86f70 lastModifiedBy
000007fef88296c8  40000af      198      System.DateTime  1 instance 0000000140b85338 timeResolved
000007fef88296c8  40000b0      1a0      System.DateTime  1 instance 0000000140b85340 timeResolutionStateLastModified
000007fef88296c8  40000b1      1a8      System.DateTime  1 instance 0000000140b85348 timeResolutionStateLastModifiedInDB
000007fef88068f0  40000b2       60        System.String  0 instance 0000000140b86f98 customField1
000007fef88068f0  40000b3       68        System.String  0 instance 0000000140b86fe0 customField2
000007fef88068f0  40000b4       70        System.String  0 instance 0000000140b87020 customField3
000007fef88068f0  40000b5       78        System.String  0 instance 0000000140b87060 customField4
000007fef88068f0  40000b6       80        System.String  0 instance 0000000140b870a0 customField5
000007fef88068f0  40000b7       88        System.String  0 instance 0000000140b870f8 customField6
000007fef88068f0  40000b8       90        System.String  0 instance 0000000140b87128 customField7
000007fef88068f0  40000b9       98        System.String  0 instance 0000000140b87160 customField8
000007fef88068f0  40000ba       a0        System.String  0 instance 0000000140b871a0 customField9
000007fef88068f0  40000bb       a8        System.String  0 instance 0000000140b873c0 customField10
000007fef88068f0  40000bc       b0        System.String  0 instance 0000000000000000 ticketId
000007fef88068f0  40000bd       b8        System.String  0 instance 0000000140b87438 context
000007fef88296c8  40000be      1b0      System.DateTime  1 instance 0000000140b85350 lastModifiedByNonConnector
000007fef8828658  40000bf      1b8          System.Guid  1 instance 0000000140b85358 connectorId
000007fef880c7d8  40000c0      12c         System.Int32  1 instance                0 repeatCount
000007fef8828658  40000c1      1c8          System.Guid  1 instance 0000000140b85368 alertStringId
000007fef88068f0  40000c2       c0        System.String  0 instance 0000000140b88848 alertParams
000007fef88068f0  40000c3       c8        System.String  0 instance 0000000000000000 siteName
000007fef88068f0  40000c4       d0        System.String  0 instance 0000000140b88b30 baseManagedEntityFullName
000007fef88068f0  40000c5       d8        System.String  0 instance 0000000000000000 baseManagedEntityPath
000007fef88068f0  40000c6       e0        System.String  0 instance 0000000140b88bb8 baseManagedEntityName
000007fef88068f0  40000c7       e8        System.String  0 instance 0000000140b88bf8 baseManagedEntityDisplayName
000007fef88068f0  40000c8       f0        System.String  0 instance 0000000140b88c30 resolutionStateName
000007fef88068f0  40000c9       f8        System.String  0 instance 0000000000000000 timeZone
000007fef88068f0  40000ca      100        System.String  0 instance 0000000140b88c50 languageCode
000007fef88296c8  40000cb      1d8      System.DateTime  1 instance 0000000140b85378 timeRaisedLocal
000007fef88296c8  40000cc      1e0      System.DateTime  1 instance 0000000140b85380 timeAddedLocal
000007fef88296c8  40000cd      1e8      System.DateTime  1 instance 0000000140b85388 lastModifiedLocal
000007fef88296c8  40000ce      1f0      System.DateTime  1 instance 0000000140b85390 lastModifiedExceptRepeatCountLocal
000007fef88296c8  40000cf      1f8      System.DateTime  1 instance 0000000140b85398 timeResolvedLocal
000007fef88296c8  40000d0      200      System.DateTime  1 instance 0000000140b853a0 timeResolutionStateLastModifiedLocal
000007fef88296c8  40000d1      208      System.DateTime  1 instance 0000000140b853a8 timeResolutionStateLastModifiedInDBLocal
000007fef88296c8  40000d2      210      System.DateTime  1 instance 0000000140b853b0 lastModifiedByNonConnectorLocal
000007fef88292a0  40000d3      218      System.TimeSpan  1 instance 0000000140b853b8 queryExecutionTimeSpan
000007fef88296c8  40000d4      220      System.DateTime  1 instance 0000000140b853c0 dataItemCreateTime
000007fef88296c8  40000d5      228      System.DateTime  1 instance 0000000140b853c8 dataItemCreateTimeLocal
000007fef88068f0  40000d6      108        System.String  0 instance 0000000000000000 tfsWorkItemId
000007fef88068f0  40000d7      110        System.String  0 instance 0000000000000000 tfsWorkItemOwner

Alright, so we know that the CustomField9 which we are interested in had offset a0 – so now we can dump it directly:

0:008> !DumpObj poi(0000000140b851a0+a0)
Name:        System.String
MethodTable: 000007fef88068f0
EEClass:     000007fef838ed78
Size:        540(0x21c) bytes
File:        C:\Windows\Microsoft.Net\assembly\GAC_64\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll
String:      SOME_STRING_HERE_WON'T_TELL_YOU_THE_VALUE_HA!
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
000007fef880c7d8  4000103        8         System.Int32  1 instance              257 m_stringLength
000007fef880b318  4000104        c          System.Char  1 instance               57 m_firstChar
000007fef88068f0  4000105       10        System.String  0   shared           static Empty
                                 >> Domain:Value  00000000003e9470:00000001bfe40488 <<

Cool, so now we know what we want and need to dump here – so what's next? Well we have this nifty .foreach and .shell command to use the Windows CMD Find command on the output as we can see in these examples WinDbg Scripting 😀

  • because the list all object addresses command will also output other crap when it gets to object index and/or Alert DataItems with EMPTY CustomProperty9 fields, we need to exclude "Invalid parameter" from the output

0:008> .shell -ci ".foreach (obj { !DumpArray poi(poi(poi(0000000201ee0948+10)+8)+10) }) { !DumpObj poi(${obj}+a0) }" FIND /V "Invalid parameter"
<Note: this object has an invalid CLASS field>
Invalid object
Integrated managed debugging does not support enumeration of symbols.
See http://dbg/managed.htm for more details.
<Note: this object has an invalid CLASS field>
Invalid object
Name:        System.String
MethodTable: 000007fef88068f0
EEClass:     000007fef838ed78
Size:        540(0x21c) bytes
File:        C:\Windows\Microsoft.Net\assembly\GAC_64\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll
String:      SOME_STRING_HERE_WON'T_TELL_YOU_THE_VALUE_HA!
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
000007fef880c7d8  4000103        8         System.Int32  1 instance              257 m_stringLength
000007fef880b318  4000104        c          System.Char  1 instance               57 m_firstChar
000007fef88068f0  4000105       10        System.String  0   shared           static Empty
                                 >> Domain:Value  00000000003e9470:00000001bfe40488 <<
Name:        System.String
MethodTable: 000007fef88068f0
EEClass:     000007fef838ed78
Size:        540(0x21c) bytes
File:        C:\Windows\Microsoft.Net\assembly\GAC_64\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll
String:      SOME_STRING_HERE_WON'T_TELL_YOU_THE_VALUE_HA!
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
000007fef880c7d8  4000103        8         System.Int32  1 instance              257 m_stringLength
000007fef880b318  4000104        c          System.Char  1 instance               57 m_firstChar
000007fef88068f0  4000105       10        System.String  0   shared           static Empty
                                 >> Domain:Value  00000000003e9470:00000001bfe40488 <<
.shell: Process exited

You do know the FIND command in CMD right? 😉

Now from the output there check the length of the strings and when you found the one which has a size bigger than 255 that's your hit there!

Well it turns out that we had a 3rd Party Management Pack there which was using the CustomField9 for … well something whatever … and thus we either remove this MP or we engage the Vendor.

Problem solved! 😀

By the way – there are some cool WinDbg extensions out there which may help you a lot with debugging stuff in general so that you don't need to use WinDbg scripting. These are more than awesome help for .NET Debugging.

A good example of such an extension would be SOSEx.

Have fun debugging!!! 🙂

 

 

Fixing Service Manager Data Warehouse Registration Information

$
0
0

There are quite a few situations where we end up having "corrupt" – actually orphaned or missing – information about the DW Registration.

I have created a tool which you can use to fix *mostly* all such issues automatically. Using this tool you can delete all basic information about the DW Registration or if you are sure about what is going on, just re-create all basic information about it. The cool part with deleting the existing information which may be incomplete, is that you can then simply re-register the DW from the SM Console.

But – let's first talk about how this actually works under the hood! 😉

When we register the DW from the Console in Administration tab, there are several things which happen:

  • A new object of class Microsoft.SystemCenter.ResourceAccessLayer.DwSdkResourceStore is created into the ServiceManager database – now, the SM Console knows where/how to connect to the DW SDK service and the Data Warehouse Tab appears in the Console
  • A new object of class Microsoft.SystemCenter.DataWarehouse.CmdbSource is created into the DWStagingAndConfig database
  • We create a new SecureReference for the Account we have chosen in the DW Registration Wizard in the Console
  • A new object of class Microsoft.SystemCenter.ResourceAccessLayer.SqlResourceStore is created into the DWStagingAndConfig database
  • A new relationship of class Microsoft.SystemCenter.ResourceAccessLayer.StoreHasProperty is created between the new SqlResourceStore object and the ExtractionSource object of class Microsoft.SystemCenter.ResourceAccessLayer.StoreProperty
  • A new object of class Microsoft.SystemCenter.ResourceAccessLayer.SdkResourceStore is created into the DWStagingAndConfig database
  • A new relationship of class Microsoft.SystemCenter.ResourceAccessLayer.StoreHasProperty is created between the new SdkResourceStore object and the Sdk object of class Microsoft.SystemCenter.ResourceAccessLayer.StoreProperty
  • If it is a "managed" DataSource (like SvcMgr or OpsMgr style) then we also create a new MPSync Rule and Extract Rule for this DataSource
  • Now in the end, we end up calling the [Staging].[AddDatasource] stored procedure on the DWStagingAndConbfig database which ads an entry to the [Staging].[Datasource] table for this new DataSource with the appropriate type

These are the tables of interest:

1. On the ServiceManager database, make sure we have the DwSdkResourceStore object for the Data Warehouse – this is the way SM (and the Console) "know" that they have a DW registered and where/how to connect to it – this is how the Data Warehouse tab appears in the SM Console:

select *
from MT_Microsoft$SystemCenter$ResourceAccessLayer$DwSdkResourceStore

2. On the DWStagingAndConfig database, make sure we have the CMDBSource object created for the SM DataSource – check the properties in the results and look at DataSourceName, SdkServer, Database, DatabaseServer. There should be at least 2 entries here – 1 for your DW (it's actually registered itself to itself as a DataSource – another story for another time) and 1 entry for your ServiceManager DataSource (Management Group):

select *
from MTV_Microsoft$Systemcenter$Datawarehouse$CMDBSource

3. On the DWStagingAndConfig database, make sure we have the Datasource entry in the Staging.Datasource table because it is an entry which is needed for the registration to actually work. There should be at least 2 entries here – 1 for your DW and 1 entry for your ServiceManager DataSource (Management Group):

select ds.DatasourceId, ds.TimeAdded, cmdb.DataSourceName_AC09B683_AE61_BDCA_6383_2007DB60859D
from Staging.Datasource as ds
join MTV_Microsoft$Systemcenter$Datawarehouse$CMDBSource as cmdb
   on ds.DatasourceId = cmdb.DataSourceId_17109AB9_58CD_F741_8AE3_3A9F29C83709

4. On the DWStagingAndConfig database, make sure that we have the SdkResourceStore object created. It is how the DW SDK knows to what (managed) DataSource to connect to for a certain DataSource as it connects to a SDK service. There should be at least 3 entries here – 1 is an internal source (Ral.SdkResourceStore.Sdk), 1 for your DW and 1 entry for your ServiceManager DataSource (Management Group). Check the properties here and see if they are correct – DisplayName should have YOUR_SM_MG.Sdk and then check also if the Server is correct – should be the SM Workflow Management Server:

select *
from MTV_Microsoft$SystemCenter$ResourceAccessLayer$SdkResourceStore

5. On the DWStagingAndConfig database, make sure that the SqlResourceStore object is created. It is how the DW SDK knows to which SQL Server to connect to for getting the data for the DataSource from its database. There should be at least 2 entries here – 1 for your DW and 1 entry for your ServiceManager DataSource (Management Group). Again, check the properties, DataService, Name, Server:

select *
from MTV_Microsoft$Systemcenter$ResourceAccessLayer$SqlResourceStore
where DisplayName like '%.ExtractionSource'

6. On the DWStagingAndConfig database, make sure you have the relationships needed here of type StoreHasProperty for the SM DataSource. There should be at least 5 entries here. 1 is for the internal DataSource (Ral.SdkResourceStore.Sdk) and then 2 for the DW DataSource (YOUR_DW_MG.Sdk and YOUR_DW_MG.ExtractionSource) and 2 for your SM DataSource (YOUR_SM_MG.Sdk and YOUR_SM_MG.ExtractionSource) – so make sure these exist:

select r.RelationshipId, bmes.FullName as 'SourceEntity', bmet.FullName as 'TargetEntity'
from Relationship as r
join BaseManagedEntity as bmes
   on r.SourceEntityId = bmes.BaseManagedEntityId
join BaseManagedEntity as bmet
   on r.TargetEntityId = bmet.BaseManagedEntityId
where r.TargetEntityId in (
   select BaseManagedEntityId
   from MTV_Microsoft$SystemCenter$ResourceAccessLayer$StoreProperty
   where DisplayName in ('ExtractionSource', 'Sdk')
) and r.RelationshipTypeId = (
   select RelationshipTypeId
   from RelationshipType
   where RelationshipTypeName = 'Microsoft.SystemCenter.ResourceAccessLayer.StoreHasProperty'
)

What the Tool does *not* do is to add or remove the SecureReference, MPSync Rule and Extract Rule for the new DataSource – so make sure that you *know* that the DW was registered and working before.

Another idea and recommendation if you are not sure is to run the tool with "-a:rem" so that we delete the basic registration information and then just re-register it normally from the SM Console, thus creating everything needed including SecureReference, MPSync Rule and Extract Rule for the ServiceManager database.

To run the tool, you need to run it on a computer which has network access to both the SM Management Server and the DW Management Server. This includes of course Kerberos Authentication working and the User Account with which you are running the tool needs to be a SM Admin Account.

This application needs these 4 mandatory parameters below to be passed and the format is "-PARAMETER:VALUE" with the "-" and the ":".
     -u: User Account which will be used for the DW in DOMAIN\USER format.
     -sm: Service Manager Workflow Management Server NetBios.
     -dw: Service Manager Data Warehouse Management Server NetBios.
     -a: "add" or "rem"

Use "add" as action (-a:add) if you want to try to re-create the core objects and relationships needed.
Use "rem" as action (-a:rem) if you want to try to delete the core objects and relationships needed and afterwards register normally from the SM Console.

Example:    SCSMRegisterDW.exe -u:CONTOSO\SMAdmin -sm:SMServer -dw:DWServer -a:rem

Run this tool at your own risk!

Make a full Backup of the ServiceManager and DWStagingAndConfig database before running the tool!

The best suggestion is to open a case with Microsoft Support before attempting to use the tool!

Happy Data Warehouse-ing! 😀

SCSMRegisterDW.zip

Update Rollup 4 for System Center 2012 R2 – important update for the Service Manager Data Warehouse

$
0
0

So, Update Rollup 4 for System Center 2012 R2 is out as you might know 😀

http://support.microsoft.com/kb/2992012

I encourage you to install UR4 for Service Manager because next to other fixes, there's a very important fix included here for the Data Warehouse.

This is part of the fixes and improvements we are trying to bring to the Data Warehouse: http://support.microsoft.com/kb/2989601

Data Warehouse Transform jobs have a hardcoded 60-minute time-out. Therefore, Data Warehouse jobs cannot be disabled for very long because the volume of data to be transformed can quickly pile up. This can cause an issue if the volume exceeds the amount that can be processed by the transform modules within the time-out period.

This fix included in UR4 has 2 parts: an improvement of an involved stored procedure which makes it be able to run faster + a new registry setting that can be created to be able to control the default timeout for the command, as well as changing the default from 60 to 180 seconds.

Happy updating! 😀

Service Manager Data Warehouse Troubleshooting

$
0
0

There are some great articles out there which talk about troubleshooting the Data Warehouse. In this article I will be going into explaining the internals of the Data Warehouse along with troubleshooting information.

It is important to first of all understand how the DW Jobs work. There are some Jobs that are only visible by looking into the database, others may be seen in PowerShell only and the most basic ones can be seen in the Console. From the perspective of the databases, the DW Jobs are actually called Processes. From now on, I will refer to the DW Jobs as Processes in this article.

The DW Processes are actually categorized like this:

  • Process Categories – this is top level way of categorizing Processes (Jobs)
  • Processes – these are the actual Processes that exist (ex. DWMaintenance, MPSyncJob, Extract, Transform, Load, etc.)
  • ProcessModules – each Process has several ProcessModules which it will actually execute in order to "finish" – there is also a certain order in which these will run depending on dependencies on one another and the level of the modules

Now what you need to know is that each Process will be executed via a Schedule which is actually a Rule (Workflow that runs on a schedule and starts its owning Process). When a Process is scheduled (for execution) it will actually get a new Batch created. Think of a Process as a class and of a Batch as an instance of that class that is actually doing something. It is the same for a ProcessModule, each will get a WorkItem when it is scheduled for execution. Because a Process can have from one to many ProcessModules, when a Batch is created, it also gets all the WorkItems associated with it (ProcessModules of the Process).

So, to recap, because we will be using these a lot in this article:

  • Batch = an instance of a Process (which in the Console or PowerShell is a Job)
  • WorkItem = an instance of a ProcessModule of a Process

Important to know and also the way the processing/execution will depend on and work, is that each Batch, as well as each WorkItem (of a Batch) will have a certain Status :

  • Success (1) – the Batch/WorkItem was successful and at that point the actual Status gets changed to Complete
  • Failed (2) – there was an error in the Batch/WorkItem and it will be retried on the next execution
  • Not Started (3) – the Batch/WorkItem was scheduled for execution and it will be executed on the next interval (schedule) – this is a good Status, it means there was no error and it will run the next time it needs to
  • Running (4) – the Batch/WorkItem is currently running
  • Stopped (5) – the Batch/WorkItem was stopped (by the user from the Console – Suspend or PowerShell Stop-SCDWJob)
  • Completed (6) – the Batch/WorkItem has successfully completed without any errors
  • Waiting (7) – the Batch/WorkItem is waiting for other Batches/WorkItems to finish first on which they are dependent

It is important to know that the actual Status is important for the WorkItems, because the Batch Status depends directly on the Statuses of its WorkItems – here are some examples:

  • A Batch can have the Status set to Running while all its WorkItems still have the Status of Not Started => this is not a bad thing, it just means that the certain Batch has received a "start request" from the schedule, but the (first) WorkItem(s) cannot run yet because they are waiting on another Batch/WorkItem (of a different Process) to finish first
  • If one of the WorkItems of a Batch will fail and have a Status of Failed, then the Batch itself will get the Status of Failed => all WorkItems need to complete successfully and thus have a Status of Completed for the Batch to also get the Status of Completed

When a Process gets created the first time (on installation or DW registration), a new Batch will be created with some BatchId (the newest one possible – starting from 1 – for the first Process). Then for each ProcessModule of this Process, a new WorkItem with a certain WorkItemId will get created for this corresponding Batch. This will all initially have the Status of Not Started and when the corresponding Schedule Rule will reach it schedule it will send a "start request" for it's Batch. This will set the Status of that Batch to Running. The actual way how these run is that there is a Rule associated with each Process and these Rules run every 30 seconds. As soon as such a Rule runs, if it sees that its corresponding Process has the (latest) Batch with a Status of Running, will take the first/next WorkItem that has a Status of Not Started (depending on dependencies and level) and will try to execute that WorkItem. If it can execute and it is successful, it will get the Status of Completed once it finishes. If it fails, then it will get the Status of Failed. It is important to know here, that it can and will get a Status of Failed also without there being an actual error – but rather it cannot start/run right now because a dependent WorkItem is not finished yet and thus the "error" is actually "waiting on workitems to finish".

Another very important thing to know is that there is also a certain synchronization method between the different Processes, so that they don't interfere with each others work. This is needed because there are some actions which change the database(s) schema, add/remove columns from tables, indexes, primary/foreign keys, etc. We would not want the ETL (Extract, Transform, Load) Processes to run and copy/transform data while we are doing such changes. Because of this, we have these implementations:

  • When the DWMaintenance and/or MPSyncJob Processes start running, the first thing they will do, is disable the ETL (Extract, Transform, Load) Processes => it is very important to note here that if either of the ETL Processes will be running when they will get disabled, these will *still* show the Status of Running, but in reality they will not run because they get disabled.
  • Because DWMaintenance and MPSyncJob Processes should also not be running at the same time, these use a locking mechanism – whichever of them gets this lock first will be allowed to run until it finishes (Status of Completed) while the other one will remain waiting for it to finish, even if the Status will still show as Running and even if it will *not* be directly disabled => there is a table in the DWStagingAndConfig database used for holding the lock which you can look at – it is called LockDetails.

I have talked about some "invisible" Jobs which can only be seen by querying the database. There are actually a lot of those, but the "real" way to talk about them, is to go back to the Process Categories. There is a very important Process Category here which needs to be mentioned, which is the Deployment category. This is invisible because the MPSyncJob actually takes care handing work over this this category. MPSyncJob will associate Management Packs from one data source to another (ex. from CMDB to the DW). Each Management Pack that got associated will get a new Job (Process) created for it which falls under the Deployment Process Category. We can also refer to this as the "Deployment Job" if you will – and if fact, this is how you would usually hear about this. These are responsible for *actually* deploying (or installing if you will) the Management Packs which were synchronized over – without this, nothing would work because we would have no extended information in the Data Warehouse about what Classes, Relationships, etc. we have in the CMDB (and other managed data sources which we can register to the Data Warehouse).

If either of the deployment "jobs" fail or are not yet finished, then you can be more or less sure that nothing will work properly and you will 99% get errors related to either of the ETL (Extract, Transform, Load) Processes and maybe not only those – depending on where deployment is at that point.

While any of the Deployment Processes will follow the same rules as the others, that is get a Batch and one to many WorkItems, it also has another way of actually making the deployment. Each such Deployment Process (for each Management Pack) will have a new DeploySequence created (with a DeploySequenceId). Because each Management Pack has one to many "items" (Classes, Relationships, etc.) which need to be deployed, for each DeploySequence we will have one to many DeployItems created. This is important information for when you get into the actual troubleshooting part in the database.

In case you are not familiar with the basics of troubleshooting the DW Jobs from PowerShell, I suggest first starting with this article: http://technet.microsoft.com/en-us/library/hh542403.aspx

Also, if you are not familiar with what the ETL (Extract, Transform, Load) Jobs are and how these work, I suggest reading this article as well: http://blogs.technet.com/b/servicemanager/archive/2009/06/04/data-warehouse-anatomy-of-extract-transform-load-etl.aspx

Remember to always start by checking if there are any Management Packs where they Deployment Status shows as Failed in the SvcMgr Console in the Date Warehouse tab under the Management Packs view. For any such Management Pack which is Failed, you should run the "Restart Deployment" task from the Console in the tasks pane. If you are lucky, it was just some timeout or deadlock and it will succeed this time. If not, then you can always see errors about this failure as well as any other failures in the DW Jobs by looking into the Operations Manager event log directly on the SvcMgr Data Warehouse Management Server (filter by sources: Data Warehouse and Deployment).

Additionally, you can try to force reset the DW Jobs and run them in a certain order by using the script from this article: http://blogs.technet.com/b/mihai/archive/2013/07/03/resetting-and-running-the-service-manager-data-warehouse-jobs-separately.aspx

The Data Warehouse database which we are interested in when troubleshooting the DW Jobs is the DWStagingAndConfig database. Here is a list of tables of interest when troubleshoot the DW Jobs and at the end, also a file attached with useful sql queries:

1. The Infra.ProcessCategory table is where all the existing Process Categories are stored. It has a column called IsEnabled which needs to be 1 in order for any Process under this Process Category to be able to run – this is only modified (0 and 1) by the DWMaintenance and/or MPSyncJobs when they run – this is how they disable the other Processes.

2. The Infra.Process table is where all existing Processes are available and classified on Process Categories via the ProcessCategoryId column. These also have a IsEnabled column which should always be 1 in order for them to be able to execute. The only reason why either of these would have IsEnabled set to 0 is if anyone explicitly disabled these by using the Disable-SCDWJob CMDLet, which should never be done. To disable a Process (Job) always disable only its Schedule by using Disable-SCDWJobSchedule. In the screenshot below, I have explicitly filtered for only the "common" Processes, but if you will query the entire table, you will see all of them.

3. The Infra.ProcessModule table is where all existing ProcessModules are located and are classified on each Process via the ProcessId column. That is what you should use in a where clause of a sql query to see all ProcessModules that belong to a certain Process. Here is an example in the screenshot below for all ProcessModules of the DWMaintenance Process. These are not "ordered" in this result – if you want to figure out the "order" in which they would run, you need to check out the ModuleLevel column along with each dependency for each module which can be seen in the Infra.ModuleTriggerCondition table.

4. The Infra.Batch table is where all the Batches will be found (current/previous and next – check out Infra.BatchHistory for a history of these). This is where you will be able to see the BatchIds for the Processes and you can view them for a specific Process by using the ProcessId in a sql where clause. In the example screenshot below, we can see the Batches of the time the screenshot was taken for the Process with ProcessId = 1 (which in this case is the DWMaintenance Process).

5. The Infra.WorkItem table is where all the WorkItems of a Batch will be found. The most important part here is that this table is where we can see any errors for failed WorkItems of a Batch (that is an instance of a certain Process). In the screenshot below, an example of the WorkItems of the Batch with BatchId 3098 that is an instance of the DWMaintenance Process.

6. The DeploySequence table where we have an entry for each DeploySequence (so each MP that is or was deployed). You should know that in this process, a staging table is used which is called DeploySequenceStaging and if deployment finished and was successful, then this staging table should be *empty*. Here is a screenshot (not all results) of how this looks like.

7. The DeployItems table where we store an entry for each DeploySequence (so each MP that is or was deployed). You should know that in this process, a staging table is used which is called DeployItemStaging and if deployment finished and was successful, then this staging table should be *empty*. Here is a screenshot with the list of DeployItems belonging to the DeploySequence of the System.WorkItem.Incident.Library Management Pack.

In addition to that, you can always use SQL Server Profiler to get detailed information about what queries are being executed, details into different errors that might happen and of course having the queries, also a possibility of understanding the "why" behind them. The most useful column on which you can filter on is the ApplicationName column a the trace. These are the various application names used by the DW Jobs processing modules:

  • DW Jobs scheduling, execution and processing: SC DAL–Orchestration and SC DAL–SCDW
  • DWMaintenance job: SC DAL–Maintenance
  • MPSync job: SC DAL–MP Sync
  • Extract jobs: SC DAL–SCDW Extract Module
  • Transform job: SC DAL–SCDW Transform Module
  • Load jobs: SC DAL–SCDW Load Module
  • Cube related jobs: Microsoft SQL Server Analysis Services (and general troubleshooting of SQL Analysis Services – Application event log and SQL Analysis Services tracing)

A cool template I usually use for the SQL Profiler Trace which is in 99% of the cases enough, has these settings (events and columns):

NOTE: Attached, you will also find a file (DWJobs_sql_queries.sql) that contains (commented) useful sql queries which you can and should use when troubleshooting the DW Jobs.

Good hunting! 🙂

DWJobs_sql_queries.sql


Improvements are here for the Data Warehouse and others in Update Rollup 5

$
0
0

Update Rollup 5 is here: http://support.microsoft.com/kb/3009517

Beside other cool fixes, this UR includes also some cool changes for the Data Warehouse component. These will make it much easier to troubleshoot possible issues that may appear.

This is part of our effort to improve the Data Warehouse and the Service Manager product in general. I hope you will like it and await more good news in the future 😀

INFOS:

 

Note The System Center 2012 R2 Service Manager Update Rollups are cumulative. Therefore, this update rollup contains new fixes for the following issues along with fixes that were included in System Center 2012 R2 Service Manager Update Rollup 4, Update Rollup 3, and Update Rollup 2.

  • Change request still stays "in progress" when last activity is skipped and all previous activities are completed.
  • ActualStartDate and ActualEndDate field values for Change Request and Release Request do not set.
  • Multiple System Center Service Manager Connectors to System Center Configuration Manager execute and complete successfully. However, the start, finish, and status property values are not updated in the Service Manager Console under the Connectors view, and event 3334 is logged.
  • Trying to delete a Service Request template that is being used by a Request Offering causes a cryptic and non-user-friendly error message.
  • The "Group By" functionality in a view does not work correctly when the Group By column has empty fields.
  • The console crashes when the user tries to open an already opened attachment from a Work Item form.
  • Opening the Views for Groups takes a long if there are a large number of explicit members in the groups.
  • The OM CI connector fails if you try to import a distributed application that contains an instance of the Hyper-V Virtual Network Adapter class.
  • Monitoring Host process crashes if there exists a notification after the user who created it was deleted or moved to a hidden organizational unit (OU) in Active Directory.
  • An exception is raised if a user tries to open a sorted view that is created by using TypeProjection on a custom class that is neither Abstract and nor a first Concrete class.
  • An exception is thrown when a user tries to open the Service Components tab on the Service Maps form when there are large number of services in CMDB.
  • Enhanced in-event logging for DW jobs.

    Logging the batch start and completion events for all DW job categories in event log. The start and completion event will include the following:

    • Process category
    • Process name
    • Batch ID
    • Batch start or completion time
  • Enhanced in event logging for DW cube processing.
    • Time taken by Cube’s batch ID to complete will be logged in seconds.
    • Information about whether the cube is processed under Analysis Services that are running in Microsoft SQL Server Standard Edition or in SQL Server Enterprise Edition.
    • Event logs will be added during processing of each Dimension and Measure Group for each partition (if applicable) for both enterprise and standard editions of SQL Server. 
  • Added the new Windows PowerShell cmdlet Get-SCDWInfraLocations on the Service Manager management server to retrieve the following location information about its data warehouse infrastructure:
    • Service Manager database
    • Service Manager data warehouse database
    • Service Manager Analysis server database
    • Data Warehouse Reporting Server database and Report Server URL

Free ebook: Microsoft System Center Operations Manager Field Experience

Improving Service Manager – we're not kidding!

$
0
0

Update Rollup 6 is out! 🙂

http://support.microsoft.com/kb/3039363

We are constantly improving SC Service Manager and we're not kidding about it 😉

The fixes and performance improvement in Update Rollup 6 are a proof of this!

Great job guys!

Enjoy!

Troubleshooting LFX Connectors in Service Manager – the SCCM Connector where LastInventoryDate is not getting updated

$
0
0

Hi everyone! 😀 It’s been a while since my last post. I’ve been pretty busy with migrating to the cloud, hehe

I do have some posts however, which I wanted to write and never got the chance to yet. So, let’s continue with the articles – this is one of them about troubleshooting LFX Connectors in Service Manager.

Enjoy!

 

First, try to concentrate only on specific stuff (if there is more than 1 issue, we need to take them one after the other):

  • Choose a single Computer that has the issue for which data should get imported / updated from SCCM through the SCCM Connector
  • Choose a specific type of data (field) – here we may start with LastInventoryDate not getting updated (focusing on the Computer we selected)

You can also have a look at this article which explains a little bit about how LFX Connectors work and check out if BatchSize(es) for the Data that is missing is high enough in relation to how much data is getting imported:

he different types of data import template definitions for the SCCM 2012+ Connector are defined in the Microsoft.EnterpriseManagement.ServiceManager.Connector.Sms2011 MP and you can export this MP using this query to understand the definitions:

select MPName, convert(xml, MPXML)
from ManagementPack
where MPName = 'Microsoft.EnterpriseManagement.ServiceManager.Connector.Sms2011'

All this, is stored in the LFX.DataTable with specific DataName – here, we will have LFX Staging Table and View information together with the Queries used to get the data from the direct source (here SCCM site database) and afterwards from the LFX Staging Tabled, import it into the SM (CMDB) Instance Space (tables) => BME/TME, MT_ClassName and Relationship tables.

Here is a Query to have a general overview of the information needed (for now, for all the DataTables of the SCCM 2012+ Connector):

select *
from LFX.DataTable
where DataName like '%CMv5_%'

Notice how some DataNames have a “Cached_” prefix in their name. This is because:

  • The ones without the “Cached_” prefix are the ones which are being used by the Data Provider (gets the data directly from the custom source – in this case, the SCCM site database)
  • The ones that have “Cached_” prefix are the ones used by the Data Consumer that will get the already imported data (from the LFX Staging Tables) and import it into the instance space (only then will it be visible in the Console)

 

This is important because if you look at the QueryString field, you can see the SQL Query that will be executed to get that data.

 

Let’s search for the type of data we are interested in (LastInventoryDate) directly in the XML definition of the MP and we find it here:

<Object Path="$Context/Path[Relationship='LFX!System.LinkingFramework.ConnectorEmbedsTables' SeedRole='Source' TypeConstraint='LFX!System.LinkingFramework.DataTable']$">
       <Property Path="$Context/Property[Type='LFX!System.LinkingFramework.DataTable']/DataName$">Cached_CMv5_LogicalComputers</Property>
       <Property Path="$Context/Property[Type='LFX!System.LinkingFramework.DataTable']/WatermarkField$">E.Lfx_Timestamp</Property>
       <Property Path="$Context/Property[Type='LFX!System.LinkingFramework.DataTable']/WatermarkType$">0</Property>
       <Property Path="$Context/Property[Type='LFX!System.LinkingFramework.DataTable']/BatchIdField$">E.Lfx_RowId</Property>
       <Property Path="$Context/Property[Type='LFX!System.LinkingFramework.DataTable']/BatchIdType$">0</Property>
       <Property Path="$Context/Property[Type='LFX!System.LinkingFramework.DataTable']/BatchIdSize$">500</Property>
       <Property Path="$Context/Property[Type='LFX!System.LinkingFramework.DataTable']/UseCache$">false</Property>
       <Property Path="$Context/Property[Type='LFX!System.LinkingFramework.DataTable']/QueryString$">
                                         SELECT E.Lfx_RowId,
                                                       E.Lfx_SourceID,
                                                       E.Lfx_Status,
                                                       E.DisplayName,
                                                       E.PrincipalName,
                                                       E.NetbiosComputerName,
                                                       E.NetbiosDomainName,
                                                       E.OffsetInMinuteFromGreenwichTime,
                                                       E.IsVirtualMachine,
                                                       E.LastInventoryDate,
                                                       E.ActiveDirectorySite
                                         from [LFXSTG].v_Cached_CMv5_LogicalComputers E
                                  </Property>
       <Property Path="$Context/Property[Type='LFX!System.LinkingFramework.DataTable']/CollectionName$">Cached_CMv5_LogicalComputers</Property>
       <Property Path="$Context/Property[Type='LFX!System.LinkingFramework.DataTable']/DependOnDataTable$">Cached_CMv5_PhysicalComputers</Property>
</Object>

We can figure out from here, that the LastInventoryDate value is coming from the v_Cached_CMv5_LogicalComputers View (notice the v_ at the beginning of the name).

What we can also figure out already from the definition above, is that this depends on another view called Cached_CMv5_PhysicalComputers (just add a “v_” as prefix if you want to query it – and don’t forget that it belongs to the LFXSTG schema).

 

So, let’s have a look from SQL Management Studio at the definition of this view v_Cached_CMv5_LogicalComputers:

CREATE VIEW [LFXSTG].[v_Cached_CMv5_LogicalComputers] AS
SELECT S.Lfx_RowId,
        S.Lfx_SourceID,
        S.Lfx_Timestamp,
        CCX.Lfx_Status,
        CCX.Name0 AS 'DisplayName',
        COALESCE(CCX.Name0, S.Netbios_Name0)
              + '.' + COALESCE(CCX.Domain0, S.Resource_Domain_OR_Workgr0) AS 'PrincipalName',
        S.Netbios_Name0 AS 'NetbiosComputerName',
        S.Resource_Domain_OR_Workgr0 AS 'NetbiosDomainName',
        CCX.CurrentTimeZone0 AS 'OffsetInMinuteFromGreenwichTime',
        CCX.IsVirtualMachine,
        W.LastHWScan AS 'LastInventoryDate',
        S.AD_Site_Name0 AS 'ActiveDirectorySite'
FROM LFXSTG.CMv5_SYSTEM S
CROSS APPLY
LFXSTG.fn_CheckCMv5CachedComputers(S.Lfx_SourceID, S.ResourceID, S.SMS_Unique_Identifier0, S.Lfx_Status) AS CCX
    LEFT JOIN LFXSTG.CMv5_WORKSTATION_STATUS W
        ON S.ResourceID = W.ResourceID AND S.Lfx_SourceId = W.Lfx_SourceId
WHERE S.Netbios_Name0 IS NOT NULL
    AND S.Resource_Domain_OR_Workgr0 IS NOT NULL
    AND CCX.Lfx_Status != 'I'

So now we figure out that we get the LastInventoryDate value from LastHWScan field of the LFXSTG.CMv5_WORKSTATION_STATUS table.

 

This also tells us, that this is the Data Consumer part (notice the “Cached_” prefix in the DataName). So how are we getting this data in the table?

The table name is CMv5_WORKSTATION_STATUS and so we can look for the QueryString in the LFX.DataTable by its name:

select *
from LFX.DataTable
where DataTableName = 'CMv5_WORKSTATION_STATUS'

So here we are, this is the Query we are executing directly on the SCCM site database to get the data from:


SELECT
       S.ChangeAction as Lfx_Status,
       S.ResourceID,
       S.BatchingKey,
       S.GroupKey,
       S.TimeStamp,
       S.LastHWScan,
       S.SystemDefaultLCID,
       S.TimezoneOffset,
       S.LastReportVersion
FROM SCCM_Ext.vex_GS_WORKSTATION_STATUS S
INNER JOIN SCCM_Ext.vex_FullCollectionMembership CM
ON S.ResourceID = CM.ResourceID
INNER JOIN SCCM_Ext.vex_Collection C
ON C.CollectionID = CM.CollectionID
WHERE S.LastHWScan IS NOT NULL
AND C.ChangeAction = 'U' and CM.ChangeAction = 'U'
AND $COLLECTIONLIST
ORDER BY S.rowversion

We can see that we get the LastHWScan value from the SCCM_Ext.vex_GS_WORKSTATION_STATUS table/view from the SCCM site database.

  • Notice $COLLECTIONLIST here – this is replaced with the SCCM Site Collection filters that you have configured in your SCCM Connector.

 

Data to collect while we let the affected SCCM Connector (responsible for the chosen Computer) run until it finishes, would be:

  • SQL Profiler trace on the SCCM Site database
  • SQL Profiler trace on the ServiceManager database
  • SM ETW Trace on LFX Connectors: cmd -l VER -a CON

 

In the SQL Profiler Trace of the SCCM Site database, look for the query that we have identified that gets the data – and ideally, run it manually to see what data you get as result from the SCCM Site database.

After this, look in the SQL Profiler Trace of the ServiceManager database and look at the other query that we identified that gets the data from the LFXSTG tables/views (as explained/identified above).

 

Identify the ResourceId of the Computer you are looking for and you can check if it is being synchronized by which and what Connectors using this query (on the ServiceManager database):

select
       wss.Lfx_RowId,
       wss.Lfx_Timestamp,
       ds.DisplayName,
       wss.Lfx_Status,
       wss.ResourceID,
       wss.BatchingKey,
       wss.GroupKey,
       wss.TimeStamp,
       wss.LastHWScan,
       wss.SystemDefaultLCID,
       wss.LastReportVersion
from LFXSTG.CMv5_WORKSTATION_STATUS wss
join LFX.DataSource ds
       on wss.Lfx_SourceId = ds.DataSourceId
where wss.ResourceId in (RES_ID_1, RES_ID_2, RES_ID_3, RES_ID_ETC)

Now another thing we may want to know about (as you might have noticed from the SQL Profiler queries and Batch/WaterMark fields in the LFX.DataTable), is that we import data in batches and that we also use a WaterMark field (defined in the LFX.DataTable) to keep track of where we left of the on the previous time to only get new data on next runs.

 

We are keeping track of the WaterMark for each DataName in the LFX.ClientWorkTable table:

select
       cwt.ClientWorkTableId as [WorkTableId],
       cwt.Watermark as [WaterMark],
       cwt.LastSyncTime as [LastSyncTime],
       ds.DataSourceName as [ConnectorName],
       ds.DisplayName as [ConnectorDisplayName],
       dt.DataName as [DataName],
       dt.DataTableName as [DataTableName],
       (
              case
                     when dt.WatermarkType = -1 then 'None'
                     when dt.WatermarkType = 0 then 'DateTime'
                     when dt.WatermarkType = 1 then 'Timestamp'
                     when dt.WatermarkType = 2 then 'Number'
                     when dt.WatermarkType = 3 then 'Number'
              end
       ) as [WaterMarkType]
from LFX.ClientWorkTable as cwt
inner join LFX.DataSource as ds
       on ds.DataSourceId = cwt.DataSourceId
inner join LFX.DataTable as dt
       on dt.DataTableId = cwt.DataTableId

To reset the WaterMark for the CMv5_WORKSTATION_STATUS DataName for example, so that the SCCM Connector will try to resynchronize all the data type available in SCCM again, we can run this query:

  • Repace “AD_CONNECTOR_DSIPLAYNAME” with the DisplayName of the (SCCM) Connector which you are interested in (as seen from the results of the previous query).
  • Pay attention to the WaterMarkType and based on that, understand what type of value we need to set in order for the WaterMark to be set to “0” and try to re-sync everything related to this type of data (in this example CMv5_WORKSTATION_STATUS)
update cwt
set cwt.Watermark = 0x00000000
from LFX.ClientWorkTable as cwt
inner join LFX.DataSource as ds
       on ds.DataSourceId = cwt.DataSourceId
inner join LFX.DataTable as dt
       on dt.DataTableId = cwt.DataTableId
where
       ds.DisplayName = 'REPLACE_WITH_CONNECTOR_DSIPLAYNAME_STRING_HERE' and
       dt.DataName = 'CMv5_WORKSTATION_STATUS'

Export and Import data of a MP in Service Manager when the MP needs to be removed as it cannot be upgraded (breaking change)

$
0
0

Ever authored a Management Pack and after it is in production for quite some time, you figured out that you need to make a breaking change – one that will require you to remove the MP first as it cannot be upgraded? Such a scenario would be needed if for some reason, you want to delete a Property of a Class for some reason.

But wait … if we remove the MP, doesn’t that mean that all the data in the database (instances & relationships) related to this MP, will be deleted? … Yup, unfortunately it does …

So, here is a PowerShell script that can be executed on a SM Management Server, which gets the (internal) name of the MP you need to remove and which will export (in memory) all the data (instances & relationships) related to the MP. It will then stop and wait for your key-press to continue.

Once it is waiting, you can delete the MP and re-import the updated version of the MP that has the change you made.

At this point, you can press any key in the PowerShell window where the script is waiting and it will start to create all instances & relationships that existed prior to removing the MP.

 

NOTE: Please be aware that I have tested only SOME scenarios. It might be that this will either fail, or not be able to re-create everything properly in who-knows-what-corner-case-scenarios. Please try to use and test this in pre-production if possible, and ALWAYS take a FRESH Full Backup of the ServiceManager database before attempting this in production.

 

$mpName = "MPClassesHostingTest" # internal MP Name goes here
$message = "Perform the needed actions here (remove MP, re-import, etc.). After finishing the actions, press any key to continue."
$props = Get-ItemProperty "HKLM:\SOFTWARE\Microsoft\System Center\2010\Common\Setup"
$instdir = $props.InstallDirectory
Add-Type -Path "$instdir`SDK Binaries\Microsoft.EnterpriseManagement.Core.dll"
$dataMap = @{}
$mp = Get-SCManagementPack -Name $mpName
$classList = Get-SCClass -ManagementPack $mp | ? { $_.Singleton -eq $false -and $_.Abstract -eq $false }
foreach($class in $classList) {
    $instanceList = Get-SCClassInstance -Class $class
    foreach($instance in $instanceList) {
        $object = @{
            instance = $instance;
            rels = @{
                source = Get-SCRelationshipInstance -SourceInstance $instance | ? { $_.IsDeleted -eq $false -and $_.Name -ne "System.Hosting" };
                target = Get-SCRelationshipInstance -TargetInstance $instance | ? { $_.IsDeleted -eq $false -and $_.Name -ne "System.Hosting" };
            };
        }
        $dataMap.Add($instance.EnterpriseManagementObject.Id, $object)
    }
}
if($psISE) {
    Add-Type -AssemblyName System.Windows.Forms
    [System.Windows.Forms.MessageBox]::Show("$message")
} else {
    Write-Host "$message" -ForegroundColor Yellow
    $x = $Host.UI.RawUI.ReadKey("NoEcho,IncludeKeyDown")
}
# Create all objects first so that we are sure we can use them for relationships afterwards
# These need to be all there created first in order (non-Hosted first & Hosted after)
foreach($key in $($dataMap.Keys)) {
    $entry = $dataMap.Item($key)
    $class = Get-SCClass -Name $entry.instance.EnterpriseManagementObject.GetClasses().Name
    $props = @{}
    if($class.Hosted -eq $true) {
        # need to handle Hosted classes differently (cannot use New-SCClassInstance cmdlet for various corner-cases)
        # handling with CreatableEnterpriseManagementObject instead (need to also create parent-Hosting class together with it)
        $hostingClass = $class.GetParentClasses([Microsoft.EnterpriseManagement.Configuration.DerivedClassTraversalDepth]::Recursive, $hostingRel, [Microsoft.EnterpriseManagement.Common.TraversalDepth]::OneLevel).Item(0)
        $hostingClassKey = $hostingClass.GetKeyProperties().Item(0)
        $object = New-Object -TypeName Microsoft.EnterpriseManagement.Common.CreatableEnterpriseManagementObject($emg, $hostingClass)
        $object.Item($hostingClass, $hostingClassKey.Name).Value = $entry.instance.$($hostingClassKey.Name)
        $object.Commit()
        $object = New-Object -TypeName Microsoft.EnterpriseManagement.Common.CreatableEnterpriseManagementObject($emg, $class)
        $object.Item($hostingClass, $hostingClassKey.Name).Value = $entry.instance.$($hostingClassKey.Name)
        foreach($property in $class.GetProperties([Microsoft.EnterpriseManagement.Configuration.BaseClassTraversalDepth]::Recursive)) {
            if($entry.instance.$($property.Name) -ne $null) {
                $object.Item($class, $property.Name).Value  = $entry.instance.$($property.Name)
            }
        }
        $object.Commit()
        $entry.instance = $object
    } elseif((Get-SCClassInstance -Id $entry.instance.EnterpriseManagementObject.Id) -eq $null) {
        foreach($property in $class.GetProperties([Microsoft.EnterpriseManagement.Configuration.BaseClassTraversalDepth]::Recursive)) {
            if($entry.instance.$($property.Name) -ne $null) {
                $props.Add($property.Name, $entry.instance.$($property.Name))
            }
        }
        $entry.instance = New-SCClassInstance -Class $class -Property $props -PassThru
    }
    $dataMap.Set_Item($key, $entry)
}
# Now that we know we have all objects created, we can safely create the Relationships
foreach($key in $dataMap.Keys) {
    $entry = $dataMap.Item($key)
    foreach($rel in $entry.rels.Item("source")) {
        $relType = Get-SCRelationship -Id $rel.RelationshipId
        $target = Get-SCClassInstance -Id $rel.TargetObject.Id
        if($target -ne $null) {
            New-SCRelationshipInstance -RelationshipClass $relType -Source $entry.instance -Target $target
        }
    }
    foreach($rel in $entry.rels.Item("target")) {
        $relType = Get-SCRelationship -Id $rel.RelationshipId
        $source = Get-SCClassInstance -Id $rel.SourceObject.Id
        if($source -ne $null) {
            New-SCRelationshipInstance -RelationshipClass $relType -Source $source -Target $entry.instance
        }
    }
}

Transforming Incidents to other WorkItem types via Console Tasks in Service Manager (SCSM Dev custom Solutions)

$
0
0

The entire solution (source code) is available here: https://github.com/SubZer0MS/TransformWorkItemTasks

I have been asked to create a solution for Service Manager 2012 R2 that contains Console Tasks that allow users us to transform an Incident into a different WorkItem type. In this example, we are going to have 4 Console Tasks, for each of the other 4 WorkItem types beside Incident (Service Request, Change Request, Release Record, Problem). This way, we can assign permissions only to some or only for the one needed for a specific role like (ex. custom – derived from Service Request Analyst role).

This is a pretty nifty example that shows how such a thing can be done and that can be directly used, or stand as a base solution that can be changed or extended as needed 😀

This solution will do the following (based on the Task being executed) – example for Service Request (Transform to SR” Task):

  1. Create a new Service Request using the Default Service Request Template (uses the default/standard Template for each WorkItem type)
  2. Add the Title & Description of the Incident to the Service Request
  3. Add the common WorkItem relationships to the Service Request & delete these relationships from the Incident (less data in DB, better performance)
  4. Add the Incident to the Service Request as related WorkItem (WorkItemRelatesToWorkItem)
  5. Create a new AnalystCommentLog and add it to the Incident as a related AnalystCommentLog containing a comment that the Incident was closed by this transform Task and also containing the new Service Request ID
  6. Close the Incident (set the Status to Closed)

As for permissions needed, it either needs to be run by users of the Advanced Operators role, or this can be configured granular and would need the following permissions:

  • close Incidents and add comment on closing (setting Status to Closed, creating AnalystCommentLog and adding it to the Incident as related AnalystCommentLog)
  • creating and editing the WorkItem of the specific Task being executed (ex. if the “Transform to SR” is being called for example, then Create/Edit permissions are needed for Service Requests)

 

The Management Pack Bundle that contains the solution and can be directly imported into Service Manager 2012 R2 is here: TransformWorkItemTasks.mpb (zip)

The entire solution (source code) is available here: https://github.com/SubZer0MS/TransformWorkItemTasks

 

Let’s take a little tour on this with an example of one of the Tasks (let’s continue with the Transform to SR” Task as I have been using it as example up to this point anyway).

All the needed references are already included in the solution folders, so no need to copy/add any DLLs or MPs that are referenced. It is however, a good thing to look at, in order to see what references are being used.

This is how a new Console Task should be declared (under <Presentation> node, under <ConsoleTasks>):

<ConsoleTask ID="TransformIncidentToServiceRequest" Accessibility="Public" Enabled="true" Target="Incident!System.WorkItem.Incident" RequireOutput="false">
  <Assembly>Console!SdkDataAccessAssembly</Assembly>
  <Handler>Microsoft.EnterpriseManagement.UI.SdkDataAccess.ConsoleTaskHandler</Handler>
  <Parameters>
    <Argument Name="Assembly">CustomTransformWorkItemTasks</Argument>
    <Argument Name="Type">CustomTransformWorkItemTasks.TransformTaskHandler</Argument>
    <Argument>Service</Argument>
  </Parameters>
</ConsoleTask>

By setting the Target to System.WorkItem.Incident, we tell SM to display this Task in views where we are displaying Incidents (when selecting an Incident).

We are declaring that the Task Handler will be the default Microsoft.EnterpriseManagement.UI.SdkDataAccess.ConsoleTaskHandler, and we are passing 3 arguments to this Method:

  1. Assembly (needed named argument that allows the Handler to know what DLL it needs to use for this): CustomTransformWorkItemTasks
  2. Type (needed named argument that tells the Handler which is the class from our DLL that is handling this): CustomTransformWorkItemTasks.TransformTaskHandler
  3. A 3rd unnamed argument, which is actually the 1st argument that we are passing to our method (CustomTransformWorkItemTasks.TransformTaskHandler) which is a simple string for which we are manually checking if it is set in our method: Service

In the DLL we are developing for this, we are defining our Handler like this:

  • note that the (main) namespace is called CustomTransformWorkItemTasks like our DLL will be called as well
  • also notice, that we define a class in this namespace called TransformTaskHandler that extends the (SM build-in Microsoft.EnterpriseManagement.UI.SdkDataAccess.ConsoleCommand class)
namespace CustomTransformWorkItemTasks
{
    public class TransformTaskHandler : ConsoleCommand
    {

The build-in method ExecuteCommand of the ConsoleCommand class from which we are inheriting from, which will get executed on initialization, is called , so we have to define that in our class and handle the arguments (in this example the only one argument with the value: Service) we pass from the MP there:

public override void ExecuteCommand(IList nodes, NavigationModelNodeTask task, ICollection parameters)
{

In this method, we decide what we do based on what argument we have sent from the Task (in the MP) – in this case we only pass the argument Service:

if (parameters.Contains("Service"))
{
    // do stuff here - the actual work you need to be done in your task when this argument is passed by the Task
    // in this case, I am setting some variables that decide what I will create later in the actual processing based on the WorkItem type (check the entire source code on GitHub)
}

This is the part does the actual magic. Notice the foreach block, we are going through all the selected Incidents here, that were selected when running the Task in the Console by enumerating through the nodes list passed as argument to the ExecuteCommand method:

  • notice that we are using EnterpriseManagementObjectProjection with Class Type constructor in order to be able to create Class & Relationship objects together
  • we are just adding the Relationships we create to the class object, by adding it to the EnterpriseManagementObjectProjection class using the Add(…) Method
  • in order to “really” save/create the Class & Relationship objects of this projection into the CMDB, which we are currently only having in memory, we now need to call Overwrite() Method on the projection
  • when we create the AnalystCommentLog, we are using CreatableEnterpriseManagementObject because it’s an object that does not exist yet and we are just creating it, but we don’t need to use a projection (EnterpriseManagementObjectProjection) because we will not create & add any Relationship to it (we are just adding the newly object itself as a related object to the new Service Request)
  • all the MP, Class, etc. names are defined in a separate CS file called Constants when different classes are defined – check the actual source code on GitHub to understand those
try
{
    ManagementPack workItemMp = emg.GetManagementPack(workItemMpName, Constants.mpKeyTocken, Constants.mpSMR2Version);
    ManagementPack mpSettings = emg.GetManagementPack(workItemSettingMpName, Constants.mpKeyTocken, Constants.mpSMR2Version);
    ManagementPack knowledgeLibraryMp = emg.GetManagementPack(ManagementPacks.knowledgeLibrary, Constants.mpKeyTocken, Constants.mpSMR2Version);

    ManagementPackClass workItemClass = emg.EntityTypes.GetClass(workItemClassName, workItemMp);
    ManagementPackClass workItemClassSetting = emg.EntityTypes.GetClass(workItemSettingClassName, mpSettings);
    EnterpriseManagementObject generalSetting = emg.EntityObjects.GetObject(workItemClassSetting.Id, ObjectQueryOptions.Default);

    foreach (NavigationModelNodeBase node in nodes)
    {
        IList bmeIdsList = new List();
        bmeIdsList.Add(new Guid(node[Constants.nodePropertyId].ToString()));

        ObjectProjectionCriteria incidentObjectProjection = new ObjectProjectionCriteria(incidentProjection);
        ObjectQueryOptions queryOptions = new ObjectQueryOptions(ObjectPropertyRetrievalBehavior.All);
        queryOptions.ObjectRetrievalMode = ObjectRetrievalOptions.Buffered;
        IObjectProjectionReader incidentReader = emg.EntityObjects.GetObjectProjectionReader(incidentObjectProjection, queryOptions);
        incidentReader.PageSize = 1;

        EnterpriseManagementObjectProjection incident = incidentReader.GetData(bmeIdsList).FirstOrDefault();

        EnterpriseManagementObjectProjection workItem = new EnterpriseManagementObjectProjection(emg, workItemClass);

        if (!string.IsNullOrEmpty(workItemTemplateName))
        {
            ManagementPackObjectTemplateCriteria templateCriteria = new ManagementPackObjectTemplateCriteria(string.Format("Name = '{0}'", workItemTemplateName));
            ManagementPackObjectTemplate template = emg.Templates.GetObjectTemplates(templateCriteria).FirstOrDefault();

            if(template != null)
            {
                workItem.ApplyTemplate(template);
            }
        }

        workItem.Object[workItemClass, WorkItemProperties.Id].Value = generalSetting[workItemClassSetting, workItemSettingPrefixName] + Constants.workItemPrefixPattern;
        workItem.Object[workItemClass, WorkItemProperties.Title].Value = string.Format("{0} ({1})", incident.Object[incidentClass, WorkItemProperties.Title].Value, incident.Object[incidentClass, WorkItemProperties.Id].Value);
        workItem.Object[workItemClass, WorkItemProperties.Description].Value = incident.Object[incidentClass, WorkItemProperties.Description].Value;

        ManagementPackRelationship workItemToWorkItemRelationshipClass = emg.EntityTypes.GetRelationshipClass(RelationshipTypes.workItemRelatesToWorkItem, wiLibraryMp);
        workItem.Add(incident.Object, workItemToWorkItemRelationshipClass.Target);

        CreatableEnterpriseManagementObject analystComment = new CreatableEnterpriseManagementObject(emg, analystCommentClass);
        analystComment[analystCommentClass, AnalystCommentProperties.Id].Value = Guid.NewGuid().ToString();
        analystComment[analystCommentClass, AnalystCommentProperties.Comment].Value = string.Format(Constants.incidentClosedComment, workItemClass.Name, workItem.Object.Id.ToString());
        analystComment[analystCommentClass, AnalystCommentProperties.EnteredBy].Value = EnterpriseManagementGroup.CurrentUserName;
        analystComment[analystCommentClass, AnalystCommentProperties.EnteredDate].Value = DateTime.Now.ToUniversalTime();

        incident.Object[incidentClass, IncidentProperties.Status].Value = incidentClosedStatus.Id;
        incident.Object[incidentClass, IncidentProperties.ClosedDate].Value = DateTime.Now.ToUniversalTime();

        ManagementPackRelationship incidentHasAnalystCommentRelationshipClass = emg.EntityTypes.GetRelationshipClass(RelationshipTypes.workItemHasAnalystComment, wiLibraryMp);
        incident.Add(analystComment, incidentHasAnalystCommentRelationshipClass.Target);

        IList relationshipsToAddList = new List()
        {
            workItemToWorkItemRelationshipClass,
            emg.EntityTypes.GetRelationshipClass(RelationshipTypes.workItemHasCommentLog, wiLibraryMp),
            emg.EntityTypes.GetRelationshipClass(RelationshipTypes.createdByUser, wiLibraryMp),
            emg.EntityTypes.GetRelationshipClass(RelationshipTypes.affectedUser, wiLibraryMp),
            emg.EntityTypes.GetRelationshipClass(RelationshipTypes.assignedToUser, wiLibraryMp),
            emg.EntityTypes.GetRelationshipClass(RelationshipTypes.workItemHasAttachment, wiLibraryMp),
            emg.EntityTypes.GetRelationshipClass(RelationshipTypes.workItemAboutConfigItem, wiLibraryMp),
            emg.EntityTypes.GetRelationshipClass(RelationshipTypes.workItemRelatesToConfigItem, wiLibraryMp),
            emg.EntityTypes.GetRelationshipClass(RelationshipTypes.entityToArticle, knowledgeLibraryMp)
        };

        foreach (ManagementPackRelationship relationship in relationshipsToAddList)
        {
            if (incident[relationship.Target].Any())
            {
                foreach (IComposableProjection itemProjection in incident[relationship.Target])
                {
                    workItem.Add(itemProjection.Object, relationship.Target);
                    itemProjection.Remove();
                }
            }

            if(incident[relationship.Source].Any())
            {
                foreach (IComposableProjection itemProjection in incident[relationship.Source])
                {
                    workItem.Add(itemProjection.Object, relationship.Source);
                    itemProjection.Remove();
                }
            }
        }

        incident.Overwrite();

        try
        {
            workItem.Overwrite();
        }
        catch (Exception ex)
        {
            ManagementPackEnumerationCriteria incidentActiveEnumCriteria = new ManagementPackEnumerationCriteria(string.Format("Name = '{0}'", EnumTypes.incidentStatusActive));
            ManagementPackEnumeration incidentActiveStatus = emg.EntityTypes.GetEnumerations(incidentActiveEnumCriteria).FirstOrDefault();

            incident.Object[incidentClass, IncidentProperties.Status].Value = incidentActiveStatus.Id;
            incident.Object[incidentClass, IncidentProperties.ClosedDate].Value = null;

            incident.Overwrite();

            throw ex;
        }


    }

    RequestViewRefresh();
}
catch(Exception ex)
{
    MessageBox.Show(string.Format("Error: {0}: {1}\n\n{2}", ex.GetType().ToString(), ex.Message, ex.StackTrace));
}

 

Have fun coding! 😀

SCCM Server hangs due to memory pressure (Kernel Cache Manager)

$
0
0

Got a request the other day to have a look at a full memory dump of a SCCM server that was hanging due to physical memory pressure (has 32GB RAM installed).

As it is the case usually in these situations, the issue was not caused by any SCCM component, but rather something else 😉

It’s a good opportunity though to discuss a little bit about the Windows Kernel Cache Manager. Everything will be done with Public Symbols as usual and also the heavy use of the MEX Debugger Extension (showing it’s power as well 😀 ).

Note: Even if I am using Public Symbols, I am using WinDe dbg extension which is available for some customers under NDA. Some MEX dbg extensions also will not work because they rely on WinDe dbg extension. One of these commands is !mex.mem. I will use this one for simplicity’s sake as the output is structured and easy to see. If these are not available, you will need to use !kdexts.memusage or even better, the !kdexts.vm command. I will use WinDe/MEX because the output is much nicer to see :p But, you can achieve the same with the build-in commands mentioned.

 

First pf all, let’s start by checking the memory status of the server – we can do this by running !mex.mem

5: kd> !mex.mem
Page File: \??\C:\pagefile.sys
Current: 10794940 Kb Free Space: 6592416 Kb ( 10.29 GB)
Minimum: 10794940 Kb Maximum: 10794940 Kb ( 10.29 GB)

Physical Memory: 8385237 ( 33540948 Kb) ( 31.99 GB)
Available Pages: 27177 ( 108708 Kb) ( 106.16 MB)
ResAvail Pages: 8193973 ( 32775892 Kb) ( 31.26 GB)
Locked IO Pages: 0 ( 0 Kb) ( 0)
Free System PTEs: 33514422 ( 134057688 Kb) ( 127.85 GB)
Modified Pages: 4073135 ( 16292540 Kb) ( 15.54 GB)
Modified PF Pages: 3 ( 12 Kb) ( 12.00 KB)
Modified No Write Pages: 0 ( 0 Kb) ( 0)
NonPagedPool 0: 8326 ( 33304 Kb) ( 32.52 MB)
NonPagedPool 1: 6844 ( 27376 Kb) ( 26.73 MB)
NonPagedPool 2: 0 ( 0 Kb) ( 0)
NonPagedPool 3: 0 ( 0 Kb) ( 0)
NonPagedPool Usage: 36694 ( 146776 Kb) ( 143.34 MB) Current pool size: 981.94 MB
NonPagedPool Max: 6263793 ( 25055172 Kb) ( 23.89 GB)
PagedPool 0: 107157 ( 428628 Kb) ( 418.58 MB)
PagedPool 1: 12274 ( 49096 Kb) ( 47.95 MB)
PagedPool 2: 6081 ( 24324 Kb) ( 23.75 MB)
PagedPool 3: 0 ( 0 Kb) ( 0)
PagedPool 4: 38 ( 152 Kb) ( 152.00 KB)
PagedPool Usage: 125550 ( 502200 Kb) ( 490.43 MB) WorkingSet: 286.58 MB, PeakWorkingSet: 517.27 MB
PagedPool Maximum: 33554432 ( 134217728 Kb) ( 128.00 GB)
Processor Commit: 3751 ( 15004 Kb) ( 14.65 MB)
Session Commit: 12901 ( 51604 Kb) ( 50.39 MB)
Syspart SharedCommit 0
Shared Commit: 17255 ( 69020 Kb) ( 67.40 MB)
Special Pool: 0 ( 0 Kb) ( 0)
Kernel Stacks: 13455 ( 53820 Kb) ( 52.56 MB)
Pages For MDLs: 1859 ( 7436 Kb) ( 7.26 MB)
Pages For AWE: 0 ( 0 Kb) ( 0)
NonPagedPool Commit: 0 ( 0 Kb) ( 0)
PagedPool Commit: 125678 ( 502712 Kb) ( 490.93 MB)
Driver Commit: 4240 ( 16960 Kb) ( 16.56 MB)
Boot Commit: 0 ( 0 Kb) ( 0)
System PageTables: 0 ( 0 Kb) ( 0)
VAD/PageTable Bitmaps: 3882 ( 15528 Kb) ( 15.16 MB)
ProcessLockedFilePages: 0 ( 0 Kb) ( 0)
Pagefile Hash Pages: 0 ( 0 Kb) ( 0)
Sum System Commit: 183021 ( 732084 Kb) ( 714.93 MB)
Total Private: 5012200 ( 20048800 Kb) ( 19.12 GB)
Misc/Transient Commit: 170655 ( 682620 Kb) ( 666.62 MB)
Committed pages: 5365876 ( 21463504 Kb) ( 20.47 GB)
Commit limit: 11083507 ( 44334028 Kb) ( 42.28 GB)

*** Low Available Pages (Standby+Zeroed+Free). This can cause performance issues. ***

Virtual Memory Physical Memory File Cache Cache Writes

Notice these 2 interesting things from the output:

  • Available Pages: 27177 ( 108708 Kb)   ( 106.16 MB)           => so we only have 106.16 MB free physical memory (pages) …
  • Modified Pages: 4073135 ( 16292540 Kb)   ( 15.54 GB)      => from 32 GB total physical memory, we have 15.54 GB (half of the memory) in Modified Pages

What is the MiModifiedPageWriter (Modified Page Writer) thread (running in System context) doing – is it blocked by anything?

To switch to the System process, we need to find it’s address – we can do this by using !mex.tl and passing it the name of the process we are interested in:

5: kd> !mex.tl system
PID Address Name
=========== ================ =========================================
0x4 0n4 fffffa8019a6b040 System
0x1b8 0n440 fffffa8028f02060 svchost.exe(LocalSystemNetworkRestricted)
=========== ================ =========================================
PID Address Name

Warning! Zombie process(es) detected (not displayed). Count: 2 [zombie report]

Ok, got it (fffffa8019a6b040) – let’s switch to it by running !mex.p and passing it this address:

5: kd> !mex.p fffffa8019a6b040
Name Address Ses PID User Name Create Time Up Time Mods Handle Act Thrd Z Thrd Parent
====== ================ ==== ======= =============== ========================== ============ ==== ====== ======== ====== ==============
System fffffa8019a6b040 none 4 (0n4) DOMAIN\COMPUTERNAME$ 06/26/2017 12:11:57.995 PM 2h:03:22.004 0 1176 204 0 Idle 0 (0n0)

Memory Details:

VM Peak Work Set Commit Size
======== ======== ======== ===========
23.18 MB 27.99 MB 18.49 MB 556 KB

CPU Details:

User Kernel Total
==== ========== ==========
0s 24m:33.696 24m:33.696

Show Threads: Unique Stacks !mex.listthreads (!lt) fffffa8019a6b040 !winde.lp fffffa8019a6b040 !process fffffa8019a6b040 7

Right! We are in the context of the System process, let’s find the Modified Page Writer thread – it should have this function call on the stack “nt!MiModifiedPageWriter“.

Being in the context of the System process we can use MEX again to search in all available threads for a thread stack that matches what we are looking for – we can use !mex.us and pass it the string we are looking for “nt!MiModifiedPageWriter“:

5: kd> !mex.us nt!MiModifiedPageWriter
1 thread [stats]: fffffa8019ab43c0
fffff80001eda6aa nt!KiSwapContext+0x7a
fffff80001edd142 nt!KiCommitThreadWait+0x1d2
fffff80001e9c03b nt!KeWaitForGate+0xfb
fffff80001e73c3a nt!MiModifiedPageWriter+0x5a
fffff80002170f12 nt!PspSystemThreadStartup+0x5a
fffff80001ec9de6 nt!KiStartSystemThread+0x16

Threads matching filter: 1 out of 204

Cool, found it! Now we want to see more information about what this thread is doing, it’s state, etc. We can do this with !mex.t and passing it the thread address (fffffa8019ab43c0):

5: kd> !mex.t fffffa8019ab43c0
Process Thread CID UserTime KernelTime ContextSwitches Wait Reason Time
System (fffffa8019a6b040) fffffa8019ab43c0 4.ac 0s 1s.154 3135 WrFreePage 3s.650

WaitBlockList:
Object Name Type Other Waiters
fffff80002092e00 nt!MmModifiedPageWriterGate+0x0 Gate 0

Priority:
Current Base UB FB IO Page
17 8 0 0 2 5

# Child-SP Return Call Site
0 fffff880029fdac0 fffff80001edd142 nt!KiSwapContext+0x7a
1 fffff880029fdc00 fffff80001e9c03b nt!KiCommitThreadWait+0x1d2
2 fffff880029fdc90 fffff80001e73c3a nt!KeWaitForGate+0xfb
3 fffff880029fdce0 fffff80002170f12 nt!MiModifiedPageWriter+0x5a
4 fffff880029fdd40 fffff80001ec9de6 nt!PspSystemThreadStartup+0x5a
5 fffff880029fdd80 0000000000000000 nt!KiStartSystemThread+0x16

It’s not blocked by anything, it is just waiting on nt!MmModifiedPageWriterGate which is a (nt!_KGATE) Gate object (short said, event to get triggered to start doing work) – we can have a look at the Gate object using dx and passing it the address (fffff80002092e00):

0: kd> dx -r1 ((nt!_KGATE *)0xfffff80002092e00)
((nt!_KGATE *)0xfffff80002092e00)                 : 0xfffff80002092e00 [Type: _KGATE *]
    [+0x000] Header           [Type: _DISPATCHER_HEADER]

0: kd> dx -r1 ((ntkrnlmp!_DISPATCHER_HEADER *)0xfffff80002092e00)
((ntkrnlmp!_DISPATCHER_HEADER *)0xfffff80002092e00)                 : 0xfffff80002092e00 [Type: _DISPATCHER_HEADER *]
    [+0x000] Type             : 0x7 [Type: unsigned char]
    [+0x001] TimerControlFlags : 0x1 [Type: unsigned char]
    [+0x001 ( 0: 0)] Absolute         : 0x1 [Type: unsigned char]
    [+0x001 ( 1: 1)] Coalescable      : 0x0 [Type: unsigned char]
    [+0x001 ( 2: 2)] KeepShifting     : 0x0 [Type: unsigned char]
    [+0x001 ( 7: 3)] EncodedTolerableDelay : 0x0 [Type: unsigned char]
    [+0x001] Abandoned        : 0x1 [Type: unsigned char]
    [+0x001] Signalling       : 0x1 [Type: unsigned char]
    [+0x002] ThreadControlFlags : 0x6 [Type: unsigned char]
    [+0x002 ( 0: 0)] CpuThrottled     : 0x0 [Type: unsigned char]
    [+0x002 ( 1: 1)] CycleProfiling   : 0x1 [Type: unsigned char]
    [+0x002 ( 2: 2)] CounterProfiling : 0x1 [Type: unsigned char]
    [+0x002 ( 7: 3)] Reserved         : 0x0 [Type: unsigned char]
    [+0x002] Hand             : 0x6 [Type: unsigned char]
    [+0x002] Size             : 0x6 [Type: unsigned char]
    [+0x003] TimerMiscFlags   : 0x0 [Type: unsigned char]
    [+0x003 ( 5: 0)] Index            : 0x0 [Type: unsigned char]
    [+0x003 ( 6: 6)] Inserted         : 0x0 [Type: unsigned char]
    [+0x003 ( 7: 7)] Expired          : 0x0 [Type: unsigned char]
    [+0x003] DebugActive      : 0x0 [Type: unsigned char]
    [+0x003 ( 0: 0)] ActiveDR7        : 0x0 [Type: unsigned char]
    [+0x003 ( 1: 1)] Instrumented     : 0x0 [Type: unsigned char]
    [+0x003 ( 5: 2)] Reserved2        : 0x0 [Type: unsigned char]
    [+0x003 ( 6: 6)] UmsScheduled     : 0x0 [Type: unsigned char]
    [+0x003 ( 7: 7)] UmsPrimary       : 0x0 [Type: unsigned char]
    [+0x003] DpcActive        : 0x0 [Type: unsigned char]
    [+0x000] Lock             : 393479 [Type: long]
    [+0x004] SignalState      : 0 [Type: long]
    [+0x008] WaitListHead     [Type: _LIST_ENTRY]

Hmm, what else? Well we also have another system thread that is writing modified pages (aka dirty pages) – this is being used to write memory mapped files that have been modified (these are usually being backed by the page file) – nt!MiMappedPageWriter – same as before !mex.us) and then using !mex.t to have more detailed information:

5: kd> !mex.us nt!MiMappedPageWriter
1 thread [stats]: fffffa8019ab5040
fffff80001eda6aa nt!KiSwapContext+0x7a
fffff80001edd142 nt!KiCommitThreadWait+0x1d2
fffff80001ede056 nt!KeDelayExecutionThread+0x186
fffff80001f117bc nt!MiGatherMappedPages+0x8c
fffff80001f122f8 nt!MiMappedPageWriter+0x198
fffff80002170f12 nt!PspSystemThreadStartup+0x5a
fffff80001ec9de6 nt!KiStartSystemThread+0x16

Threads matching filter: 1 out of 204
5: kd> !mex.t fffffa8019ab5040
Process Thread CID UserTime KernelTime ContextSwitches Wait Reason Time
System (fffffa8019a6b040) fffffa8019ab5040 4.b0 0s 187ms 133350 DelayExecution 15ms

Priority:
Current Base UB FB IO Page
17 8 0 0 2 5

# Child-SP Return Call Site
0 fffff88002b08910 fffff80001edd142 nt!KiSwapContext+0x7a
1 fffff88002b08a50 fffff80001ede056 nt!KiCommitThreadWait+0x1d2
2 fffff88002b08ae0 fffff80001f117bc nt!KeDelayExecutionThread+0x186
3 fffff88002b08b50 fffff80001f122f8 nt!MiGatherMappedPages+0x8c
4 fffff88002b08c50 fffff80002170f12 nt!MiMappedPageWriter+0x198
5 fffff88002b08d40 fffff80001ec9de6 nt!PspSystemThreadStartup+0x5a
6 fffff88002b08d80 0000000000000000 nt!KiStartSystemThread+0x16

Also not blocked, it’s normally waiting in nt!KeDelayExecutionThread which means that after some time, it will run again and then wait again and so on (it basically runs on an interval).

But what files are being currently cached and how big are they in memory? We can have a look at this using !mex.mem -fc and we get this here:

5: kd> !mex.mem -fc
Name ControlArea FsContext Valid StandbyDirty Shared Locked Total
=================================================================================== ================ ================ ======== ============ ====== ====== ========
[...SNIPPED...]
$LogFile fffffa801fc6b990 fffffa801f656e10 860 KB 860 KB
CM_123.mdf fffffa802a3dd490 fffff8a003a85140 1.75 MB 1.75 MB
$Mft fffffa801ce9d4a0 fffffa801ce7dbb0 2.82 MB 2.82 MB
CM_123.mdf fffffa80204ffd30 fffff8a003423140 10.69 MB 15.54 GB 15.55 GB

Virtual Memory Physical Memory File Cache Cache Writes

Interesting … there is a file called CM_123.mdf that is using 15.54GB – that is half of our physical memory !!! Ha, there we have it!

Let’s have a look at more details (ex. it’s path, etc.). This address (fffffa80204ffd30) to the file will point to a Control Area which is s structure the Kernel creates for the image file – we can use !kdexts.ca with this address and get more information:

5: kd> !kdexts.ca fffffa80204ffd30
ControlArea @ fffffa80204ffd30
Segment fffff8a004a8b940 Flink 0000000000000000 Blink 0000000000000000
Section Ref 1 Pfn Ref 3e30f0 Mapped Views 19f
User Ref 0 WaitForDel 0 Flush Count 0
File Object fffffa802a4144f0 ModWriteCount 0 System Views 19f
WritableRefs 0
Flags (8080) File WasPurged

\SCCMSQLBackup\123Backup\SiteDBServer\CM_123.mdf

Segment @ fffff8a004a8b940
ControlArea fffffa80204ffd30 ExtendInfo 0000000000000000
Total Ptes 1745000
Segment Size 1745000000 Committed 0
Flags (c0000) ProtectionMask

Subsection 1 @ fffffa80204ffdb0
ControlArea fffffa80204ffd30 Starting Sector 0 Number Of Sectors 800000
Base Pte fffff8a014a00000 Ptes In Subsect 800000 Unused Ptes 0
Flags d Sector Offset 0 Protection 6
Accessed
Flink fffffa801fc069d0 Blink fffffa802a3dd560 MappedViews 0

Subsection 2 @ fffffa801fc0a010
ControlArea fffffa80204ffd30 Starting Sector 800000 Number Of Sectors 800000
Base Pte fffff8a01ca00000 Ptes In Subsect 800000 Unused Ptes 0
Flags d Sector Offset 0 Protection 6
Accessed
Flink 0000000000000000 Blink 0000000000000000 MappedViews 19f

Subsection 3 @ fffffa801fc54b30
ControlArea fffffa80204ffd30 Starting Sector 1000000 Number Of Sectors 745000
Base Pte 0000000000000000 Ptes In Subsect 745000 Unused Ptes 0
Flags c Sector Offset 0 Protection 6
Flink 0000000000000000 Blink 0000000000000000 MappedViews 0

It really seems to be something related to SCCM – but which process is holding this file?

In this moment, I am guessing that we are already doing Cache Write Throttling – this is a mechanism used to improve server performance when we may delay the writes. This is something that will be triggered when nt!CcTotalDirtyPages will be bigger than nt!CcDirtyPageThreshold.

From the Windows Internals book:

The file system and cache manager must determine whether a cached write request will affect system performance and then schedule any delayed writes. First the file system asks the cache manager whether a certain number of bytes can be written right now without hurting performance by using the CcCanIWrite function and blocking the write if necessary. For asynchronous I/O, the file system sets up a callback with the cache manager for automatically writing the bytes when writes are again permitted by calling CcDeferWrite. Otherwhise, it just blocks and waits on CcCanIWrite to continue. Once it’s notified of an impending write operation, the cache manager determines how many firty pages are in the cache and how much physical memory is available. If few physical pages are free, the cache manager momentarily blocks the file system thread that’s requesting to write data to the cache. The cache manager’s lazy writer flushes some of the dirty pages to disk and then allows the blocked file system thread to continue. This throttling prevents system performance from degrading because of a lack of memory when a file system or network server issues a large write operation“.

We have a very cool command we can use to check this which will also show us which (if any) files are currently in this state (from where we can get the file handle) – !kdexts.defwrites:

5: kd> !kdexts.defwrites
*** Cache Write Throttle Analysis ***

CcTotalDirtyPages: 4075797 (16303188 Kb)
CcDirtyPageThreshold: 4058054 (16232216 Kb)
AvailablePages: 27177 ( 108708 Kb)
ThrottleTop: 450 ( 1800 Kb)
ThrottleBottom: 80 ( 320 Kb)
ModifiedPages: 4073135 (16292540 Kb)

CcTotalDirtyPages >= CcDirtyPageThreshold, writes throttled

Check these thread(s): CcWriteBehind(LazyWriter)
Check critical workqueue for the lazy writer, !exqueue 16
Cc Deferred Write list: (CcDeferredWrites)
File: fffffa802a4144f0 Event: fffff880074906d0
File: fffffa8029c08070 Event: fffff88007968640

Yup, sure enough, we are currently doing throttling and here are the 2 file handles we have fffffa802a4144f0 fffffa8029c08070.

Nice, so now that we have the handles, we can use !kdexts.fileobj and give it the address to understand what files these are:

5: kd> !kdexts.fileobj fffffa802a4144f0

\SCCMSQLBackup\123Backup\SiteDBServer\CM_123.mdf

Device Object: 0xfffffa801f7d07c0 \Driver\volmgr
Vpb: 0xfffffa8029683ad0
Event signalled
Access: Read Write Delete

Flags: 0x43062
Synchronous IO
Sequential Only
Cache Supported
Modified
Size Changed
Handle Created

File Object is currently busy and has 0 waiters.

FsContext: 0xfffff8a003423140 FsContext2: 0xfffff8a00f5d3500
Private Cache Map: 0xfffffa802029b9c0
CurrentByteOffset: edb0f0000
Cache Data:
Section Object Pointers: fffffa8029479818
Shared Cache Map: fffffa802029b850 File Offset: ffffffffdb0f0000

Ha! Exactly the one we are looking for (using 15.54GB of RAM): \SCCMSQLBackup\123Backup\SiteDBServer\CM_123.mdf

No need to look at the other one now, it’s the one we are interested in. So ok, what SCCM process is currently having a handle open on this file? We can use !mex.findhandle with the address of the file handle (fffffa802a4144f0) to get the process (you can use !kdexts.findhandle instead):

5: kd> !mex.findhandle fffffa802a4144f0
smssqlbkup.exe fffffa802935eb10
384

Ok, let’s have a look at what this SCCM process (smssqlbkup.exe) is doing by switching to it now that we have the address (fffffa802935eb10):

5: kd> !mex.p fffffa802935eb10
Name Address Ses PID User Name Create Time Up Time Mods Handle Act Thrd Z Thrd Parent
============== ================ === ============ =============== ========================== ============ ==== ====== ======== ====== ========================
smssqlbkup.exe fffffa802935eb10 0 614 (0n1556) DOMAIN\COMPUTERNAME$ 06/26/2017 12:13:00.723 PM 2h:02:19.276 54 221 8 0 services.exe 264 (0n612)

Command Line: "D:\SMS_FOLDER\bin\x64\smssqlbkup.exe"

Memory Details:

VM Peak Work Set Commit Size PP Quota NPP Quota
======== ========= ======== =========== ========= =========
86.53 MB 107.85 MB 66.95 MB 12.94 MB 137.89 KB 15.38 KB

CPU Details:

User Kernel Total
====== ========= =========
2s.403 2m:21.447 2m:23.850

Show Threads: Unique Stacks !mex.listthreads (!lt) fffffa802935eb10 !winde.lp fffffa802935eb10 !process fffffa802935eb10 7

Show LPC Port information for process

Let’s have a look at the list of threads and afterwards we should have a look at each thread – we can do this using !mex.lt now that we are already in the context of this process:

5: kd> !mex.lt
Process PID Thread Id Time Reason
============== === ================ ==== ======= ==============
smssqlbkup.exe 614 fffffa8029361b50 5c0 2s.371 UserRequest
smssqlbkup.exe 614 fffffa801f6a5b50 53c 43s.649 DelayExecution
smssqlbkup.exe 614 fffffa801f6a55c0 874 514ms Executive
smssqlbkup.exe 614 fffffa80293b2b50 c38 43s.649 UserRequest
smssqlbkup.exe 614 fffffa80295c0b50 7d4 43s.649 UserRequest
smssqlbkup.exe 614 fffffa801f929b50 11fc 43s.649 WrQueue
smssqlbkup.exe 614 fffffa8021324060 1af4 2s.386 WrQueue
smssqlbkup.exe 614 fffffa8029610060 10a0 2s.371 WrQueue

Thread Count: 8

Looking through each thread, we find this veeery interesting thread (fffffa801f6a55c0):

5: kd> !mex.t fffffa801f6a55c0
Process Thread CID UserTime KernelTime ContextSwitches Wait Reason Time COM-Initialized
smssqlbkup.exe (fffffa802935eb10) fffffa801f6a55c0 614.874 1s.841 2m:21.228 36298 Executive 514ms APTKIND_MULTITHREADED (MTA)

WaitBlockList:
Object Type Other Waiters
fffff880074906d0 NotificationEvent 0

# Child-SP Return Call Site
0 fffff88007490430 fffff80001edd142 nt!KiSwapContext+0x7a
1 fffff88007490570 fffff80001edf96f nt!KiCommitThreadWait+0x1d2
2 fffff88007490600 fffff80001ea2527 nt!KeWaitForSingleObject+0x19f
3 fffff880074906a0 fffff880016ebcc8 nt!CcCanIWrite+0xfffffffffffa11f7
4 fffff88007490770 fffff88001401102 Ntfs!NtfsCopyWriteA+0x68
5 fffff88007490970 fffff880014048ba fltmgr!FltpPerformFastIoCall+0xf2
6 fffff880074909d0 fffff8800142283e fltmgr!FltpPassThroughFastIo+0xda
7 fffff88007490a10 fffff800021f1bbe fltmgr!FltpFastIoWrite+0x1ce
8 fffff88007490ab0 fffff80001ed70d3 nt!NtWriteFile+0x5ad
9 fffff88007490bb0 0000000077a2bdba nt!KiSystemServiceCopyEnd+0x13
a 00000000014cce38 000007fefd73865c ntdll!NtWriteFile+0xa
b 00000000014cce40 00000000778d171a KERNELBASE!WriteFile+0xfe
c 00000000014cceb0 00000000778d1ea5 kernel32!BaseCopyStream+0x4b2
d 00000000014cd8a0 00000000778d1907 kernel32!BasepCopyFileExW+0x545
e 00000000014cdda0 00000000779556f2 kernel32!CopyFileExW+0x97
f 00000000014cde20 000000013ff6fb4a kernel32!CopyFileA+0x62
10 00000000014cde80 000000013ff81a5c smssqlbkup+0x9fb4a
11 00000000014cdf10 000000013ff7c430 smssqlbkup+0xb1a5c
12 00000000014ce280 000000013ff7bb5a smssqlbkup+0xac430
13 00000000014cf3b0 000000013ff7753f smssqlbkup+0xabb5a
14 00000000014cf400 000000013ff7987c smssqlbkup+0xa753f
15 00000000014cf6f0 000000013ff85643 smssqlbkup+0xa987c
16 00000000014cf7c0 0000000140155d55 smssqlbkup+0xb5643
17 00000000014cf9e0 0000000140155bec smssqlbkup+0x285d55
18 00000000014cfa10 00000000778d59cd smssqlbkup+0x285bec
19 00000000014cfa40 0000000077a0a561 kernel32!BaseThreadInitThunk+0xd
1a 00000000014cfa70 0000000000000000 ntdll!RtlUserThreadStart+0x1d

Well, well, well – what do we have on the stack here?

  • first of all we are clearly trying to write to a file (nt!NtWriteFile after the usermode call to CopyFile because the smssqlbkup.exe process is trying to copy the file)
  • second of all we are waiting in nt!CcCanIWrite (remember explanation of throttling from above)

So, it seems that we either have a very slow disk or we have a problem with some disk related driver. Let’s check for any pending disk IRPs – we can do this by running !mex.ioq (you can use !kdexts.irpfind, but will need to parse the results manually):

5: kd> !mex.ioq
Collecting Data...This might take a moment.
Collecting Raid Data...Returned Failure
Collecting Disk Data...Added Object Disk with Filter DISK
Returned Success

================== Disk.sys Devices ==========================

Device Name DeviceObject DeviceExtension PrivateFdoData PendingIRPs
===================== ================ ================ ================ ===========
\Device\Harddisk0\DR0 fffffa801cd28060 fffffa801cd281b0 fffffa801cd29010 2
==================RaidPort Info: \Device\RaidPort2 ==========================
Device Name Miniport DeviceExtension BusType IoModel
=================================== ======== ================ =========== =========================
\Device\RaidPort2(fffffa801a668060) somestoragedriver fffffa801a6681b0 BusTypeRAID StorSynchronizeFullDuplex
RaidUnit(LUN) ClassPNPDevice DeviceState QueueState OutstandingCount QueueDepth PendingIrps
================ ================ ================== ========== ================ ========== ===========
fffffa801a6b2930 fffffa801cd28060 DeviceStateWorking Normal 2 254 2

Summary Notes:
IOQ found 80 devices, of which 2 have a total of 4 pending Irps.

Alright, so we have pending disk IRPs – let’s have a look at the IRP queue now that we have the DeviceExtension address (fffffa801cd281b0) by using !mex.ioq again, but this time with the “-e” parameter with the address (fffffa801cd281b0):

5: kd> !mex.ioq -e fffffa801cd281b0

============================================
Disk Information: \Device\Harddisk0\DR0
============================================
ClassPNP Devices(Disk.sys) Information:
============================================
Device Name DeviceObject DeviceExtension PrivateFdoData PendingIRPs
===================== ================ ================ ================ ===========
\Device\Harddisk0\DR0 fffffa801cd28060 fffffa801cd281b0 fffffa801cd29010 2

Transfer Packet Current Irp Thread Wait Thread #Retries TimeOut (s) DeviceName CurrentDrvr Original Irp FileName Srb SrbStatus
================ ================ =========== ============================== ======== =========== =============== =========== ================ ============= ================ ==================
fffffa801d11a310 fffffa801d11b010 15ms fffffa80213cc430(RAMMap64.exe) 0 0 HarddiskVolume2 somestoragedriver fffffa8029b27010 \pagefile.sys fffffa801d11a430 SRB_STATUS_PENDING
fffffa801d1fdb10 fffffa801d1fd910 0s fffffa8029573b50(sqlservr.exe) 0 0 HarddiskVolume2 somestoragedriver fffffa801f8ab980 \pagefile.sys fffffa801d1fdc30 SRB_STATUS_PENDING

Wow … stuck IRPs on writing to the page file (pagefile.sys) – remember that the memory mapped files are usually backed by the page file.

So, now it’s clear and it does not matter that that we do not see our file here – this is normal because at the time the dump was taken, the we were waiting in nt!CcCanIWrite and it seems that we did not have any pending IRP for it currently.

The important thing is that we have a disk related issue here.

Let’s have a quick look at both IRPs in more detail by using !mex.mirp with the “-v” parameter (for verbose) and each IRP addresses (fffffa801d11b010 & fffffa801d1fd910) (you can use !kdexts.irp instead):

5: kd> !mex.mirp -v fffffa801d11b010

Irp Details: fffffa801d11b010 [ verbose | !ddt | !winde.io | !irp ]

Mdl : fffffa801fd8a7b0
System buffer :
Issuing Process :
Thread :
Frame Count : 2
IoStatus Status : c00000bb
IoStatus Info : 0000000000000000
Requester Mode :
Cancel : 0
Cancel IRQL : 0
Apc Environment : 0
User Iosb : 0000000000000000
User Event :
APC : 0000000000000000
Completion Key : 0000000000000000
Cancel Routine :
Original File Object: 0000000000000000
Original File Name :

Irp Stack Frame(s)

# Driver Major Minor Dispatch Routine Control Code Flg Ctrl Status Completion Invoker(s) Device QueueLocation File Context Completion Routine Args
=== ================ ======================= ===== ======================== ============ === ==== ======= ====================== ================ ============= ====== ====================== ============================ ===============================================================================
1 CREATE 0 0 0 None 0000000000000000 (null) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
->2 \Driver\somestoragedriver INTERNAL_DEVICE_CONTROL 0 storport!RaDriverScsiIrp 12 e1 Pending Cancel, Success, Error fffffa801a6b27e0 (null) fffffa801d11a310(CPnp) CLASSPNP!TransferPktComplete fffffa801d11a430(CPnp) 0000000000000000 0000000000000000 fffffa801a6b2930(Devi)

IO Status: 0xc00000bb (The request is not supported.)

 

5: kd> !mex.mirp -v fffffa801d1fd910

Irp Details: fffffa801d1fd910 [ verbose | !ddt | !winde.io | !irp ]

Mdl : fffffa801f5b9670
System buffer :
Issuing Process :
Thread :
Frame Count : 2
IoStatus Status : c00000bb
IoStatus Info : 0000000000000000
Requester Mode :
Cancel : 0
Cancel IRQL : 0
Apc Environment : 0
User Iosb : 0000000000000000
User Event :
APC : 0000000000000000
Completion Key : 0000000000000000
Cancel Routine :
Original File Object: 0000000000000000
Original File Name :

Irp Stack Frame(s)

# Driver Major Minor Dispatch Routine Control Code Flg Ctrl Status Completion Invoker(s) Device QueueLocation File Context Completion Routine Args
=== ================ ======================= ===== ======================== ============ === ==== ======= ====================== ================ ============= ====== ====================== ============================ ===============================================================================
1 CREATE 0 0 0 None 0000000000000000 (null) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
->2 \Driver\somestoragedriver INTERNAL_DEVICE_CONTROL 0 storport!RaDriverScsiIrp 12 e1 Pending Cancel, Success, Error fffffa801a6b27e0 (null) fffffa801d1fdb10(CPnp) CLASSPNP!TransferPktComplete fffffa801d1fdc30(CPnp) 0000000000000000 0000000000000000 fffffa801a6b2930(Devi)

IO Status: 0xc00000bb (The request is not supported.)

Ok, there we go, we are stuck with these IRPs inside of the disk vendor driver (called “somestoragedriver” here for obvious purposes). The cause these are stuck is that we get the IO Status 0xc00000bb from the IOCTL, which means STATUS_NOT_SUPPORTED (The request is not supported).

Last thing – let’s check some information about this driver (like how old it is) – we can do this by running lmtm (or any other lm* command) and give it the name of the driver (somestoragedriver):

5: kd> lmtm somestoragedriver
Browse full module list
start end module name
fffff880`01549000 fffff880`01559000 somestoragedriver Mon Aug 9 21:21:24 2010 (4C604724)

Wow … this is awesome … we are in July 2017 and this disk driver is from August 2010 … niiice …

 

So, what we need to do is:

  • Update the “somestoragedriver.sys” driver to the latest existing version
  • After this, monitor disk performance using Performance Monitor and check if the disk is performing well

Sure enough, after updating the “somestoragedriver.sys” driver, the issue was solved and the SCCM server stopped eating up memory until it gets into a hung state.

 

WOOT! 😀

 

PS: We can also get to the file directly by running !mex.mem -p, but this command usually takes a very long time (couple of hours sometimes, depending how big the dump is – this can also be achieved with !kdexts.memusage, but the output will need to be looked at manually – not summary for the Mapped Files).

The reason why it takes a lot of time, is that it has to scan the PFN (Page Frame Number) database and process all the data.

5: kd> !mex.mem -p
Loading PFN database... Building kernel map...this will take a while.

Top physical RAM users:

Type Name ControlArea/PID Valid Standby Dirty Shared Locked PageTables TOTAL
=========== ============== ================ ========= ======== ======== ======= ========= ========== =========
Mapped File CM_123.mdf FFFFFA80204FFD30 10.69 MB 15.54 GB 15.55 GB
Process sqlservr.exe 14866272 14.18 GB 79.96 MB 16 KB n/a n/a 35.34 MB 14.29 GB
Process ReportingServi FFFFFA80292A6B10 241.43 MB 2.62 MB 16 KB n/a n/a 2.28 MB 246.35 MB
System NonPaged Pool 184.33 MB n/a 184.33 MB 184.33 MB
Process RAMMap64.exe FFFFFA8020C87380 111.77 MB 4.94 MB n/a n/a 2.06 MB 118.77 MB
Process some_other_process FFFFFA8029411060 97.23 MB 1.47 MB n/a n/a 1.18 MB 99.88 MB
System Kernel Stacks 53.94 MB 44 KB n/a 992 KB 128 KB 54.11 MB
Process procexp64.exe FFFFFA80297EB060 36.13 MB n/a n/a 608 KB 36.72 MB
Process svchost.exe FFFFFA8028EF25D0 28.81 MB 1.97 MB n/a n/a 732 KB 31.49 MB
Process explorer.exe FFFFFA8029673A40 23.09 MB 196 KB n/a n/a 724 KB 23.99 MB
Process perfmon.exe FFFFFA802976A6C0 21.61 MB 160 KB n/a n/a 460 KB 22.21 MB
Mapped File sqlservr.exe FFFFFA8029838AC0 20.22 MB 20.22 MB
Process explorer.exe FFFFFA80296AAB10 19.06 MB 120 KB n/a n/a 664 KB 19.82 MB
Process lsass.exe FFFFFA801F106A70 18.47 MB 96 KB n/a n/a 352 KB 18.91 MB
Process WmiPrvSE.exe FFFFFA80294AF060 18.11 MB 4 KB n/a n/a 548 KB 18.64 MB
Process smsexec.exe FFFFFA801F5D0B10 17.64 MB 40 KB n/a n/a 724 KB 18.38 MB
Process procexp64.exe FFFFFA802A574B10 17.18 MB 40 KB n/a n/a 460 KB 17.66 MB
Mapped File shell32.dll FFFFFA801EF80680 13.5 MB 3.12 MB 16.62 MB
Process svchost.exe FFFFFA8028F0BB10 15.03 MB 28 KB n/a n/a 432 KB 15.48 MB
Process some_process FFFFFA8029018B10 14.19 MB 196 KB n/a n/a 432 KB 14.8 MB
=========== ============== ================ ========= ======== ======== ======= ========= ========== =========
Type Name ControlArea/PID Valid Standby Dirty Shared Locked PageTables TOTAL

Only the top 20 of 1612 rows are being shown. Show all

Virtual Memory Physical Memory File Cache Cache Writes

 

Here is how it would look like with !kdexts.memusage with all other stuff snipped out:

5: kd> !kdexts.memusage
[...SNIPPED...]
Usage Summary (in Kb):
Control Valid Standby Dirty Shared Locked PageTables name
ffffffffffffd 8384 0 0 0 0 0 AWE
fffffa8019a34010 440 0 0 184 0 0 mapped_file( winhttp.dll )
fffffa8019ac18d0 16 0 0 0 0 0 mapped_file( netutils.dll )
[...SNIPPED...]
fffffa80204ffd30 10944 0 16292096 0 0 0 mapped_file( CM_123.mdf )
[...SNIPPED...]

 

Don’t forget to read-up on your Windows Internals 😉

 


Network Discovery not working in SCOM due to DLL corruption (debugging DLL loader component – LDR)

$
0
0

So, wow, this was nice! … a super one-off scenario related to SCOM. This is very interesting though from a point of view of debugging and can theoretically happen in any application, so we are going to have a look at how to debug (and automate debugging) of the DLL Loader component (LDR) from ntdll.dll.

Let’s start with the actual issue first 😀

SCOM Network Discovery component is not working and the error we can see being logged in the Operations Manager event log on the Management Server which is configured to run the Network Discovery rule, is Error event ID 157:

Creation of module with CLSID "{6620A03E-9E85-4BEA-947C-D508CDB540FE}" failed with error "%1 is not a valid Win32 application." in rule "discovery<SOME_NUMBER>.<SOME_GUID>" running for instance "<SOME_NAME>" with id:"{<SOME_GUID>}" in management group "<SOME_MG_NAME>".

Now the error “not a valid Win32 application” means that we are trying to load an EXE or DLL file and it has one of 2 issues:

  1. we either are trying to load a 32bit binary file from a 64bit process (or the other way around – very uncommon … but not impossible)
  2. or there is something directly wrong with the binary file contents (some corruption at the file structure level / header): https://en.wikipedia.org/wiki/Portable_Executable

But what file (most probably DLL) is this?! We have the CLSID in the error which is: 6620A03E-9E85-4BEA-947C-D508CDB540FE

So how do we figure out which DLL it is? By searching in the registry (on a SCOM Management Server) by that CLSID and somewhere under HKEY_CLASSES_ROOT\CLSID\, we should find this GUID – and after a search, we do and under: HKEY_CLASSES_ROOT\CLSID\{6620A03E-9E85-4bea-947C-D508CDB540FE}\InprocServer32, we find the value called (Default) which contains the full path to the DLL: C:\Program Files\Microsoft System Center 2012 R2\Operations Manager\Server\NetworkDiscoveryModules.dll

 

Pah! Super simple, right? It looks like the NetworkDiscoveryModules.dll is somehow corrupted and we should just replace it with a working one from a SCOM Management Server with the exact same SCOM version and Update Rollup installed – right?!

Well, we did that and it did not work … so whaaat is going on here?!

 

Well, this clearly is not a “SCOM” issue, but some file/OS related issue that can happen with any software – but, because I am a curious man …

 

First things first – let’s totally separate our problem from SCOM and let’s manually try to reproduce the problem by just loading the DLL directly – we don’t need to complicate ourselves with writing native code, we can just do this in PowerShell directly – like this:

Add-Type @"
using System;
using System.Runtime.InteropServices;
public static class NativeMethods {
   [DllImport(@"$($env:ProgramFiles)\Microsoft System Center 2012 R2\Operations Manager\Server\NetworkDiscoveryModules.dll")]
   public static extern void DllCanUnloadNow();
}
"@
[NativeMethods]::DllCanUnloadNow()

Extra info for the other curious people out there about this script:

  • it will create a C# file which will get compiled in a temporary directory that will do the actual native (C/C++) DLL loading/import
  • after this, we will call a standard DLL function (usually included in every DLL – verified that we have it in NetworkDiscoveryModules.dll because I can look at the code :p – but you can usually relay that almost all DLLs have this) – DllCanUnloadNow()
  • this function – DllCanUnloadNow() – will not return anything from this PowerShell script on a computer where there is no issue with the DLL – however, it will return an error if there is something wrong with it – just like this one below which we got when running the script on the affected server:

So the error we are getting is actually the same (only more descriptive): An attempt was made to load a program with an incorrect format.

 

Ok cool! So now we can reproduce the issue totally outside of SCOM, by just trying to load that DLL and calling a function from it (because only when we make an actual call will the DLL be truly loaded for code execution).

So, now we can do a live debug by attaching to the PowerShell.exe process which we will use to execute that script: https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/

Everything I will show next in the debugger, will be using Public Symbols, so everyone can follow the steps (don’t need any internal private symbols): https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/microsoft-public-symbols

The Windows LDR works like this:

  1. we want to load DLL whatever.dll
  2. we go verify if the binary file has the correct bitness and if it’s not corrupt in any way
  3. if all goes well, we check the import section and for each DLL which it is dependent on, we do the exact same thing one by one – steps 1 to 3 for each of them
  4. if any of the DLLs in the import section cannot be correctly loaded (not correct bitness, corrupt in some way, etc.) we will fail and throw the error describing why the DLL cannot be loaded – however, we throw the error for the initial DLL we were trying to load, we don’t specify if there actually was a failure in the loading of one of the dependent DLLs

So we need to track down which dependent DLL has the actual corruption …

Where do we break to check what is wrong? Well, we have these 2 nice function calls where we will break and check what is going on:

  1. kernelbase!LoadLibraryExW which is used to load DLLs by Windows OS (Ex comes from “Extended” which means being able to pass in more “options” – parameters and W comes from the fact that the UNICODE string version of it is being used) => we are just starting here in order to continue with  the investigation with other breakpoints only when we reach the point where we are trying to load the DLL we are interested in troubleshooting (in this case NetworkDiscoveryModules.dll)
  2. ntdll!LdrpFindOrMapDll (not documented on MSDN, should not directly be used) which will get called by the DLL loader component (LDR) to try to find the DLL in different places, verify binary file structures/header, load all the DLL that it is dependent on (verifying consistency of each as well) and in the end load the actual DLL we call in order to be able to execute from it => this will be important because this is where we can get the DLL file name that we are currently trying to load
  3. ntdll!ZwCreateSection which goes into Kernel and does the actual file read from disk and checks the required information (is it execution code, does it have a valid “MZ” start, is the header ok?, does it have the correct bitness?, etc.) => this is the actual function call we are interested in the end, because this is the one that will return the actual error in case that we have an issue loading the binary file to execute from it

Let’s set the first breakpoint and go in WinDbg:

0:000> bp kernelbase!LoadLibraryExW

0:000> g
Breakpoint 0 hit
kernelbase!LoadLibraryExW:
00007ffe`36b48d10 48895c2410 mov qword ptr [rsp+10h],rbx ss:0000000c`301d4b38=0000000c301d91a8

So, sure enough, we will break on the first/next execution of kernelbase!LoadLibraryExW and we can check what DLL is currently being loaded.

Just as an “FYI” we can/will also have a look at the stack because we are curious people:

0:001> k
Child-SP RetAddr Call Site
0000000c`301d4b28 00007ffe`310ae4b4 kernelbase!LoadLibraryExW
0000000c`301d4b30 00007ffe`310ae421 clr!CLRLoadLibraryExWorker+0x54
0000000c`301d4b80 00007ffe`31098aff clr!CLRLoadLibraryEx+0x51
0000000c`301d4bd0 00007ffe`31098a28 clr!LoadedImageLayout::LoadedImageLayout+0xca
0000000c`301d4e90 00007ffe`313f6a84 clr!PEImageLayout::Load+0x3d
0000000c`301d4ed0 00007ffe`313f69de clr!PEImage::GetLayoutInternal+0x361930
0000000c`301d4f20 00007ffe`3107ee8f clr!PEImage::GetLayout+0x36195e
0000000c`301d4fa0 00007ffe`3107ed53 clr!RuntimeOpenImageInternal+0xcf
0000000c`301d5080 00007ffe`3107ec7b clr!GetAssemblyMDInternalImportEx+0xcd
0000000c`301d5100 00007ffe`310d3cc9 clr!CreateMetaDataImport+0x1b
0000000c`301d5140 00007ffe`310d3c1d clr!BindResult::Init+0x69
0000000c`301d51b0 00007ffe`3118867b clr!BindResult::CreateFromPath+0xc9
0000000c`301d5220 00007ffe`31188891 clr!NATIVE_BINDER_SPACE::VerifyNativeCandidate+0x55f
0000000c`301d58b0 00007ffe`31086b40 clr!NATIVE_BINDER_SPACE::CreateVerifiedNICandidate+0x10a
0000000c`301d5bc0 00007ffe`31085f39 clr!NATIVE_BINDER_SPACE::BindToNativeDependency+0x2ac
0000000c`301d5de0 00007ffe`31085803 clr!BindToNativeAssembly+0xed
0000000c`301d5f00 00007ffe`3108799f clr!BindToNativeImage+0x1bb
0000000c`301d6230 00007ffe`31082a6b clr!CAsmDownloadMgr::RecordInfoAndProbeNativeImage+0x106
0000000c`301d62a0 00007ffe`3108303b clr!CAsmDownloadMgr::PreDownloadCheck+0x518
0000000c`301d6ae0 00007ffe`31082e45 clr!CAssemblyDownload::PreDownload+0xb7
0000000c`301d6b80 00007ffe`3108251a clr!CAssemblyName::BindToObject+0x419
0000000c`301d6cd0 00007ffe`310822c2 clr!FusionBind::RemoteLoad+0x1aa
0000000c`301d6e10 00007ffe`31082047 clr!AssemblySpec::LoadAssembly+0x19a
0000000c`301d6f30 00007ffe`31081cf5 clr!AssemblySpec::FindAssemblyFile+0x113
0000000c`301d76d0 00007ffe`3108a4ae clr!AppDomain::BindAssemblySpec+0xef7
0000000c`301d8950 00007ffe`310ab18b clr!AssemblySpec::LoadDomainAssembly+0x1ec
0000000c`301d8f00 00007ffe`3118c9e8 clr!AssemblySpec::LoadAssembly+0x1b
0000000c`301d8f40 00007ffe`2f6ebde2 clr!AssemblyNative::Load+0x304
0000000c`301d92b0 00007ffe`304c4326 mscorlib_ni!System.Reflection.RuntimeAssembly.InternalLoadAssemblyName(System.Reflection.AssemblyName, System.Security.Policy.Evidence, System.Reflection.RuntimeAssembly, System.Threading.StackCrawlMark ByRef, IntPtr, Boolean, Boolean, Boolean)+0xd2
0000000c`301d9360 00007ffe`31209f51 mscorlib_ni!System.Reflection.RuntimeAssembly.LoadWithPartialNameInternal(System.Reflection.AssemblyName, System.Security.Policy.Evidence, System.Threading.StackCrawlMark ByRef)+0xd66ad6
0000000c`301d93e0 00007ffe`31209e4f clr!ExceptionTracker::CallHandler+0xc5
0000000c`301d9480 00007ffe`312081c7 clr!ExceptionTracker::CallCatchHandler+0x7f
0000000c`301d9510 00007ffe`396c347d clr!ProcessCLRException+0x2e6
0000000c`301d95f0 00007ffe`39685405 ntdll!RtlpExecuteHandlerForUnwind+0xd
0000000c`301d9620 00007ffe`31208260 ntdll!RtlUnwindEx+0x385
0000000c`301d9d00 00007ffe`3120821c clr!ClrUnwindEx+0x40
0000000c`301da220 00007ffe`396c33fd clr!ProcessCLRException+0x2b2
0000000c`301da300 00007ffe`39684847 ntdll!RtlpExecuteHandlerForException+0xd
0000000c`301da330 00007ffe`39683a6d ntdll!RtlDispatchException+0x197
0000000c`301daa00 00007ffe`36b495fc ntdll!RtlRaiseException+0x18d
0000000c`301db1c0 00007ffe`31209864 kernelbase!RaiseException+0x68
0000000c`301db2a0 00007ffe`314bb7ef clr!RaiseTheExceptionInternalOnly+0x2fe
0000000c`301db3a0 00007ffe`3118cb37 clr!UnwindAndContinueRethrowHelperAfterCatch+0x80
0000000c`301db3f0 00007ffe`2f75d99e clr!AssemblyNative::Load+0x453
0000000c`301db760 00007ffe`2ff13666 mscorlib_ni!System.Reflection.RuntimeAssembly.LoadWithPartialNameInternal(System.Reflection.AssemblyName, System.Security.Policy.Evidence, System.Threading.StackCrawlMark ByRef)+0x14e
0000000c`301db890 00007ffe`1933970f mscorlib_ni!System.Reflection.Assembly.LoadWithPartialName(System.String)+0x46
0000000c`301db8e0 00007ffe`193394e1 system_management_automation_ni!System.Management.Automation.ExecutionContext.LoadAssembly(System.String, System.String, System.Exception ByRef)+0x15f
0000000c`301db9a0 00007ffe`193211f2 system_management_automation_ni!System.Management.Automation.ExecutionContext.AddAssembly(System.String, System.String, System.Exception ByRef)+0x21
0000000c`301db9f0 00007ffe`19316887 system_management_automation_ni!Microsoft.PowerShell.Commands.ModuleCmdletBase.LoadBinaryModule(System.Management.Automation.PSModuleInfo, Boolean, System.String, System.String, System.Reflection.Assembly, System.String, System.Management.Automation.SessionState, ImportModuleOptions, ManifestProcessingFlags, System.String, Boolean, Boolean, Boolean ByRef, System.String, Boolean)+0x452
0000000c`301dbbf0 00007ffe`1931b82b system_management_automation_ni!Microsoft.PowerShell.Commands.ModuleCmdletBase.LoadModuleNamedInManifest(System.Management.Automation.PSModuleInfo, Microsoft.PowerShell.Commands.ModuleSpecification, System.String, Boolean, System.String, System.Management.Automation.SessionState, ImportModuleOptions, ManifestProcessingFlags, Boolean, Boolean, System.Object, Boolean ByRef, System.String)+0x977
0000000c`301dbe20 00007ffe`19316ab5 system_management_automation_ni!Microsoft.PowerShell.Commands.ModuleCmdletBase.LoadModuleManifest(System.String, System.Management.Automation.ExternalScriptInfo, System.Collections.Hashtable, System.Collections.Hashtable, ManifestProcessingFlags, System.Version, System.Version, ImportModuleOptions ByRef, Boolean ByRef)+0x4d0b
0000000c`301dc9d0 00007ffe`1931fa1b system_management_automation_ni!Microsoft.PowerShell.Commands.ModuleCmdletBase.LoadModuleManifest(System.Management.Automation.ExternalScriptInfo, ManifestProcessingFlags, System.Version, System.Version, ImportModuleOptions ByRef)+0xd5
0000000c`301dca70 00007ffe`1932386c system_management_automation_ni!Microsoft.PowerShell.Commands.ModuleCmdletBase.LoadModule(System.Management.Automation.PSModuleInfo, System.String, System.String, System.String, System.Management.Automation.SessionState, System.Object, ImportModuleOptions ByRef, ManifestProcessingFlags, Boolean ByRef, Boolean ByRef)+0xc4b
0000000c`301dcda0 00007ffe`193241d8 system_management_automation_ni!Microsoft.PowerShell.Commands.ImportModuleCommand.ImportModule_LocallyViaName(ImportModuleOptions, System.String)+0x4ac
0000000c`301dcef0 00007ffe`193dbb0e system_management_automation_ni!Microsoft.PowerShell.Commands.ImportModuleCommand.ProcessRecord()+0x198
0000000c`301dcfd0 00007ffe`19382b55 system_management_automation_ni!System.Management.Automation.CommandProcessor.ProcessRecord()+0x29e
0000000c`301dd080 00007ffe`1937953e system_management_automation_ni!System.Management.Automation.CommandProcessorBase.DoExecute()+0xe5
0000000c`301dd0f0 00007ffe`193666d0 system_management_automation_ni!System.Management.Automation.Internal.PipelineProcessor.SynchronousExecuteEnumerate(System.Object, System.Collections.Hashtable, Boolean)+0x17e
0000000c`301dd1e0 00007ffe`1936581f system_management_automation_ni!System.Management.Automation.Runspaces.LocalPipeline.InvokeHelper()+0x750
0000000c`301dd400 00007ffe`19365241 system_management_automation_ni!System.Management.Automation.Runspaces.LocalPipeline.InvokeThreadProc()+0x2af
0000000c`301dd550 00007ffe`193348d5 system_management_automation_ni!System.Management.Automation.Runspaces.LocalPipeline.StartPipelineExecution()+0x321
0000000c`301dd5d0 00007ffe`19363fe4 system_management_automation_ni!System.Management.Automation.Runspaces.PipelineBase.CoreInvoke(System.Collections.IEnumerable, Boolean)+0x305
0000000c`301dd650 00007ffe`193df48e system_management_automation_ni!System.Management.Automation.Runspaces.PipelineBase.Invoke(System.Collections.IEnumerable)+0x24
0000000c`301dd690 00007ffe`193deff0 system_management_automation_ni!System.Management.Automation.PowerShell+Worker.ConstructPipelineAndDoWork(System.Management.Automation.Runspaces.Runspace, Boolean)+0x2de
0000000c`301dd770 00007ffe`193d9fbe system_management_automation_ni!System.Management.Automation.PowerShell.CoreInvokeHelper[[System.__Canon, mscorlib],[System.__Canon, mscorlib]](System.Management.Automation.PSDataCollection`1<System.__Canon>, System.Management.Automation.PSDataCollection`1<System.__Canon>, System.Management.Automation.PSInvocationSettings)+0x350
0000000c`301dd820 00007ffe`193d9aca system_management_automation_ni!System.Management.Automation.PowerShell.CoreInvoke[[System.__Canon, mscorlib],[System.__Canon, mscorlib]](System.Management.Automation.PSDataCollection`1<System.__Canon>, System.Management.Automation.PSDataCollection`1<System.__Canon>, System.Management.Automation.PSInvocationSettings)+0x4de
0000000c`301dd9b0 00007ffe`193efca0 system_management_automation_ni!System.Management.Automation.PowerShell.CoreInvoke[[System.__Canon, mscorlib]](System.Collections.IEnumerable, System.Management.Automation.PSDataCollection`1<System.__Canon>, System.Management.Automation.PSInvocationSettings)+0x1ba
0000000c`301dda60 00007ffe`193ed6c2 system_management_automation_ni!System.Management.Automation.CommandDiscovery.AutoloadSpecifiedModule(System.String, System.Management.Automation.ExecutionContext, System.Management.Automation.SessionStateEntryVisibility, System.Exception ByRef)+0x2c0
0000000c`301ddb00 00007ffe`19376326 system_management_automation_ni!System.Management.Automation.CommandDiscovery.TryModuleAutoDiscovery(System.String, System.Management.Automation.ExecutionContext, System.String, System.Management.Automation.CommandOrigin, System.Management.Automation.SearchResolutionOptions, System.Management.Automation.CommandTypes, System.Exception ByRef)+0x312
0000000c`301ddc10 00007ffe`19375ed1 system_management_automation_ni!System.Management.Automation.CommandDiscovery.LookupCommandInfo(System.String, System.Management.Automation.CommandTypes, System.Management.Automation.SearchResolutionOptions, System.Management.Automation.CommandOrigin, System.Management.Automation.ExecutionContext)+0x3d6
0000000c`301ddcf0 00007ffe`193dd8ce system_management_automation_ni!System.Management.Automation.CommandDiscovery.LookupCommandProcessor(System.String, System.Management.Automation.CommandOrigin, System.Nullable`1<Boolean>)+0x31
0000000c`301ddd50 00007ffe`193dcfd1 system_management_automation_ni!System.Management.Automation.ExecutionContext.CreateCommand(System.String, Boolean)+0x6e
0000000c`301dddb0 00007ffe`19335255 system_management_automation_ni!System.Management.Automation.PipelineOps.AddCommand(System.Management.Automation.Internal.PipelineProcessor, System.Management.Automation.CommandParameterInternal[], System.Management.Automation.Language.CommandBaseAst, System.Management.Automation.CommandRedirection[], System.Management.Automation.ExecutionContext)+0x3f1
0000000c`301ddef0 00007ffe`1a0a036e system_management_automation_ni!System.Management.Automation.PipelineOps.InvokePipeline(System.Object, Boolean, System.Management.Automation.CommandParameterInternal[][], System.Management.Automation.Language.CommandBaseAst[], System.Management.Automation.CommandRedirection[][], System.Management.Automation.Language.FunctionContext)+0x135
0000000c`301ddfb0 00007ffe`1938591b system_management_automation_ni!System.Management.Automation.Interpreter.ActionCallInstruction`6[[System.__Canon, mscorlib],[System.Boolean, mscorlib],[System.__Canon, mscorlib],[System.__Canon, mscorlib],[System.__Canon, mscorlib],[System.__Canon, mscorlib]].Run(System.Management.Automation.Interpreter.InterpretedFrame)+0x19e
0000000c`301de050 00007ffe`1938591b system_management_automation_ni!System.Management.Automation.Interpreter.EnterTryCatchFinallyInstruction.Run(System.Management.Automation.Interpreter.InterpretedFrame)+0x16b
0000000c`301de160 00007ffe`1938575b system_management_automation_ni!System.Management.Automation.Interpreter.EnterTryCatchFinallyInstruction.Run(System.Management.Automation.Interpreter.InterpretedFrame)+0x16b
0000000c`301de270 00007ffe`193752af system_management_automation_ni!System.Management.Automation.Interpreter.Interpreter.Run(System.Management.Automation.Interpreter.InterpretedFrame)+0x7b
0000000c`301de2f0 00007ffe`19383925 system_management_automation_ni!System.Management.Automation.Interpreter.LightLambda.RunVoid1[[System.__Canon, mscorlib]](System.__Canon)+0x11f
0000000c`301de380 00007ffe`193832f8 system_management_automation_ni!System.Management.Automation.DlrScriptCommandProcessor.RunClause(System.Action`1<System.Management.Automation.Language.FunctionContext>, System.Object, System.Object)+0x405
0000000c`301de510 00007ffe`19382fce system_management_automation_ni!System.Management.Automation.DlrScriptCommandProcessor.Complete()+0x1a8
0000000c`301de5c0 00007ffe`19382d01 system_management_automation_ni!System.Management.Automation.CommandProcessorBase.DoComplete()+0x16e
0000000c`301de680 00007ffe`19379551 system_management_automation_ni!System.Management.Automation.Internal.PipelineProcessor.DoCompleteCore(System.Management.Automation.CommandProcessorBase)+0xb1
0000000c`301de720 00007ffe`193666d0 system_management_automation_ni!System.Management.Automation.Internal.PipelineProcessor.SynchronousExecuteEnumerate(System.Object, System.Collections.Hashtable, Boolean)+0x191
0000000c`301de810 00007ffe`1936581f system_management_automation_ni!System.Management.Automation.Runspaces.LocalPipeline.InvokeHelper()+0x750
0000000c`301dea30 00007ffe`192e7001 system_management_automation_ni!System.Management.Automation.Runspaces.LocalPipeline.InvokeThreadProc()+0x2af
0000000c`301deb80 00007ffe`2f6d2d45 system_management_automation_ni!System.Management.Automation.Runspaces.PipelineThread.WorkerProc()+0x31
0000000c`301debb0 00007ffe`2f6d2ab9 mscorlib_ni!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)+0x285
0000000c`301ded10 00007ffe`2f6d2a97 mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)+0x9
0000000c`301ded40 00007ffe`2f6ea161 mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)+0x57
0000000c`301ded90 00007ffe`3103afb3 mscorlib_ni!System.Threading.ThreadHelper.ThreadStart()+0x51
0000000c`301dede0 00007ffe`3103ae9e clr!CallDescrWorkerInternal+0x83
0000000c`301dee20 00007ffe`3103b632 clr!CallDescrWorkerWithHandler+0x4a
0000000c`301dee60 00007ffe`31146bd9 clr!MethodDescCallSite::CallTargetWorker+0x251
0000000c`301df010 00007ffe`3103c8e1 clr!ThreadNative::KickOffThread_Worker+0x105
0000000c`301df270 00007ffe`3103c868 clr!ManagedThreadBase_DispatchInner+0x2d
0000000c`301df2b0 00007ffe`3103c7d9 clr!ManagedThreadBase_DispatchMiddle+0x6c
0000000c`301df3b0 00007ffe`3103c91b clr!ManagedThreadBase_DispatchOuter+0x75
0000000c`301df440 00007ffe`31146aba clr!ManagedThreadBase_FullTransitionWithAD+0x2f
0000000c`301df4a0 00007ffe`31153656 clr!ThreadNative::KickOffThread+0xd2
0000000c`301df570 00007ffe`38f113d2 clr!Thread::intermediateThreadProc+0x7d
0000000c`301df7b0 00007ffe`396454e4 kernel32!BaseThreadInitThunk+0x22
0000000c`301df7e0 00000000`00000000 ntdll!RtlUserThreadStart+0x34

 

Ok, cool, so now how do we figure out which is the DLL that is currently being loaded? Well, from MSDN, we know that the first parameter passed to the kernelbase!LoadLibraryExW function is the name of the DLL (lpFileName).

Hmm … first parameter … so where is that? Well, we know that on x64 calling convention (fastcall), the first parameter is passed on into the RCX register: https://msdn.microsoft.com/en-us/library/ms235286.aspx

So good! Let’s have a look at that – as it should be a UNICODE string, we can look at it like this:

0:001> du @rcx
0000000c`2f4dd240 "C:\Windows\assembly\NativeImages"
0000000c`2f4dd280 "_v4.0.30319_64\Microsoft.P521220"
0000000c`2f4dd2c0 "ea#\ab32ec62ecc4b90efc07602a8353"
0000000c`2f4dd300 "8c65\Microsoft.PowerShell.Comman"
0000000c`2f4dd340 "ds.Utility.ni.dll"

Pff … well this is lame, not the correct DLL … so we go again further until the next breakpoint with the “g” command and verify again and go again and verify and so on, until we finally reach the DLL we are interested in (in this case NetworkDiscoveryModules.dll):

0:001> g
Breakpoint 0 hit
kernelbase!LoadLibraryExW:
00007ffe`36b48d10 48895c2410 mov qword ptr [rsp+10h],rbx ss:0000000c`301dcbd8=0000000c301dddf8
0:001> du @rcx
0000000c`301dd078 "C:\Program Files\Microsoft Syste"
0000000c`301dd0b8 "m Center 2012 R2\Operations Mana"
0000000c`301dd0f8 "ger\Server\NetworkDiscoveryModul"
0000000c`301dd138 "es.dll"

Perfect! Now we can put breakpoints on the other functions that will get called next to have a look (dependency) DLL by DLL to figure out which is corrupt and then go (continue execution):

0:001> bp ntdll!LdrpFindOrMapDll
0:001> bp ntdll!ZwCreateSection
0:001> g
Breakpoint 2 hit
ntdll!LdrpFindOrMapDll:
00007ffe`396a61c0 4053            push    rbx

So, we break into the next call of ntdll!LdrpFindOrMapDll. Aaaaaand how exactly do we know which dependent DLL is being currently loaded? The function is not documented on MSDN … ok, so a little bit of internals – this function gets the DLL name as a pointer to a string in the format of the ntdll!_UNICODE_STRING structure – which is actually documented on MSDN and which we can see using Public Symbols.

First parameter? Ok, so again, still in RCX register – let’s dump it out using “dx” (Debugger Data Model functionality – which can be extended using JavaScript as well – but a little later on extending it):

0:001> dx ((ntdll!_UNICODE_STRING *)@rcx)
((ntdll!_UNICODE_STRING *)0x00000000c301dc948) : 0xc301dc948 : "C:\Program Files\Microsoft System Center 2012 R2\Operations Manager\Server\NetworkDiscoveryModules.dll" [Type: _UNICODE_STRING *]
[<Raw View>] [Type: _UNICODE_STRING]

Ha, ok, now we’re taking, we are going through the verification/load of our DLL of interest directly in ntdll!LdrpFindOrMapDll. Cool – so let’s continue execution and we should break in ntdll!NtCreateSection next which we can use to verify the DLL consistency check status and see if there is any error (corruption in this case) – let’s also have a look at the stack after we break:

0:001> g
Breakpoint 1 hit
ntdll!NtCreateSection:
00007ffe`396c0b50 4c8bd1 mov r10,rcx

0:001> k
Child-SP RetAddr Call Site
0000000c`301dc538 00007ffe`396a6921 ntdll!NtCreateSection
0000000c`301dc540 00007ffe`3965b2b5 ntdll!LdrpFindOrMapDll+0x761
0000000c`301dc8b0 00007ffe`39698d69 ntdll!LdrpLoadDll+0x295
0000000c`301dcae0 00007ffe`36b48dda ntdll!LdrLoadDll+0x99
0000000c`301dcb60 00007ffe`310ae4b4 kernelbase!LoadLibraryExW+0xca
0000000c`301dcbd0 00007ffe`310ae421 clr!CLRLoadLibraryExWorker+0x54
0000000c`301dcc20 00007ffe`31197dfd clr!CLRLoadLibraryEx+0x51
0000000c`301dcc70 00007ffe`3140bf1e clr!LocalLoadLibraryHelper+0x31
0000000c`301dcca0 00007ffe`310af2a4 clr!NDirect::LoadLibraryModule+0x35c7de
0000000c`301dd4f0 00007ffe`310af50a clr!NDirect::NDirectLink+0x80
0000000c`301dd820 00007ffe`310af469 clr!NDirect::GetStubForILStub+0x4a
0000000c`301dd870 00007ffe`310af54b clr!GetStubForInteropMethod+0x65
0000000c`301dd8b0 00007ffe`310a9696 clr!MethodDesc::DoPrestub+0xca7
0000000c`301dda80 00007ffe`3103226a clr!PreStubWorker+0x3d6
0000000c`301ddd90 00007ffd`d1a40a1c clr!ThePreStub+0x5a
0000000c`301dde60 00007ffe`2cb64c1c DynamicClass.CallSite.Target(System.Runtime.CompilerServices.Closure, System.Runtime.CompilerServices.CallSite, System.Type)+0xfc
0000000c`301ddee0 00007ffe`1938b088 system_core_ni!System.Dynamic.UpdateDelegates.UpdateAndExecute1[[System.__Canon, mscorlib],[System.__Canon, mscorlib]](System.Runtime.CompilerServices.CallSite, System.__Canon)+0x39c
0000000c`301ddff0 00007ffe`1938591b system_management_automation_ni!System.Management.Automation.Interpreter.DynamicInstruction`2[[System.__Canon, mscorlib],[System.__Canon, mscorlib]].Run(System.Management.Automation.Interpreter.InterpretedFrame)+0x58
0000000c`301de050 00007ffe`1938591b system_management_automation_ni!System.Management.Automation.Interpreter.EnterTryCatchFinallyInstruction.Run(System.Management.Automation.Interpreter.InterpretedFrame)+0x16b
0000000c`301de160 00007ffe`1938575b system_management_automation_ni!System.Management.Automation.Interpreter.EnterTryCatchFinallyInstruction.Run(System.Management.Automation.Interpreter.InterpretedFrame)+0x16b
0000000c`301de270 00007ffe`193752af system_management_automation_ni!System.Management.Automation.Interpreter.Interpreter.Run(System.Management.Automation.Interpreter.InterpretedFrame)+0x7b
0000000c`301de2f0 00007ffe`19383925 system_management_automation_ni!System.Management.Automation.Interpreter.LightLambda.RunVoid1[[System.__Canon, mscorlib]](System.__Canon)+0x11f
0000000c`301de380 00007ffe`193832f8 system_management_automation_ni!System.Management.Automation.DlrScriptCommandProcessor.RunClause(System.Action`1<System.Management.Automation.Language.FunctionContext>, System.Object, System.Object)+0x405
0000000c`301de510 00007ffe`19382fce system_management_automation_ni!System.Management.Automation.DlrScriptCommandProcessor.Complete()+0x1a8
0000000c`301de5c0 00007ffe`19382d01 system_management_automation_ni!System.Management.Automation.CommandProcessorBase.DoComplete()+0x16e
0000000c`301de680 00007ffe`19379551 system_management_automation_ni!System.Management.Automation.Internal.PipelineProcessor.DoCompleteCore(System.Management.Automation.CommandProcessorBase)+0xb1
0000000c`301de720 00007ffe`193666d0 system_management_automation_ni!System.Management.Automation.Internal.PipelineProcessor.SynchronousExecuteEnumerate(System.Object, System.Collections.Hashtable, Boolean)+0x191
0000000c`301de810 00007ffe`1936581f system_management_automation_ni!System.Management.Automation.Runspaces.LocalPipeline.InvokeHelper()+0x750
0000000c`301dea30 00007ffe`192e7001 system_management_automation_ni!System.Management.Automation.Runspaces.LocalPipeline.InvokeThreadProc()+0x2af
0000000c`301deb80 00007ffe`2f6d2d45 system_management_automation_ni!System.Management.Automation.Runspaces.PipelineThread.WorkerProc()+0x31
0000000c`301debb0 00007ffe`2f6d2ab9 mscorlib_ni!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)+0x285
0000000c`301ded10 00007ffe`2f6d2a97 mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)+0x9
0000000c`301ded40 00007ffe`2f6ea161 mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)+0x57
0000000c`301ded90 00007ffe`3103afb3 mscorlib_ni!System.Threading.ThreadHelper.ThreadStart()+0x51
0000000c`301dede0 00007ffe`3103ae9e clr!CallDescrWorkerInternal+0x83
0000000c`301dee20 00007ffe`3103b632 clr!CallDescrWorkerWithHandler+0x4a
0000000c`301dee60 00007ffe`31146bd9 clr!MethodDescCallSite::CallTargetWorker+0x251
0000000c`301df010 00007ffe`3103c8e1 clr!ThreadNative::KickOffThread_Worker+0x105
0000000c`301df270 00007ffe`3103c868 clr!ManagedThreadBase_DispatchInner+0x2d
0000000c`301df2b0 00007ffe`3103c7d9 clr!ManagedThreadBase_DispatchMiddle+0x6c
0000000c`301df3b0 00007ffe`3103c91b clr!ManagedThreadBase_DispatchOuter+0x75
0000000c`301df440 00007ffe`31146aba clr!ManagedThreadBase_FullTransitionWithAD+0x2f
0000000c`301df4a0 00007ffe`31153656 clr!ThreadNative::KickOffThread+0xd2
0000000c`301df570 00007ffe`38f113d2 clr!Thread::intermediateThreadProc+0x7d
0000000c`301df7b0 00007ffe`396454e4 kernel32!BaseThreadInitThunk+0x22
0000000c`301df7e0 00000000`00000000 ntdll!RtlUserThreadStart+0x34

Hmm, ok good! But what now? How do we figure out what the ntdll!NtCreateSection function will return? Because it will actually do a return of the status (which can be an error value). We know that by conversion, compilers will use the RAX register as return value just before finishing execution of a function (unless specifically coded otherwise in assembly).

Let’s go at the end of the function execution of ntdll!NtCreateSection using “pt” and have a look at the value of the RAX register:

0:001> pt
ntdll!NtCreateSection+0xa:
00007ffe`396c0b5a c3 ret

0:001> r @rax
rax=0000000000000000

Ok, so RAX is 0 and 0 is good! 0 means ERROR_SUCCESS status. Don’t get confused by the name – it actually meas that the operation completed successfullyhttps://msdn.microsoft.com/en-us/library/windows/desktop/ms681382(v=vs.85).aspx

So, we don’t throw an error and we validate this DLL (currently NetworkDiscoveryModules.dll) successfully. Ok, so next, we should go through each dependent DLL and try to validate/load each on of them. So, let’s step through each one of these and check the status (return value in RAX) of ntdll!NtCreateSection each time.

So after checking each one (ntdll!LdrpFindOrMapDll for the DLL name & ntdll!NtCreateSection for the return value) we finally reach the DLL that is throwing the error (binary file validation error):

0:001> g
Breakpoint 2 hit
ntdll!LdrpFindOrMapDll:
00007ffe`396a61c0 4053 push rbx

0:001> dx ((ntdll!_UNICODE_STRING *)@rcx)
((ntdll!_UNICODE_STRING *)@rcx) : 0xc301dc4b0 : "dmboot.dll" [Type: _UNICODE_STRING *]
[<Raw View>] [Type: _UNICODE_STRING]

0:001> g
Breakpoint 1 hit
ntdll!NtCreateSection:
00007ffe`396c0b50 4c8bd1 mov r10,rcx

0:001> pt
ntdll!NtCreateSection+0xa:
00007ffe`396c0b5a c3 ret

0:001> r @rax
rax=00000000c000012f

Ha, nice! So return status for ntdll!NtCreateSection for the dmboot.dll  DLL is not 0 (ERROR_SUCCESS) anymore – it is: 00000000c000012f

Guess what? We also have a command from a build-in extension (!error) for WinDbg which we can use to “resolve” the actual error name/description by the HRESULT (00000000c000012f) value:

0:001> !error 00000000c000012f
Error code: (NTSTATUS) 0xc000012f (3221225775) - The specified image file did not have the correct format, it did not have an initial MZ.

 

Well … what do you know? It looks like the dmboot.dll DLL is the one that is actually corrupt and it is being loaded because it is a dependency DLL of our NetworkDiscoveryModules.dll DLL.

We got the dmboot.dll DLL from a healthy SCOM Management Server with the same SCOM version and Update Rollup and replaced this corrupt one – aaand … there you go, issue solved! 😀

 

 

How did this file (DLL) get corrupt in the first place?! Well now, that is the real question! Unfortunately, it cannot really be explained after it already happened without any type of “file-change-auditing”, but it has too many possible causes:

  • some filter driver (like anti-virus software or anything else like “my-super-cool-awesome-disk-speed-up.sys” and so on …
  • computer power-failure in the middle of specific type of disk IO (direct access) – very unlikely though …
  • network (possibly filter) driver while copying a file from some other computer (or some network device in between that somehow ends up modifying bits)
  • super cool awesome storage controller that directly manipulates (RAM) memory through the mother board (not through OS Kernel)
  • some totally other random weird issue like a CPU bit-flip while the file was being written to disk ?! …

 

Yes – this gets much easier if Private Symbols are available 😉 But, as this should (I hope) ideally help anyone trying to debug such a scenario, I have documented everything using Public Symbols 😀

But hey … why not automate this process by creating a cool JavaScript extension for WinDbg and just let the computer do the work for me next time this ever happens with … whatever … other software? 😀

Here is the documentation about creating extensions in JavaScript for the Data Model: https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/javascript-debugger-scripting & also check out this Channel9 video from the Defrag Tools series about this, with cool info from Andrew Richards, Andy Luhrs and Bill Messmerhttps://channel9.msdn.com/Shows/Defrag-Tools/Defrag-Tools-170-Debugger-JavaScript-Scripting

This could have easily been just a series of functions in the scripts, but I complicated it a little bit because I wanted to show it as an extension to the process object – I might extend this in the future and create a more complex extension around the LDR component. I highly doubt that it will be public though :p because without public symbols, I have to rely on calling conversions (and get directly directly from registers when possible – and it’s not always possible) as I cannot resolve any locals or parameters (by calling DX’s getModuleSymbol), which would over-complicate things.

"use strict";

class __DllLoadStatus
{
   get Ldr()
   {
      return new __DllLoadFailure(this);
   }
}

class __DllLoadFailure
{
   constructor(process)
   {
      this.__process = process;
   }

   Initialize()
   {
      var ctl = host.namespace.Debugger.Utility.Control;
      ctl.ExecuteCommand("bp /1 ntdll!ZwCreateSection;g");
      ctl.ExecuteCommand("pt");

      var address = host.currentThread.Registers.User.rip;
      ctl.ExecuteCommand("bp ntdll!LdrpFindOrMapDll \"dx @$curprocess.Ldr.GetDllName();gc\"");
      ctl.ExecuteCommand("bp " + address.toString(16) + " \"dx @$curprocess.Ldr.GetLoadStatus();gc\"");

      host.diagnostics.debugLog("Ok, we are all set! Run \"g\" to start automated debugging.\n");
   }

   GetDllName()
   {
      var ctl = host.namespace.Debugger.Utility.Control;
      var address = host.currentThread.Registers.User.rcx;

      var pDllName = host.createPointerObject(address, "ntdll.dll", "_UNICODE_STRING *");
      var dllName = host.memory.readWideString(pDllName.dereference().Buffer).toString();

      host.diagnostics.debugLog(dllName, "\n");
   }

   GetLoadStatus()
   {
      var ctl = host.namespace.Debugger.Utility.Control;
      var status = ctl.ExecuteCommand("!error @rax").First().toString();

      host.diagnostics.debugLog(status, "\n");
   }
}

function initializeScript()
{
   return [
      new host.namedModelParent(__DllLoadStatus, "Debugger.Models.Process")
   ];
}

Alright, so now we have this script saved somewhere (ex. C:\DbgScripts\LdrExtensionSample.js). So let’s load the JsProvider (!load jsprovider) and afterwards using the full path of the script as parameter, load the script using the .scriptload command:

0:004> !load jsprovider
0:004> .scriptload C:\DbgScripts\LdrExtensionTest.js
JavaScript script successfully loaded from 'C:\DbgScripts\LdrExtensionTest.js'

So, let’s have a go at our script and see the output of the automated analysis by running “dx @$curprocess.Ldr.Initialize()” and then letting the process execute with the “g” command:

0:000> dx @$curprocess.Ldr.Initialize()
Ok, we are all set! Run "g" to start automated debugging.
@$curprocess.Ldr.Initialize()
 
0:001> g
C:\Windows\system32\rsaenh.dll
@$curprocess.Ldr.GetDllName()
Error code: (Win32) 0 (0) - The operation completed successfully.
@$curprocess.Ldr.GetLoadStatus()
C:\Program Files\Microsoft System Center 2012 R2\Operations Manager\Server\NetworkDiscoveryModules.dll
@$curprocess.Ldr.GetDllName()
Error code: (Win32) 0 (0) - The operation completed successfully.
@$curprocess.Ldr.GetLoadStatus()
MSVCR100.dll
@$curprocess.Ldr.GetDllName()
Error code: (Win32) 0 (0) - The operation completed successfully.
@$curprocess.Ldr.GetLoadStatus()
KERNEL32.dll
@$curprocess.Ldr.GetDllName()
Error code: (Win32) 0 (0) - The operation completed successfully.
@$curprocess.Ldr.GetLoadStatus()
USER32.dll
@$curprocess.Ldr.GetDllName()
Error code: (Win32) 0 (0) - The operation completed successfully.
@$curprocess.Ldr.GetLoadStatus()
ole32.dll
@$curprocess.Ldr.GetDllName()
Error code: (Win32) 0 (0) - The operation completed successfully.
@$curprocess.Ldr.GetLoadStatus()
OLEAUT32.dll
@$curprocess.Ldr.GetDllName()
Error code: (Win32) 0 (0) - The operation completed successfully.
@$curprocess.Ldr.GetLoadStatus()
ADVAPI32.dll
@$curprocess.Ldr.GetDllName()
Error code: (Win32) 0 (0) - The operation completed successfully.
@$curprocess.Ldr.GetLoadStatus()
sm-server.dll
@$curprocess.Ldr.GetDllName()
Error code: (Win32) 0 (0) - The operation completed successfully.
@$curprocess.Ldr.GetLoadStatus()
sm-discovery.dll
@$curprocess.Ldr.GetDllName()
Error code: (Win32) 0 (0) - The operation completed successfully.
@$curprocess.Ldr.GetLoadStatus()
sm-auto-discovery.dll
@$curprocess.Ldr.GetDllName()
Error code: (Win32) 0 (0) - The operation completed successfully.
@$curprocess.Ldr.GetLoadStatus()
MSVCP100.dll
@$curprocess.Ldr.GetDllName()
Error code: (Win32) 0 (0) - The operation completed successfully.
@$curprocess.Ldr.GetLoadStatus()
ntdll.dll
@$curprocess.Ldr.GetDllName()
Error code: (Win32) 0 (0) - The operation completed successfully.
@$curprocess.Ldr.GetLoadStatus()
RPCRT4.dll
@$curprocess.Ldr.GetDllName()
Error code: (Win32) 0 (0) - The operation completed successfully.
@$curprocess.Ldr.GetLoadStatus()
SHLWAPI.dll
@$curprocess.Ldr.GetDllName()
Error code: (Win32) 0 (0) - The operation completed successfully.
@$curprocess.Ldr.GetLoadStatus()
HealthServiceRuntime.dll
@$curprocess.Ldr.GetDllName()
Error code: (Win32) 0 (0) - The operation completed successfully.
@$curprocess.Ldr.GetLoadStatus()
KERNEL32.dll
@$curprocess.Ldr.GetDllName()
Error code: (Win32) 0 (0) - The operation completed successfully.
@$curprocess.Ldr.GetLoadStatus()
MSVCP100.dll
@$curprocess.Ldr.GetDllName()
Error code: (Win32) 0 (0) - The operation completed successfully.
@$curprocess.Ldr.GetLoadStatus()
MSVCR100.dll
@$curprocess.Ldr.GetDllName()
Error code: (Win32) 0 (0) - The operation completed successfully.
@$curprocess.Ldr.GetLoadStatus()
KERNEL32.dll
@$curprocess.Ldr.GetDllName()
Error code: (Win32) 0 (0) - The operation completed successfully.
@$curprocess.Ldr.GetLoadStatus()
ADVAPI32.dll
@$curprocess.Ldr.GetDllName()
Error code: (Win32) 0 (0) - The operation completed successfully.
@$curprocess.Ldr.GetLoadStatus()
sm-clsapi.dll
@$curprocess.Ldr.GetDllName()
Error code: (Win32) 0 (0) - The operation completed successfully.
@$curprocess.Ldr.GetLoadStatus()
dmboot.dll
@$curprocess.Ldr.GetDllName()
Error code: (NTSTATUS) 0xc000012f (3221225775) - The specified image file did not have the correct format, it did not have an initial MZ.
@$curprocess.Ldr.GetLoadStatus()

 

Pfff … well, that was much easier 😀

 

Have fun developing WinDbg JS Extensions & Scripts! 🙂

 

 

 

Viewing all 76 articles
Browse latest View live


Latest Images