Wednesday, June 19, 2019

Commvault DASH copy nightmare, and how to improve it.

So I like to write about my experience with Commvault DASH (aka Aux) copy.

DASH copy is used when you want to ship local backups to a remote site for off site backup / DR backup.

I have two Media Agents, MediaAgent is actually a disk storage, one in our HQ and the other one is at our DR site, two MAs are connected with 500mb pipe.

This is about my experience with 4TB SQL database.

Commvault uses a database call dedup database, aka DDB, to keep track of changes, it makes a full backup once and backup only changes (deltas), on DDB they store/record so called "signature", using the signature it will only backup what is changed. they keep DDB on both Local and DR MA, It reduces backup time and storage space. Great!

For DASH copy, there are two options, disk optimized and network optimized, disk optimized is the default option. There is a sub option called "source side cache".

According to their tech document, disk optimized dash copy ships only the deltas/changes, it will use signatures to determine what it already have. Network optimized will re-open the local backup create signature on the fly and ship deltas to DR MA (MediaAgent).

My experience is, with disk optimized dash copy, it ships the entire full backup to the DR MA, every time it runs the DASH copy, and DR MA calculate signature and save only the difference. When I ran it, it took more than 24 hours, and daily backups accumulated over and over again and never finished. I called their tech support, they said stay put because Dr MA is generating signatures and it take time. once it generate the signature it will be faster. It finally shipped the first copy, I was expected to see improvements, NOT! No there is no DDB involved. Yep it shipped 4TB over 500mb, over and over again. it would never finish, I had more than 15 backups stacked up! 4TB x 15.
I called them and they insisted that disk optimized dash copy ships only deltas. I showed them what I found on network utilized report, they said change it to network optimized.

With Network optimized, it actually read the whole backup, with expense, the local MA's disk queue length went up beyond 50. my local MA has 18 disks, but it sent deltas only wow!. But it took a while to read the 4TB data, it took 11 to 13 hours. it also had to reseed, which end up eating up licenses.

I called again and they said enable the Local cache with disk optimized, it will create a local signature database on the "client computer", and it will make the process faster. hmm...what about the DDB?
It turned out, the local cache database is created not on the SQL database server, it got created on the local MA. It reseeded again, and eat up the license of course. But after it created local signature database, it is now taking 4 to 5 hours to finish the dash copy.

Conclusion,

1. "disk optimized + local cache" is the best option for the DASH copy, disk optimized only option will ship full backup to the DR MA and recalculate signature, there is no DDB involved.
2. On DASH copy client computer = Local MA, if you enable the Local cache.
3. Network optimized option does not optimize network traffic, I don't even know why the option even exist.
4. If you create DASH copy, "disk optimized - local cache" is the default option. Enable the local cache and optimize the DASH copy performance and network utilization. Yes, consider when you do that it might reseed!


SCCM updatesdeplyment log's assignment ID is deployment ID, hunting for job error 0x8024000f

One day I found

Job error (0x8024000f) received for assignment ({bdd02889-257b-431c-98b3-965a16ee51d7}) action

I was wondering what the heck the assignment ID is, after googling I found assignment ID is actually deployment ID. BAH!

To find out what that is, there are two ways to do it.

1. From the SCCM console -> monitoring -> Deployments -> and search for the ID.
2. Open SCCM powershell, Get-CMDeployment -DeploymentID 



Besides, a fantastic blog about the 0x8024000f can be found here:
https://blogs.technet.microsoft.com/ken_brumfield/2014/08/24/whoa-wuau-what-the-heck-is-with-the-circular-references-0x8024000f/

Also reference for the error code: https://social.technet.microsoft.com/wiki/contents/articles/15260.windows-update-agent-error-codes.aspx



Wednesday, March 13, 2019

Jan. 2019 Exchange Security Update KB4471389 issue

It seems some admins broke their exchange server after installing KB4471389
Symptoms are all exchange related services dies after the update and admins had to reinstall exchange server in recovery mode.

M$ says that can happen when a admin runs the update in normal mode.

Yes next exchange CU will have that patch as well. So run all exchange updates in admin mode all the time!!

Tuesday, March 05, 2019

WSUS server cleanup

WSUS clean up sometime takes really long time and fail.

I like to explain how you can automate the process so there will be not much to clean up so clean up process will run faster.

You can run the gui version of the WSUS server cleanup or run a powershell script.

On wsus server, open PowerShell in admin mode and run

Get-WsusServer | Invoke-WsusServerCleanup -CleanupObsoleteComputers -CleanupObsoleteUpdates -CleanupUnneededContentFiles -CompressUpdates -DeclineExpiredUpdates -DeclineSupersededUpdates 


Yes you can save it and run it from the task manager! run it daily or weekly.

While the clean up job runs, "Wsus Service" service will be stopped, and automatically started when the clean up job ends. Do not start it manually, it will extend the clean up time.

If the clean up job runs too long and fail, your best option is to rebuild the wsus.

After you rebuild the wsus and if you are using SCUP and got errors like "xxxxx not found on WSUS SMS_WSUS_CONFIGURATION_MANAGER " on WSyncMgr.log while WSUS try to sync, see this blog

Wednesday, February 27, 2019

SCCM ADR failed with error 404

I was getting ADR error, it has content ID but it is not unique content ID. I was trying to find out what kind of update it is.

Downloading content with ID 17423088 in the package SMS_RULE_ENGINE 2/27/2019 4:03:50 PM 14756 (0x39A4)
Failed to download the update from internet. Error = 404 SMS_RULE_ENGINE 2/27/2019 4:04:23 PM 14756 (0x39A4)
Failed to download ContentID 17423088 for UpdateID 17459792. Error code = 404 SMS_RULE_ENGINE 2/27/2019 4:04:23 PM 14756 (0x39A4)
Downloading contents (count = 1) for UpdateID 17459793 SMS_RULE_ENGINE 2/27/2019 4:04:23 PM 14756 (0x39A4)
List of update content(s) which match the content rule criteria = {17423089} SMS_RULE_ENGINE 2/27/2019 4:04:23 PM 14756 (0x39A4)
Downloading content with ID 17423089 in the package SMS_RULE_ENGINE 2/27/2019 4:04:23 PM 14756 (0x39A4)
Failed to download the update from internet. Error = 404 SMS_RULE_ENGINE 2/27/2019 4:04:54 PM 14756 (0x39A4)
Failed to download ContentID 17423089 for UpdateID 17459793. Error code = 404 SMS_RULE_ENGINE 2/27/2019 4:04:54 PM 14756 (0x39A4)
Downloading contents (count = 1) for UpdateID 17459794 SMS_RULE_ENGINE 2/27/2019 4:04:54 PM 14756 (0x39A4)
List of update content(s) which match the content rule criteria = {17423090} SMS_RULE_ENGINE 2/27/2019 4:04:54 PM 14756 (0x39A4)
Downloading content with ID 17423090 in the package SMS_RULE_ENGINE 2/27/2019 4:04:54 PM 14756 (0x39A4)
Failed to download the update from internet. Error = 404 SMS_RULE_ENGINE 2/27/2019 4:05:26 PM 14756 (0x39A4)
Failed to download ContentID 17423090 for UpdateID 17459794. Error code = 404 SMS_RULE_ENGINE 2/27/2019 4:05:26 PM 14756 (0x39A4)
Ran this from SCCM DB, it showed content Unique ID

select * from vsms_content where Content_ID = '17423089'

Copy and pasted it to SCCM and look for what it is on "All Software updates"

I was able to identify what it is, it was an old adobe flash update from SCUP, so I expired the update from SCUP and published it, and the problem got resolved.

Wednesday, January 16, 2019

Problem applying Sharepoint 2013 cumulative update

So I was applying Dec 2018 Sharepoint 2013 cumulative update, once binaries got installed, I ran the SharePoint configuration wizard and got this error.


Curiously, there is no "Additional exception information", so I opened the log file and found not so helpful information:
----------------------------------------------------------------
5  INF  Resource retrieved id PostSetupConfigurationFailedEventLog is Configuration of SharePoint Products failed.  Configuration must be performed in order for this product to operate properly.  To diagnose the problem, review the extended error information located at {0}, fix the problem, and run this configuration wizard again.


 5  INF  Received a TaskDriverEventHandler: TaskDriverEventArgs.EventCriticalityType error, TaskDriverEventArgs.EventType stop, message Configuration of SharePoint Products failed.  Configuration must be performed before you use SharePoint Products.  For further details, see the diagnostic log located at C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\15\LOGS\PSCDiagnostics_1_14_2019_10_13_6_911_869397427.log and the application event log.
------------------------------------------------------------------

Update!!!

The problem was related with broken timer service instance
After running a script from the web site below fixed the issue and I don't see the error any more.

https://social.technet.microsoft.com/Forums/windows/en-US/1c8bda47-5be9-4412-a531-1706bf0df6e6/wsp-not-getting-deployed-in-all-servers-in-our-sharepoint-2013-farm?forum=sharepointadmin

I am keeping the below article just for historical purposes, you can ignore it.

------------------------------------------------------------------

According to this blog (https://www.mssqltips.com/sqlservertip/5516/how-to-apply-microsoft-sharepoint-2013-cumulative-updates-and-handling-issues/PostSetupConfigurationFailedEventLog means database upgrade issue.

So all binary update process are done and when the wizard tried to upgrade content database it failed.
The blog said to dismount content DB, run config wizard and mount the content DB and run DB upgrade, but it SP config wizard failed even when the content DB was dismounted.

So my theory was, all SP servers got binaries updated, it is the DB that failed, why not upgrade the DB after the SP wizard failed?

So login story short, to fix the problem, I did:

- Run get-spproduct -local on all SharePoint servers. Just in case.

- Reboot the CA server, to make all services start again. Especially all Sharepoint services and Net.pipe xxxx and Net.Tcp xxxx services on all SP servers.

- Run "Upgrade-SPContentDatabase -Name -WebApplication /"  
For example: Upgrade-SPContentDatabase -name WSS_Content -WebApplication https://sharepoint.domain.com

If you check the status of the DB from CA, CA will show that the DB is being upgraded:
Once it is done, I got this message, if the upgrader seems stuck after you enter Y and Enter, hit enter again.



- Run "psconfig.exe -cmd helpcollections -installall -cmd secureresources -cmd services -install -cmd installfeatures -cmd applicationcontent -install -cmd upgrade -inplace b2b -force -wait" as the SP config wizard suggested.

- Run SP config wizard gui on all servers to confirm there is no error.

- Reboot all SharePoint servers

Hopefully it would help someone who face the same error like me.

Ref:
Why SharePoint 2013 Cumulative Update takes 5 hours to install?
https://blogs.msdn.microsoft.com/russmax/2013/04/01/why-sharepoint-2013-cumulative-update-takes-5-hours-to-install/#commentmessage

How to install update packages on a SharePoint farm where search component and high availability search topologies are enabled
https://blogs.technet.microsoft.com/tothesharepoint/2013/03/13/how-to-install-update-packages-on-a-sharepoint-farm-where-search-component-and-high-availability-search-topologies-are-enabled/

How to apply Microsoft SharePoint 2013 Cumulative Updates and Handling Issues
https://www.mssqltips.com/sqlservertip/5516/how-to-apply-microsoft-sharepoint-2013-cumulative-updates-and-handling-issues/