We had recently built a SharePoint 2013 farm that used Okta as its identity provider and started receiving reports that users were being prompted to re-authenticate. Furthermore this would occur during both browsing and editing of pages, but was more problematic when content contributors were editing a page and then prompted to re-authenticate. This caused the content contributors to lose all their changes which in turn made them upset.
Initially Okta got the blame and why not? It was a new authentication provider for us and the Okta logon screen is what people would see prior to being challenged to authenticate again followed by losing their changes. We had previously used ADFS and since Okta was similar we focused on the token session lifetime as laid out by Rob Garrett in his SharePoint Authentication and Session Management post. Well it turned out that the Okta token lifetime was 8 hours so we had about 7 hours and 50 minutes before SharePoint would time it out so that wasn’t the issue.
After combing through the logs we noticed several errors that were similar to the following:
Failed to get token from distributed cache for ’05.t|okta|<enduser>@<mycompany.com>’
So now we were focused on the distributed cache and came across a great article from Jason Warren titled “SharePoint 2013 distributed cache bug” and proceeded to implement those suggestions. After doing so the number of token cache failures dropped dramatically but we were still seeing 100’s of failures per day across the farm. At that point we submitted a support ticket and worked with Microsoft support who advised us to implement the following settings for all the cache containers using the procedures below:
- Stop “Distributed Cache” service from “Services on Server” page in CA on all servers that are running the service.
- Execute the following PowerShell:
Add-PSSnapin Microsoft.Sharepoint.Powershell #DistributedLogonTokenCache $DLTC = Get-SPDistributedCacheClientSetting -ContainerType DistributedLogonTokenCache $DLTC.MaxConnectionsToServer = 1 $DLTC.requestTimeout = "3000" $DLTC.channelOpenTimeOut = "3000" Set-SPDistributedCacheClientSetting -ContainerType DistributedLogonTokenCache -DistributedCacheClientSettings $DLTC #DistributedViewStateCache $DVSC = Get-SPDistributedCacheClientSetting -ContainerType DistributedViewStateCache $DVSC.MaxConnectionsToServer = 1 $DVSC.requestTimeout = "3000" $DVSC.channelOpenTimeOut = "3000" Set-SPDistributedCacheClientSetting -ContainerType DistributedViewStateCache $DVSC #DistributedAccessCache $DAC = Get-SPDistributedCacheClientSetting -ContainerType DistributedAccessCache $DAC.MaxConnectionsToServer = 1 $DAC.requestTimeout = "3000" $DAC.channelOpenTimeOut = "3000" Set-SPDistributedCacheClientSetting -ContainerType DistributedAccessCache $DAC #DistributedActivityFeedCache $DAF = Get-SPDistributedCacheClientSetting -ContainerType DistributedActivityFeedCache $DAF.MaxConnectionsToServer = 1 $DAF.requestTimeout = "3000" $DAF.channelOpenTimeOut = "3000" Set-SPDistributedCacheClientSetting -ContainerType DistributedActivityFeedCache $DAF #DistributedActivityFeedLMTCache $DAFC = Get-SPDistributedCacheClientSetting -ContainerType DistributedActivityFeedLMTCache $DAFC.MaxConnectionsToServer = 1 $DAFC.requestTimeout = "3000" $DAFC.channelOpenTimeOut = "3000" Set-SPDistributedCacheClientSetting -ContainerType DistributedActivityFeedLMTCache $DAFC #DistributedBouncerCache $DBC = Get-SPDistributedCacheClientSetting -ContainerType DistributedBouncerCache $DBC.MaxConnectionsToServer = 1 $DBC.requestTimeout = "3000" $DBC.channelOpenTimeOut = "3000" Set-SPDistributedCacheClientSetting -ContainerType DistributedBouncerCache $DBC #DistributedDefaultCache $DDC = Get-SPDistributedCacheClientSetting -ContainerType DistributedDefaultCache $DDC.MaxConnectionsToServer = 1 $DDC.requestTimeout = "3000" $DDC.channelOpenTimeOut = "3000" Set-SPDistributedCacheClientSetting -ContainerType DistributedDefaultCache $DDC #DistributedSearchCache $DSC = Get-SPDistributedCacheClientSetting -ContainerType DistributedSearchCache $DSC.MaxConnectionsToServer = 1 $DSC.requestTimeout = "3000" $DSC.channelOpenTimeOut = "3000" Set-SPDistributedCacheClientSetting -ContainerType DistributedSearchCache $DSC #DistributedSecurityTrimmingCache $DTC = Get-SPDistributedCacheClientSetting -ContainerType DistributedSecurityTrimmingCache $DTC.MaxConnectionsToServer = 1 $DTC.requestTimeout = "3000" $DTC.channelOpenTimeOut = "3000" Set-SPDistributedCacheClientSetting -ContainerType DistributedSecurityTrimmingCache $DTC #DistributedServerToAppServerAccessTokenCache $DSTAC = Get-SPDistributedCacheClientSetting -ContainerType DistributedServerToAppServerAccessTokenCache $DSTAC.MaxConnectionsToServer = 1 $DSTAC.requestTimeout = "3000" $DSTAC.channelOpenTimeOut = "3000" Set-SPDistributedCacheClientSetting -ContainerType DistributedServerToAppServerAccessTokenCache $DSTAC
- Start “Distributed Cache” service from “Services on Server” page in CA on all servers that are running the service.
- With Cumulative Update package 3 for AppFabric, a new feature was added. This takes advantage of the non-blocking background garbage collection feature in .NET 4.5. For this, we will need to modify the DistributedCacheService.exe.config (default location – %ProgramFiles%\AppFabric 1.1) as below to enable the background garbage collection: (please check to ensure this is enabled)
<add key=”backgroundGC” value=”true”/>
After applying these settings the token cache failures disappeared. I thought that by increasing the timeout settings in the DistributedLogonTokenCache container we would be able to solve the issue, but we must have been impacted by some of the other container settings as well. For now the problem seems to have been solved.