David Rios

Users stopped throwing errors

Blog Post created by David Rios Partner on May 3, 2019

First, some background:

 

We are using the auto-csv upload option for population maintenance, and the log always shows errors of the form "Failed to set manager on user with uid …" because the manager listed in the CSV does not exist in Bridge.  This is expected, because we exclude some users from Bridge based on various attributes in our SIS, and some of those will inevitably be managers.  However, I can't just assume all the errors are of this nature and safe to ignore, so I've been tracking them every day to make sure they are not actual errors, and to identify any patterns that might be helpful or informative.  It is because I was tracking these 'errors' that I noticed something odd…

 

On 11/23/2018, I noticed a difference in the errors being reported - there were no errors on that day!  I knew that couldn't possibly be right, so I went back to the previous day and checked on all the users who had thrown an error at that time.  These were all users with managers who did not exist in bridge, so the attribute could not be updated.  Prior to 11/23, they all had an old manager listed in Bridge, but after that day they had no manager listed (a 'null' value), despite the fact that the csv information for those users had not changed.  So, two things happened: the manager went from an old value to 'null', and they stopped throwing errors despite the fact that the manager listed in the csv did not exist in bridge.

I continued tracking the affected users, and over time most of them were fixed in one of two ways: they were removed from the Bridge population, or the manager listed in the csv changed and that allowed them to be updated properly.  In this way, the number of affected users went from 12 to 3.  No explanation was found for this behavior.

 

The twist:

It happened again! 

 

We made a change to the logic that grabs user data from our SIS, which resulted in a large increase in our population.  When I checked the import logs the day after the change, it seemed normal at first glance, with about 50% more manager-related errors being reported.  However, the last line of the import log said 'Failed to complete import', and at least one user had a mismatch in the manager information between Bridge and the csv.  Additionally, 10 users had a 'null' value for the manager in Bridge, and a manager in the csv that was not in Bridge.  These users did throw the expected errors in the log, so at first I did not see any connection between this and the other situation from 11/23.

 

We contacted support regarding the failure message in the log and the manager mismatch, and we were informed that the import had timed out so user data was not updated properly.  The next day, the import ran to completion and the mismatched data was corrected.  However, at that point the 10 users mentioned above no longer threw any errors in the log, even though the csv data had not changed.  They still had a 'null' value for manager, and the managers in the csv did not exist in Bridge.  This appears to be the same situation, which makes me think the original issue was also caused by a timeout which went undetected.

 

The implication seems to be that a timeout during the import can lock some users into a weird state, where they should be throwing errors but don't.  This seems unlikely to result in any serious problems, but it is a behavior that is interesting and worth noting.

Outcomes