We have a custom-written CTI solution using Genesys T-Server, GVP, and an Avaya PBX and AES. Always in the past when an agent logged into the CTI application, the CTI application then told the PBX to log the agent into the phone in AUX-Work. It still tells it to do that, but the phone doesn't light up, so the agent is in the CTI app but not logged into phone.
On 9/5 this apparently broke (the day before our U.S. Labor Day holiday). I came to work on Tuesday and agents were complaining about CTI not logging them into the phone automatically. The first couple of days I heard from maybe a dozen agents that they had this problem. (We have about 200 agents.) Then I started hearing from more, including at least one agent for whom CTI had logged her into her phone on 9/7. The problem was spreading. Now, 2 and a half weeks later, I know of 32 who have the problem. This is not critical, because agents simply log into their phones manually, and then they get their screen pops anyway, but it damages their faith in the software. Besides, I consider this broken, and I want to get it fixed. Logging in manually to the phone is a workaround. Workarounds bug me.
At least the problem is consistent: once an agent gets the problem, the problem happens every time, and never goes away.
All the other I.T. people I've contacted swear up and down they made no changes that weekend: no PBX changes, no firewall changes, no patches to the agent PCs. The only thing I've been able to find is log messages that say the phone status is OUT_OF_SERVICE. I don't recall seeing that in the past, but I haven't looked in those logs for a LONG time, because CTI has been stable. But the PBX guys swear up and down they didn't change anything. Also, we're only monitoring those extensions from three servers, and we're allowed to monitor them from 4.
Now I'm down to considering stuff that's unlikely -- license expirations (checked FlexLM and it looked okay), database corruption in the CME (checked, and it looked okay), whatever. At this point I've pretty much ruled out every root cause I can think of. There is one thing did happen: the T-Server that monitors agent phones switched over to the backup machine, so the second T-Server is now primary. (They're in a hot standby arrangement.) But that happened weeks before the agents reported this problem.
I'm baffled.