Genesys CTI User Forum
Genesys CTI User Forum => Genesys CTI Technical Discussion => Topic started by: smile on December 13, 2012, 01:29:02 PM
-
Hello all
how is your RM feeling yourself? I'm going to write 3rd SR today during this week... It crashed on different sites, at different circumstances. Several times it crashed with core dumps and yesterday tier3 said that they found the root issue. Hope they really did...
I'm just wondering that this very important component has too much serious bugs! Does anyone has the same troubles with it? What version are you using? My is 8.1.502.82 on Linux platform.
-
We are using RM on Windows OS version 8.1.502.86 and all works fine. No problem we have registered. RM provides all services as voicexml,cpd,media,annc,recordingclient. We preffered and suggested using RM with TCP transport protocol. So, I do not know about any bug from production env. GVP daily serve more than 30k+ calls.
We had a few issues registered during the deployment and first days of using,but RM never has crashed. The issues were in bad search of configured LRG.
What problems/issue do you registered?
-
[quote author=smile link=topic=7552.msg32451#msg32451 date=1355405342]
Hello all
how is your RM feeling yourself? I'm going to write 3rd SR today during this week... It crashed on different sites, at different circumstances. Several times it crashed with core dumps and yesterday tier3 said that they found the root issue. Hope they really did...
I'm just wondering that this very important component has too much serious bugs! Does anyone has the same troubles with it? What version are you using? My is 8.1.502.82 on Linux platform.
[/quote]
Hi there
my suggestion- check the cache settings. it's very sensetive to cachesize/prompt file size correlation
WBR Timur
-
[quote author=Timur Karimov link=topic=7552.msg32462#msg32462 date=1355422137]
[quote author=smile link=topic=7552.msg32451#msg32451 date=1355405342]
Hello all
how is your RM feeling yourself? I'm going to write 3rd SR today during this week... It crashed on different sites, at different circumstances. Several times it crashed with core dumps and yesterday tier3 said that they found the root issue. Hope they really did...
I'm just wondering that this very important component has too much serious bugs! Does anyone has the same troubles with it? What version are you using? My is 8.1.502.82 on Linux platform.
[/quote]
Hi there
my suggestion- check the cache settings. it's very sensetive to cachesize/prompt file size correlation
WBR Timur
[/quote]
do you mean rm or mcp? rm doesn't process prompts and doesn't work with cache...
-
Hi Kubig,
your version look better even though this:
Resource Manager now no longer terminates unexpectedly while closing a SIP Transaction. Previously, this situation sometimes caused an assert in the stack. (ER# 300007729)
and this fix for previous 8.1.502.85:
Now Resource Manager does not terminate due the following race condition: one thread detects a socket read failure and closes the socket, while another thread sends a message on the same socket at the same time. (ER# 305054141)
and this...:
Resource Manager (RM) now correctly handles an invalid action that it receives in messages from another RM node in High Availability. Previously, RM could terminate in this situation. (ER# 304433067)
??? do i need to upgrade?
The "funny" thing is that we had to upgrade from previous version (don't remember what was) because of this bug:
Resource Manager now no longer terminates unexpectedly while closing a SIP Transaction. Previously, this situation sometimes caused an assert in the stack. (ER# 300007729)
the all humour is that sip server version previously installed with that rm had another one bug - when msml service became unavailable and someone put call on hold, then sip server terminate unexpectedly. I think you know why i know too much about this details...
Well, this is thing of the past. Today i have following complaints:
1. Situation where the active RM sends new call information to the backup and gets a response before it can update its own transaction record. Engineering believes that this is rare situation. but during last ~ 6 months i got it 3 or 4 times. I think you will see new release note about it. The bad thing that rm crashed into core in this case.
2. sometimes (i don't know why,but at last time it occurred immediately after 1st item) nodes in rm cluster run out of sync. The interesting thing is both 1st and 2nd node see each other and send/receive HB messages (i suppose). But there is no inter-node updates with current status. Nodes may work such way during several days (my observation - 4 days). after that - see num. 3.
3. sometimes (at last time it occurred 4 days after 2nd item ;) ) primary node received an invite and... that's all. no other messages sent or received. BUT the backup node "think" that primary is in service! moreover,because of item 2 (or something else) if you kill primary node, secondary not doesn't become primary =( epic fail. during last 2 days i saw it twice in different sites.
i know it looks strange and it works good in lab, but in production it's crazy %)
looking on release notes with such fix when "in very rare cases" and "rm can became unstable"... i believe it is not good-prepared product.
sorry for emotions. if i could throw away RM i would do it. in 7.6 it was non-mandatory component.. good times.
-
We are using older version that 8.1.5x of RM on TDA platform(8.1.4x) and same issues as you we do not register,but currently we have not deployed RM in HA pair. This need is in process,so wich me luck :-) I believe,that the HA "mode" for RM can bring other problems and issues.
As I wrote,we have registered some issues when RM returns 4xx SIP message and says "Cannot allocate LRG for service [i]servicetype[/i]". However, I think that after the deploy RM to HA I will be not so optimistic like yet.
-
[quote author=Kubig link=topic=7552.msg32466#msg32466 date=1355442572]
We are using older version that 8.1.5x of RM on TDA platform(8.1.4x) and same issues as you we do not register,but currently we have not deployed RM in HA pair. This need is in process,so wich me luck :-) I believe,that the HA "mode" for RM can bring other problems and issues.
As I wrote,we have registered some issues when RM returns 4xx SIP message and says "Cannot allocate LRG for service [i]servicetype[/i]". However, I think that after the deploy RM to HA I will be not so optimistic like yet.
[/quote]
oh, i'm not alone :D we have already another one SR about "Cannot allocate LRG for service servicetype". Genesys said that probably we don't have configured recording LRG %) LOL, investigation is in progress...
so have you solved it?
-
Our issue with bad allocating of configured LRG was caused by bad configuration. During we changes transport protocal from UDP to TCP,we forgot to change all required section. RM has problem with monitoring of LRG and in logs I can see that all LRGs status was "offline". Since clean configuration we do not registered same issue again. But I think that the issue was not last,unfortunately...
-
[quote author=smile link=topic=7552.msg32463#msg32463 date=1355433287]
[quote author=Timur Karimov link=topic=7552.msg32462#msg32462 date=1355422137]
[quote author=smile link=topic=7552.msg32451#msg32451 date=1355405342]
Hello all
how is your RM feeling yourself? I'm going to write 3rd SR today during this week... It crashed on different sites, at different circumstances. Several times it crashed with core dumps and yesterday tier3 said that they found the root issue. Hope they really did...
I'm just wondering that this very important component has too much serious bugs! Does anyone has the same troubles with it? What version are you using? My is 8.1.502.82 on Linux platform.
[/quote]
Hi there
my suggestion- check the cache settings. it's very sensetive to cachesize/prompt file size correlation
WBR Timur
[/quote]
do you mean rm or mcp? rm doesn't process prompts and doesn't work with cache...
[/quote]
yeap. You right. Read one thread...answer in another =( Need more sleep and less work. But still have the problem with mcp in my own project =(
Sorry.
WBR Timur
-
[quote author=smile link=topic=7552.msg32464#msg32464 date=1355435118]
3. sometimes (at last time it occurred 4 days after 2nd item ;) ) primary node received an invite and... that's all. no other messages sent or received. BUT the backup node "think" that primary is in service! moreover,because of item 2 (or something else) if you kill primary node, secondary not doesn't become primary =( epic fail. during last 2 days i saw it twice in different sites.
[/quote]
We have similar issue on windows 2008 x64 with 8.1.502.80 ver of RM. Have the respond from Genesys T3 - "look like it's a SIP server problem, so we open new SR against SIP and continue investigate". On my next question - why service back after complete restart GVP...still whait the answer =(
WBR Timur
-
the one good news today - new RM 8.1.602.01 has fix of one of our issue:
The Resource Manager no longer terminates when it synchronizes data with its HA counterpart and the response comes back before the sending RM can update its transaction record. (ER# 314483785)
working on other issues still in progress...