| << Back to previous view |
[QFJ-645] Deadlock between message sending and reset due to message reception on Session and senderMsgSeqNumLock Created: 27/Oct/11 Updated: 10/Oct/12 Resolved: 31/Jul/12 |
|
| Status: | Resolved |
| Project: | QuickFIX/J |
| Component/s: | Engine |
| Affects Version/s: | 1.5.0, 1.5.1 |
| Fix Version/s: | 1.5.3 |
| Type: | Bug | Priority: | Critical |
| Reporter: | marc sperisen | Assignee: | Christoph John |
| Resolution: | Fixed | Votes: | 2 |
| Labels: | deadlock | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
Deadlock arises between the following threads:
Thread1: "GatewayStateTimer" daemon prio=3 tid=0x04d65388 nid=0x2bef waiting for monitor entry [0x790ff000..0x790ff9f0] at quickfix.SessionState.isLogonSent(SessionState.java:162) - waiting to lock <0x9d12c020> (a quickfix.Session) at quickfix.Session.sentLogon(Session.java:690) at quickfix.Session.isLoggedOn(Session.java:731) at quickfix.Session.sendRaw(Session.java:2151) at quickfix.Session.send(Session.java:2213) at quickfix.Session.sendToTarget(Session.java:590) at com.greatBank.application.fix.Engine.sendMessage(Engine.java:54) at com.greatBank.application.fix.Engine.pingFIX(Engine.java:191) at com.greatBank.application.fix.Engine.access$0(Engine.java:186) at com.greatBank.application.fix.Engine$PingTimerTask.run(Engine.java:176) at java.util.TimerThread.mainLoop(Timer.java:512) at java.util.TimerThread.run(Timer.java:462) -> Thread1 is sending a message: it first acquires the lock senderMsgSeqNumLock and then tries to acquire a lock on the session instance Thread2: "QF/J Session dispatcher: FIX.4.2:SENDER1->TARGET1" daemon prio=3 tid=0x014e5ac8 nid=0x2bfc waiting on condition [0x794ff000..0x794ff970] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:118) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:716) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:746) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1076) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:184) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:256) at quickfix.SessionState.lockSenderMsgSeqNum(SessionState.java:322) at quickfix.Session.sendRaw(Session.java:2117) at quickfix.Session.generateLogout(Session.java:1293) at quickfix.Session.generateLogout(Session.java:1284) at quickfix.Session.reset(Session.java:756) - locked <0x9d12c020> (a quickfix.Session) at quickfix.Session.next(Session.java:836) at quickfix.mina.ThreadPerSessionEventHandlingStrategy$MessageDispatchingThread.run(ThreadPerSessionEventHandlingStrategy.java:119) -> Thread2: while processing an incoming message, the engine notices that it is in a new session time interval (StartTime/EndTime) and therefore a MsqSeqNum reset is started. The reset method acquires a lock on the session and, when generating a Logout, tries to acquire the lock senderMsgSeqNumLock, and then deadlocks with Thread1 |
| Comments |
| Comment by Peter Clarke [ 12/Jan/12 11:33 PM ] |
|
Hi My solution to this was to remove the synchronized keyword from the Session.reset function. I've also attached a unit test we have to check for the issue. Pete |
| Comment by John [ 17/Jan/12 12:25 PM ] |
|
had this one too, but I haven't changed the library so I synchronized on the "Session" (=> so locks are taken in the same order). |
| Comment by John [ 02/Feb/12 1:08 PM ] |
|
Erratum on the preceding comment. Peter's was solution was the best one from the beggining and my workarround is not that good ;o) I just put the code I used for the "reset" method : quickfix.Session public void reset() throws IOException { if (!resetting.compareAndSet(false, true)) { return; } try { if (hasResponder()) { if (application != null && application instanceof ApplicationExtended) { ((ApplicationExtended) application).onBeforeSessionReset(sessionID); } generateLogout(); disconnect("Session reset", false); } resetState(); } finally { resetting.set(false); } } |
| Comment by Christoph John [ 31/Jul/12 9:38 AM ] |
|
Peter and John, |