Uploaded image for project: 'QuickFIX/J'
  1. QuickFIX/J
  2. QFJ-645

Deadlock between message sending and reset due to message reception on Session and senderMsgSeqNumLock

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.5.0, 1.5.1
    • Fix Version/s: 1.5.3
    • Component/s: Engine
    • Labels:

      Description

      Deadlock arises between the following threads:

      Thread1:
      "GatewayStateTimer" daemon prio=3 tid=0x04d65388 nid=0x2bef waiting for monitor entry [0x790ff000..0x790ff9f0]
      at quickfix.SessionState.isLogonSent(SessionState.java:162)

      • waiting to lock <0x9d12c020> (a quickfix.Session)
        at quickfix.Session.sentLogon(Session.java:690)
        at quickfix.Session.isLoggedOn(Session.java:731)
        at quickfix.Session.sendRaw(Session.java:2151)
        at quickfix.Session.send(Session.java:2213)
        at quickfix.Session.sendToTarget(Session.java:590)
        at com.greatBank.application.fix.Engine.sendMessage(Engine.java:54)
        at com.greatBank.application.fix.Engine.pingFIX(Engine.java:191)
        at com.greatBank.application.fix.Engine.access$0(Engine.java:186)
        at com.greatBank.application.fix.Engine$PingTimerTask.run(Engine.java:176)
        at java.util.TimerThread.mainLoop(Timer.java:512)
        at java.util.TimerThread.run(Timer.java:462)

      -> Thread1 is sending a message: it first acquires the lock senderMsgSeqNumLock and then tries to acquire a lock on the session instance

      Thread2:
      "QF/J Session dispatcher: FIX.4.2:SENDER1->TARGET1" daemon prio=3 tid=0x014e5ac8 nid=0x2bfc waiting on condition [0x794ff000..0x794ff970]
      at sun.misc.Unsafe.park(Native Method)
      at java.util.concurrent.locks.LockSupport.park(LockSupport.java:118)
      at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:716)
      at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:746)
      at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1076)
      at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:184)
      at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:256)
      at quickfix.SessionState.lockSenderMsgSeqNum(SessionState.java:322)
      at quickfix.Session.sendRaw(Session.java:2117)
      at quickfix.Session.generateLogout(Session.java:1293)
      at quickfix.Session.generateLogout(Session.java:1284)
      at quickfix.Session.reset(Session.java:756)

      • locked <0x9d12c020> (a quickfix.Session)
        at quickfix.Session.next(Session.java:836)
        at quickfix.mina.ThreadPerSessionEventHandlingStrategy$MessageDispatchingThread.run(ThreadPerSessionEventHandlingStrategy.java:119)

      -> Thread2: while processing an incoming message, the engine notices that it is in a new session time interval (StartTime/EndTime) and therefore a MsqSeqNum reset is started.
      The reset method acquires a lock on the session and, when generating a Logout, tries to acquire the lock senderMsgSeqNumLock, and then deadlocks with Thread1

        Attachments

          Issue Links

            Activity

            msperisen marc sperisen created issue -
            Hide
            peteclarkez Peter Clarke added a comment -

            Hi
            I've also seen this issue with our application.

            My solution to this was to remove the synchronized keyword from the Session.reset function.

            I've also attached a unit test we have to check for the issue.

            Pete

            Show
            peteclarkez Peter Clarke added a comment - Hi I've also seen this issue with our application. My solution to this was to remove the synchronized keyword from the Session.reset function. I've also attached a unit test we have to check for the issue. Pete
            peteclarkez Peter Clarke made changes -
            Field Original Value New Value
            Attachment SessionResetTest.java [ 10372 ]
            Hide
            pantolomin John added a comment -

            had this one too, but I haven't changed the library so I synchronized on the "Session" (=> so locks are taken in the same order).

            Show
            pantolomin John added a comment - had this one too, but I haven't changed the library so I synchronized on the "Session" (=> so locks are taken in the same order).
            Hide
            pantolomin John added a comment -

            Erratum on the preceding comment.
            Synchronizing "send" on the session created a huge performance hit when having heavy load.

            Peter's was solution was the best one from the beggining and my workarround is not that good ;o)

            I just put the code I used for the "reset" method :

            quickfix.Session
            public void reset() throws IOException {
              if (!resetting.compareAndSet(false, true)) {
                return;
              }
              try {
                if (hasResponder()) {
                  if (application != null && application instanceof ApplicationExtended) {
                    ((ApplicationExtended) application).onBeforeSessionReset(sessionID);
                  }
                  generateLogout();
                  disconnect("Session reset", false);
                }
                resetState();
              }
              finally {
                resetting.set(false);
              }
            }
            
            Show
            pantolomin John added a comment - Erratum on the preceding comment. Synchronizing "send" on the session created a huge performance hit when having heavy load. Peter's was solution was the best one from the beggining and my workarround is not that good ;o) I just put the code I used for the "reset" method : quickfix.Session public void reset() throws IOException { if (!resetting.compareAndSet( false , true )) { return ; } try { if (hasResponder()) { if (application != null && application instanceof ApplicationExtended) { ((ApplicationExtended) application).onBeforeSessionReset(sessionID); } generateLogout(); disconnect( "Session reset" , false ); } resetState(); } finally { resetting.set( false ); } }
            chrjohn Christoph John made changes -
            Assignee Christoph John [ chrjohn ]
            chrjohn Christoph John made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            chrjohn Christoph John made changes -
            Status In Progress [ 3 ] Resolved [ 5 ]
            Resolution Fixed [ 1 ]
            Hide
            chrjohn Christoph John added a comment -

            Peter and John,
            thanks for your contributions. I have changed the reset() method accordingly and have added the UnitTest (I changed it slightly to work with a pausable Executor and made it detect the deadlock programmatically).

            Show
            chrjohn Christoph John added a comment - Peter and John, thanks for your contributions. I have changed the reset() method accordingly and have added the UnitTest (I changed it slightly to work with a pausable Executor and made it detect the deadlock programmatically).
            chrjohn Christoph John made changes -
            Fix Version/s 1.5.3 [ 10165 ]
            chrjohn Christoph John made changes -
            Link This issue is duplicated by QFJ-700 [ QFJ-700 ]
            chrjohn Christoph John made changes -
            Link This issue is duplicated by QFJ-571 [ QFJ-571 ]

              People

              • Assignee:
                chrjohn Christoph John
                Reporter:
                msperisen marc sperisen
              • Votes:
                2 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: