[QFJ-875] SendingTime accuracy problem Created: 16/Feb/16  Updated: 23/Aug/16  Resolved: 23/Aug/16

Status: Closed
Project: QuickFIX/J
Component/s: Engine
Affects Version/s: 1.6.0
Fix Version/s: None

Type: Other Priority: Major
Reporter: Cedric Zeng Assignee: Unassigned
Resolution: Cannot Reproduce Votes: 0
Labels: QuickfixJ
Environment:

Use Eclipse to run QuickFIX/J


Attachments: Microsoft Word Sample Code to QFJ.xlsx     Microsoft Word Sample Log to QFJ.xlsx    

 Description   

Hi,

I'm running QuickFIX/J for 8 months and still cannot solve this issue.
I run only the initiator and have synchronized time with the third party FIX server.

The problem is that sometimes machine will have a time lag to receive the message. Once the time lag is over 30 seconds, our side will report a "SendingTime accuracy problem" issue to broker and asked log-out.

After I modified the code, the frequency drops a lot. But it still happens one time per day.

Do you have any idea why this problem keeps happening?
1. internet unstable?
We save the price from broker to our server.


Here is the sample log

8=FIX.4.3|9=69|35=0|49=BrokerQuote|56=Client123|34=670|57=FX|52=20160215-07:55:34.743|369=5|10=074|
8=FIX.4.3|9=230|35=S|49=BrokerQuote|56=Client123|34=683|57=FX|52=20160215-07:55:35.369|369=5|131=3072|117=Client123-USDSEK-2016-2-15:7.55.35:72-2000000|537=1|55=USD/SEK|460=4|132=8.43613|133=8.43913|134=2000000|135=2000000|60=20160215-07:55:35.369|64=20160217|10=051|
8=FIX.4.3|9=61|35=0|34=6|49=Client123|50=FX|52=20160215-07:55:56.108|56=BrokerQuote|10=190|
8=FIX.4.3|9=229|35=S|49=BrokerQuote|56=Client123|34=684|57=FX|52=20160215-07:55:35.470|369=5|131=3072|117=Client123-GBPUSD-2016-2-15:7.55.35:80-2000000|537=1|55=GBP/USD|460=4|132=1.4515|133=1.45166|134=2000000|135=2000000|60=20160215-07:55:35.470|64=20160217|10=219|
8=FIX.4.3|9=229|35=S|49=BrokerQuote|56=Client123|34=685|57=FX|52=20160215-07:55:35.470|369=5|131=3072|117=Client123-USDRUB-2016-2-15:7.55.35:80-2000000|537=1|55=USD/RUB|460=4|132=77.8389|133=77.934|134=2000000|135=2000000|60=20160215-07:55:35.470|64=20160216|10=028|
8=FIX.4.3|9=69|35=0|49=BrokerQuote|56=Client123|34=686|57=FX|52=20160215-07:55:35.619|369=5|10=084|
8=FIX.4.3|9=229|35=S|49=BrokerQuote|56=Client123|34=687|57=FX|52=20160215-07:55:35.696|369=5|131=3072|117=Client123-USDPLN-2016-2-15:7.55.35:98-2000000|537=1|55=USD/PLN|460=4|132=3.9205|133=3.92347|134=2000000|135=2000000|60=20160215-07:55:35.696|64=20160217|10=037|
8=FIX.4.3|9=113|35=3|34=7|49=Client123|50=FX|52=20160215-07:56:07.607|56=BrokerQuote|45=684|58=SendingTime accuracy problem|372=S|373=10|10=033|
8=FIX.4.3|9=61|35=5|34=8|49=Client123|50=FX|52=20160215-07:56:07.607|56=BrokerQuote|10=198|
8=FIX.4.3|9=113|35=3|34=9|49=Client123|50=FX|52=20160215-07:56:07.607|56=BrokerQuote|45=685|58=SendingTime accuracy problem|372=S|373=10|10=036|
8=FIX.4.3|9=62|35=5|34=10|49=Client123|50=FX|52=20160215-07:56:07.607|56=BrokerQuote|10=240|
8=FIX.4.3|9=114|35=3|34=11|49=Client123|50=FX|52=20160215-07:56:07.607|56=BrokerQuote|45=686|58=SendingTime accuracy problem|372=0|373=10|10=044|
8=FIX.4.3|9=62|35=5|34=12|49=Client123|50=FX|52=20160215-07:56:07.607|56=BrokerQuote|10=242|
8=FIX.4.3|9=114|35=3|34=13|49=Client123|50=FX|52=20160215-07:56:07.607|56=BrokerQuote|45=687|58=SendingTime accuracy problem|372=S|373=10|10=082|
8=FIX.4.3|9=62|35=5|34=14|49=Client123|50=FX|52=20160215-07:56:07.607|56=BrokerQuote|10=244|



 Comments   
Comment by Guido Medina [ 16/Feb/16 ]

30 seconds is unlikely to be the time source which is feeding your servers but just in case here is a long shot:

I have seen time offsets for networks where the time source is not reliable or even set, if you could make sure your time source in the network is being updated via internet time servers and the same for the other side if you can since the other side might be out of your hands.

If you are using Linux even better because you can set your NTP servers to the closest pool.

I would recommend you to use time sources from http://www.pool.ntp.org/en/
On the right side you can see the different pools, 1st by continent and then by country.

Hope this helps,

Guido.

Comment by Cedric Zeng [ 17/Feb/16 ]

Hi Guido,

Thanks your advice.
I don't think it is a time sync problem. We have double check the time with our broker.
Our engine runs well most of the time. It just sometimes gets jammed and can't receive message for a while. Once the time it restart to receive messages over 30 seconds, it will been logged out.

I want to investigate why the system get jammed but I don't know how to start.

Thanks.
Cedric

Comment by Guido Medina [ 17/Feb/16 ]

Again another long shot as what I'm trying to do is trying to help you with my little available time with easy fixes/suggestions, few questions:

  • What type of session do you open? Weekly? Daily? or Forever?
  • Connection stalled might be related to Mina-core, if you go to QuickFixJ GitHub you will see my countless effort to always keep the dependencies updated, specially Mina-core, you probably have 2.0.10 and between that version and latest which is 2.0.13 some weird race conditions have been fixed, specially related to SSL (Not excluding non-SSL)

Go and check Mina-core page which has the list of fixed issues: https://mina.apache.org/mina-project/index.html and see if any of these fixes your problem, but regardless I would recommend you to revised your dependencies and update them from time to time.

Hope this helps,

Guido.

Comment by Guido Medina [ 17/Feb/16 ]

Another question I forgot: Are you seeing long GC pauses at any side? And that would lead us to another different which is your JVM configuration and hence my question, what JVM are you using and what are your JVM options.

Note: I have seen crazy GC pauses of more than 30 secs but then you would have to be using a big heap size like 16g+ or so.

Comment by Christoph John [ 18/Feb/16 ]

Cedric,
my first assumption would also be a GC pause. But you can easily check that with tools like "jstat -gcutil". Other possibility would be that you do a long running processing in your fromApp() method where you process the incoming message.
Chris.

Comment by Guido Medina [ 18/Feb/16 ]

Hi Cedric,

I have to agree with Christoph there, I would check the following in the following order:

  • Make sure you are not doing processing on the QuickFixJ thread, if you have a single thread thread factory you need to forward the FIX message to another queue and then processing at another thread.
  • Check your JVM version and parameters, I strongly recommend to move to at least Java 7u80 or even Java 8u74; reason for that? GC improvement, specially if you want a near pauseless JVM, see my thread Akka forums for low-latency micro-services, my conclusion on JVM options are by the end but you are welcome to read the whole thread (12 posts): https://groups.google.com/d/msg/akka-user/9s4Yl7aEz3E/zfxmdc0cGQAJ
  • Make sure your project dependencies are updated, that's easy to do if you are using Maven with the following command which will tell you what dependencies can be updated, not all dependencies are update-able because some of them will point to an alpha or beta version but for most it helps:
  • For Windows: mvn -U versions:display-dependency-updates
  • For Linux with only the important bits: mvn U versions:display-dependency-updates | grep -e '>' -e Building
Comment by Cedric Zeng [ 19/Feb/16 ]

Hi Guido,

For your questions,

  • My session is Weekly session.
  • I'm using Java 8u66
  • Here is my JVM configuration
    -vm
    C:\Program Files\Java\jre1.8.0_66\bin\server\jvm.dll
    eclipse.home.location=file:/D:/eclipse/
    eclipse.launcher=D:\eclipse\eclipse.exe
    eclipse.launcher.name=Eclipse
    [email protected]/../p2
    eclipse.p2.profile=epp.package.java
    eclipse.product=org.eclipse.epp.package.java.product
    eclipse.startTime=1453791502283
    eclipse.stateSaveDelayInterval=30000
    eclipse.vm=C:\Program Files\Java\jre1.8.0_66\bin\server\jvm.dll
    eclipse.vmargs=-Dosgi.requiredJavaVersion=1.6
    -Xms40m
    -Xmx512m

For Christoph suggestion,
Actually I did a long running processing in my fromApp() before. After I changed that, the frequency drops a lot. "TradeAppInitiator" is the class processes the incoming message. I move other process to "TradeAppInitiatorApp" class. Does this solve the concern?

For update,
My machine runs well for 3 days and has no any problem. The frequency actually drops a lot compares to early situation. I'm going to check whether it is a GC pause.

Thanks Guido and Christoph suggestions. You are very kind and helpful.

Regards,
Cedric

Comment by Christoph John [ 23/Aug/16 ]

If there are any more problems, please write to the mailing list. Thanks

Generated at Sat May 04 06:39:13 UTC 2024 using JIRA 7.5.2#75007-sha1:9f5725bb824792b3230a5d8716f0c13e296a3cae.