GNS Disconnects

Hello - so please download and install:

http://lunatico.es/GNS/beta/GNSLog.apk

It is a new app named “GNS Log”… you’ll have to configure again, but this way there’s no mess with your “real” app and settings. Everything is the same, but the log will be generated.

After exiting the monitor window, both the configuration (gns.data) and the log (gns.log) files will be present in the /Download folder of your phone. If you send these to me, I expect to find something interesting there. Quite looking forward to see.

Please note the files will be overwritten in each session, so once you’ve the program failure please send them straight away.

Thank you very much,

jaime

When I tap on monitor it comes up with
‘ERROR: Developer must include WRITE_EXTERNAL_STORAGE to the build settings! Ensure that your app is using the “android.permission.WRITE_EXTERNAL_STORAGE” permission’

ops, shouldn’t happen, I’ll have to check on other phones, in mine it works fine.
Sorry, I’ll check

… sent from a phone

Please download again:

Thanks

1 Like

I can’t get the logging version to fail while the original version continues to fail. I ran a test of 50 frames and the original version failed on the 9th frame while the logging version completed all frames. To me this suggests a timing problem where the logging changes the timing enough to avoid the problem. What protocol are you using to communicate between the programs? TCP, UDP, or something else?

The protocol is UDP … how strange, the only change are the logging messages, that could add a little bit of delay but so small…

The log is verbose but not that verbose, just a few messages / second at most.

Your network latency is normal, I mean, in the hundreds of ms at most?

What mostly concerns me is the lack of “network error” in your normal program. What’s the polling period? Can you try lowering it to 3 seconds or so?

Thanks!

UDP is a broadcast protocol which does not guarantee receipt. I think a message is being lost and it is not detected. I think you either need to switch to TCP which does guarantee receipt or add some receipt acknowledgement messages to detect lost messages.

I tried a 3 second poll and it fails.

Network is a local WIFI so latency should be low but that should not matter.

My bad I should have said TCP - I use UDP in many other things, but here TCP it is.

The problem is not however if a packet is lost (a write can fail in TCP…) - but that somehow that event is not detected (which of course should!).

If the communication with the host is lost, the screen should reflect that (comms arrows turning orange / red) and then being replaced by square dots meaning trying to re-establish the communication.

Specially puzzling me is the fact that you cannot reproduce the failure in the version generating logs - and that I cannot reproduce it at all.

I’ll check the code once more and cross fingers you can get the log version to fail.

Thanks.

What does the “no communication with remote” message in the PC log tell you?

… in the smartphone.

You previously reported everything in the smartphone was “green”, so the comms failure was not detected. Having it detected in the PC is good, but does not help in that regards.

Hello again,

I’ve been making a lot of tests. So far, what I see is the time taken to drop the communications and start again may be high, depending on the polling period. This is easy to improve, to have more chances of reconnecting.

But my main concern is to be sure the communications failures are being detected. I cannot find a problem with that neither reproduce it. As soon as the communication fails (“as soon as” meaning in this case, at most, the user-configured polling period + 2 seconds), the green arrows at the top should turn red/orange.

If they don’t, there’s a problem I’m not aware of.

Other possibility is they are indeed turning red/orange but that’s not obvious enough and maybe the title (“Communications”, above) should change color too.

I need to confirm the existence or not of that problem before improving the re-connection speed.

Thanks!

I ran another test today and paid particular attention to the green communication arrows on the phone. Just before a time out, the PC was on frame 21 and the phone showed frame 18. Both green arrows indicated that polling was still active on the phone. That is, they were blinking alternatively.

Thank you very much.

So sad this is not happening with the log version. I’ll keep trying all possible means of reproducing it here - once I managed (or we get the log) the solution will be very quick.

Best regards.

Here is the latest test setup that is failing for me. SGP - 50 dark events of 10 sec using the ASCOM V2 camera simulator. GNS polling 3 sec. When I tried this combination, after about 28 events the phone lost synchronization with the PC.

Thank you truly.

I’ve been making some changes to speed up things. Basically, everything comms related is now faster, and also more checks are added.

I’m betting on what’s happening is a comms error going undetected - let’s see.

So there’s both, a new log version, available same place as previous one http://lunatico.es/GNS/beta/GNSLog.apk

… and a new 1.5 version (exactly the same, but log disabled, as it seemed to affect somehow), in public beta in the play store:

https://play.google.com/apps/testing/com.lunatico.GNS

One note about synchronisation, though: it may be ok to be slightly out of sync. I mean, the phone will contact the PC every “polling period” seconds. This polling period of course is user configurable.

So, things going well, the phone can show an older event for at most the duration of one polling period.

Things going south, the error should be reflected ASAP.

In any case what cannot happen is any extended period of time with the comms arrows in happy green but the communications actually down. That’s what I hope to have solved here.

Best regards

I tried the beta with a 12 second poll and completed 50 frames. I tried it with a 3 second poll and it failed about frame 36. It appears the status on the phone is not being updated until the next poll which makes it behind most of the time. I’m not understanding what the poll is supposed to be doing vs the time out. I would like to see the PC side push notifications to the phone for a quicker update rather than depending on the phone to poll.

Bottom line, it still fails. I will try the logging version and see if I can get it to fail.

I tried the logging version and was able to get to fail. Here are the pc and phone files

Hope this helps.

This helps a lot, thanks!

For what I see, everything went fine until while “frame 11” there was a communication drop. The PC detects a socket timeout, and the smartphone receives something empty. In this case the smartphone reflected correctly the comms problems, right?

Then the phone started to try to connect again, not managing to do so.

Why do the communications drop is out of my reach, I’m afraid - most likely something temporary with the phone or the wifi or whatever; this is very much what happens to me when, while testing, I disable the wifi in the phone, or unplug the router while the session is running.

I feel somehow better now that the comms failure is detected, that was no doubt an error in the previous version.

Assuming the comms problem is real (seems so, but I can be proved wrong, always), everything looks fine to me now.

Maybe the time trying to reconnect should be longer, or more tries?

Some considerations about timeouts etc in the next message, this is long already.

Hello again, the promised considerations:

  • using push from the PC is out of the question, for technical reasons; standard “push” is achieved using google’s (or apple’s) intermediary servers, too much “middleware”, and thus given the phone has to start the connection polling seems to me the best way. It is also simpler, and the PC can support several users in a very simple way.

  • Polling period is how often the phone “asks” the PC about the status, timeouts etc.

  • The watchdog is a safety measure, to protect even from programming errors; whatever happens, if there are no valid communications for that watchdog period, the alarm is fired.

For smooth GNS operation, the timeouts have to be generous enough. We do not need “to the second” accuracy, this is supposed to work with remote observatories with poor internet.

For a healthy session, the timeout counter should never reach low values, below 30 or so.

Imagine what happens in this case:

  • a new event in the PC for 120 seconds
  • the phone gets it, polling every 10 seconds.
    (a 1 or 2 seconds desync is ok, I do not attempt to achieve better than that)
  • the timeout goes down, both places, say down to 7 seconds.
  • at this moment, the PC gets another event, whatever, for 300 secs.
  • the phone has just 7 seconds to receive the new event
    … as the polling period is 10 seconds, there are 30% chances it will miss it and an alarm is issued.

The goal is never having a problem undetected; this may imply false alarms sometimes, hopefully very seldom.

If the communication is lost, and cannot be re-established right away, alarm. If the timeout is reached, alarm. If the watchdog limit is reached, alarm.

So if there are communication problems (as seems to be) then maybe GNS is not the best solution, or maybe we can think of something to change to make it be.

Not sure that the phone reflected the comms problem - it just timed out. I did not see any other indication of an error. Both the phone and the PC where located within 6 feet of the router so I assume there was a strong signal between the two. Whatever the issue is, I can reproduce it most of the time so it is not some random infrequent occurrence.