Autofocus bug causing focuser runaway


#1

Edit to add that this is 3.0.0.4.

I have been having all kinds of problems recently with autofocus. The problem starts around 20:53:03 in the log below. SGP is running autofocus when it inexplicably commands the focuser to move to position -0-. This causes the focuser to runaway uncontrollably and the only way to stop the movement from my starting point at around position 25000 to zero is to shut down SGP and power down the focuser. Here is a relevant snippet.

[01/04/18 20:53:05.953][DEBUG] [Focuser Backlash Thread] Moving focuser to original requested position (25206)…
[01/04/18 20:53:05.953][DEBUG] [Focuser Backlash Thread] Focuser backlash compensation needed. Steps: 50 Direction:IN
[01/04/18 20:53:05.953][DEBUG] [Focuser Backlash Thread] Focuser moving to 25206
[01/04/18 20:53:05.965][DEBUG] [Focuser Backlash Thread] Attempting to move focuser to 0, but it is greater than max value 0

Some potentially relevant information. Temperature compensation is enabled in the focuser. This is critical for me since my scope will move more than the CFZ during a long exposure when the temperature is changing rapidly. Since SGP does not fully honor the ASCOM standard for temperature compensation by allowing me to enable it or disable it from the GUI, I have to close the focuser and then open the ASCOM settings to toggle TempComp and then reconnect as needed. I don’t know if this is relevant or not. I don’t recall ever recall ever having this runaway problem when I wasn’t using TempComp so maybe it is. Also, this doesn’t happen all the time, but when it does when a sequence is running unattended, it is catastrophic and unrecoverable and the session is lost.

I’ll be glad to provide whatever additional information is needed since this is a complete show-stopper for me.

Thanks.

Log file:

Tim


My Focuser
#2

Just to add another data point - last night, the sequence ran perfectly, autofocusing several times with no runaway. At no time during the sequence did I have TempComp enabled (I had to manually nudge focus with the IN/OUT buttons during exposures). So this definitely appears to be a bug related to TempComp.

Tim


#3

Well, another sequence ruined. I REALLY need some help with this. This time TempComp was not enable for the entire sequence so that’s clearly a red herring. In this log (link below), you can see that at 1:36:52:823 SGP again inexplicably tells the focuser to move from its last position at 25217 to zero.

[01/07/18 01:36:52.823][DEBUG] [Focuser Backlash Thread] Attempting to move focuser to 0, but it is greater than max value 0

Then you can see 3 minute later at 1:39:37:624 SGP logs that the focuser position is stale:

[01/07/18 01:39:37.624][DEBUG] [Focuser Move Thread] MoveFocuserAbsBlocking: Focuser position is stale for 10 seconds, retrying…

The focuser position is stale because it takes a good bit of time for my focuser to move 25000 steps. The next thing SGP does is tell the focuser to move to the next correct position for the AF routine. By this time, my focuser has tried to move to position zero, ground against the focuser drawtube stops and slipped the clutch on the Moonlite fortunately causing no damage. When the focuser tries to go back (after moving 25000 steps in each direction) it is completely lost and the sequence bombs.

I’m begging for some help here. I have adored SGP for several years, but now it is utterly useless to me.

HELP!!!

Tim

Log file:


#4

What focuser driver are you using? You mention that it is a MoonLite focuser but the driver isn’t the MoonLite one. Are you are using different hardware or your own driver? In any case it may be worth checking if the spurious zero value is coming from the driver.


#5

I’m using Robert Brown’s Arduino ASCOM FocuserPro2:

A lot of people are using this with no trouble. It appears from the SGP log that the zero comes from a position SGP is sending to the focuser [“Attempting to move focuser to 0…”], not a position the focuser is reporting. So it seems like a SGP issue in calculating the next focuser move, but I have no way to know for sure. Unfortunately, I did not have trace on in the focuser so I have no logs for it. I’ll fix that for the next session.

Tim


#6

My question was where is SGP getting the idea that the focus position should be 0. One possibility is that it’s getting it from the focuser driver. The driver log file could help to give some clarity on this. Speculation won’t help, what is needed is evidence and log files can help.


#7

Actually we’re checking for the MaxStep property of the focuser and 1 of 2 things is happening:

  • The focuser is disconnected and we’re returning a default of 0
  • The focuser is returning 0 in some cases.

It seems that the focuser is not disconnected so I’m guessing it’s the second option. Trace logging would certainly help here. I’ve added some more logging to SGP and some logic ignore MaxStep values of 0. However that could also be problematic in certain cases…but probably no worse than racking the focuser to 0.

Thanks,
Jared


#8

Assuming that a driver is a normal DLL that can be read easily without consequences isn’t a good idea.

it’s implemented as a normal DLL to make the interfacing process easy but it isn’t. It’s purpose is to provide an interface to hardware and this is it’s primary function. Any call could involve communication with the hardware.

Normally reading a property is fast but there’s no guarantee this is the case with a driver. It could involve a context switch into a different process and a call to hardware. Something that is assumed to take a short time, microseconds at most, could easily take vastly longer, 10s of milliseconds. This could upset the application in many ways and at the least make it run much slower.
Increasing the load on the driver also increases the chance that errors are triggered, especially with a multi threaded application such as SGP.

It would be better to be much kinder to the drivers:

  • Ask for a constant property once, then use a local copy.
  • Don’t check for anything more frequently than you have to.
  • Pass all calls to a driver through a single thread.
  • Always call a driver in a try/catch loop.
  • Don’t keep checking the connected property, trap any error instead. You have to do this anyway.

Without this sort of approach SGP’s reliability becomes hostage to the least reliable driver, something over which you have no control.


#9

Thanks Jared. I’ll try again with trace on for the focuser controller. It may not help, but I now recall seeing the same behavior with the Rigel nStep controller as well.

Tim


#10

OK, I’ve done some more sleuthing. I found a recent focuser log file that logs the behavior. Unfortunately, I don’t have the corresponding SGP log since only the last week of logs are kept. Here is where things go pear-shaped. Right after this snippet, the focuser starts its descent to position zero:

21:02:08.011 Position Get 25263
21:02:08.019 CS: GetPosition Recvd: P25263
21:02:08.020 CS: 25263
21:02:08.020 CS: :01#
21:02:08.020 Position Get 25263
21:02:08.028 CS: GetMaxStep Recvd 50000
21:02:08.028 CS: 0000
21:02:08.028 MaxStep Get 0000
21:02:08.028 CS: IsMoving cmd= :01#
21:02:08.040 CS: IsMoving Recvd: I00
21:02:08.040 CS: 00
21:02:08.040 IsMoving 00
21:02:08.040 IsMoving Returning 0 - not moving
21:02:08.040 Move 0
21:02:08.040 Move Position = 0
21:02:08.040 Move focuserPosition = 25263
21:02:08.040 Move Maxsteps = 0
21:02:08.040 Move focuserPosition = 0
21:02:08.040 Move 0

If I’m reading the log correctly, it looks like this line is the culprit:

21:02:08.028 CS: 0000

It seems to my unschooled eye that a command string (I think that’s what “CS” is) of “0000” is being sent. What I can’t seem to understand is where the “Move 0” comes from. Some help decoding this would be greatly appreciated.

Here’s the log.

Tim


#11

There seems to be three messages associated with getting the MaxStep value:

The first one seems to give the impression that the correct value - 50000 - was returned, presumably from the hardware, but this was truncated to 0000 before being passed to SGP. This may be because the driver expects that the value has a character prefixed in the way that reading the position seems to return P25263 which is truncated to 25263.

Looking at the full log GetMaxStep should return M50000 so this looks likely.


#12

As the author of the ASCOM driver (which fully passes CONFORM and all the tests), I have followed this with interest and have a few comments on this. Yes the driver did return an invalid value but the reason for this occurs a little earlier (I will get to that in a moment though).

19:56:45.629 CS: GetMaxStep Recvd M50000
SGP still moving but again requests MaxSteps
and then requests MaxSteps again 100ms later
19:56:45.764 CS: GetMaxStep Recvd M50000
and then again
19:56:45.899 CS: GetMaxStep Recvd M50000
and keeps requesting focuser position and Maxsteps again and again from
19:56:45.999 Position Get calling
19:56:46.014 CS: Get MaxStep cmd= :08#
to
19:56:46.141 CS: Get MaxStep cmd= :08#

That just seems to be overkill - it should be caching MaxSteps and it should not need to get the position after the move has finished at < 100ms intervals because the move is finished. On what assumption is maxSteps changed? Why is it necessary to call it more than once?

Then SGP repeatedly calls for focuser position every second (in one instance only 10ms apart). I cannot see any justification for this.

19:56:47.550 Position Get calling
19:56:47.566 Position Get calling

20:08:18.497 CS: GetPosition Recvd: P24976
We see the focuser position change - this was due to the temperature compensation within the focuser being applied (not from SGP)
and again at
20:16:19.704 CS: GetPosition Recvd: P24969

Now what is also evident that when the focuser position changes, SGP then requests maxSteps again
20:08:18.538 CS: GetMaxStep Recvd M50000
20:16:19.717 CS: GetMaxStep Recvd M50000
20:16:19.843 CS: GetMaxStep Recvd M50000
Again, every time this happens SGP again requests maxSteps - I would argue (with Chris) that SGP should be using a cached copy. Here two maxStep get calls are issued by SGP 100ms apart. There is NO NEED to call maxsteps THREE times in the space of 1.3 seconds. There should be ONLY ONE call needed.

So far the focuser is operating fine (apart from maxSteps being repeatedly requested and sometimes 100ms apart, and that the focuser position is being requested every 1 second, somes more frequently)

Again the focuser is controlling focus position through temperature compensation (the position changes again - about 5s-8s apart as temperature drops)
20:21:47.022 CS: GetPosition Recvd: P24962
after which SGP again calls for MaxStep (so SGP is making an assumption that whenever the focuser position changes maxsteps may also change)
20:21:47.062 CS: GetMaxStep Recvd M50000

SGP requests the state of TempComp which is True (correct)
20:26:03.499 CS: TempCompEnabled Recvd 11
SGP turns off temperature compensation
20:26:03.502 CS: Set TempComp ON/OFF c :230#
SGP gets position
20:26:03.532 CS: GetPosition Recvd: P24962
SGP gets maxsteps
20:26:03.564 CS: GetMaxStep Recvd M50000
SGP checks ismoving which is false (correct)
20:26:03.581 CS: IsMoving Recvd: I00
SGP sets new position
20:26:03.581 CS: SET NEW POSITION cmd= :0525287#

Now something initiated this in SGP - perhaps an autofocus routine?
I would argue that because the focuser is in temperature compensation mode, the focuser is controlling focus, not SGP, and SGP should not be trying to move the focuser position. I believe SGP shouldve popped up a dialog box warning the user that temperature compensation was enabled and gave the user the choice to leave it alone or let SGP take over

SGP then goes into a series of GetPosition and GetMaxSteps calls. There is NO need to call maxSteps all the time like this. Again repeated calls to GetMaxSteps are often only 100ms apart. What is the justification for more than one call? And repeated again and again over such a small time frame? Why doesnt SGP just issue one call and wait?

20:27:42.800 Position Get 24959
it seems the auto-focus is completed

SGP requests tempcomp state
20:28:13.801 TempComp Get Sending to controller
SGP turns on temperature compensation
20:28:13.813 CS: Set TempComp ON/OFF c :231#
Focuser applies temperature compensation internally
20:34:54.907 CS: GetPosition Recvd: P24952
SGP again requests maxSteps
20:34:54.948 CS: GetMaxStep Recvd M50000
Focuser applies temperature compensation internally
20:45:42.981 CS: GetPosition Recvd: P24945
SGP again requests maxSteps
20:45:43.029 CS: GetMaxStep Recvd M50000
Focuser applies temperature compensation internally
20:51:27.258 Position Get 24938
SGP again requests maxSteps
20:51:27.299 MaxStep Get 50000

Another auto-focus by SGP
21:02:05.862 TempComp Get Sending to controller
21:02:05.873 CS: TempCompEnabled Recvd 11
21:02:05.877 CS: Set TempComp ON/OFF c :230#

Now here is where it goes wrong
21:02:07.987 CS: Get Position cmd= :00#
21:02:07.987 CS: :00#
21:02:07.988 Position Get calling
21:02:07.999 CS: GetPosition Recvd: P25263
21:02:07.999 CS: :00#
21:02:07.999 CS: 25263
21:02:07.999 Position Get 25263
21:02:07.999 CS: Get Position cmd= :00#
21:02:07.999 CS: :08#
21:02:08.000 CS: Get Position cmd= :00#
21:02:08.005 CS: Get MaxStep cmd= :08#
21:02:08.011 CS: GetPosition Recvd: P25263
21:02:08.011 CS: 25263
21:02:08.011 Position Get 25263
21:02:08.019 CS: GetPosition Recvd: P25263
21:02:08.020 CS: 25263
21:02:08.020 CS: :01#
21:02:08.020 Position Get 25263
21:02:08.028 CS: GetMaxStep Recvd 50000
21:02:08.028 CS: 0000
21:02:08.028 MaxStep Get 0000
21:02:08.028 CS: IsMoving cmd= :01#
21:02:08.040 CS: IsMoving Recvd: I00
21:02:08.040 CS: 00
21:02:08.040 IsMoving 00
21:02:08.040 IsMoving Returning 0 - not moving
21:02:08.040 Move 0
21:02:08.040 Move Position = 0
21:02:08.040 Move focuserPosition = 25263
21:02:08.040 Move Maxsteps = 0
21:02:08.040 Move focuserPosition = 0
21:02:08.040 Move 0

A breakdown
21:02:07.987 CS: Get Position cmd= :00#
21:02:07.999 CS: Get Position cmd= :00#
21:02:08.000 CS: Get Position cmd= :00#
21:02:08.005 CS: Get MaxStep cmd= :08#
and then
21:02:08.028 CS: GetMaxStep Recvd 50000
21:02:08.028 CS: 0000
21:02:08.028 MaxStep Get 0000

Because the driver returned maxSteps 0 then SGP sets focuser position to 0.
What caused this?
21:02:07.987 CS: Get Position cmd= :00#
21:02:07.999 CS: Get Position cmd= :00#
21:02:08.000 CS: Get Position cmd= :00#
21:02:08.005 CS: Get MaxStep cmd= :08#
12ms between first Getposition command and second
1ms between second GetPosition command and third
WHY did SGP issue 3 get position commands so quickly one after the other when only 1 would suffice? SGP did not even bother to wait for a reply before issuing a second and a third call. This caused the focuser driver to overload and return the wrong value as commands were coming in too quickly. It was the third command that caused the driver to fail.

Now this did not happen after the previous auto-focus routine at 19:57 because there is more time delays between all the getPosition calls that follow at the end of the auto-focus routine. The driver handles repeated calls with 5ms delays between them without issue.

SGP should only need to issue 1 getPosition command and 1 getmaxcall and wait for the response.

Again lets look at this sequence
20:27:41.957 CS: Get Position cmd= :00#
20:27:42.094 CS: Get Position cmd= :00#
20:27:42.137 CS: Get Position cmd= :00#
20:27:42.149 CS: Get Position cmd= :00#
again very little time between each call - and again and again
20:27:42.225 CS: Get Position cmd= :00#
20:27:42.239 CS: Get Position cmd= :00#
20:27:42.377 CS: Get Position cmd= :00#
20:27:42.511 CS: Get Position cmd= :00#
20:27:42.651 CS: Get Position cmd= :00#
20:27:42.786 CS: Get Position cmd= :00#
That is 10 calls is less than 1 second

20:27:43.525 CS: Get Position cmd= :00#
20:27:43.613 CS: Get Position cmd= :00#

20:27:44.537 CS: Get Position cmd= :00#
20:27:44.553 CS: Get Position cmd= :00#

There are times that SGP issues two or more getPosition commands within very small delays between calls.

my conclusions about what is happening in relation to SGP

  1. Immediately after completing an auto-focus, SGP issues multiple getPosition calls over very short intervals and does not wait for a reply before issuing the next one. There is no need for 10 getPosition calls within the space of 1 second as the log shows
  2. SGP should wait for a response before issuing another call of the same type (getPosition)
  3. At this time in the sequence (after auto-focus is finished) SGP is generating too may getPosition calls with little or no delay between them. SGP should only require a single call to getPosition not 10. The log shows the time between calls varies from 1ms all the way to 10ms+
  4. The 1ms repeat getPosition call at 21:02:08.000 from SGP causes the issue
  5. On getting maxSteps at 0, SGP then tries to move the focuser to position 0. Logic might dictate that perhaps a drastic change from a previous value of 50000 to 0 was maybe an error, so SGP mightve been better to flag that and attempt another maxSteps call before assuming 0 was the right value?
  6. After calling getPosition, SGP keeps requesting maxSteps from the focuser. This is unnecessary and SGP should be using the cached version
  7. If SGP feels that it is necessary to issue so many getPosition calls in such a short space of time, it should WAIT for response before issuing the next getPosition call. SGP should also ensure that calls for getPosition are just not fired off in quick succession.
  8. SGP does not need to poll getposition every second. Every 2 or 3s (like focusmax) woud place less strain on any driver and still achieve the same results. This is a focuser, not a telescope which would be changing position all the time.
  9. SGP should not initiate an auto-focus when temperature compensation is enabled in the focuser. It should give the user the choice to override

I think that because SGP is multi-threaded, there maybe multiple threads that are generating getPosition calls, and that these end up all occuring around the same time with little or no gaps between them. It would be better having this handled by a single thread. It appears that this ONLY happens when an auto-focus routine is completed.

This is based on these facts - that the two previous auto-focus routines worked without issue. Immediately following the auto-focus there is a burst of getPosition calls - the first and second instances show the times between these calls to be ok that the driver handles fine, but in the third sequence at 21:02:08.000 its 1ms or less and the driver cannot handle that. The variation in times for the getPosition calls between the three sequences indicates this is based in SGP, not the driver.

Thus SGP is determining the frequency and timing of these getPosition calls. And when completing an auto-focus, SGP is generating too many (possibly from multiple threads) and with little or no time lapse between them.

Chris - can you please give a view on my logic - I am happy to be corrected. I just don’t see the need for any client to repeat getposition and getmaxstep calls to any driver repeatedly again and again over such a short time (1ms apart is a good idea?) or 10 calls in under a second for the same value?

I am not trying to place blame on SGP - Yes the driver should not have returned an invalid value. I do accept that. But I would like an explanation as to why SGP is generating so many repeated calls to the driver when only one would suffice?

Regards
Robert


#13

There are no defined “Static” values in ASCOM to my knowledge. At least not from the perspective of the client. Nothing prevents or is specified in the contract that says the driver cannot/will not/must not change these values and I don’t believe it should be up to the client to handle caching of these things. If the driver knows it should not actually ask the hardware for this value then it should be the driver’s concern to cache this value.

Correct, the driver should actually throw an exception if this occurs. SGP should also not attempt to move the focuser when Temp Compensation is running. Generally we disable it, focus, re-enable it.

We’re respecting the MaxSteps location that the driver returned us. Essentially we want to go to a value that is beyond the MaxSteps but we can’t, so we go to MaxSteps…which just happened to be 0.

I don’t disagree that this is likely unneeded, but at the same time the lack of thread safety should not be pushed onto the client. The driver needs to protect itself if it is not thread safe:

http://www.ascom-standards.org/Help/Developer/html/caf6b21d-755a-4f1c-891f-ce971a9a2f79.htm

Again this is the same threading issue as above. I don’t disagree that we probably should do this but I also feel that this is pushing the driver implementation details back to the client.

If the driver needs to cache it, then it should be a concern of the driver. From that point it should not matter how fast it gets called as you’re just returning a value.

This is one thing I really like about INDI over ASCOM…no polling, it’s glorious.

Agreed. I’ll double check this logic.

Because it’s multi threaded and multiple things are asking for the position. Generally when the focuser is not moving we slow the allowed polling interval, but give a better user experience we increase polling when the focuser is moving and it may happen that some of these polls overlap with other things going on (like Auto focus). And really the general principles line it out Drivers are system components that are responsible for hiding all device dependencies from clients via a standard interface. http://ascom-standards.org/Developer/Principles.htm

Thread safety and caching are all easily implemented in the driver, and what is better is that this will fix any of these issues for ALL applications that use ASCOM. Here is a simple driver I put together for a focuser a while back: https://github.com/jwellman80/ASCOMFocuser/blob/master/JFocus/Driver.cs

You’ll notice 2 simple constructs that address threading and excess calls:

  • Mutex lock around read/write - This keeps read/writes atomic to the actual hardware and also prevents clients from beating your hardware up or from serial traffic generating overlapping i/o
  • Centralized DoUpdate which queries the hardware and updates any of the changed values for fetching later. This throttles all calls to the hardware and essentially is the cache invalidation/updater.

Thank you,
Jared


#14

Jared:

While you’re looking into this, would it be possible to look into adding the ability for the user to toggle TempComp? That would help me immensely. As it stands now, I have to close the focuser, open the ASCOM properties, toggle TempComp and then reconnect the focuser. Also it seems like some times when toggle TempComp off, SGP still thinks it’s on.

Thanks Robert and Jared for looking into this.


#15

I think there is value in using your own advice to implement these things in SGP. Perhaps all applications, if followed the same advice, could benefit from that as well.


#16

The only issue with that is then we are crippling hardware that doesn’t have these restrictions. Also MANY drivers allow you to open up a settings dialog while the hardware is connected and change properties. We have no way of knowing that these properties would change if we cached them on connect. Many cameras do this to allow you to change Gain/Binning real time. Almost every telescope has a dialog box that is always on screen and many of these allow you to change settings real time that SGP relies on. Even the ASCOM Focuser Simulator allows you to change configuration while the device is connected:

This means that you can change the “static” values at any time and the client application is none the wiser if those values are cached in the client. Thus it is the Driver’s responsibility to cache those calls if they need to be cached.

Thanks,
Jared


#17

You totally miss the point Jared. You continue to harp on and on about the driver responsibility as though that gives you the free license to write bad code and expect the drivers to put up with all that nonsense that your application generates. SGP could have been written in such as way as to interface better with a serial based driver, using your own principles, in a thread safe and behaved manner. Instead that appears not to have happened as evidenced and pointed out in the logs.

Regards
Robert


#18

Hi,

Hello, I changed my ASCOM Focuser and with the new one the same problem.
With autofocus, but also with manual focus, it happens that the focuser runs uncontrolled into the -0 position.
Is there a solution to the problem?

Regards
Carsten


www.mainsequencesoftware.com