Filter Change Hang

In a previous post about plate solve, I reported an issue where SGP attempted to change the filter to clear but it never completed nor timed out which then caused some plate solve recovery issues.

This report seems to be similar but occurred during a sequence while changing to the clear filter. The UI symptoms where that the filter reported “Moving…” but after a couple of minutes it had not happened nor had an error been reported. I then tried to abort the sequence but apparently SGP could not complete the abort because of the filter. After disconnecting and unplugging the filter and then reconnecting. It was not possible to change the filter manually in SGP but when the sequence was restarted it was able to set the filter. The relevant part of the log appears to be this:

[6/4/2016 10:42:42 PM] [DEBUG] [Sequence Thread] Setting filter: Clear…
[6/4/2016 10:42:42 PM] [DEBUG] [Sequence Thread] Moving filter wheel, isMoving, check 1…
[6/4/2016 10:42:42 PM] [DEBUG] [Sequence Thread] Moving filter wheel, isMoving, check 1 is complete…
[6/4/2016 10:43:58 PM] [DEBUG] [Main Thread] PopulateDataModel: Transferring view to the data model…
[6/4/2016 10:43:58 PM] [DEBUG] [MF Update Thread] Performing serialize…
[6/4/2016 10:45:03 PM] [DEBUG] [Main Thread] User requested sequence abort…

This shows the start of the filter change up to the abort request. While this was happening the light was lit on the filter controller which suggest it might have been stuck, however, I was expecting SGP to return a timeout error or something similar. Interestingly, the remaining filter changes in the sequence had no problems. There may be an issue with the filter but SGP needs better error handling/reporting. I have access to the filter communication protocol so I will do some testing and/or write my own ASCOM driver. Any insights you can gather from the log could help me with the testing.

The full log file is here: Dropbox - File Deleted

Some more data on this issue. I wrote a test program that used the ASCOM driver to do the following loop:

Set Filter position (rotating through RGBL)
Wait 10 seconds
Verify filter position
Wait 5 seconds

After 350 cycles of this sequence there was no failure. This test seems to preclude a mechanical issue and points to a ASCOM/SGP interaction problem. I suspect two things:

  1. SGP is polling the ASCOM driver for position and it fails (perhaps driver does not respond)
  2. SGP fails to note the problem and timeout.
    If these suspicions are true, #1 indicates a driver problem or an SGP polling issue. #2 indicates an SGP problem.

If you can tell me more about SGP’s interaction with the ASCOM filter driver, I can try to reproduce the problem with my test program. This failure happens most often on the B to L filter change which is the longest rotation of the filter. My filters are LRBGHa in that order. So in a RGBL rotation, the BL transition is more than one filter rotation which suggests a timing issue.

I extended the above test and replaced the 10 second delay with a 100ms loop to poll the filter for completion. I ran a test of over 1000 filter changes with no failure. I have been unsuccessful in demonstrating a problem with the filter or the ASCOM driver which makes me suspect a problem in SGP.

One thing that might help is to enable the ASCOM DriverAccess logging. This should show what commands SGP is sending and the responses it is getting when this fails.

Another thing to try is the ASCOM Conform application. This will check that the filterwheel driver is behaving correctly.

Chris

Thanks for the suggestion, however, the error is quite random in its appearance so I’m not sure I’ll capture it. A filter driver is so simple, I think my test program already proves it is a conforming driver.

Have you actually ran it through conform? I’d give that a shot and see if it produces any errors.

SGP doesn’t do anything fancy with filters. We ask it to move to a position and then poll it to see if it’s moving or if it’s landed where we’ve asked. We have some code in there to retry as some filter wheels like to ignore commands for some unknown reason.

As Chris mentioned you can easily sniff what SGP is doing via ASCOM logging. Which will give you a more accurate picture than I can describe.

Thanks,
Jared

I ran the conformance test and it passed so that is not an issue. I’ve already tried heavily polling the driver as I said but I wonder about the retry code you mention. Is it possible that the retry is causing a problem? If you could tell me the details about how the retry works, I will see if I can cause a failure with a test program. What is it SGP detects from the driver that triggers a retry? Perhaps the trigger is not seeing -1 or seeing the old position? Is there a time delay before that check is made?

In the 1000 cycle test I ran, I was checking every 100ms for either the final position or -1. I have not seen any indication that the driver is returning a bad status at least after 100ms.

Let’s not just focus only on the driver/SGP interaction. Why is SGP not timing out? Is that a clue as to what is happening?

Using ASCOM logging is a great idea but since the issue seems to be timing dependent, I can’t set up a reliable failure to check it.

What filter wheel driver is this?

I’d enable the DriverAccess logging and carry on as normal until it fails. The logs will be big but you know the time so you can search on that. I’d also enable any logging in the filter wheel driver.

The first thing to do to solve a problem is to find out what is really happening.

I know you are trying to be helpful but reproducing the error is maddening. It, of course, happens when I am actually imaging and then only some times. When I am imaging I don’t want to take the time to deal with this, so I am trying to narrow down the situation with targeted tests. The driver does not appear to have its own logging. I’m not aware of any ASCOM system wide driver logging (other than profile and serial port). If there is such logging could you point me to it?

As I stated at the beginning there does appear to be strong evidence of SGP problems in that it does not time out on the filter change. Meanwhile, if Jared can give me some clues as to what SGP does during retry, I will carry on trying to reproduce the problem in a controlled environment (not under the night skies).

Sorry, “Filter” and “Focuser” are too close. We don’t actually retry to move the filterwheel. Here’s the process for switching between filters:

  • Check current position. If new == old return
  • set integer position (fw.Position = X)
  • start “fast” polling while waiting for move to complete (1 second polls)
  • when currentposition == requested position go to slow polling (10 seconds)
  • Set “Move Complete” to true
  • Done

Hope that helps,
Jared

Thanks, pretty simple. Why the slow poll? What happens during fast poll if the position is not -1 and not the new position - raise an error?

Any thoughts on why SGP does not timeout?

After, many cycles I finally got another failure using SGP with just my camera and the filter in a big sequence of flats.
Considering how simple the ASCOM filter API is, I’m wondering if what is happening is that SGP polls for position and the driver does not return. Jared, could you look at the code and see how it is structured? A typical way to do timeouts would be a count down polling loop. If the driver never returned that might explain why SGP is not timing out.

Please let me know what you find. The filter has a serial interface which acknowledges a filter change. If the ASCOM driver is not structured carefully, I can see where a lost acknowledgement could cause the ASCOM driver not to return. That would also explain the interment nature of the error.

Here is the log file for the most recent failure. Look for the last filter change in the log at 8:07:09

No real reason to be hitting the hardware that “hard” when it should be dormant. Not that 1 second polling is hard…

We wait until it is the new position.

Do you have any handling for overlapping IO? For instance do you have a mutex around the serial connection so that it is locked and protected from multi-threaded access and prevents multiple reads/writes happening concurrently? This would explain the odd behavior you’re seeing with something that is difficult to duplicate.

Thanks,
Jared

From a look at the log the filter wheel driver is throwing an exception saying that it can’t reach the correct position:

[6/9/2016 7:25:20 PM] [DEBUG] [Sequence Thread] Setting filter: Blue…
[6/9/2016 7:25:20 PM] [DEBUG] [Sequence Thread] Moving filter wheel, isMoving, check 1…
[6/9/2016 7:25:20 PM] [DEBUG] [Sequence Thread] Moving filter wheel, isMoving, check 1 is complete…
[6/9/2016 7:26:20 PM] [DEBUG] [Sequence Thread] ASCOM Filter Wheel: Error in SetFilterWheelPosition. : Invalid Position Data Returned (System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. —> System.Runtime.InteropServices.COMException: Invalid Position Data Returned

This looks like a driver problem, once in a blue moon setting a filter position fails. The driver is correctly throwing an exception and SGP is catching it.
There’s nothing much more that SGP can do. It doesn’t know where the filter is so closing down seems pretty reasonable. You probably don’t want data collected through an unknown filter.

It may be some issue between the various QHY components but that’s not something that Jared can solve. It needs solving by the driver developer - QHY I guess.

Chris

@Chris You are misreading what happened. The exception was caused by my unplugging the filter after it hung in SGP. That is the only way I could get SGP going again. If you look at the second log I provided you will see there is no exception.

@Jared I did not write the driver so I don’t know the answer to your question. However, I am going to write one to see if I can get past this issue. Nevertheless, there seems to be an issue in SGP that it is not timing out.

You can try running the driver through POTH to see if that enforces some better behavior.

Yes, we could probably handle this better.

Thanks,
Jared

Thanks. Rather than a sync issue, I suspect a missed acknowledgement so I will roll my own and be careful about both issues.

Some new issues on this problem. I was testing my new ASCOM driver with SGP and I unplugged the filter as SGP was changing filters. SGP crashed hard with an unhandled I/O exception. Here is the log:

This is a second issue I have seen several times where SGP dies apparently while displaying an image. The sequence is 4 flat events each with a different filter 50 times on a rotating basis. Here is the log file:

Upon further tests, this appears to be related to having duplicate files names. If I run the test with an empty directory, I don’t see the problem. But if I run the test with files present from a previous run, it fails.

I take that back. I just had the same failure with an empty directory.