Center object hang due to camera timeout

ManuelJ · December 7, 2015, 7:40am

After a center object, the camera driver threw this exception:

[06/12/2015 4:19:37] [DEBUG] [Camera Thread] ASCOM Camera : CheckDotNetExceptions ASCOM.Atik.Camera StartExposure System.ApplicationException: StartExposure - Camera not idle (See Inner Exception for details) (System.ApplicationException: StartExposure - Camera not idle)
at ASCOM.DriverAccess.MemberFactory.CheckDotNetExceptions(String memberName, Exception e) in c:\ASCOM Build\Export\ASCOM.DriverAccess\MemberFactory.cs:line 630
at ASCOM.DriverAccess.MemberFactory.MethodTargetInvocationExceptionHandler(String memberName, Exception e) in c:\ASCOM Build\Export\ASCOM.DriverAccess\MemberFactory.cs:line 678
at ASCOM.DriverAccess.MemberFactory.CallMember(Int32 memberCode, String memberName, Type[] parameterTypes, Object[] parms) in c:\ASCOM Build\Export\ASCOM.DriverAccess\MemberFactory.cs:line 487
at ASCOM.DriverAccess.Camera.StartExposure(Double Duration, Boolean Light) in c:\ASCOM Build\Export\ASCOM.DriverAccess\Camera.cs:line 594
at ew.ay(cu A_0, he& A_1)

The problem is that after giving the exception, the center object dialog was opened for hours and the sequence was neither retried nor aborted. So the tracking of the mount went on an on causing a dangerous situation.

sg_logfile_20151205181247.zip (140.7 KB)

Ken · December 7, 2015, 10:38pm

Ya… this issue has been around for some time for Atik cameras. I am honestly not sure what to do about it.

This is something we can control. I will try and figure a way to replicate the issue and make sure appropriate actions are taken.

ManuelJ · December 7, 2015, 11:34pm

Hi Ken,

Maybe if some exception like this arise, instead of waiting the camera to complete the capture (something that never happens), just propagate the Exception and make it fail, so that the recovery mode may go ahead.

(I’m not sure if this makes sense)

Regards,
Manuel.

Chris · December 8, 2015, 9:24am

It would really help if people said what equipment they have.

There’s nothing that can be done because no one ever provides any useful information. Whenever I hear about this sort of thing I ask for driver log files and I NEVER get any.

All I can see is that the camera driver reports that it can’t start because the camera is in the wrong state. It reads the state from the camera.

Driver logs would help, also what has to be done to recover so the camera works again?

Chris

ManuelJ · December 8, 2015, 9:53am

Hi Chris,

Do you need something else from me?. Is there any driver log file?.

This error is not exactly uncommon from my side.

Regards,
Manuel.

Chris · December 8, 2015, 11:17am

[quote=“ManuelJ, post:5, topic:2678”]
Do you need something else from me?. Is there any driver log file?.
[/quote]Yes driver log files.
Check the Trace On checkbox in the Atik driver setup dialog window.
Run until you get the error.
Zip up the Atik driver log and the SGP log and post them.

The Atik log is in My Documents\ASCOM\Logs , named something like ASCOM.Atik.nnnnnn.nnnnnnn.txt

It would also help to know what you do to get things working after this problem.

Chris

ManuelJ · December 8, 2015, 11:34am

All right Chris, I’ll do as you say. I’ve just updated to the last ascom and atik drivers. I’m also considering removing the usb hub. One thing at a time to see what is the source of the problem.

I’m a software engineer, and I fully understand your point!.

Regards,
Manuel.

Chris · December 9, 2015, 10:51am

In case it helps I was running last night with two Atik cameras connected through a USB extension and a 13 port hub. This is actually 4 4 port hubs and at least one of the cameras is connecting through two hubs.

No drop outs of the cameras or any of the other USB stuff at all over several hours.

W10 64 bits but a fairly powerful laptop. I can surf the net and do emails at the same time as imaging and guiding.

I’m using my latest drivers of course, they may be more up to date than what is released.

Chris

ManuelJ · December 10, 2015, 7:28am

Hi Chris,

My computer has a Atom D525 cpu, so it’s quite slow for todays standards. So maybe that’s a clue.

Weather is horrible, so it will take some time to extract the logs.

Regards,
Manuel.

ManuelJ · December 29, 2015, 8:48pm

Hi,

Event happened again @ 29/12/2015 21:28:24. Same error, and the center object did not exit. Please see attached logs.

Regards,
Manuel.

ASCOM.Atik.1947.329280.txt.zip (80.2 KB)sg_logfile_20151229194516.zip (67.4 KB)

ManuelJ · January 13, 2016, 8:37am

Hi,

Any ideas?.

Regards,
Manuel.

Chris · January 13, 2016, 9:15am

Looking at the camera log what I see is that a 60 second exposure is started at 21:27:49

21:27:49.394 StartExposure Duration 60, Light True
21:27:49.394 CameraState get cameraIdle
21:27:49.395 StartExposure: bin 2,2: subframe 0,0:2748,2198: high priority True, amplifier switched-True, StartExposure, started

Then, 25 seconds later another exposure is started:

21:28:24.462 StartExposure Duration 2, Light True
21:28:24.463 CameraState get cameraExposing
21:28:24.463 StartExposure camera state not idle

This fails because the previous exposure is still in progress, but then shortly later there is an AbortExposure call:

21:28:24.530 CanAbortExposure always true
21:28:24.533 AbortExposure

It looks as if the abortExposure and StartExposure calls are in the wrong order.

There are also telescope communication problems shortly before this.

Chris

Ken · January 14, 2016, 4:41am

Thanks Chris.

@ManuelJ Chris is 100% right… I just can’t figure out why. What happened is bad timing and a condition we did not fully consider (as is evidenced by this thread). What happens is that your sequence falls apart… clouds or, in this case looks like a complete failure of the mount… this also probably caused PHD2 to freak out and stop moving the mount, which in turn led to a “star lost” event from PHD2. This event aborts the exposure and the sequence goes into recovery (if you have it enabled or aborts the sequence if not).

From the SGPro logs:

[29/12/2015 21:27:49] [DEBUG] [Camera Thread] SGM_CAMERA_CAPTURE message received…
[29/12/2015 21:27:49] [DEBUG] [Sequence Thread] ASCOM Camera: Exposure aborted…

From the Camera logs:

21:27:49.344 CanAbortExposure always true
21:27:49.348 AbortExposure
21:27:49.394 StartExposure Duration 60, Light True
21:27:49.395 StartExposure: bin 2,2: subframe 0,0:2748,2198: high priority True, amplifier switched-True, StartExposure, started

Like @Chris pointed out, this is called in the wrong order, but the abort on top is called because of a lost star event. What I can’t figure out is why the camera received them that way… The SGPro logs show it in the correct order. But… instead of pondering over this for long periods of time (regardless of this log, it can still clearly happen), I think it will be OK to place an abort event in front of any new capture start. It will not be called if the camera reports it is idle, but this will help sync camera status when it can be modified from multiple threads. Not the most elegant fix, but you probably won’t see this issue again.

You will need to use the 2.5 beta to see this. It will be in 2.5.0.2

Chris · January 14, 2016, 11:08am

I’m uneasy about this, it’s what I call a band aid fix, adding something to hide an underlying problem.

While the camera drivers should handle this it’s putting an additional load on them, adding what could be an edge case. People don’t expect AbortExposure to be called frequently.

These thread synchronisation issues are really difficult, In this case I guess that the various calls are marshalled onto a single thread in the driver - or the interface between SGP and the driver.

Would it help if the calls to a device were to be marshalled onto a single thread in SGP? That way you would have more control and ability to diagnose problems.

Chris

Ken · January 14, 2016, 11:03pm

Maybe… but it’s a good band-aid I think.

There really is no extra load on the camera so only in very infrequent conditions would this ever happen. 99.9% of the time abort will be called no more than it is currently. If the camera is in the expected state (which it obviously is most of the time), we will not call abort. If it’s not, we will abort the exposure before the call to start a new one (or fail if abort is not supported). There is no extra load here and abort is not called frequently… just checked to see if it needs to be called more frequently. There is likely a better way to architect this, but right now we are saying that a camera’s ability to start a new exposure is predicated on the fact that only one exposure can be in process at any given time. If something modifies the expected state, it is truly from an unexpected (kind of) event.

This is the current architecture of SGPro… there is a camera thread, a focuser thread, etc. They all subscribe to message queues for pub/sub. These threads act in a synchronous manner and events are handled one at a time when the device is ready to support it. That said, sometimes we are forced to circumvent the device threads because we need to execute an action immediately (like abort or stop or whatever). These events are issued outside of the device threads and this is where we run into marshaling issues.

One other point is that we are looking at the possibility of actively moving any device activity out of persistent threads (entirely… meaning no direct calls to devices from the sequence thread, no direct calls from the device threads). All device calls will be discrete and compartmentalized to protect other areas of the sequence. I’m not exactly sure how it will work yet, but we’re looking at methods that would allow devices to behave as badly as they want… enter infinite loops, never return control, etc and still maintain control of the overall sequence.

Anyhow, with respect to this particular change, it’s not really much change at all…

ManuelJ · March 17, 2016, 7:08am

Hi,

This has happened again several times. One of them ended up in a pier hit. Once it happens, SGP won’t exit the center screen.

Around 1:14:39, always happends the same. Guider goes bad, and the recovery starts with a center.

sg_logfile_20160315224759.zip (101.2 KB)

Regards,
Manuel.

Ken · March 18, 2016, 12:19am

OK, Thanks for the logs… we’ll take a look.

You should never depend on only one application to protect your gear… often the ASCOM driver will have it’s own protection built in.

Again, I am not sure if we can do much to fix the root cause of the issue (camera timeout), but my hope is that we can find the hang and, at least force a graceful sequence failure.

Ken · March 18, 2016, 4:11am

@ManuelJ

I do believe I have the “hang” part of this issue resolved.

ManuelJ · March 18, 2016, 4:36pm

Many thanks Ken. I’m developing my own piece of software to avoid the mount to track past meridian. But this bug was occurring more than I wish, and was ruining my imaging sessions.

Regards,
Manuel.

Andy · March 18, 2016, 4:59pm

Here’s a simple app that will do that for you. Mount Watcher

Andy