Sgp keeps crashing

OK. FWIW… I am not in a position to make a full release with an installer right now, but I can provide a an exe file for you. If you are inclined to continue the battle against these 2 sinister issues, we will be forever grateful:

  1. Whole app lock / hang during end of sequence options (usually after park is called)
  2. Whole app lock / hang during the code section that writes FITS files to disk.

To use (if you are interested), simply download the new exe file and replace the current EXE file in your Program Files (x86) directory. The issue where it would clobber older sequences is no longer present in this version so you can go back to using whatever method creates these issues for you (most quickly).

I spent the last 2 hours trying to reproduce these problems with 2.5.0.5. Results:

  1. Using my sequence for creating Flats with SBIG which in all previous versions has always failed quickly ----- NO FAILURES of either 1. or 2. — Awesome!
  2. Still hangs with “Parking Mount” message, but mount does not move.

Great work Ken!!
Now if perhaps you are not sure of what you changed in the code to fix this, feel free to send me several versions with different combinations of your changes for me to test.

Hmm… I didn’t change any logic, I just added more tracing to the logs. The only time I have ever seen trace statements fix a bug is when it adds some necessary delay between two commands…

Can I take a look at the logs where you have the park issue? I’m looking at this now. Usually folks are seeing the park happen, but then the UI becomes responsive afterward so this is new behavior.

The timing issue is actually a fairly good guess. My Flat sequence has always quickly triggered the crash, but only rarely has the Dark sequence done so. And others have reported adding 2 second delay fixed it for them (didn’t for me).

Here is last night’s log. Ran fine until 3:50 when the filter not moving problem occurred.
I have these set
[1/27/2016 6:23:06 PM] [DEBUG] [Sequence Thread] bShutdownPhdWhenDone: True
[1/27/2016 6:23:06 PM] [DEBUG] [Sequence Thread] bParkTelescopeWhenDone: True
but the log shows nothing happening to shut down. No parking, no stopping guider.

This sequence stopped because of a failure, as opposed to:

This is what happens every time the sequence finishes all its targets:
– Status bar says Parking Telescope
– Mount stops tracking but then stays at that position (Gemini 2, Titan mount).
– UI locks up, job using lots of cpu ~6% on quad core 3.2ghz

This happens every night whenever a sequence completes. Will send you tonight’s log.

Cool thanks. If you can get it to happen every time, I am very curious for you to produce it with 2.5.0.6.

SGPro 2.5.0.7 beta is released. I am still unable to produce any hang situation so I am still reliant on folks running SGPro to this failure and sharing the logs. Thx for all the help.

Hi

Here is log from latest crash

Harrysg_logfile_20160130155352.txt (69.9 KB)

Here are 3 hangs from last night’s imaging run using 2.5.0.6.

1): running 1st target fine. I was making changes to 2nd target to cause the sequence to complete early for our hang testing. I double clicked on count column (great new feature, always worked perfectly before this) and all the following rows in every column got changed to the one I was on. Just prior I had made several of these changes just fine. I had started with LRBG events and added 4 more LRGB events. It was in the process of setting these correctly that my double click on count cause change to all columns.
Not sure if there is any connection here, but shortly thereafter the program got stuck downloading current image. The UI was still completely responsive, it just would not complete the image download. So I aborted the run.

  1. started up sequence again. Ran to end. Hung running end of sequence ops.
    Dropbox - Error

  2. started up sequence again. Wanted to run an autofocus so I clicked Pause, allow image to finish, no Ending ops. After successfully downloading that image program hung. This hang on Pause or Terminate has been working fine for me for this and I think the prior release. Although image was fully downloaded, the hang occurred before the events stats got updated.
    Dropbox - Error

Here are the PHD2 logs for this period:

I should mention that to start the evening I installed a trial version of PHD2 that Andy sent me to try to improve the dim guide star problem, something about a pedestal adjustment. I am suspect of that version in that it was very sluggish in posting new guide commands. I had it set variously from 1 to 4 seconds, but it often only updated every 10 to 30 seconds. I have never seen this behavior before. I frequently got its message that the image had not competed in 17 seconds, and that it was going to disconnect all the equipment (which I don’t think it ever did).
More details for Andy on this in PHD2 tracking on Faint Star thread.

Alright… some good news on this front. I have isolated (and fixed?) the issue where the UI could hang at the end of a sequence. This condition was only related to sequences that had, at some point, entered recovery mode. The fix, is, of course, not to enter recovery any longer… make sure your stuff works the first time… OR since I am just being an ass right now, I have also added code that should circumvent this issue nicely. So… will the UI hang any longer? I doubt it. Did I possibly push a bug to another area of the code (dealing with multiple recovery instances in a single sequence)? I don’t think so. It’s possible though so I’m investigating.

Also… this is the issue that masqueraded as an issue the looked like the sequence hung on mount park (this is the one that most folks encounter… like @Andy, @pscammp and @swag72 reported). There is another issue where issues with the underlying file system and the disk that can cause a hang. This fix does not address that issue (Harry’s issue above). This one will require more hunting.

1 Like

Thanks Ken for all your hard work in tracking this down - It’s much appreciated

Sorry to have to report that 2.5.0.12 now has the crashing problem when I take lights. Here are 5 logs in a row. It took me 6 runs to get all of my 40 lights for 6 filters finished.

The problem seems to have these elements:
— happens very quickly ( 1 to 3 images) with very short (<1 sec) exposures
— adding a 2 sec delay for the short exposures helps a lot but still fails after 20+ exposures
— longer exposures help a lot too, but failures still happen
— these logs are using my SBIG 8300M at 2x2. Prior runs with older releases failed same way with Canon 6D and Nikon D600.
— this seemed to have been fixed with 2.5.0.10. I have not run 11.

Ken, you are welcome to log on to my obs computer and operate my cameras to reproduce this. Should be a lot easier for you to track down with my rig available, since you can do the lights any time of the day, and the failure happens reliably and very quickly.
The attached logs are for 2x2 lights with my SBIG 8300M. I will be happy to retry with 10 and/or the DSLR cameras if that would help you. Problem does not seem to be related to which camera is being used.

jmacon,
Same here with my QHY8L…It always happens to me when downloading the third LIGHT exposure.

Same for me with 2.5.0.11 but 2.5.0.10 is fine so im back on that for the moment.

No Log to offer yet but will try to produce one soon.

Question…I have 2.5.0.10 installed as I don’t get this problem, is it possible to also have 2.5.0.12 installed at the same time so long as I don’t run them both at the same time ???

Regards
Paul

No need to apologize for this… I am actually glad. For a while now there have always been 2 lingering crashes out there. 2.5.0.10 fixed only one of them and I have been looking for a way to track down the other… I hope this is it.

I appreciate the offer, but I really need a development environment to track issues effectively.

Certainly true. But additionally I can easily provide rapid turnaround if you would like to send me any special versions with extra debug logging.

OK. Well, the good part is that your logs are remarkably consistent. That’s the first step to a good trace. They have led me to a particular area of code that is possibly problematic and I have refactored a part of it that I have not liked for a very long time… maybe it will make a difference, maybe not… either way there will be more logging in this area (from your description it seems you can reproduce it pretty easily?).

I replicated your sequence, hooked up to the SBIG simulator and took hundreds of 1 second flat images without issue… I’m working blind so any feedback is much appreciated.

Yes, reproducing this is easy. Also, this problem was fixed in 10, and now fails again in 12, and in 11 according to pscammp.

OK @jmacon . Some minor stuff has changed in this build, but it’s nothing likely to fix any reported issue. It does, however, contain some more interesting trace logging. This is not an installer. Please replace the exe in your install directory with this:

Thx for your help.

Hi Ken,
Here are some logs for this latest version 13.

The first log shows a failure saving the 3rd image.
In the 2nd log it ran ok for about 10 images and I paused the run to change the delay from 2 to 0. On resumption it failed immediately.
The next couple of runs failed quickly.
I did not touch the keyboard during these runs, so no threads besides the one doing the image download had any work to do. Also, there is no connection to mount, focuser, plate solver, weather monitor, or dome.
I have included the sgf I have been using. It might make a difference in your simulation runs.
thanks for all your hard work in tracking this down.
Jerry

@Ken, I don’t know why I didn’t think of this earlier. You have a Canon 6D and I have two of them. This fails with all 3 of my camera brands: SBIG, Canon, Nikon.
You should be able to reproduce this with your hardware.
Here are my just produced logs:

The first 2 are the SBIG, but 1 ends differently than the others have.
The next 2 are using my Canon 6D.
I have included my sgf for the Canon run.
My hardware:

  1. Windows 8.1 Pro
  2. 8 month old i5-4660 3.2 ghz quad Intel pc. 8gbyte ram. Dedicated to running SGP and associated programs. In this case, nothing much else running since I am taking flats. No filter, mount, plate solver, safety monitor, guider, dome.
  3. Canon 6D connected to pc through USB 3.0 7-port powered hub, thoroughly tested over several months against 4 or 5 other hubs I have. The SBIG connects through ethernet, the most reliable possible camera connection.

If you can’t reproduce this problem on your hardware, I would conclude that maybe it has something to do with Win 8.1 vs Win 10 (which I think you are running under).
One other possibility might be some bad interaction with other software running on the pc, such as anti-virus software.
This idea is not supported by the fact that I could never get this failure to occur with 2.5.0.10 but does happen easily with both earlier and later versions.

@jmacon

Right now, it is in Jared’s possession while he addresses some mirror lock and live view issues. We can certainly coordinate this, just not immediately.

Are you sure about this? The logs seem to indicate saving is going fine, but there is a failure to display the resultant bitmap preview.

This is both interesting and disheartening as it seems like there are 2 separate issues:

  • An issue with cross-thread rendering (SBIG)
  • An issue where RAW data extraction fails in a bad way (hanging) (Canon)

Not sure about the RAW data one. That said… the one where image fails to render the bitmap preview (SBIG) is less an issue of reproduction at this point and more of a discovery process as to why. You could be right… there may be subtle ways in which the .NET runtimes function by OS. Not sure… Anyhow, I have 2 ideas. The first one is here in 2.5.0.14 (please continue with the SBIG so we can look at just one issue at a time):

Two final thoughts notes:

  • Your SBIG drivers seem a bit out of date. The current release is 4.9 build 1 (this is just FYI… as it does not seem to be a contributor here)
  • This seems like a pretty major issue and I am trying to fathom why more people don’t see it (it is certainly present beyond your reports… but we only have 3 or 4 ppl right now). It leaves me wondering if the issue is code or environment related.