Troubleshooting error (PHD? SGP? APCC?)

Bhwolf · August 2, 2015, 4:59pm

Hi all,

Hoping for some pointers. I’m using SGP, PHD2, and APCC w/ AP1100 mount. PHD uses a QHY5L-II guide cam.

At several points throughout the night, I get alerted via GNS that there’s a problem. When I log in, the mount is parked, PHD isn’t guiding (naturally) but I do seem frames still being taken. What I can’t seem to piece together is the core of the problem. I suspect either PHD encountered some kind of problem, or, APCC did a safety park for some reason.

In looking at the log(s), these two times stand out:

[8/2/2015 12:50:57 AM] - frame 7 starts
[8/2/2015 12:55:29 AM] [DEBUG] [PHD2 Listener Thread] Failed to establish client connection to PHD2 using port 4400: No connection could be made because the target machine actively refused it 127.0.0.1:4400

What I can’t figure out is: did PHD error, or did APCC error and safety park, causing PHD to error? The APCC log isn’t human readable so it’s hard to tell what APCC thought was going on. I’m not sure why the mount parked.

This occurred a few more times later in the night. PHD/SGP logs here.

https://onedrive.live.com/?id=2193A58C8118D0E4!107556&cid=2193A58C8118D0E4&group=0&parId=2193A58C8118D0E4!107536&authkey=!AIpfIhG9xnEuSBc&action=locate

Any tips greatly appreciated. I’m going to fwd the APCC log to AP to see if they can decipher anything. Thanks!

Brian

Ken · August 2, 2015, 5:20pm

If SGPro is still capturing images, something external to it has invoked the park (as SGPro seems blissfully unaware of it).

rgralak · August 2, 2015, 6:00pm

APCC can park if the meridian or horizon limits are enabled and reached. The mount will also park if the serial port connection is lost. To be robust as possible, ASCOM clients probably should be polling the ASCOM driver to see if the mount is tracking and/or parked before attempting operations, including taking images.

-Ray

Bhwolf · August 2, 2015, 8:14pm

OK, thanks for the info. I reboot, fired up the machine and started APCC only. One problem I noticed: the watchdog will countdown (in APCC) roughly every 10 seconds to reset the timer. I had mine set to 6 minutes. At 5:50, it would reset to 6, etc. But as I sat there and watched, occasionally the PC usage would really spike and eventually it would get down to 5:00, then refresh, then lower, then refresh, etc. The machine became more unresponsive.

I’m not saying APCC is necessarily causing the problem; it could be a USB/serial issue, background task issue, contention, etc. But it seems the park must’ve been caused by the AP watchdog, as no other limits were set. Also, I see large gaps in the APCC log, like below. Normally there is quite a bit of communication every second, in this case large outages.

Ray, any chance you or AP could post a quick primer on the commands in the log (or point me to it, if it’s out there?) I submitted a ticket to see if AP can provide insight into what the log is saying, but I’d be happy to fish / translate myself if I knew what I was looking at.

Thanks much,
Brian

0429923 2015-08-02 00:53:28.705: Debug, Serial Thread, TX = ':Rd#'
0429924 2015-08-02 00:53:28.719: Debug, Serial Thread, RX = '0#'
0429925 2015-08-02 00:57:28.742: Debug, Serial Thread, TX = ':pS#'
0429926 2015-08-02 00:57:28.760: Debug, Serial Thread, RX = 'East#'
0429927 2015-08-02 00:57:28.761: Debug, Serial Thread, TX = ':GH#'
0429928 2015-08-02 00:57:28.789: Debug, Serial Thread, RX = '00:05:58.0#'
0429929 2015-08-02 01:01:28.796: Debug, Serial Thread, TX = '#:GR#'
0429930 2015-08-02 01:01:28.823: Debug, Serial Thread, RX = '20:15:44.8#'
0429931 2015-08-02 01:02:28.804: Debug, Serial Thread, TX = '#:GD#'
0429932 2015-08-02 01:02:28.830: Debug, Serial Thread, RX = '+3839:03#'
0429933 2015-08-02 01:02:28.830: Debug, Serial Thread, TX = ':GOS#'
0429934 2015-08-02 01:02:28.860: Debug, Serial Thread, RX = 'P99000220P000#'
0429935 2015-08-02 01:08:28.900: Debug, Serial Thread, TX = ':GM#'
0429936 2015-08-02 01:08:28.927: Debug, Serial Thread, RX = '02:24:00.0#'
0429937 2015-08-02 01:08:28.927: Debug, Serial Thread, TX = ':HRG#'
0429938 2015-08-02 01:08:28.954: Debug, Serial Thread, RX = '+23340:34#'
0429939 2015-08-02 01:13:44.861: Debug, Serial Thread, TX = ‘:HDG#’

rgralak · August 2, 2015, 9:01pm

Brian,

It looks like something is probably locking up the computer for a long time. The actual commands to the mount are not an issue because right before the gap there was a TX and an RX, so nothing was outstanding to the mount for the 5-minute gap. Everything in APCC is multi-threaded so even if there was an user interface lockup the background communication to the mount would have continued, but something was stopping it.

Were the exposures 5-minutes by any chance? If so maybe some sort of USB lockup might have happened.

-Ray

Bhwolf · August 2, 2015, 9:40pm

Thanks Ray –

These were 20min exposures – I had the guide cam, focuser, and mount sharing the same powered USB hub, while the main cam was stand-alone on its own USB connection.

This happened some time ago and at the time George/AP just said there was a drop in communication for some reason, but was only an isolated incident. This is now 2 nights with several occurrences, so something seems wrong with my machine/hardware somewhere.

mads0100 · August 3, 2015, 2:14am

Bhwolf,

I run my main imaging camera on separate USB cord due to similar issues in the past. It fixed my ‘quirks’ when I did it and have always done it since.

Chris

Bhwolf · August 6, 2015, 11:33pm

Just to follow up on this… Not really SGP, but…

I’m waiting to hear back from AP, but I’m not sure there is much to be learned from the logs. I was imaging for 4 nights. The first two I used APCC and experienced these strange lock ups several times. By lock ups, I mean these strange gaps where communications seem to disappear and the mount hits the safety park watchdog timeout. PHD also suffered a blackout around these times.

The second two nights I didn’t use APCC, and the system ran fine all night long. Now, I’m not saying it’s APCC causing it, but there is a correlation. I haven’t used APCC in awhile but a few months ago I had some similar issues, only now seeing the correlation to nights when I had unexplained timeouts.

So, I’ll have to come up with some system debugging to figure out what’s going on. Just wanted to post an update…

rgralak · August 7, 2015, 2:04am

I doubt it is APCC as that would have been seen in the (literally) multi-year beta testing, However, APCC sends a lot more traffic across the serial port so it will stress your USB/serial converter and hub to a greater extent. I suggest you try moving your USB/serial converter off of that hub and onto a dedicated port.

-Ray

Bhwolf · August 7, 2015, 2:27am

That’s great to know! It could very well be something along these lines. It’s going to be raining significantly for a few days, but perhaps I can rig up an indoor test where I have the mount tracking w/ APCC and PHD taking pictures/etc to try to stress the port a bit.

rgralak · August 7, 2015, 3:07am

Or… I strongly suggest you try what I said and move your USB/serial converter to a separate port and see if lockups continue. BTW, the best way to connect to the mount is to connect the USB/serial converter directly to a USB port on your PC. Then, hang a M-F DB9 serial cable off the USB/serial converter so you are not daisy chaining USB cables. Serial cables can be much longer than USB cables.

Also, if possible don’t use the USB port directly above or below the other one you are using. USB ports are usually in pairs so use a port away from the one you use for camera downloads and autoguiding subframes (presumably connected to your hub).

-Ray