PDA

View Full Version : Locked out, can't reboot remotely.


bobbyc
12-15-2002, 12:31 PM
Hi, one of our towers that has been working fine for about a month I can't SSH into anymore. It just hangs forever after I enter the passwd. Is there any way I can remotely reboot it with the starutil firmware updater, or should I head on up to the tower?
Thanks,
Bob C

georgew
12-15-2002, 02:13 PM
My experience with hung unix machines indicates that sometimes you can get in another way, however rebooting is frequently the first thing to stop working. Rebooting at minimum requires the ability to spawn a process, one of the things a hung unix machine typically can't do.

If you have the ping watchdog configured, you could try blocking the pings or taking the target down... of course if you can't spawn a process, this may not help any...

You should try everything, as you never know for sure exactly what is hung. However I would suggest that remote power control is something you should plan for.

I've thought about using a public band for such telemetry tasks. Things like citizen's band receivers have really good range, and using DTMF, you can come up with an out of band remote power control system, with fairly unlimited expansion capability. Of course you can be snooped so you either must address security of your codes, or keep it a lightly used secret. Around here, CB frequencies are virtually unused. But there are plenty of other signalling methods on other bands that could be used as well.

You could also do a dead-man approach, cycle power every 5 minutes unless a signal is received through the router... say look for transitions on an led that you can reliably stop from blinking if you stop sending the signal. Then if there is a crash or if you purposely stop the "ping", you can effect a reboot. The built-in watchdog is nice, but it won't replace an external system's ability to work during a hung system event.


George

bobbyc
12-15-2002, 03:49 PM
Thanks for the tips. This particular tower seems to still be working, I just can't SSH into it. We have 2 other towers with problems, a backhaul on one and 2 sectors are out on another, because water got in the amps. Huge storm blew in yesterday. I just climbed one of the towers to replace pigtail from amp to sector, but it was so windy I couldn't heat shrink the connectors. Turns out it is the amp though. Last night it had no signal, the association list was -102/-102 for all customers. Over the morning I've seen peoples signals rise steadily as things have dried out. But a new storm is coming in.
I guess electrical tape just doesn't work as good as heat shrink.
Bob C

Trvln
12-15-2002, 08:29 PM
I've had SSH crash on me on Linux boxes before. Everything else worked fine bar SSH.

Maybe a cron job to restart SSH once a week, sounds kinds windowsy though.

georgew
12-15-2002, 10:48 PM
Over the years, I have had dozens of examples of less than 100% stable daemons... We have to add startup code we call "wrapers", some kind of start-and-keep-running script. But if you were using the ping watchdog, temporarily stopping the ping could induce a reboot.

Yeah, water has a way of getting into anything eventually. Lots of wrapping can work either way... I use a rubber tape wrapped toapply tightening pressure on the nut, snd it seems to work. I overwrap it in typical uv pvc tape. PVC tape alone seems to let water in quickly.

Funny thing, my "temporary" no-tape connections seem to stay dry in driving rain in vehicle applications. I use shrink-wrap that has hot-glue lining it, and push it as close to the colar as possible for a good water proof connection to the coax, and the rubber seal inside the N connectors seems to work great as long as water can't get in through the coax interface. I've also used silicone with good results as well... just can't get it on the contacts. Silocone will keep the nuts tight, and you can coat seams and potential leaks around the amp and antenna. Use the silicone used to make aquariums.

George

lonnie
12-16-2002, 12:11 AM
The two issues are tough ones. It would seem you will have to do reboot of the unit. Something is killing the SSH process. It is being started nicely but does not get the last phase.

The methods we use for sealing the connector is to use the double walled heatshrink when we put the end on. It also add more physical strain relief. The final touch is to use bathtub caulking (silcone sealant) on the top surfaces. We try and make it a complete and smooth finish and we have very little trouble with water. I would like to say we have none, but in over 3 years and 150 installs we have had fewer than 5 cables with water problem.

The other thing we do is to use LMR-400DB. It has the "grease" inside to repel water. Everything you can do to keep it out is essential.

tony
12-16-2002, 12:32 AM
The SSH problem is an interesting one. Are you sure it is not DNS related? (ie. remove all DNS entries from your StarOS box which should help if you do not have a DNS entire for your workstation.)

for StarOS v1.06.7 - v1.10.3 users, to reboot your system remotely, without the need of SSH, simply use our starutil utility with the below syntax. (of course, change the IP and password as needed)

starutil 192.168.1.1 1234 -reboot

Thanks!

bobbyc
12-16-2002, 12:44 AM
Sweet, thanks Tony. I was thinking there'd be something like this. I updated the firmware a few days ago so this should work. I removed the DNS entries as well to fix the slow log in problem a few days ago, but this just happened last night.
Thanks, BOb C

billf
12-16-2002, 09:25 AM
I was looking on the site and could not find it

bobbyc
12-16-2002, 10:45 AM
Well we got it rebooted and I still can't SSH into it. It is working fine though, I can ping the customers behind it.
I am starting to suspect my hardware. We've had another tower acting weird, and the common thing between them is the motherboard (asus tusl2-c), Aopen network cards, old hard drives, cheap vga cards, and lucent pcmcia-pci adapters.
The other tower acting weird was having a problem where on startup, it would not always find all 4 pcmcia cards and it would lock up a few days later. I tried swapping network cards, and the pcmcia problem still existed. I moved around the slot the vga pci card was in, and it wouldn't even boot up. So I removed the vga card altogether yesterday, and it booted up ok. So I'll watch that for the next few days.
I'm going to leave the tower with this SSH problem in question alone until the next release, and then I will head up there and update the bios and play around with the hardware there too as well.
Bob C

tony
12-16-2002, 02:05 PM
I was looking on the site and could not find it

The utility on at the top of our downloads page. (v1.3)

Thanks!

tony
12-16-2002, 02:12 PM
Well we got it rebooted and I still can't SSH into it. It is working fine though, I can ping the customers behind it.
I am starting to suspect my hardware. We've had another tower acting weird, and the common thing between them is the motherboard (asus tusl2-c), Aopen network cards, old hard drives, cheap vga cards, and lucent pcmcia-pci adapters.
The other tower acting weird was having a problem where on startup, it would not always find all 4 pcmcia cards and it would lock up a few days later. I tried swapping network cards, and the pcmcia problem still existed. I moved around the slot the vga pci card was in, and it wouldn't even boot up. So I removed the vga card altogether yesterday, and it booted up ok. So I'll watch that for the next few days.
I'm going to leave the tower with this SSH problem in question alone until the next release, and then I will head up there and update the bios and play around with the hardware there too as well.
Bob C

I would suspect a possible drive problem. Make sure you download your configuration in case the drive, or hardware fails.

There will be a new release in about an hour (or less) for 1.10.4 RC1

Thanks!

bobbyc
12-16-2002, 02:58 PM
Cool, can't wait. I should get some flash cards for hard drives, huh? ;)
Bob C

tony
12-16-2002, 03:01 PM
It was just released a few minutes ago. Enjoy.

dkii
12-16-2002, 04:11 PM
I was just about to post a nice long message about weatherproofing, till I realized it was OT for this thread. Goto http://forums.star-os.com/viewtopic.php?p=1020#1020 to read it.

bobbyc
12-16-2002, 11:13 PM
One last question: is there a way to remotely activate the newly loaded .bin firmware using the starutil, or can that only be done by SSH?
Thanks, Bob C

tony
12-16-2002, 11:15 PM
One last question: is there a way to remotely activate the newly loaded .bin firmware using the starutil, or can that only be done by SSH?
Thanks, Bob C

Since there are questions involved in this process, it has to be done via SSH.