PDA

View Full Version : Beta 2.10.1b5-4746 is ready


lonnie
02-12-2006, 09:02 PM
BETA downloads are now available.

Areas of focus while testing this release are:
*) new quagga 99.3

Please post your results in this thread. Given the test results from the areas listed, either a new beta will be released, or this version will be renamed to 2.10.1 and made public on our website downloads page.

Router WRAP Edition
http://www.star-os.com/downloads/oem-vnc/strrw-2.10.1b5-4746.bin


Router Desktop Edtion
http://www.star-os.com/downloads/oem-vnc/strr-2.10.1b5-4746.bin


Server Desktop Edition
http://www.star-os.com/downloads/oem-vnc/strs-2.10.1b5-4746.bin


Router Routerboard (x86) Edition
http://www.star-os.com/downloads/oem-vnc/strrb-2.10.1b5-4746.bin


Router Soekris Edition
http://www.star-os.com/downloads/oem-vnc/strrs-2.10.1b5-4746.bin
(http://www.star-os.com/downloads/oem-vnc/strrs-2.10.1b5-4746.bin)
(http://www.star-os.com/downloads/oem-vnc/strrs-2.10.1b2-4714.iso)

billr
02-13-2006, 01:22 AM
WHAT does the Quagga watchdog do please..

Excuse my ignorance !!

oscarBravo
02-13-2006, 07:25 AM
Excellent!!

Expect reports back in a couple of days.

tony
02-13-2006, 07:56 AM
It is something that keeps an eye on the routing daemons and ensures they do not quit without notice.

WHAT does the Quagga watchdog do please..

Excuse my ignorance !!

billr
02-15-2006, 01:18 AM
Thanks !

(PS Quagga - isn't that one of these recently discovered, long extinct dinosaur animals things ?!?!)

butchkemper
02-15-2006, 06:03 AM
No, it is an extinct relative of the Zebra.

Butch

lonnie
02-15-2006, 06:45 PM
I am going to assume that quagga 99.3 fixed the problems that oscarbravo was having. Another problem is closed.

oscarBravo
02-16-2006, 08:59 AM
Whoah, not so fast there Lonnie. It takes time to test beta software in a production environment.

I'm not 100% sure that the OSPF problems are gone, but there's a much bigger problem with this release: I've had to downgrade the units I upgraded.

Our backbone is built on 10.x addresses, with VDS tunnels used to route public IPs to the customers. The AP router has a tunnel endpoint bridged to an AP radio, and the CPE does NAT from the public address to 192.168.1.0/24.

When I upgraded the AP routers, the clients stopped passing traffic. They re-associated, but I couldn't ping them, and they couldn't get to the Internet. I added a public IP to the AP radio to check whether it was a VDS problem, but I still couldn't ping the clients. When I downgraded, traffic started flowing again without problems.

I don't know if this is related to some of the changes in the previous beta build. I'm going to re-read that thread now.

lonnie
02-16-2006, 09:56 AM
The beta build was simply changing to the new quagga.

oscarBravo
02-16-2006, 10:28 AM
Didn't it include any of the changes made in b2? The symptoms correspond to those in a post (http://forums.staros.com/showpost.php?p=28347&postcount=36) by Skaught.

We have a number of routers with the bridge-tunnel configuration. Those I upgraded stopped working, and started working again when I downgraded. Those I didn't upgrade just kept working.

lonnie
02-16-2006, 10:56 AM
Of course the latest beta includes b2. The problem Skaught had was related to Tranzeo not liking Atheros. It is well documented and much discussed.

Scott did two things. He upgraded to the beta AND he changed a 2511 card to a CM9 card for the AP. Did you do those two things?

Ultanium
02-16-2006, 11:23 AM
Not trying to hijack here, but are you sure this isn't the ongoing arp issue we all have? When the clients associate, do you see 0.0 for their ip in the wireless association list? If you keep saving and applying changes, something finally starts working (bridging?), the ip address shows up, and then you can ping the client.

Tom

Ps: this is ONLY with Atheros cards.

Skaught
02-16-2006, 12:04 PM
My only workable solution was to abandon Atheros on my existing sectors. In fact I am now building a whole new network based on 5ghz. My 2.4 clients will remain but they will have the option to upgrade if they pay the difference.

In the mean time I am using Rotennas and CB3s as they are prisim and tranzeo has had shortages of the 200. The 200 is just a CB3 anyway. The circuit boards are even the same rev.

oscarBravo
02-16-2006, 12:57 PM
Of course the latest beta includes b2. The problem Skaught had was related to Tranzeo not liking Atheros. It is well documented and much discussed.

Scott did two things. He upgraded to the beta AND he changed a 2511 card to a CM9 card for the AP. Did you do those two things? These APs already had CM9s in them.

The CPEs in question are Tranzeo CPQs. They don't have a problem with Atheros APs in 2.10.0 - what changed in the beta to cause this problem?

lonnie
02-16-2006, 01:34 PM
We added an option for long/short preamble. It is documented in the beta release announcement.

aldo
02-17-2006, 04:33 AM
I am going to assume that quagga 99.3 fixed the problems that oscarbravo was having. Another problem is closed.

this does not remedy the issue with quagga not working correctly on some embedded flashes.

cannot clarify if this does fix the problem on a clean flash only time will tell.
i would like to see a bit of wait time on this as quagga is a bit of a beast and onlly time will tell.

I feel a bit perplexed by this discussion about an untested release. it seems that we have all paid for a product that supports x y and z yet the developers seem to think that there is no obligation to ensure that these work effectively. (sorry for the hard line please understand this is not a personal attack but a organisational / development critique of the product)
Where is the logic in this thought ?. We have deployed staros for testing and paid for our licences. presently i am not satified that it is capable of doing the job. Additionally on reading the forums i feel even less statisfied that i will ever be able to do the job, and am surprised that so many people use it.

I personally can attest to two issues discovered during a month of testing

(1) the quagga issue does not seem directly related to quagga as we have bsd boxes that run 0.99.3 with no trouble as well as previous versions.
(2) there is an issue with duel wireless cards and stablity and it seems that this is unwilling to be addressed either.

Personally we will wait for a short time before reviewing the status of staros again and see if it can produce the product it is advertising itself as for the moment we will leave our tests in place and see how they perform over more time. in the interim our development team will go back to uor freebsd access ponts and continue the work on them in our production enviroment.

Disapointtingly as a lot of people have pointed out this seems redundent, as the staros team seems to want support of a comminuty project but will not allow the community to input into the codebase.

regards

alan

bradg
02-17-2006, 09:29 AM
OK, it's either a really odd coincidence, or there may actually be some issue with Atheros on the latest beta.

Four times since Tuesday night (when I pushed the beta out to about 8 units on one wireless segment) have I had issues with two PtP links going wonky - one 9.4 miles, the other 16 miles. The 16 mile link has given trouble once, the 9 mile link three times now, but both exhibit the same symptoms. Neither have had any trouble in the past.

The symptoms are that the station end will show no association at all, or very low quality, the AP will show that it's basically dumb and happy, with the station associated with the usual SNR, and activity resetting the idle counter every few seconds. In the logs on the station with low quality or no association, you see "kernel: wpci0: NOTICE - SoftBmiss, no beacons from AP detected." every few seconds. The trouble begins to appear after I see several (usually in a row) "wpci0: reset the hardware to load new rfgain values", and in the latest case when association drops completely at the station, "kernel: wpci0: assoc failed, code = 0x4". There is no indication of any badness at the AP that I can see.

Both links have good qualities, the 9 mile link runs 30/30 or better, the 16 mile link is at 26/26 or better. Both links use Pac Wireless 29dB solid dishes, CM9's, and WRAP boards identically configured. Power on the WRAP's isn't an issue (it never has been for me), using industrial temp range DC-DC converters for 48V->15V supplies, the boards are not rebooting.

On the other network segments that haven't been upgraded to the latest beta, I've had no issues. They use similar to identical hardware for the most part, and have been exposed to the same weather and temperature. The weather has been good, but a bit cold the last 24h (much warmer Tuesday and Wednesday).

Each time it's happened, all that's required to "fix" it is an activate changes - not a reboot or power cycle. As soon as that's done, the station reassociates, full quality returns, and everything is good again - for a variable period of time from all day, to maybe a few minutes.

As I've been writing this, I've been fighting with the problematic link, and finally downgraded to 2.10.0 at the AP end to see if that helps any.

As a result, unfortunately I can't say if OSPF is any better or not since all affected links won't stay stable to test right now.


Brad

hopp
02-23-2006, 01:40 PM
Bradg. What version did you upgrade from?

We have this problem on 2 strong atheros ptp links with build 4693. A quick activate changes fixed it for us as well. I think there must be some sort of driver anomaly (multi-threaded starvation? race-condition?) which causes our links to fail.

Marlow
03-03-2006, 05:10 AM
No changes on the ospf front. The boxes still randomnly stop exchanging routes. A reboot might bring that back, might not. The box might entirely stop to exchange routes until reflashed and set up from scratch.

/Martin List-Petersen

bminish
03-07-2006, 12:03 PM
No changes on the ospf front. The boxes still randomnly stop exchanging routes. A reboot might bring that back, might not. The box might entirely stop to exchange routes until reflashed and set up from scratch.

/Martin List-Petersen

I have never ever had a box with the OSPF bug require a reflash.
I don't know how you are configuring things but the bug that we are seeing is that OSPF will on occasion loose a wireless interface(s), generally applying changes on it's NEIGHBOUR(s) will get things going again (but may in turn case problems on that neighbouring node )

This beta with 99.3 Quagga re-converges much quicker than 98.5 did but the underlying problems persist.

.brendan

bminish
03-07-2006, 12:08 PM
There is a user interface related BGP bug with this beta.

Basically the default configuration as on a fresh install can run ok but if you select factory reset it returns BGP to a state that will not run. without BGP running one cannot configure BGP since it's impossible to telnet in to it's interface if it's not running.
You have to reboot the box before you can get back in to the BGP management interface
.brendan

bradg
03-07-2006, 12:41 PM
I have never ever had a box with the OSPF bug require a reflash.
I don't know how you are configuring things but the bug that we are seeing is that OSPF will on occasion loose a wireless interface(s), generally applying changes on it's NEIGHBOUR(s) will get things going again (but may in turn case problems on that neighbouring node )

This beta with 99.3 Quagga re-converges much quicker than 98.5 did but the underlying problems persist.

.brendan

I'm finally going to chime in to confirm that I see the same symptoms as Brendan does.

I've been quiet, doing a *lot* of work on various configurations and scenarios, and in every case so far, it's exactly as reported - a router will go "into the weeds" and stop exchanging routes to a neighbor, and an activate changes will usually bring it back, but about 50% of the time will then cause the next neighbor deeper in the network to give trouble.

I've gone so far as to statically route a good chunk of the wireless network (where it made sense to, anyway), and run OSPF in a small area immediately around the core (area 0), and it still acts up - it seems to take longer to show up, though. But, it doesn't seem to have a pattern that I can tell, which is frustrating to pin down.

On a separate note (that partially hijacks the thread, my apologies), I've also had to turn up two Mikrotik AP's because I absolutely had to have proper DHCP relaying in these points and virtual access point functionality for transition and backwards compatibility reasons, and found that they do not want to talk to Star-OS units encrypted with WPA (AES or TKIP) - CM9's on each end. I'm currently unsure if this is an issue with configuration, or software. They *DO* talk unencrypted or WEP encrypted just fine, so that's leading me toward it being a WPA only issue, and I do not know if it's an issue with earlier releases (latest v2 stable) or just the beta.


Brad

robert
03-22-2006, 12:21 PM
The 0.0 ip address problem persists. Amazingly, I've seen it with MT as well as StarOS, so personally I believe it's a problem with Atheros not with the systems, unless they share drivers ( rumors are that they do. ) We've gone to Prism 2511 for the AP in most cases, away from atheros. This has been an issue with Tranzeo(prism), EZBridge(prism) and Tranzeo(CPQ/Atheros), StarOS(atheros), and StarOS(orinoco/prism) clients. It gets worse the more clients you add to the AP and seems to max out the associations at around 25 clients. Occasionally when it's in that mode we do see clients get IP addresses and they have extremely long ping times (3000-4000ms). I sure wish they could figure what is going on there, as this has completely shot down Atheros in our eyes...

Marlow
03-27-2006, 02:56 PM
I have never ever had a box with the OSPF bug require a reflash.
I don't know how you are configuring things but the bug that we are seeing is that OSPF will on occasion loose a wireless interface(s), generally applying changes on it's NEIGHBOUR(s) will get things going again (but may in turn case problems on that neighbouring node )

This beta with 99.3 Quagga re-converges much quicker than 98.5 did but the underlying problems persist.

.brendan

Matter of fact, your problem is that you run non-broadcast, which has serious problems with Star-OS as you describe, especially in point-to-multipoint. Neighbor negotiation is odd there sometimes.

I use broadcast on the interfaces which works fine, as long as I stick with Atheros and Prism 2.5 based, not with Hermes and Ruby. However boxes may due to powercut or the likes just suddenly stop to work with broadcast, this is non recoverable (ergo: stays on reboot, power-cycle), unless you reflash the box. Actually, I've come as far, that I can reflash the boxes via the firmware upgrade feature on the lan, as long as I don't keep the configuration and reconfigure things from scratch. Beyond that, this seems currently the only way to fix these boxes. Looks like corruption of the configuration storage of Star-OS that is the reason.

In case the broadcast dies, changing configuration to non-broadcast will always work but result in the problems you have.

Also a tcpdump on both sides clearly shows, that the broadcasts are transferred from box A, received on box B, box B answers, but box A never sees the packets that box B send, so clearly a problem in the layer underneath the tcp stack.

/Marlow

oscarBravo
03-27-2006, 03:23 PM
Matter of fact, your problem is that you run non-broadcast, which has serious problems with Star-OS as you describe, especially in point-to-multipoint. Neighbor negotiation is odd there sometimes. ... Also a tcpdump on both sides clearly shows, that the broadcasts are transferred from box A, received on box B, box B answers, but box A never sees the packets that box B send, so clearly a problem in the layer underneath the tcp stack. I'm confused. When running in non-broadcast mode, there are no broadcast packets; all hellos are unicast.

Marlow
03-31-2006, 07:54 PM
I'm confused. When running in non-broadcast mode, there are no broadcast packets; all hellos are unicast.

Well broadcast might have been the wrong term :). I ment HELLO and ACK. Put it down to being quite tired when getting the time to write posts.

/Marlow