View Full Version : WAR board software
lonnie
11-14-2005, 11:52 AM
We have discovered some major flaws in the VxWorks stack that are affecting bridging.
Routing is stable but RIP has some issues, so we are going to do a major rewrite of the TCP stack and bridging modules. This will run parallel to the v3 release which will share the same stack code.
This is going to take 4 to 6 weeks to resolve.
oscarBravo
11-14-2005, 03:14 PM
Routing is stable but RIP has some issues... Just curious: are you using quagga for RIP, or reimplementing the protocol from scratch?
Obviously I'm wondering about OSPF in StarVx.
tbutcher
11-14-2005, 03:44 PM
How major are the problems? Are you recommending that we hold off on any new installs until this problem is fixed?
bobbyc
11-14-2005, 04:23 PM
Routing is fine, if that's what you're doing.
Bob C
lonnie
11-14-2005, 07:09 PM
VxWorks has their own implemtation of RIP and OSPF. It is not quagga or zebra or any other package.
oscarBravo
11-15-2005, 05:05 AM
VxWorks has their own implemtation of RIP and OSPF. It is not quagga or zebra or any other package. Interesting. Does it implement a similar telnet-style interface with IOS-style configuration?
ianek
11-15-2005, 08:26 AM
Helo,
please how serious is the RIP issue in WAR. I want to switch Wrap for WAR boards, but i need RIP to be working for 60 routes .
Thanks
Jan
lonnie
11-15-2005, 09:16 AM
RIP is giving us intermittant stops and requires a restart to get it going. We have been unable to isolate the cause. It is an ongoing search.
RIP does not use an IOS type shell, but rather our simplified GUI to pass it parameters.
Helo,
please how serious is the RIP issue in WAR. I want to switch Wrap for WAR boards, but i need RIP to be working for 60 routes .
Thanks
Jan
lonnie
11-15-2005, 09:30 AM
I would not put them in remote locations or real critical spots until we get a fix for the TCP stack.
How major are the problems? Are you recommending that we hold off on any new installs until this problem is fixed?
ianek
11-15-2005, 03:25 PM
I hope you get it working fast, 'cause i ordered some to speed up our backbone and static routing is really unusable for me .
Jan
lonnie
11-15-2005, 05:20 PM
It is a priority and we will have it as soon as possible.
I hope you get it working fast, 'cause i ordered some to speed up our backbone and static routing is really unusable for me .
Jan
I mistakenly assumed bridging would be the more stable configuration so I put in a couple of links for a new backhaul. The devices will just stop talking to each other periodically. Unfortunately one of them is at 8000 feet and a good long snowmobile ride away. I can static route if necessary, but I really need to know this will stay up. How sure are you of the routed solution? The last couple of days left me with a really bad first impression of the WAR platform. If switching them over to routed will fix it I am fine, if not then I have little choice but to pull them out.
pti-andy
11-15-2005, 09:11 PM
I have experienced similar issues with bridging. I tested the VX platform in the lab for about a week with no issues. When I installed it on our tower for our main backhaul it ran for about 6 hours and then started sending broadcasts out to ever switch port that had a link. MRTG showed about 200K to 1M of broadcast traffic to ports that had no other traffic on them. It caused the CPU utilization on our Cisco to increase 10 fold. The link would go up and down until the radio locked up. I did quite a bit of driving back and forth to power cycle the WAR to bring it back up. I tried everything to stop the broadcast traffic and finally ended up setting the gateway address to 0.0.0.0 instead of the previous valid gateway. Keep in mind this is a bridge so the gateway address should only be used for management purposes. This stopped the broadcast traffic and the link will stay up as long as I don't go into the management interface. If I run a bandwidth test with the interface between the two radios it will bring down the link and cause both radios to become unreachable from the Ethernet side. After the link has been up for a few hours the Ethernet interface on both radios will become unreachable all on it's own but the link does stay up. The only thing that will bring it back is to power cycle.
None of this happens in the lab! It seems to have issues dealing with lots of IP traffic or multiple bridged MACs, not sure which. This is about the only difference between my lab tests and our production environment. Right now the link has been up for over 12 hours passing traffic but I still can't reach the radios themselves. I'd do anything to get this fixed, as I can't afford to hire my tower guy again to switch them out. I was planning on replacing more than 20 other bridges with the WAR boards due to the channel cloaking benefits but obviously can't due to this problem.
BTW, this may be the wrong place to ask but will V3 staros have channel cloaking? If so this would make my life easier since I already have dozens of WRAP boards.
-Andy
PTI Wireless
These are the exact same symptoms I was seeing. I was logging in to leave a similiar analysis when I found this thread and got the answer. This does not seem like a difficult problem to create. I was seeing it on all of the links I put up. My only conclusion is that the real world testing was limited.
lonnie
11-16-2005, 01:48 AM
Real world testing with bridging was limited, and I apologize for that. We have a fully routed network and thus bridging was done between a few key points. We saw no issues. Guys with large networks are having trouble with bridging and we are rewriting the bridge code.
In general you would think that bridging is simple, but in reality there is more code in the bridging module than the routing module. There is less actual processing so throughput is higher, but more code is executed.
We are also considering a radical move and have v3 ported to the WAR boards as a way to get a full system quicker. We also know the bridging code and routing is stable under Linux. Either way, rewrite VxWorks bridging or convert our VxWorks driver to Linux we are looking at 6 weeks to get a stable release.
I am leaning to the v3 port since that would bring us back to a single code base on multiple platforms.
nelson05
11-16-2005, 09:54 AM
I like the way you are leaning! I think a return to a single code base would be a very good thing and would definitely be willing to give up some of the performance you were able to eke out with VxWorks to see this happen. It would simplify a number of the choices a user faces when selecting equipment and software for a new or replacement network by making part of the equation more consistent. While there will always be hardware platforms that enable functionality or performance not available elsewhere (WAR), currently, there is a tradeoff on the software side where a number of valuable features are split between the different flavors of Star. I know that the WAR was originally released as a backhaul solution, but it appears you see it evolving into an Access Point with the addition of DHCP and a CPE with the release of a single port model. Moving to a single code base would accelerate this evolution and also allow many of the features currently only available on the WAR platform, to make their way to the WRAP and PC. I'm also pretty sure you'd appreciate not having to write any more royalty checks to Wind River.
One last comment which is semi-related..... Would such a switch mean that we have to give up the cool feature you recently introduced on the WAR where the signal strength in an antenna alignment is represented using the onboard LED? I didn't see anyone else comment on how cool this is!
pti-andy
11-16-2005, 10:39 AM
I'm all for that! I would love to be able to use my current WRAP boards for our basic client links. I also really like the features of the new StarVX. It is the performance of the radio that is the key. If cloaking (5Mhz and 10Mhz channels) will be supported in the new V3 then it will be much easier to convert my current network to this superior platform. The issues of WRAP vs WAR makes no difference to me since I'd only use the WAR on backhauls and high speed links. If all of the current starvx features were available on the WRAP V3 platform I could convert our entire network over. Also, transparent bridging is a must for me if I were to move all of our radios to star. We started this topology years ago and can't change it now.
Any news yet if cloaking will be available for WRAP?
Thanks,
Andy
PTI Wireless
ianek
11-16-2005, 12:40 PM
Helo, so you can tell me in routed network the RIP has no problems ? As i use only route, NO bridge.
Thanks
Jan
I really like the way this thread is going. I think we actually had a similiar conversation in Las Vegas. Unless you are doing huge volume, splitting the code base is a bad choice because you are also splitting your development time. Plus with vxWorks you are giving up the use of a lot of open source code that works fine for what it is designed for. I think your best bet is to write the best radio drivers you can and leverage the BSD code to the max. With BSD you can change the kernel with no licensing issues and still use the open source apps where appropriate. The performance on the vx platform is solid, but writing everything yourself means that having a full feature set is a long way off.
I would also like to make a request for an https interface instead of the windows ap. Don't like it, probably never will.
stephenpatrick
11-16-2005, 04:59 PM
Jeff,
IMHO your points are right on, 2 code sets are of course a resource drain - but again from a Star-OS and some user perspectives things differ
- IXP42x is a much "better" (faster) platform than x86 GEODE
- Fast x86 platforms that can tower-mount are very expensive and run hot
- professional products do not use x86 for a number of reasons
For Star-OS to evolve, it must break away from x86 platforms eventually.
Look at WLAN. No consumer device uses x86. Most are ARM.
WIMAX. No x86 in sight, again, most CPUs are cores chosen for optimal performance/heat/cost.
To put in perspective, Valemount are offering a "public beta" of a potentially very valuable solution. Don't consider it "production" until some degree of stability is demonstrated to be there. In the meantime, hanging "live customers" on such equipment is risk-taking only you can sanction.
Don't let me sound sour: it's a great platform, and I can almost "sense the sweat" the developers have put in so far. Amazing results, with 97Mbps wireless bridged throughput seen here in the lab.
Just needs debugging and "real world" features added.
Keep up the good work -
And perhaps a "green flag" on the code set when we've reached "customer ready" status.
Regards
nelson05
11-16-2005, 05:13 PM
No argument against the WAR platform here... just the observation that development would be easier, with features coming to all platforms sooner, if there was one OS. As Lonnie noted, moving back to Linux, "would bring us back to a single code base on multiple platforms."
palmczak
11-16-2005, 11:42 PM
V3 on the new hardware, now you are talking! Hell V2 on that board will blow the doors off most anything.
" Either way, rewrite VxWorks bridging or convert our VxWorks driver to Linux we are looking at 6 weeks to get a stable release."
Hmmm..... 6 weeks, that puts us pretty close to the 1 year ANNIVERSARY of the V3 release date.......;-)
lonnie
11-17-2005, 07:48 AM
Suffice to say we are not happy about this either.
V3 on the new hardware, now you are talking! Hell V2 on that board will blow the doors off most anything.
" Either way, rewrite VxWorks bridging or convert our VxWorks driver to Linux we are looking at 6 weeks to get a stable release."
Hmmm..... 6 weeks, that puts us pretty close to the 1 year ANNIVERSARY of the V3 release date.......;-)
therealboss
11-17-2005, 10:23 AM
Will the WAR's run fine if only routed and not bridged? We have 20 in the post and if there is a problem with Bridging and a problem with RIP, then I'm in the sh*t. It was me that told the boss we had to get the WAR's to solve problems we have with WRAP's on our backhaul.
Can you clear this up for me, if I set the the WAR as a routed system (running RIP) without any bridging, will I have problems or should I be OK?
Jeff,
IMHO your points are right on, 2 code sets are of course a resource drain - but again from a Star-OS and some user perspectives things differ
- IXP42x is a much "better" (faster) platform than x86 GEODE
- Fast x86 platforms that can tower-mount are very expensive and run hot
- professional products do not use x86 for a number of reasons
To put in perspective, Valemount are offering a "public beta" of a potentially very valuable solution. Don't consider it "production" until some degree of stability is demonstrated to be there. In the meantime, hanging "live customers" on such equipment is risk-taking only you can sanction.
Regards
I'm trying to figure out how anything I said could be interpreted as a request to stay with x86. You must not be a developer, bsd and linux both use a unified code base that is then targeted for whatever platform you want. There is always some conditional code that is specific to a given platform, but the vast majority of the code is common.
As far as public beta goes they bought their vxWorks license a year ago. I asked Lonnie many times about the status of the code base and waited months before deploying any. You have to take the plunge at some point. The types of bugs being reported seemed to indicate that the basic features had become stable which is all I need for a backhaul. Based on several years of experience with Star-OS (and for that matter VxWorks) I am actually surprised that something this major slipped through. Their quality control is generally quite good.
go.fast
11-17-2005, 11:17 AM
I have to admit that I knew going in that the war boards initial release would be beta for the first few months.
I bet we all knew this.
For me, I've just used simple Point to Point links and have had great success.
I have links up that are just as stable as anything out there.
And I've got a couple that have rebooted on their own from time to time.
Only a couple times have I had to power cycle a radio to get it to come back.
So, I'm confident in the platform and realistic about their use.
I'm sure the boys from valemount will get the wars smoothed out in time. it's still early in the game.
George
stephenpatrick
11-17-2005, 12:09 PM
I'm trying to figure out how anything I said could be interpreted as a request to stay with x86. You must not be a developer, bsd and linux both use a unified code base that is then targeted for whatever platform you want. There is always some conditional code that is specific to a given platform, but the vast majority of the code is common.
Hmmm - I think we are talking cross-purposes (and yes, spent last 10 years as a developer/vendor in wireless industry:-) IMHO the improvement Valemount have made is through VXWORKS coupled with the CPU platform. We use another OS on x86 and MIPS4k based on Linux and even with 2GHz CPUs we don't get the throughput that has been achieved here (97Mbps in the lab!)
Well sure I **do** understand the issues of code set vs platform: we don't write our own radio OS, but we have developed and maintain 10's of megabytes of code supporting our long-shipping products (laser/infrared) where such platform issues have bitten in the past.
The problems really hit when a new OS/complier/stacks are used and bugs appear such as we have here.
I think Valemount are doing a great job - the WAR has real promise. IMHO the current release should be titled "beta" then we would expect bugs and be cautious when deploying. The software norm of alpha-beta-RC-production should be used - site visits are expensive - and lost customers even more so.
Regards
pti-andy
11-17-2005, 12:10 PM
After reading all of these posts I'm still left with the same question. Will the new V3 of StarOS (for WRAP) support the features of the StarVX platform, such as channel cloaking etc? If so then it will bring a much wider market for both products since the thousands of existing WRAP systems will not have to be replaced right away. Not many WISPs out there can swap out their entire infrastructure at once. Being able to add some WAR's here and there when the performance is needed will allow a transition of product and thus promote the use of the new platform. Which OS is being used on what board is an interesting topic but I'm more concerned about the overall performance and compatibility of the system with existing systems. I certainly hope we are not considering moving StarOS over to the WAR instead of continued WRAP support. If so, this would be shooting yourself in the foot.
Andy
PTI Wireless
lonnie
11-17-2005, 01:32 PM
I still cannot answer the question about cloaking. We are obligated, by contract, to not deploy that feature on the x86. I am discussing having that dropped.
After reading all of these posts I'm still left with the same question. Will the new V3 of StarOS (for WRAP) support the features of the StarVX platform, such as channel cloaking etc? If so then it will bring a much wider market for both products since the thousands of existing WRAP systems will not have to be replaced right away. Not many WISPs out there can swap out their entire infrastructure at once. Being able to add some WAR's here and there when the performance is needed will allow a transition of product and thus promote the use of the new platform. Which OS is being used on what board is an interesting topic but I'm more concerned about the overall performance and compatibility of the system with existing systems. I certainly hope we are not considering moving StarOS over to the WAR instead of continued WRAP support. If so, this would be shooting yourself in the foot.
Andy
PTI Wireless
palmczak
11-17-2005, 03:31 PM
We use another OS on x86 and MIPS4k based on Linux and even with 2GHz CPUs we don't get the throughput that has been achieved here (97Mbps in the lab!)
I also use that other OS, and it should be left for wired routing. The wireless implimentation (standard based 802.11x) is inferior to StarOS. Especially the latest release. We have recently removed them all from service and replaced with StarOS. With no other changes (antennas and cables and cards remain the same) ALL the links that had trouble under that OS now work.
It would be interesting to see just how much of the performance was attributed to VX Works as opposed to Linux. My guess is some, but most of the magic was in the Atheros driver, and my understanding is that if/when ported will give the Linux platform much of the performance.
Joe
lonnie
11-17-2005, 10:32 PM
The magic was in the driver. We spent more than a year of developing and tweaking and VxWorks was awesome because it forced to look at things differently. We thought we were doing things that we could not do in Linux, but now that we see the weaknesses in VxWorks we have discussed the whole OS issue and we will be able to duplicate a lot of the "really cool and neat" things we did in VxWorks. Call it a year at the HKU (Hard Knocks University) and we have graduated with honours.
In a few weeks we will have a driver that will be the result of a LOT of tweaking. Sure it will be beta at first, but everything must start somewhere. The is really the seventh rewrite we have done, covering three operating systems and 6 years of doing what we do.
Sure, I wish this had not happened, but at the same time we will be stronger and the product will be better. Every other rewrite was better than the one it replaced, and this will be no different. In the grand scheme of things --> what's another couple of months? I am not trying to be flippant, just recognizing that is better to have it right than to have it sooner.
I still cannot answer the question about cloaking. We are obligated, by contract, to not deploy that feature on the x86. I am discussing having that dropped.
Mikrotik now has cloaking on an X86 platform. I cannot imagine how you can be prevented from doing it when others have done it or are doing it.
But, if you're going to not support the channel width variables, it is going to become almost imperative to move to another OS. I don't think you realize the immense importance this has in the future to your customers.
Skaught
11-17-2005, 11:39 PM
I am 100% for V3 on the WAR. I also really really wish I could keep SSH but that is not a deal breaker for me.
ianek
11-18-2005, 12:45 AM
Seems Noone can answer if RIP will run fine in only routed networks ?
Jan
palmczak
11-18-2005, 08:18 AM
While cloaking is an amazing feature. Not having it on x86 is not the end of the world.
If the feature can only be implimented on Non-x86 hardware then mult-card WAR AP's are installed that can talk to all existing WRAPs, while the additional cards run cloaking and clients that need that feature are converted to a Non-x86 CPE. While it is less than Ideal, it is a viable upgrade path. The big deal (for us) is excellent Atheros to Prism compatability and the Prism driver needs to be available in the WAR.
II still cannot answer the question about cloaking. We are obligated, by contract, to not deploy that feature on the x86. I am discussing having that dropped.
I think I understand the implications of Lonnie's contract...
lonnie
11-18-2005, 08:18 AM
It runs fine and it does not run fine. We have systems that continually drop routes and need RIP restarted yet we have systems that have NEVER done bad things. It has been driving us crazy and as a result we have taken the step of this move to Linux.
A RIP watchdog is not good enough because RIP is still working, it just loses some routes. My advice is to wait.
Seems Noone can answer if RIP will run fine in only routed networks ?
Jan
lonnie
11-18-2005, 08:27 AM
Agreements have nothing to do with "what the other guy has". We are working on the guys who paid us to do it and they are the ONLY ones who can release us from the contract.
The fact that we now have to convert to Linux and they only rights to the VxWorks code which we will no longer support, gives us a good bargaining position. But as you can imagine they have no great desire to simply give us that item for free. It is going to cost us, and that is what we are working on now.
I realize this is a big thing. That is why we developed it in the first place.
Mikrotik now has cloaking on an X86 platform. I cannot imagine how you can be prevented from doing it when others have done it or are doing it.
But, if you're going to not support the channel width variables, it is going to become almost imperative to move to another OS. I don't think you realize the immense importance this has in the future to your customers.
lonnie
11-18-2005, 08:40 AM
The prism cards are only being used because of their higher power. Now that high power Atheros are available it is not an issue. The Atheros have better performance and if connecting to anther Atheros they are the BEST you can get.
The contract is simple. We were paid a lot of money to develop cloaking for the WAR boards under VxWorks. One of the terms was that we could not bring the feature to x86, since the customer wanted an exclusive feature for the new boards.
It was maybe not the best thing, but it sure kick started our capital base and has allowed us to go big in the WAR boards.
If we are allowed to add the feature for x86, at the end of the day people will still have better service from the WAR board. It is the next generation of processor and does a better job.
While cloaking is an amazing feature. Not having it on x86 is not the end of the world.
If the feature can only be implimented on Non-x86 hardware then mult-card WAR AP's are installed that can talk to all existing WRAPs, while the additional cards run cloaking and clients that need that feature are converted to a Non-x86 CPE. While it is less than Ideal, it is a viable upgrade path. The big deal (for us) is excellent Atheros to Prism compatability and the Prism driver needs to be available in the WAR.
I think I understand the implications of Lonnie's contract...
lonnie
11-18-2005, 08:43 AM
In order to speed the new release we are going to try and preserve the SSH interface, since it is already mostly done and ready.
I am 100% for V3 on the WAR. I also really really wish I could keep SSH but that is not a deal breaker for me.
Lonnie
How much memory is available on the WAR platform. I know one of the reasons you liked VxWorks was the compact kernel. Will existing WAR boards be able to run the full system. I'm also hoping you can figure out a recovery mode since the expanded platform will have additional points of failure (shell access in particular). Do you program the flash using the JTAG interface. If so I'm guessing that everyone who buys quantity would be willing to spend some money on a programmer just to have for emergencies. I was also curious if any of the headers have spi or pio connections. With shell access I would like to look at extenal power control to supplement the watchdog.
lonnie
11-18-2005, 12:57 PM
VxWorks is tiny, but we can get Linux stripped to be "Small Enough"TM to fit in the flash size we have. In the v2 Linux we used PHP and Apache, whereas our scripting will now be a TCL like script language and GoAhead. There are other things that really have large ram and we can safely replace and have a smaller footprint. Ram usage at 32 MB will be more than adequate for the dual and the quads have 64 MB.
We will do a VxWorks upgrade image that will upload the Linux image and write to flash and save the settings. This will be an in place update and change of OS.
We have figured out a recovery mechanism, but at this time I will not go into details about it.
The headers that you might see on early boards will not be there on later production releases. At the end of the day this is an AP or Client with some options to make life easier for the ISP.
JTAG will not be necessary and likely will not be enabled on final production platforms. It is there while we are in development, but at some point it becomes a needless expense and another point of failure.
Lonnie
How much memory is available on the WAR platform. I know one of the reasons you liked VxWorks was the compact kernel. Will existing WAR boards be able to run the full system. I'm also hoping you can figure out a recovery mode since the expanded platform will have additional points of failure (shell access in particular). Do you program the flash using the JTAG interface. If so I'm guessing that everyone who buys quantity would be willing to spend some money on a programmer just to have for emergencies. I was also curious if any of the headers have spi or pio connections. With shell access I would like to look at extenal power control to supplement the watchdog.
Are those options going to include temp and voltage sensors as well as an external watchdog reboot. If not how about at least leaving a serial port that cards could connect to. With shell access and a serial port there are lots of things that could be made to work.
What is the flash size of the WAR boards?
lonnie
11-18-2005, 02:51 PM
The voltage and temperature sensors on the Gateworks design were dropped for the final production version. The serial port was also dropped.
The flash size will be "Big Enough"TM for what we are doing. I will not commit to any numbers since we do not plan on having more flash and thus cost than we really need.
Are those options going to include temp and voltage sensors as well as an external watchdog reboot. If not how about at least leaving a serial port that cards could connect to. With shell access and a serial port there are lots of things that could be made to work.
What is the flash size of the WAR boards?
phendry
11-18-2005, 04:03 PM
Just when I was getting used to a basic high speed solution for backhaul and customer point-to-point solutions. Will we see the same speeds through the WAR's running BSD as we do with VX? After all, the core of a network should be design to switch those packets as quickly as possible and nothing else.
I'm not sure I support a move back to BSD if it adversly affects the WAR boards performance. I guess I would have to downgrade any new boards from V3 back to the current VX release ;)
lonnie
11-18-2005, 09:52 PM
Please don't panic and just let us do our job before you make any decisions. At this time you have zero information to base any decisions on so don't make any decisions at this time. OK? Wait and see what we can come up with. After all, we are not known for under performance, and this is really the first real messy release we have had.
Cut some slack, OK, Eh?
I'm assuming that with the crunch to get this out the door that a better snmp agent is not even on the list. Does that mean that starutil will remain the only way to get configuration and status out?
lonnie
11-20-2005, 06:46 PM
Yes, that would be a safe assumption.
I'm assuming that with the crunch to get this out the door that a better snmp agent is not even on the list. Does that mean that starutil will remain the only way to get configuration and status out?
schatnet
11-27-2005, 09:33 PM
I started using RIP on my WAR board and have experienced teh problems that you are talking about.
Is there any way that you can add more Static Routers for a temp solution until you get RIP working correctly?
I have tried to trim my network down to 5 router but I need at least 7
--
Or if I had some static routes and some RIP routes will the static routes stop working when the rip routes stop?
kbldawg
11-29-2005, 09:41 PM
I started using RIP on my WAR board and have experienced teh problems that you are talking about.
Is there any way that you can add more Static Routers for a temp solution until you get RIP working correctly?
I have tried to trim my network down to 5 router but I need at least 7
--
Or if I had some static routes and some RIP routes will the static routes stop working when the rip routes stop?
I agree, more static routes would be great!