View Full Version : Riddle me this OLSR man
David L. Vrablic
10-05-2008, 05:04 PM
OK OLSRman riddle me this:
I have a X4000 Dual path repeater running 1.3.23b in full duplex mode.
That is the first half of the box is used as a client side from another site
designated Link 1 using W1 and W2.
This half works great.
--------
The second half designated as L2 uses W3 and W4.
This goes off to another site down the line.
They are on different segments using the same subnet mask.
I had the OLSR path set up to "wfd0" "wfd1" "wfd2" "wfd3" "eth0"
I could not get a connection across the two halves of the box.
We changed the setting to :
Interface "eth0" "eth1" "wfd0" "wfd1" "wfd2" "wfd3" "wpci0" "wpci1" "wpci2" "wpci3"
and now it works fine.
(The eth0 port is just used to access the box from a VPN connection.)
What is going on here that I haven't learned yet.
You should check in the system report to see what IPs you've got on what interfaces.
If you have two full-duplex links on one system, the wfd0 and wfd1 interfaces should be the ones with the IP addresses. Don't know why that wouldn't be the case...
You don't have to worry too much about limiting which interfaces you bind olsrd to if the system is just a backbone box and isn't a client-bearing AP.
David L. Vrablic
10-05-2008, 07:07 PM
You should check in the system report to see what IPs you've got on what interfaces.
If you have two full-duplex links on one system, the wfd0 and wfd1 interfaces should be the ones with the IP addresses. Don't know why that wouldn't be the case...
You don't have to worry too much about limiting which interfaces you bind olsrd to if the system is just a backbone box and isn't a client-bearing AP.
Now wait a min Wilber!
Does that mean the W1 and W2 pair are designated with a wfd0 and there really isn't any wfd3 and wfd4 needed to support 4 radios and two links in one box?
------------
This is a backbone only box.
I wonder why it started working after I added the wpci 0,1,2,3 etc. entries.
DrLove73
10-06-2008, 12:50 AM
What do you mean by W1 and W3?
Togs wfd0 is in your case wpci0+wpci1, and wfd1 is wpci2+wpci3.
There is no wfd2 and wfd3 on your system.
EDIT: I wrote, by mistake, wpci1 instead of wpci0, etc... since we are talking in linux terms, not StarOS GUI terms.
David L. Vrablic
10-06-2008, 07:00 AM
What do you mean by W1 and W3?
Togs wfd0 is in your case wpci1+wpci2, and wfd1 is wpci3+wpci4.
There is no wfd2 and wfd3 on your system.
W1 and W2 designations were just my internal shop shorthand for wpci1 and wpci2. (Which we all know are really wpci0 and wpci1 ;))
We started referring to the radios as W1,W2,W3,W4 in our shop. It becomes a habit and not a Star community recognized designator.
Sorry for the confusion Dr.
The original question was:
Does the wfdo designator announce both the Wpci0 and Wpci1 radios when used in duplex mode.?
If so, it follows suite that wfd1 would be used for the wpci2 and wpci3 duplex link in the same x4000 box.
------------
Conclusion : if the above is true, it would not be necessary to enter wfd2 or wfd3 for the dplx repeater link to pass traffic.
----------
Second question was: Why did I have to enter wpci0,wpci1,wpci2 and wpci3 to get traffic to pass between the two halves of the dplx repeater link.?
DrLove73
10-06-2008, 08:23 AM
I fixed my wpciX numbering mistake.
Are you sure that wpci0, wpci1... do not have any IP's? IP's on them instead of on wfd0 and wfd1 would explain this situation.
David L. Vrablic
10-06-2008, 08:38 AM
I only put an IP on wpci0 and wpci2 for this box.
wpci1 and wpci3 say "Empty".
I have been told that it doesn't matter which radio interface the IP is assigned to as long as there is only one assigned per pair.
That is why I have started this thread to get to the missing information I can't seem to find anywhere else.
I have duplex systems up and working but there must be only one right way to set them up and I have a feeling there are assignments that only clutter things up and maybe even confuse the systems.
I want to get it right.
The original question was:
Does the wfdo designator announce both the Wpci0 and Wpci1 radios when used in duplex mode.?
If so, it follows suite that wfd1 would be used for the wpci2 and wpci3 duplex link in the same x4000 box.
Correct, wpci0/wpci1 become wfd0. wpci2/wpci3 become wfd1. This is assuming you've used full-duplex group numbers 1 and 2 and not some other numbers.
If you used full-duplex group number 9, you'd have a wfd8 instead.
I have been told that it doesn't matter which radio interface the IP is assigned to as long as there is only one assigned per pair.
Correct, and if you look in the system report you should be seeing wfd0 and wfd1 with your IPs assigned to them. Please take a look in your system report and let me know what it shows for what IPs are on what interfaces.
Second question was: Why did I have to enter wpci0,wpci1,wpci2 and wpci3 to get traffic to pass between the two halves of the dplx repeater link.?
Don't know, that's why I need to know what your system report is showing for what IPs were assigned to which interfaces. Theoretically your olsrd config should just have wfd0 and wfd1 in its Interfaces statement. Again, assuming that you used full-duplex group numbers 1 and 2.
David L. Vrablic
10-06-2008, 05:29 PM
GROUPS! No body said anything about assigning groups.
I have both halves set to 1.
Tog look for a PM
and thanks a bunch.
Hey cool! That makes post 1000 over a 7 year period.
And dad said I talk to much!
Right, that would be your problem. You need wpci1/wpci2 in FD group #1 and wpci3/wpci4 in FD group #2.
Then your olsrd Interface {} statement can just have wfd0 and wfd1 and you should be set.
David L. Vrablic
10-06-2008, 06:57 PM
OK now I know that I need to change the Group setting.
What do I set the next pair on the next hop to?
and the one after that.?
Is each link pair a separate group?
If I was going to link all the way to China would I just go to group "skeytie eight" or whatever.?
What's the limit.?
Where can I read up on this part.? How did you learn about the settings.
Google hasn't been much help.
I feel like Johnny 5 " I need INPUT" thanks for feeding me.
I don't think it's really documented anywhere quite that specifically yet, I guess I just know what's what because I'm a unix guy and I know how this stuff works under the hood instinctively as soon as it's shown to me.
The FD group number merely chooses a wfdX device on your local system to put that wireless interface into, it means nothing to the remote end of the FDX link.
In other words, if you're doing a chain of full-duplex repeaters, you can keep doing wpci1/wpci2 in FD group #1 and wpci3/wpci4 in FD group #2 for all of them. The only purpose of the FD group number is to group two of your interfaces together.
David L. Vrablic
10-06-2008, 08:19 PM
I don't think it's really documented anywhere quite that specifically yet, I guess I just know what's what because I'm a unix guy and I know how this stuff works under the hood instinctively as soon as it's shown to me.
The FD group number merely chooses a wfdX device on your local system to put that wireless interface into, it means nothing to the remote end of the FDX link.
In other words, if you're doing a chain of full-duplex repeaters, you can keep doing wpci1/wpci2 in FD group #1 and wpci3/wpci4 in FD group #2 for all of them. The only purpose of the FD group number is to group two of your interfaces together.
By Jove!I think I've got it.
A little hall monitor that tells the computer what radio are where.?
Or something like that.
Now I will venture forth (with my fifth) and ring the city of Gotham with delightfully dastardly duplex.
You are indeed a knowledgeable lad.:)
David L. Vrablic
10-09-2008, 06:09 AM
We changed the group number to 2 on the second half of the link X4000 in question.
(The wpci3 and wpci4 pair)
It stopped passing traffic and we had to set it back to 1 to get it to work again.
Checked the system report and there is no reference to wfd anything listed.
Now I am scratching my head.
If not full duplex what mode is it operating in the way it is now?
-----------
Question:
Was there any problem with this function in 1.13.1?
I haven't upgraded that box yet. It was working fine and wasn't giving me grief.
Maybe now is the time to see if it makes a difference.
----------
I have a couple of units that haven't been deployed yet.
Maybe I need to set them up in the shop and see if we can make things happen there.
You should do a bit of bench testing and use 1.3.23b, I agree. Saves me the trouble, I don't have any setup on the bench at the moment.
David L. Vrablic
10-09-2008, 07:10 AM
OK step one .
Confirmation, both ends of the link are running .23 version.
Off to the bench.
It's not completely outside the realm of possibility that you're the only person who has put more than one full-duplex link on the same system and the back-end scripting to set it all up doesn't work as expected...
I'm not saying it's very likely, but it's time for bench testing.
David L. Vrablic
10-09-2008, 08:29 PM
8 hrs and now I am more confused than ever.
Set up 3 radios in the shop.
1. The first is a single FD pair W1 and W2, Group 1
IP on the W1 radio.
I put an IP on the ether port on my desktop network.
Set W1 radio to Role FD-RX and AP mode with Essid of UNIT-1-A
Set W2 radio to Role FD-TX and station mode with Essid of UNIT-2-A
Olsr HNA announced my desktop IP.
OLSR Listed ports as "Wfdo" "eth0"
==========
2. Set up the second radio First Dplx pair W1 and W2, Group 1
IP on W1 radio
Set W1 radio to role FD-RX AP mode with Essid UNIT-2-A
Set W2 radio to Role FD-TX and station mode with Essid of UNIT-1-A
OLSR Listed port as "Wfd0"
======
Tried to ping across the units without success.
There are no IP assignments generated by the OLSR and no Wfd0 connections.
I have units in the field that are passing traffic but they have to have the wpci designations listed to pass traffic.
================
It seems so awkward that the role of transmit is in station mode and the Rec is in AP mode.
My working systems are in separate boxes that should just use the wfdo designator for the first two radios in the pair for each end.
=========
One question is do they need to be flipped on the far end of the link.
If you assign them out of order they won't let you flip them over but you can delete the role and reverse them.
Do they need to be reversed for the wfd0 to start working or doesn't it matter.? Does the duplex engine just figure things out for itself.
Any information will be more that I am sure of at the moment.
As soon as I get back to the shop and do some serious documentation I will be begging for some more help.
I really don't know where to go for info except here.
One step at a time, start by closely examining one system to confirm every step of the way that it's doing what we expect it to be doing. First create one full-duplex group on one of the systems, activate and check the system report to see what interfaces are there and what has the IP you assigned.
Assuming that works and you saw a wfd0 interface, create a second full-duplex group out of the second pair of radios, activate and then look at the system report again. This time one would expect to see wfd0 and wfd1 in there.
You can just set the first system's W1 and W2 to AP mode and the second system's W1 and W2 to station mode. You can keep with the normal convention that the system "closest" to you is in AP mode and the far end more towards the edge of your network uses station mode.
DrLove73
10-10-2008, 01:31 AM
David, I hope you will not get offended, but I have seen you write several times letter "o" instead of number "0" in wfd0. Can make sure you are not making the same mistake in OLSR config?
David L. Vrablic
10-10-2008, 12:09 PM
David, I hope you will not get offended, but I have seen you write several times letter "o" instead of number "0" in wfd0. Can make sure you are not making the same mistake in OLSR config?
Well now, I guess I did do that.
How drain bammaged of me.
I'll bet you think I am a proficient typist or something ;)
That is why I like the slashed zeros.
Good idea, I will double check the configs, but I do have others looking over my shoulder. They don't trust me either.
I do make mistakes.
David L. Vrablic
10-11-2008, 08:22 AM
OK I thunk it all over and here is what I will set up.
I started it on Friday but I want to make sure my process is correct.
http://picasaweb.google.com/dvrablic/Engineering#5255901168574755986
Thank you "Uncle Tog."
I noted the disconnected connections so someone who might be using this as a guide would not get strange results.
I really hope this will help someone get over the configuration hump.
Final results will be posted this week as time permits.
That's ok for testing purposes, but keep in mind they'll all just reach other via the ethernet switch since they're all assigned IPs on the same subnet connected to the same switch... they'll hear each other and use that as their preferred route.
David L. Vrablic
10-11-2008, 04:15 PM
Got it!
I forgot to add that normally they are disconnected from the switch.
Those IP's and the the connection is just there to let me access a unit if I need to.
-----------
I swear this is how the unit in the field is set up.
I'll see what it does on Tuesday when I get back to the shop.
Thanks for the look see , Enjoy your weekend
DrLove73
10-12-2008, 10:48 AM
Got it!
I forgot to add that normally they are disconnected from the switch.
Those IP's and the the connection is just there to let me access a unit if I need to.
David, you can have it both ways. Just add another IP (or more) in the advanced tab of your Windows NIC, and use one subnet entered there for one unit connected to the LAN.
Main IP for the first unit 192.168.1.200, one additional IP for the second unit 192.168.2.200, another additional IP for the third unit 192.168.3.200.....
That way you can access all of them at the same time, and they can not see each other.
David L. Vrablic
10-12-2008, 12:15 PM
Theoretically you are right and I expect you have been able to get the added IP to work reliably without having to clear the arp cache each time you try to connect to another subnet.
But
"You are a better man than I Gunga Din"
I have several 192.162.2 and 3 and 4 plus 172 and 10 space addresses I use for network access and setups in there now.
Also the company server is on 192.168 space and handing out DHCP IP's.
It causes all kinds of problems and the server has to be rebooted when I plug it in.
I just save myself a beating and keep them unplugged unless I really need them.
DrLove73
10-12-2008, 04:28 PM
Actual subnets are of coarse just for demonstration. You can use any subnet you like. Just do not add any gateways for that additional IP's unless you have specific need.
Subnets that I keep for special setting are 192.168.100.0/24 for Ovislink AP's, 192.168.1.0/24 for StarOS and others and 192.168.0.0/24 for some AP's and Internal LAN for Ovislink 5460AP and 5470AP's in WISP mode (NAT on cheap wireless AP/client). There are some others (actualy whole 192.168.0.x - 192.168.100.x are off limits on the network), but these are the main ones.
I currently have on my office PC 6 IP's all in different subnets, and I use them as I need them, never delete them. Especialy useful when I change LAN IP in WISP mode from 192.168.100.0/24 to 192.168.0.0/24. I do not have to change the PC IP, I just wait 40 seconds and access AP on the new IP. I also just add another IP/subnet to my Linux notebook with just one command when I need it, and later I clear them if they bother me.
And there is no need to clear arp cache when you change subnets, only if you are switching/replacing actual units that have the SAME IP.
Stratolinks
10-16-2008, 07:23 AM
Just had an interesting experience yesterday on our network with OLSR.
I have another router assembled and getting ready to deploy at the shop. I started the configuration by downloading a config from another router on our network and uploading it to this unit. I then proceeded to change a number of the particulars such as the Ethernet IP address, wireless settings, etc. After assigning an IP to the Ethernet to connect to my Office network, I plugged it in to my office network. A few minutes later I received an SMS notification that one of our main backhaul sites was down. I stopped what I was working on and went to the site since it was unreachable from either side.
When I got to the site and plugged in the laptop it appeared to be running, but I could not ping the other end of either connection. I replaced the radios, then replaced the entire cpu card with factory setting on it. I put the basic wireless settings in and the new CPU still wouldn't connect. I then radioed someone else to coordinate changes to the other ends of the links. They were able to ping the IP of my end of the link. When I deactivated the radio, they were still successfully pinging the IP! Then I had them do a traceroute from the neighboring router to the IP (which is in the same /30 as the local interface on that router). The traceroute went all the way back around the network to my shop and ended on the machine in my shop.
As it turns out, I had not restarted OLSR on the machine in the shop after making the config changes, and it started announcing the presence of the two /30 networks. The two adjacent routers then ignored their local interfaces and sent the traffic destined for these addresses back around the network.
After a total of 3 hours of downtime, with driving there, carrying items up the 125 foot silo twice, swapping parts out etc, it is a mistake I am not likely to repeat.
Shouldn't a local IP address that is on a local interface of the router take priority over an OLSR learned route? The local addresses were in the HNA statements of these routers and yet the HNA announcement from a machine 9 hops away took over the local address and the local HNA announcement.
Any opinions onhow this should have played out?
Any opinions onhow this should have played out?
Not to belittle this experience, but your test/bench system should not have been able to communicate with your live network like that.
I could run a similar risk and cause the exact same problem here even from home, except, olsrd on my home CPE is bound only to my wireless interface. It would never be able to communicate with anything that was talking OLSR on the ethernet side.
I don't know how your network is laid out at the office, but don't bind olsr to any interfaces you don't need to. You should probably stick your office windows workstations and bench test stuff behind a NAT router for easy security and convenience.
Please see item #3 under the configuration template section. That has been there for a while :)
http://staros.tog.net/wiki/OLSR
Stratolinks
10-16-2008, 11:11 AM
Actually it is a single Ethernet port that is there to allow for live on the network testing, and is clearly labeled and placed different than the rest. All the regular connections across my bench only let me access the local LAN (behind NAT) with no routing protocols available.
Yes I should have been more sure of my config on the new equipment being tested before I plugged it in. Part of the testing is to see that it announces the new subnets and makes them available across the network.
My only concern is why the learned OLSR route from 10 hops away on the network took over a locally available subnet that is also being announced locally via OLSR.
I use the link quality multiplier at the end of the wireless backbone to prevent the traffic from using the backup HSA connection unless a link is down forcing the data to go that way. Eventually the loop will get closed with all wireless connections so the network can have its own wireless backup loop.
Not to belittle this experience, but your test/bench system should not have been able to communicate with your live network like that.
I could run a similar risk and cause the exact same problem here even from home, except, olsrd on my home CPE is bound only to my wireless interface. It would never be able to communicate with anything that was talking OLSR on the ethernet side.
I don't know how your network is laid out at the office, but don't bind olsr to any interfaces you don't need to. You should probably stick your office windows workstations and bench test stuff behind a NAT router for easy security and convenience.
Please see item #3 under the configuration template section. That has been there for a while :)
http://staros.tog.net/wiki/OLSR
Actually it is a single Ethernet port that is there to allow for live on the network testing, and is clearly labeled and placed different than the rest. All the regular connections across my bench only let me access the local LAN (behind NAT) with no routing protocols available.
Sorry, I wasn't reading thoroughly enough the first time around. Fair enough, sticking the system on your live network was a completely intentional move... it was your specific intention to test OLSR. I don't really bother connecting my bench systems to anything special. If I upload a saved config and activate, I change the IP on my laptop and login to the IP it's supposed to have on the ethernet from the config and do a once-over to make sure everything is as I expect.
My only concern is why the learned OLSR route from 10 hops away on the network took over a locally available subnet that is also being announced locally via OLSR.
I use the link quality multiplier at the end of the wireless backbone to prevent the traffic from using the backup HSA connection unless a link is down forcing the data to go that way. Eventually the loop will get closed with all wireless connections so the network can have its own wireless backup loop.
To clarify and educate: The /30 announced with an Hna statement would not override the local interface's IP as you tried to reach it while directly plugged in via ethernet. The single-IP announcement could, though. Keep in mind whenever you start up OLSR, whether you like it or not, each IP on each interface you've specified in the Interface { } statement will be announced as a single IP route to the rest of the network.
That means if your wpci3 has 192.168.254.1/24 on it and you're announcing 192.168.254.0/24 via Hna, the rest of the network is going to get two routes in their tables:
192.168.254.1
192.168.254.0/24
Yes, that single-IP route for 192.168.254.1 is quite powerful and can even override your ability to reach 192.168.254.1 on your own system if you assign 192.168.254.1 to another OLSR-speaking system participating in the same network.
I would not expect your ability to reach a local IP directly-connected via ethernet would normally become overridden by a single-IP entry it got from 10 hops away, but I have not tested this duplicate IP situation personally to see what actually happens.
Depending upon how you did the LinkQualityMult, it is possible that the LinkQualityMult being changed on that system had the effect of causing that system to prefer the single-IP entry it had learned from far away. The /30 route it received from far away isn't specific enough to keep you from being able to talk to it locally over the ethernet, but the single-IP entry that was injected into its routing table is just as specific as the single-IP "link local" type routing table entry that is put in whenever you assign an IP to your local interface.
Because I haven't tested it personally, for all I know because of the way the Linux routing table worked, the single-IP route it received from far away could have simply overwritten and obliterated the "link local" single-IP route that's normally placed in there when the IP is assigned to a local interface regardless of any LinkQualityMult trickery. Or both entries could still be in the routing table and Linux went ahead and preferred the learned single-IP route over the link-local single-IP route.
Anyway, I think that constitutes 90% of an answer and is hopefully informative enough for you to get an idea of why it happened.
Stratolinks
10-16-2008, 07:22 PM
Sorry, I wasn't reading thoroughly enough the first time around. Fair enough, sticking the system on your live network was a completely intentional move... it was your specific intention to test OLSR.
Actually after rereading my original post I didn't make that very clear at all, but it was my intention to test the completed unit for at least several days continuous before deployment.
To clarify and educate: The /30 announced with an Hna statement would not override the local interface's IP as you tried to reach it while directly plugged in via ethernet. The single-IP announcement could, though. Keep in mind whenever you start up OLSR, whether you like it or not, each IP on each interface you've specified in the Interface { } statement will be announced as a single IP route to the rest of the network.
That means if your wpci3 has 192.168.254.1/24 on it and you're announcing 192.168.254.0/24 via Hna, the rest of the network is going to get two routes in their tables:
192.168.254.1
192.168.254.0/24
Yes, that single-IP route for 192.168.254.1 is quite powerful and can even override your ability to reach 192.168.254.1 on your own system if you assign 192.168.254.1 to another OLSR-speaking system participating in the same network.
Perhaps it would be better if I only allow one of the routers to announce the local PTP /30 subnet. Currently both ends of the link announce it. If only one end was announcing that subnet and a link went dead the IP of the other end would simply become unreachable, unless you log in to the box that that address is physically on. Hmmm, that may actually be better.
I would not expect your ability to reach a local IP directly-connected via ethernet would normally become overridden by a single-IP entry it got from 10 hops away, but I have not tested this duplicate IP situation personally to see what actually happens.
I intentionall placed some duplicate addresses on the network at one point to see what it would do. Each time, the traceroute showed it went to the closest one, which was exactly what I would have expected it to do.
Depending upon how you did the LinkQualityMult, it is possible that the LinkQualityMult being changed on that system had the effect of causing that system to prefer the single-IP entry it had learned from far away. The /30 route it received from far away isn't specific enough to keep you from being able to talk to it locally over the ethernet, but the single-IP entry that was injected into its routing table is just as specific as the single-IP "link local" type routing table entry that is put in whenever you assign an IP to your local interface.
The LinkQualityMult is used on a wired link 4 hops away from my shop at the tail end of the network. This is a link to a backup route to the head end of the network through an HSA DSL line that ties in to an ethernet port back at the head end of the network. This maintains an internet connection (although a slow one) for the rest of the network if any other backbone link goes down. This is the alternate route that the one router was taking to get to my shop after not talking to it's closest neighbour anymore.
Because I haven't tested it personally, for all I know because of the way the Linux routing table worked, the single-IP route it received from far away could have simply overwritten and obliterated the "link local" single-IP route that's normally placed in there when the IP is assigned to a local interface regardless of any LinkQualityMult trickery. Or both entries could still be in the routing table and Linux went ahead and preferred the learned single-IP route over the link-local single-IP route.
One would think a local link shoudl always take precedence over a route learned by any dynamic routing prorocol. There is no doubt a reason it was done this way, we just don't know the particulars as to why.
Anyway, I think that constitutes 90% of an answer and is hopefully informative enough for you to get an idea of why it happened.
Overall a good explanation. And a good lesson learned to make sure of your basic settings before plugging it in to a live network. After the amout of hair I lost (and I really don't have much to spare), a mistake I am not likely to repeat any time soon.
Perhaps it would be better if I only allow one of the routers to announce the local PTP /30 subnet. Currently both ends of the link announce it. If only one end was announcing that subnet and a link went dead the IP of the other end would simply become unreachable, unless you log in to the box that that address is physically on. Hmmm, that may actually be better.
Actually it wont matter much because of those single-IP routes. If you had a situation where both systems remained up and functional but a backhaul radio link went down, you'd still be able to reach both IPs (assuming "far end" has backup link) due to them both having those single-IP routes out there on the mesh.
I intentionall placed some duplicate addresses on the network at one point to see what it would do. Each time, the traceroute showed it went to the closest one, which was exactly what I would have expected it to do.
Then perhaps it is possible yours was overridden in this case due to the LinkQualityMult stuff.