Saturday, January 17, 2009

IE Vol 1 Multicast RPF Failure

This is probably my single biggest weakness in multicast: troubleshooting RPF failure issues. I actually got through this lab pretty quickly, but here is the sequence of steps to find and fix an RPF failure.

1. Do a sh ip mroute from the source until you find an interface which does not show the S,G flow on an incoming interface.

(10.1.37.7, 225.1.1.1), 00:01:25/00:01:42, flags:
Incoming interface: Null, RPF nbr 150.1.15.1


2. Turn ip mroute-cache off on each pim interface. Then, turn on debug ip mpacket. This message spells out the issue.

*Jan 18 01:59:52.590: IP(0): s=10.1.37.7 (Serial2/0.2) d=225.1.1.1 id=0, ttl=250, prot=17, len=48(44), not RPF interface

3. sh ip mroute count is another way to verify, although it's not the most intuitive output ever created.

2 groups, 0.50 average sources per group
Forwarding Counts: Pkt Count/Pkts(neg(-) = Drops) per second/Avg Pkt Size/Kilobits per second
Other counts: Total/RPF failed/Other drops(OIF-null, rate-limit etc)

Group: 225.1.1.1, Source count: 1, Packets forwarded: 0, Packets received: 21
Source: 10.1.37.7/32, Forwarding: 0/0/0/0, Other: 21/21/0

So from all this output, it's evident that source 10.1.37.7 is being received on port S2/0.2, which is not the RPF interface. So what is the RPF interface? sh ip route 10.1.37.7 tells you that answer:
* 150.1.15.1, from 150.1.3.3, 00:14:03 ago, via Ethernet1/0
Route metric is 3, traffic share count is 1

There it is. The router is expecting to see flows from 10.1.37.7 coming in on E1/0, but instead they arrive on port S2/0.2.

Now, there are two ways to fix this. One way would be to adjust IGP metrics so the current path is preferred in the routing table, thus resolving the RPF issue. The other way is more of a workaround, in which a static mroute is added to allow another interface to pass the RPF check. I chose that route.

R5(config)#ip mroute 10.1.37.7 255.255.255.255 s2/0.2

And as soon as that's added:

*Jan 18 02:07:27.570: IP(0): s=10.1.37.7 (Serial2/0.2) d=225.1.1.1 (Ethernet0/0) id=0, ttl=250, prot=17, len=44(44), mforward

And now, everything looks wonderful:

R7#ping 225.1.1.1

Type escape sequence to abort.
Sending 1, 100-byte ICMP Echos to 225.1.1.1, timeout is 2 seconds:

Reply to request 0 from 10.1.68.8, 64 ms

Incidently, for the lazy folks out there, you can just specify all IPs in the mroute:
R5(config)#ip mroute 0.0.0.0 0.0.0.0 s2/0.2

Or even add a default mroute to every single interface on the router to be sure no RPF checks are failing. On the negative side to an approach like this, keep in mind that RPF is used as a loop prevention mechanism. If you go overboard in removing RPF, you may end of DoS'ing your network due to a routing loop. Therefore this can be a bad approach in a production network.

No comments: