FCNS inconsistency issues update

As some of you know I have been having issues in my lab where I would have an issue with the FCNS databases of switches in a given VSAN being inconsistent. I am working through the IEmentor labs, and the two VSAN’s I have had issues with are VSAN 20 and VSAN 40.

I have been doing extensive troubleshooting to try to find out why I have been having these issues. Sometimes the issue does not show itself, but it always happens in time. Also sometimes the issue can be cleared by suspending and then unsuspending the VSAN.

After looking with detail at the configurations, one thing I noticed is that both were using fcroutes. fcroutes are not something your likely to use often in production. In fact if you are not careful with them they can cause alot of issues. I think they may be related to my issue.

First, I removed fcroutes from VSAN 40, but I left them in VSAN 20. Sure enough, in short order I noticed VSAN 40 was working fine, but VSAN 20 still had FCNS inconsistencies. The way I check for inconsistencies is first I do “sh fcdomain domain-list vsan 40” for example, to take a look at what the fcdomain manager sees as the fabric. Then I look at what the fcns manager sees as the fabric with “show fcns internal info v 40”. What you will see, when you are having this issue is that the FCNS manager does not see the all of the same domains as the fcdomain manager. It’s also obvious because the “show fcns data v 40” does not match on all the switches. This could happen for other reasons of course, such as an isolated link which can be checked with “show int trunk v 40”, but in the situation I am describing everything will look fine, but you will not have consistent FCNS databases.

Here is a topology of the VSAN 40:

VSAN 40 Topology

Here is a message I wrote to Cisco TAC which best explains some of the issues at hand and troubleshooting steps and analysis. This issue has actually been troubleshooted extensively but this email is where I finally get onto something that may be the root cause:

So, I got my fcns’s all consistant with you on the phone the other day, and now I have let it sit overnight. I wake up today and VSAN 40, which is the VSAN we left the fcroute in, has inconsistant FCNS database, here is the relevant information. Attached to this email is a VSAN topology diagram for VSAN 40.

MDS1
MDS1# show run | inc fcroute
fcroute 0x10000 0xff0000 interface fc1/14 domain 1 metric 10 vsan 40

MDS1# show fcdomain domain-list v 40

Number of domains: 4
Domain ID WWN
——— ———————–
0xef(239) 20:28:00:0d:ec:0e:b2:41 [Local] [Principal]
0x01(1) 20:28:00:0d:ec:19:c9:01
0xee(238) 20:28:00:0d:ec:10:05:41
0x50(80) 50:00:53:07:ff:f0:00:50 [Virtual (SDV)]

MDS1# show fcdomain domain-list v 40

Number of domains: 4
Domain ID WWN
——— ———————–
0xef(239) 20:28:00:0d:ec:0e:b2:41 [Local] [Principal]
0x01(1) 20:28:00:0d:ec:19:c9:01
0xee(238) 20:28:00:0d:ec:10:05:41
0x50(80) 50:00:53:07:ff:f0:00:50 [Virtual (SDV)]
MDS1# show fcns internal info v 40

Info for vsan 40
=================
Interop mode: 0
R_A_TOV: 10000; D_S_TOV: 5000
Local Domain: 0xef(239)
Remote Domains: 0x50(80) 0xee(238)
Info for 0x50(80)
updating_db = 0
refreshing_db = 0
num_ports = 0

Info for 0xee(238)
updating_db = 0
refreshing_db = 0
num_ports = 0

Indexed objects details:
port_id index::
size:128 incr_factor:128 slots_free:126
portwwn index::
size:128 incr_factor:128 slots_free:126
nodewwn index::
size:128 incr_factor:128 slots_free:126
ip addr index::
size:64 incr_factor:64 slots_free:64

total entries in database = 54

MDS2
MDS2# show run | inc fcroute
fcroute 0xef0000 0xff0000 interface fc1/15 domain 237 metric 10 remote vsan 40

MDS2# show fcdomain domain-list v 40

Number of domains: 4
Domain ID WWN
——— ———————–
0xef(239) 20:28:00:0d:ec:0e:b2:41 [Principal]
0x01(1) 20:28:00:0d:ec:19:c9:01 [Local]
0xee(238) 20:28:00:0d:ec:10:05:41
0x50(80) 50:00:53:07:ff:f0:00:50 [Virtual (SDV)]
MDS2# show fcns internal info v 40

Info for vsan 40
=================
Interop mode: 0
R_A_TOV: 10000; D_S_TOV: 5000
Local Domain: 0x1(1)
Remote Domains: 0x50(80) 0xee(238)
Info for 0x50(80)
updating_db = 0
refreshing_db = 0
num_ports = 0

Info for 0xee(238)
updating_db = 0
refreshing_db = 0
num_ports = 0

Indexed objects details:
port_id index::
size:128 incr_factor:128 slots_free:127
portwwn index::
size:128 incr_factor:128 slots_free:127
nodewwn index::
size:128 incr_factor:128 slots_free:127
ip addr index::
size:64 incr_factor:64 slots_free:64

total entries in database = 52

MDS3
MDS3# show fcdomain domain-list vsan 40

Number of domains: 4
Domain ID WWN
——— ———————–
0xef(239) 20:28:00:0d:ec:0e:b2:41 [Principal]
0x01(1) 20:28:00:0d:ec:19:c9:01
0xee(238) 20:28:00:0d:ec:10:05:41 [Local]
0x50(80) 50:00:53:07:ff:f0:00:50 [Virtual (SDV)]
MDS3# show fcns internal info v 40

Info for vsan 40
=================
Interop mode: 0
R_A_TOV: 10000; D_S_TOV: 5000
Local Domain: 0xee(238)
Remote Domains: 0x1(1) 0x50(80) 0xef(239)
Info for 0x1(1)
updating_db = 0
refreshing_db = 0
num_ports = 0

Info for 0x50(80)
updating_db = 0
refreshing_db = 0
num_ports = 0

Info for 0xef(239)
updating_db = 0
refreshing_db = 0
num_ports = 0

Indexed objects details:
port_id index::
size:128 incr_factor:128 slots_free:126
portwwn index::
size:128 incr_factor:128 slots_free:126
nodewwn index::
size:128 incr_factor:128 slots_free:126
ip addr index::
size:64 incr_factor:64 slots_free:64

total entries in database = 46

Some more helpful information:

MDS1# show fcroute unicast vsan 40

D:direct R:remote P:permanent V:volatile A:active N:non-active
# Next
Protocol VSAN FC ID/Mask RCtl/Mask Flags Hops Cost
——– —- ——– ——– —- —- —– —- —-
static 40 0x010000 0xff0000 0x00 0x00 D P A 1 10
fspf 40 0x010000 0xff0000 0x00 0x00 D P N 1 500
fspf 40 0x500000 0xff0000 0x00 0x00 D P A 1 1
fspf 40 0xee0000 0xff0000 0x00 0x00 D P A 1 500
local 40 0xef0000 0xffffff 0x00 0x00 D P A 1 1

MDS1# show fcroute unicast 0x010000 0xff0000 v 40

D:direct R:remote P:permanent V:volatile A:active N:non-active
# Next
Protocol VSAN FC ID/Mask RCtl/Mask Flags Hops Cost
——– —- ——– ——– —- —- —– —- —-
static 40 0x010000 0xff0000 0x00 0x00 D P A 1 10
fc1/14 Domain 0x01(1)
fspf 40 0x010000 0xff0000 0x00 0x00 D P N 1 500
fc1/7 Domain 0x01(1)

MDS2# show fcroute unicast vsan 40

D:direct R:remote P:permanent V:volatile A:active N:non-active
# Next
Protocol VSAN FC ID/Mask RCtl/Mask Flags Hops Cost
——– —- ——– ——– —- —- —– —- —-
fspf 40 0x500000 0xff0000 0x00 0x00 D P A 1 1
fspf 40 0xee0000 0xff0000 0x00 0x00 D P A 1 500
static 40 0xef0000 0xff0000 0x00 0x00 R P A 1 10
fspf 40 0xef0000 0xff0000 0x00 0x00 D P N 1 500

MDS2# show fcroute unicast 0xef0000 0xff0000 v 40

D:direct R:remote P:permanent V:volatile A:active N:non-active
# Next
Protocol VSAN FC ID/Mask RCtl/Mask Flags Hops Cost
——– —- ——– ——– —- —- —– —- —-
static 40 0xef0000 0xff0000 0x00 0x00 R P A 1 10
fc1/15 Domain 0xed(237)
fspf 40 0xef0000 0xff0000 0x00 0x00 D P N 1 500
fc1/7 Domain 0xef(239)

MDS3# show fcroute unicast v 40

D:direct R:remote P:permanent V:volatile A:active N:non-active
# Next
Protocol VSAN FC ID/Mask RCtl/Mask Flags Hops Cost
——– —- ——– ——– —- —- —– —- —-
fspf 40 0x010000 0xff0000 0x00 0x00 D P A 1 500
fspf 40 0x500000 0xff0000 0x00 0x00 D P A 1 1
fspf 40 0xef0000 0xff0000 0x00 0x00 D P A 1 500

I believe I do see an issue with this VSAN and the static routing. On MDS1, I am routing domain 10 over fc1/14, which is the link to MDS3. Yet domain 10 is MDS2. Because the “remote” keyword is not specified in the fcroute, this would likely lead to a potential issue.

MDS2, tries to use fcroute to reach domain 239, which is on MDS1. The fcroute points over fc1/15 and has the “remote” keyword, which is correct since fc1/15 is to MDS3 which is an intermediate hop.

Even more telling, is that MDS3’s fcdomain and fcns databases are consistant. The inconsistancies exist in MDS1 and MDS2 which is where the fcroutes are being pulled.

I was going to try to add the “remote” keyword as a test to the fcroute on MDS1, as this clearly looks like a mistake on my part. I believe that should be a valid configuration and should not cause issues. I can see where the current configuration is not valid because I am using an fcroute on MDS1 and not using remote keyword yet I am routing through an intermediate switch.

I have not analyzed VSAN 20’s fcroutes in this way yet (as you know for now they are removed for testing). However, I may go back and check them as deeply to see if there may be a configuration issue also going on there as well.

I always understood it that FCNS traffic used 0xFFFCxx where xx=the domain the switch is trying to reach, this is a special format I believe unique to FCNS, other services I thought use a more standard fabric broadcast of 0xFFFFFx where x= the well known service.

In any case, I hope I am on the right track and close to solving this mystery. I just wanted to give you an update and see what you thought.

So, so far, it looks like this may be the issue. It doesn’t explain the similar issue I see with VSAN 20, but I will need to look at that more extensively to see if something similar is going on. Point is, be very careful of fcroutes. The “remote” keyword can be a bit confusing to understand.

Also it would appear that FCNS traffic uses the fcroutes. Cisco uses a format for FCNS manager traffic in order to work with VSAN’s. The FCNS manager of say domain EF has an FCID of 0xFFFCEF. If you had a static route for 0xEF0000 0xFF0000 it would be followed for traffic to reach this special addressing format. At least that’s what my observation is. So although the domain is in the third byte, it still knows to follow regular FSPF and static routing in the same way a normal FCID would.

This entry was posted in CCIE Storage, MDS and tagged , , , . Bookmark the permalink.

4 Responses to FCNS inconsistency issues update

  1. Junior Engineer says:

    If we don’t set the static fcroute for the FCNS between different VSAN, what will happen? Can the FCoE traffic be transmitted? I’m really confused about this.
    Thanks!

    • brian says:

      I don’t mean to confuse anyone with some of the errors I have run into. Keep in mind I was using SAN-OS 3.3, so your mileage may vary in newer NX-OS. The thing is, fcroute is a bit of a bastardization of sorts, its not really something that I find a practical use for, and if you find yourself using it, you probably need to rethink your topology or use other knobs. That said, its definitely game for the CCIE lab so you must know it. However, I find that weird things happen with funky fcroutes. Basic routes, probably work ok, but my point in my posts were that I saw very weird FCNS issues until I removed the fcroutes. TAC research on this matter was inconclusive but they did mention that they have seen weird stuff with fcroutes. Keep in mind I didn’t just have a basic config, I am using configs with all kinds of bells and whistles, and then you introduce fcroutes and it can muck things up. I am interested in knowing if others have seen such issues…..

  2. Junior Engineer says:

    Another question, I have one Nexus5000 works as FCF(it connects to a Spirent Tester), I created one VSAN on it, and there are two vfc interfaces under the VSAN, the detail configation as below:

    feature fcoe

    interface Ethernet1/1
    priority-flow-control mode on
    no lldp transmit
    no cdp enable
    switchport mode trunk

    interface Ethernet1/2
    priority-flow-control mode on
    no lldp transmit
    no cdp enable
    switchport mode trunk

    interface vfc401
    bind interface Ethernet1/1
    fcoe fcf-priority 100
    switchport mode F
    no shutdown

    interface vfc402
    bind interface Ethernet1/2
    fcoe fcf-priority 100
    switchport mode F
    no shutdown

    vlan 401
    fcoe vsan 401

    vsan database
    vsan 401 name ForTesting

    vsan database
    vsan 401 interface vfc401
    vsan 401 interface vfc402

    =================
    I also configured the correspond things over Spirent Tester, after starting the protocol, I checked the fcroute over the Nexus5000, and get the results as below:

    nexus5010# show fcroute unicast

    D:direct R:remote P:permanent V:volatile A:active N:non-active
    # Next
    Protocol VSAN FC ID/Mask RCtl/Mask Flags Hops Cost
    ——– —- ——– ——– —- —- —– —- —-
    local 401 0x340002 0xffffff 0x00 0x00 D P A 1 1
    local 401 0x340003 0xffffff 0x00 0x00 D P A 1 1

    nexus5010# show fcoe database

    ——————————————————————————-
    INTERFACE FCID PORT NAME MAC ADDRESS
    ——————————————————————————-
    vfc401 0x340002 20:00:10:94:17:00:00:01 00:10:94:17:00:01
    vfc402 0x340003 20:00:10:94:18:00:00:02 00:10:94:18:00:02

    ==============================================
    I also checked the ENode over the Spirent Tester, it’s “PLOGI Accepted”, so, does it mean the FCoE works fine? and does it mean the Nexus5000 is ready to transmit FCoE traffic?

    I created FCoE traffic over the Tester, but can not be forwarded. It’s really strange.

    Do you have any good idea for this issue?

    Thanks a lot!

    • brian says:

      So let me get this straight, you are trying to send traffic from the node on vfc401 to the node on vfc402? The nodes login, but you are not able to pass traffic on the N5k. What version of NX-OS are you running? Also just what kind of “traffic” are you trying to pass? Do you have any dump of the traffic? Is one node configured as a target and one as an initiator? Is there anything meaningful being advertised by the target for the host to work with, such as FC4 / SCSI? If its init/target combo, and the target has FC4 resources, then the initiator should be able to PLOGI/PRLI and discover the targets (if zoned, or default zoning is enabled).

Leave a Reply