Archive for August, 2010

The Cisco Port Analyzer Adapter (PAA) is a great tool and very helpful for troubleshooting.  It comes in two types, the PAA and the PAA-2.  The PAA’s have a Gigabit port onboard that can be used in 100MB or 1000MB mode.  You really want to always use the 1000MB mode.  Fibre Channel today is 1, 2, 4 or 8 GB.  So to take a SPAN of that much traffic and try to put it on a wire at 100MB isn’t very practical.  It’s OK at 1000MB because in many cases you can do port captures at less busy times or in a lab to see what is going on.

The PAA has a few different modes and these are all explained in the Cisco documentation.  The documentation is available from Cisco at Port Analyzer Configuration Note.  The default mode is DTM which means Deep Truncate Mode.  This allows for the most frames per second because it truncates the payload.  As you know a FC frame is 2112 plus or minus a bit and a typical ethernet MTU is about 1500.  So by truncating the packet it is put into a format that can go nicely on an ethernet wire and be captured into Wireshark. Wireshark is also called the Cisco Protocol Analyzer.  The disadvantage is that now you have truncated the frame and lost data.  We are not just talking FC payload data, when we say “data” we mean potentially parts of the headers within the FC packet.  This means you lose information that may be essential in troubleshooting.

The standard PAA has no way of accounting for the size of this truncated data, so your bandwidth/traffic information is wrong when its brought into Wireshark.  If you use the PAA2 it inserts a special field called “Original Packet Length” under the Boardwalk header which keeps the original packet length before truncation in tact for reporting purposes. NTOP, also called Cisco Traffic Analyzer, is used for traffic analysis, and is almost useless for good traffic studies without either a PAA-2 or capturing the full amount of traffic.

DTM will work with 100, 1000 or 2000 MB, but obviously as you go up in frame size and speed you may drop packets.  I personally like to run what is called NTM which is No Truncate Mode, as this mode does not truncate.  You can’t run this mode unless your NIC supports Jumbo Frames.  You can usually set the Jumbo Frames under the NIC Properties like so:

You should set this to match your frame size.  I use standard 2112 Fibre Channel size so I went with a 3KB MTU.  So if you use NTM you should typically use a 1000MB Ethernet speed with Jumbo frames.

If you are set to the default DTM mode, you will Deep Truncate.  As you can read in the Cisco docs, this means that only 64-bytes of the FC frame are kept, which is likely to eat into your headers.  Examine the below capture where you can see the Auth_Negotiate packet ends after the Protocol Parameters Length field:

There are actually quite a few more fields, but these are truncated so you do not see all of the FC-SP fields and parameters.  This can hurt your troubleshooting.  In many cases, not all, but most, you really don’t care so much about the actual payload data of the FC frame.  What you want is the headers.  DTM will likely cause you some truncation of headers which is bad.  So any mode that you can use that keeps more data than DTM is a good idea in my opinion.  I just goto NTM because that way I know its going to have everything and I am generally not spanning 4GB links anyways, most of what I do is in a controlled lab.  If you can’t do NTM, then do ETM, if you can’t do ETM do STM and if you have to do DTM.  Make sure if your using FC frames larger than 1496 you use Jumbo Frames on your Ethernet NIC.  Look at the result when we use NTM instead of DTM:

Notice how after the Protocol Parameters Length we have many more fields that can be helpful in troubleshooting.  For example we can see what hash options were offered in the Auth_Negotiate and what Diffe-Hellman groups were supported.

I generally recommend using the latest version of Wireshark and libpcap which you can get from www.wireshark.org.  There is no benefit to getting the program from the Cisco site and it’s likely to be outdated. You will also almost certainly want to enable encapsulation eisl on the SD port you configure to capture traffic. This will insert VSAN information into the Boardwalk header so in Wireshark and NTOP you can get VSAN information.

Here is a diagram of the lab created from Cisco Fabric Manager. Fabric Manager allows you to export the Fabric Layout (among other things) in either Visio, Visio with labels or a JPEG image. The Visio actually comes out very good. My only gripe is that the images are OLE embedded, so using programs such as OmniGraffle on the Mac, replaces the images of the switches and JBOD’s with grey rectangles.

It looks rather impressive all laid out. This is what you need to pass the CCIE lab! I actually have one additional switch that has my SSM in it which is not pictured as I have it powered off right now.

You can click on the below image for a larger view!

The Cisco MDS 9000 supports a feature known as IPS Network Simulator.  This feature is available on MDS switches equipped with the IPS-4 (DS-X9304) and IPS-8 (DS-X9308) network modules, MSM module (DS-X9304-18K9), as well as the Multiservice Modular Switch 9222i.  What this feature does is allow you to simulate the conditions of a network, controlling attributes such as delay, bandwidth and packet reordering.  This is very helpful when you are trying to test things like replication in a lab scenario before you go into production.  You can think of it as a sort of network shim that sits between two of the Gigabit Ethernet ports.  One port is the input to the network simulator and the other port is the output.  Each of the various parameters of the network can be controlled for each direction.  In order to use the IPS Network Simulator features you must have the SAN_EXTN_OVER_IP or SAN_EXTN_OVER_IP_IPS4 licenses installed.

The simulator works off the Gigabit Ethernet ports, so is not available to test pure fibre channel.  However it is perfect for testing FCIP, which is a common SAN extension technology that can be used over very long distances, typically with asynchronous replication.  The simulator can handle Gigabit traffic at full line rate.  You are also able to test additional FCIP options such as compression and view the effect they would have over the network with the given parameters.

In addition to simulating a network, the MDS also has the ability to simulate I/O.  You can use the SAN Extension Tuner application to generate SCSI reads and writes as well as other useful commands.

Here is a diagram of the topology we will be using

Configuration of Gigabit Ethernet ports

First we will configure the Gigabit Ethernet ports on both MDS1 and MDS2:

MDS1(config)#fcip enable
Enter configuration commands, one per line. End with CNTL/Z.
MDS1(config)# int GigabitEthernet2/2
MDS1(config-if)# ip address 192.168.10.1 255.255.255.0
MDS1(config-if)# no shut
MDS1(config-if)# exit
MDS1(config)# exit

MDS2(config)#fcip enable
Enter configuration commands, one per line. End with CNTL/Z.
MDS2(config)# int GigabitEthernet2/2
MDS2(config-if)# ip address 192.168.10.2 255.255.255.0
MDS2(config-if)# no shut
MDS2(config-if)# exit
MDS2(config)# exit

Configuration of FCIP interfaces

Next, we will do a basic configuration of FCIP by creating a tunnel between MDS1 and MDS2. This tunnel will not come up as we have not yet established connectivity between ports GigabitEthernet2/2 on MDS1 and GigabitEthernet2/2 on MDS2. Once we configure the IPS Network Simulator the ports will be connected. We will use VSAN100 for our testing.

MDS1# conf t
Enter configuration commands, one per line. End with CNTL/Z.
MDS1(config)# fcip profile 1
MDS1(config-profile)# ip address 192.168.10.1
MDS1(config-profile)# exit
MDS1(config)# exit
MDS1(config)# int fcip1
MDS1(config-if)# use-profile 1
MDS1(config-if)# peer-info ipaddr 192.168.10.2
MDS1(config-if)# switchport trunk allowed vsan 100
MDS1(config-if)# exit
MDS1(config)# exit

MDS2# conf t
Enter configuration commands, one per line. End with CNTL/Z.
MDS2(config)# fcip profile 1
MDS2(config-profile)# ip address 192.168.10.2
MDS2(config-profile)# exit
MDS2(config)# exit
MDS2(config)# int fcip1
MDS2(config-if)# use-profile 1
MDS2(config-if)# peer-info ipaddr 192.168.10.1
MDS2(config-if)# switchport trunk allowed vsan 100
MDS2(config-if)# exit
MDS2(config)# exit

Configuration of IPS Network Simulator

The Gigabit Ethernet and FCIP configurations are mostly complete. We will add some additional parameters later, but enough has been configured for basic connectivity. Now we must configure the actual IPS Network Simulator. A few key points about the IPS Network Simulator first:

  • You must enable san-ext-tuner
  • Parameters are configured ingress and can be configured on either of the simulators two ports, one for each direction
  • IPS netsim configuration is not saved to NVRAM
  • MDS1(config)# san-ext-tuner enable
    MDS1# ips netsim enable interface g2/3 g2/4
    MDS1# ips netsim delay-ms 100 ingress g2/3
    MDS1# ips netsim delay-ms 100 ingress g2/4
    MDS1# ips netsim max-bandwidth-mbps 10 ingress g2/3
    MDS1# ips netsim max-bandwidth-mbps 10 ingress g2/4
    MDS1# ips netsim qsize 250 ingress g2/3
    MDS1# ips netsim qsize 250 ingress g2/4
    MDS1# ips netsim drop nth 200 burst 1 ingress g2/3

    At this point the FCIP tunnels should be up and operational.

    Configuration of SAN Extension Tuner

    The SAN Extension Tuner allows us to generate traffic between two virtual endpoints. A few key points about SET configuration:

  • You must enable san-ext-tuner (we have already done this for IPS Network Simulator configuration)
  • You must enable iSCSI
  • You must create an arbitrary nWWN and pWWN for the endpoints to use for each side of the SET
  • You must zone the two virtual endpoints (easiest thing to do is just use zone default permit)
  • SET configuration is not saved to NVRAM
  • MDS1(config)# san-ext-tuner enable
    MDS1(config)# iscsi enable
    MDS1(config)# iscsi enable module 2
    MDS1(config)# zone default-zone permit vsan 100
    MDS1(config)# int iscsi 2/2
    MDS1(config-if)# no shut
    MDS1# san-ext-tuner
    MDS1(san-ext)# nwwn 10:00:00:00:00:00:00:00
    MDS1(san-ext)# nport pwwn 11:11:11:11:11:11:11:11 vsan 100 interface g2/2
    MDS1(san-ext-nport)# exit
    MDS1(san-ext)# exit

    MDS2(config)# san-ext-tuner enable
    MDS2(config)# iscsi enable
    MDS2(config)# iscsi enable module 2
    MDS2(config)# zone default-zone permit vsan 100
    MDS2(config)# int iscsi 2/2
    MDS2(config-if)# no shut
    MDS2# san-ext-tuner
    MDS2(san-ext)# nwwn 20:00:00:00:00:00:00:00
    MDS2(san-ext)# nport pwwn 22:22:22:22:22:22:22:22 vsan 100 interface g2/2
    MDS2(san-ext-nport)# exit
    MDS2(san-ext)# exit

    MDS1# show san-ext-tuner nports
    —————————————————————————-
    Interface NODE NAME PORT NAME VSAN
    —————————————————————————-
    GigabitEthernet2/2 10:00:00:00:00:00:00:00 11:11:11:11:11:11:11:11 100

    MDS2# show san-ext-tuner nports
    —————————————————————————-
    Interface NODE NAME PORT NAME VSAN
    —————————————————————————-
    GigabitEthernet2/2 20:00:00:00:00:00:00:00 22:22:22:22:22:22:22:22 100

    At this point the SET is not really doing anything but its setup and in place for us to start generating some tests. You can test your connectivity by using fcping:

    MDS1# fcping pwwn 22:22:22:22:22:22:22:22 vsan 100
    28 bytes from 22:22:22:22:22:22:22:22 time = 699973 usec
    28 bytes from 22:22:22:22:22:22:22:22 time = 335159 usec
    28 bytes from 22:22:22:22:22:22:22:22 time = 670161 usec
    28 bytes from 22:22:22:22:22:22:22:22 time = 201197 usec
    28 bytes from 22:22:22:22:22:22:22:22 time = 249109 usec

    5 frames sent, 5 frames received, 0 timeouts
    Round-trip min/avg/max = 201197/431119/699973 usec

    Notice the ridiculously high ping time we get. This is because with IPS Network Simulator we have created 200ms of RTT, packet drops, manipulated the qsize and reduced the bandwidth to 1% (10Mbps) of the actual link speed.

    Lets generate some traffic:

    MDS1# san-ext-tuner
    MDS1(san-ext)# nport pwwn 11:11:11:11:11:11:11:11 vsan 100 int g2/2
    MDS1(san-ext-nport)# read command-id 1 target 22:22:22:22:22:22:22:22 transfer-size 256000 outstanding-ios 2 continuous

    MDS2# san-ext-tuner
    MDS2(san-ext)# nport pwwn 22:22:22:22:22:22:22:22 vsan 100 int g2/2
    MDS2(san-ext-nport)# read command-id 1 target 11:11:11:11:11:11:11:11 transfer-size 256000 outstanding-ios 2 continuous

    We can view statistics from both IPS Network Simulator as well as SET:

    MDS1# show san-ext-tuner interface g2/2 nport pwwN 11:11:11:11:11:11:11:11 vsan 100 counters
    Statistics for nport
    Node name 10:00:00:00:00:00:00:00 Port name 11:11:11:11:11:11:11:11
    I/Os per sec : 5
    Reads : 100%
    Writes : 0%
    Egress throughput : 0.75 MBs/sec (Max – 2.50 MBs/sec)
    Ingress throughput : 0.69 MBs/sec (Max – 2.50 MBs/sec)
    Average response time : Read – 279561 us, Write – 0 us
    Minimum response time : Read – 404 us, Write – 0 us
    Maximum response time : Read – 3112761 us, Write – 0 us
    Errors : 10

    MDS1# show ips stats netsim ingress g2/3
    Network Simulator Configuration for Ingress on GigabitEthernet2/3
    Delay : 100000 microseconds
    Rate : 10000 kbps
    Max_q : 250000 bytes
    Max_qdelay : 150000000 clocks
    Drop nth pkt : 200

    Network Simulator Statistics for Ingress on GigabitEthernet2/3
    Dropped (tot) = 35875
    Dropped (netsim) = 3592
    Reordered (netsim) = 0
    Max Qlen(pkt) = 542
    Qlen (pkt) = 83
    Max Qlen (byte) = 5602
    Qlen (byte) = 0
    Mintxdel(poll) = 39600
    Mintxdel(ethtx) = 39600
    empty = 233
    txdel = 603235
    late = 111330
    Average speed = 1586 Kbps

    Now we will enable compression and see the effect is has on our traffic:

    MDS1# conf t
    Enter configuration commands, one per line. End with CNTL/Z.
    MDS1(config)# int fcip1
    MDS1(config-if)# ip-compression mode3
    MDS1(config-if)# exit
    MDS1(config)# exit

    MDS2# conf t
    Enter configuration commands, one per line. End with CNTL/Z.
    MDS2(config)# int fcip1
    MDS2(config-if)# ip-compression mode3
    MDS2(config-if)# exit
    MDS2(config)# exit

    MDS1# show san-ext-tuner interface g2/2 nport pwwN 11:11:11:11:11:11:11:11 vsan 100 counters
    Statistics for nport
    Node name 10:00:00:00:00:00:00:00 Port name 11:11:11:11:11:11:11:11
    I/Os per sec : 16
    Reads : 100%
    Writes : 0%
    Egress throughput : 2.00 MBs/sec (Max – 2.50 MBs/sec)
    Ingress throughput : 2.00 MBs/sec (Max – 2.50 MBs/sec)
    Average response time : Read – 134971 us, Write – 0 us
    Minimum response time : Read – 404 us, Write – 0 us
    Maximum response time : Read – 3112761 us, Write – 0 us
    Errors : 14

    Network Simulator Statistics for Ingress on GigabitEthernet2/3
    Dropped (tot) = 36375
    Dropped (netsim) = 500
    Reordered (netsim) = 0
    Max Qlen(pkt) = 542
    Qlen (pkt) = 0
    Max Qlen (byte) = 2236
    Qlen (byte) = 0
    Mintxdel(poll) = 36000
    Mintxdel(ethtx) = 36000
    empty = 523
    txdel = 70351
    late = 28586
    Average speed = 914 Kbps

    You can see that adding compression has greatly improved the speed on the link. What is shown in the netsim statistics is the actual links speed. What is shown in the SET statistics is the compressed data being moved.

    FCIP Interface TCP Parameters

    When configuring FCIP parameters, you want to do so based on the network performance. SET can help you find these things. In our example its 10MB bandwidth and 200ms round trip time:

    MDS1(config)# fcip profile 1
    MDS1(config-profile)# tcp max-bandwidth-mbps 10 min-available-bandwidth-mbps 10 round-trip-time-ms 200
    MDS2(config)# fcip profile 1
    MDS2(config-profile)# tcp max-bandwidth-mbps 10 min-available-bandwidth-mbps 10 round-trip-time-ms 200

    I have heard a a lot of FUD flying around about Thin Provisioning recently.  People are blogging and putting stuff out there as if Thin Provisioning is this very dangerous concept as if it’s too risky to deploy.  Basically all Thin Provisioning boils down to is oversubscription.  Oversubscription seems to be a dirty word in some peoples vocabulary.  There is certainly nothing wrong with oversubscription if done properly.  Just about every technology out there has some form of oversubscription going on.

    Ethernet, Telcos, ISP’s, Bank Vaults, Electricity, Highways

    All of the above are oversubscribed.  If everyone were to simultaneously consume resources there would not be enough.  Ethernet switches for example are oversubscribed.  Even though the backplane may be non-blocking it is typical that the uplinks in many situations are oversubscribed.  Fiber Channel storage is oversubscribed.  If we weren’t oversubscribing we would be doing DAS, which is the opposite direction, instead we use SAN’s which by there very nature are oversubscribed.  Obviously there are limits to this oversubscription, so we deal with generally acceptable fan-in/fan-out ratios and most importantly we look at what resources we need for our applications.  There are a host of technologies that help us deal with oversubscription, for example in FC we have FCC/QoS/etc.  The telephone system and ISP’s are heavily oversubscribed, from the perspective that if everyone demanded to make a phone call or download maximum speed simultaneously, the system would not be able to support it.  But these systems are based on statistical multiplexing which is nothing more than someone has done the numbers to figure out what acceptable performance needs to exist during the busiest time.  And the telephone system has safeguards in place for special situations, for example some customers have the ability of priority and preemption, so no matter what they will be able to place the call and someone else would get dropped.

    What makes Thing Provisioning different?

    Nothing really.  So we have Thin Provisioning, the ability to divvy out to systems more storage than we realistically have as true usable capacity.  Once again, you don’t just configure these things blindly, you base the amounts your giving on real data and low risk probabilities.  The big fear is that some admin is going to get a call at 3am from his monitoring system or NOC that the disk is full on the companies high priority server.  This would be very bad indeed.  The reality is, is that even with static storage this scenario can happen if things go terribly wrong.  The key is to make sure servers have enough drive space so they are properly sized.  You should also have enough space for some decent growth and an amount of space for contingency if something goes wrong.  But you can look at all this space as a whole, aggregated and then make that calculation on how much space you would need “worse case scenario”.  Then you should carefully monitor the situation so that you can be comfortable that you have sized everything properly.  In the end you have gained a much more efficient system and money saved can go into other parts of the system, like high performance disk cache or SSD’s.

    So Thin Provision everything?

    Definitely not!  There are definitely trade-offs with Thin Provisioning.  Every application needs to be looked at separately.  Some applications are not supported on thin volumes, some applications are too unpredictable for thin volumes.  Essentially when you make volumes thin, you are compressing the footprint of the IO’s in the SAN.  This has performance impacts, but can be dealt with to some degree by rebalancing the drives in the SAN.  So there are parts in a SAN where it may absolutely make sense to use Thin Provisioning and then there are parts that may make sense to not use it.  Also in some cases to meet RPO/RTO’s you may need to change the RAID of the underlying storage to a more expensive design, for example going from RAID-5 to RAID-6.  So that is a cost that needs to be considered.

    Conclusion

    All Thin Provisioning should be looked at is as a tool in the toolbox.  It is not a black or white choice you make on how you setup your entire storage environment, but rather a choice being made per application.  Just like many other storage applications out there consideration must be given to performance, scalability, efficiency, and the benefits you stand to gain from implementation.  Obviously disk manufacturers don’t directly benefit from Thin Provisioning so there is bound to be some misinformation out there.  We are basically oversubscribing a lot of things in the datacenter and storage should be looked at just the same, there are situations where you would want to do it and situations where you would not.  I believe however that in almost any environment there are places for Thin Provisioning to be used and I welcome the technology and it’s future improvements.

    The datacenter and all things in it have really heated up these last 18-24 months.  Compute companies are going in the storage business, networking companies are going into the compute business, everyone wants to have as much of the datacenter pie as they can.  The good side to all of this, is a lot of technology and innovation, and prices being driven down for the customers as competition beat each other up over and endless stream of features.

    But who has the best products?  Well, it depends on who you ask.  Ask the competition and they will be quick to point out everything you do wrong and they do right.  Ask the manufacturer themselves and they will tell you they have vastly superior products.  Everyone has something to say.  But at the end of the day, who really cares who has the absolute cheapest disk, the best de-dupe, the lowest power consumption, the greenest solution?  Is that what you are shopping for?  Storage arrays don’t just do one job, they do many.  If you focus on any one task, you are not going to find a solution that beats everyone hands down.  Vendor A has better de-dupe, Vendor B has a better snapshot technology but Vendor C has the cheapest cost per/Tb, which do you choose?

    Too many times, customers get led down a path booby trapped with pitfalls and FUD.  What customers need to concentrate on is why are they really looking to invest in a new solution?  What is wrong with what they have? What would they like to see improved?  Customers today are trying to get a lot of mileage out a storage array.  Not only is space important, but features like Thin Provisioning, Data Protection, High Availability, Storage Virtualization, NAS, FCoE, De-Dupe, Storage Encryption, etc, etc.  If you make your purchase about any one single item, you are likely to get derailed from whatever the true “best” solution may be for you and instead making decisions on data that may or may not change much for you at the end of the day.

    This is nothing unique to storage, I see this day in and day out with everything from blade servers to ethernet switching.  As an engineer I am always truly looking to solve a customers problem in the best way I can, and I am looking holistically at everything from their ability to administrate the technology, fund it, take advantage of its features to migration and support costs.  These are important things to consider, the cost of a storage migration can easily eat up any savings made in selecting a more inexpensive array!

    Anyways, I digress, I think I had way to much fill of marketing hype today from just about every manufacturer out there.  This is a rapidly changing industry, today’s leader in de-dupe or data protection could be in the backseat within 12-18 months if not sooner.  Don’t make decisions on a solution based on one or a few pieces of data that may or may not matter, pick solutions that solve your problems.