Oct 07, 2010 | Post by: aaron 3 Comments

CCNP SWITCH 642-813 :: High-Availability Overview

           

High availability is an organizational objective that enables resilience by increasing network availability and includes the following components:          

  • Redundancy
  • Technology
  • People (ex. skills, training)
  • Processes (ex. change control)
  • Tools (ex. network management, documentation)

Review of Failover Times          

  • EIGRP and OSPF can both achieve sub-second convergence time
  • RSTP converges in about 1 second
  • EtherChannel can failover in approximately 1 second (When a single link in the bundle fails, it redirects traffic to the other links)
  • Default HSRP timers are 3 seconds for hellos and 10 seconds for hold time but pest practice says to change hellos to 1 sec. so convergence takes less than 3 seconds
  • The Windows XP TCP/IP stack will hold a session open for about 9 seconds

Optimal Redundancy          

Redundancy is not only a question of  added cost vs. uptime and resiliency, but also a question of complexity.  The more hardware and software deployed in the name of redundancy adds administrative overhead and complexity, which is tough to put numbers on.           

Cisco recommends:          

  • Redundant switches at the core and distribution layers with fully redundant links
  • Access switches should have redundant links to redundant distribution switches
  • Avoiding single points of failure as much as possible
    • This can be achieved at the access layer with help from SSO (for layer 2) and potentially NSF (for layer 3)

           

Redundant Supervisor Engines

Providing redundant switch supervisor engines adds another level of high-availability for critical distribution and core layer devices.  Redundant switch supervisor engine options are only available on Cisco Catalyst 4500 and 6500 families of switches.          

The three redundancy options are:          

  • RPR (Route Processor Redundancy) and RPR+
  • SSO (Stateful Switchover)
  • NSF (Non-Stop Forwarding)

RPR was the first form of supervisor engine redundancy and is no longer the preferred option.  The primary reason is the time required to failover to the backup supervisor engine.          

RPR – 2 to 4 minutes on 6500 (<60 seconds on 4500)
RPR+ – takes between 30-60 seconds          

RPR also does not synchronize routing information with the redundant supervisor engine, so all dynamic routing state information is lost upon failover.  Also, upon failover the FIB tables are cleared so all dynamic routing protocols must reconverge.  Only static routes will remain in tact as the are manually configured.         

           

Stateful Switchover (SSO)

SSO is designed to minimize disruption while transitioning layer 2 services during a supervisor failover.    Even a clock synchronization failure between supervisors is enough to cause a failover with SSO.       

The redundant supervisor starts up in a fully initialized state and syncs with the startup and running configuration of the active supervisor engine.  All subsequent changes are then also updated, allowing for seamless continuation of all supported layer two protocols.          

SSO recognizes the link status of every port, so links that were active before the switchover remain active.  Neighboring devices do not see the link go down and spanning-tree remains unaffected.          

On the 6500s, the switchover takes between 0-3 seconds.  On the 4500 series switches it takes less than a second.  Layer 3 information must be relearned however which includes rebuilding ARP tables and layer 3 CEF adjacency tables          


Configuring SSO

Switch# configure terminal
Switch(config)# redundancy
Switch(config-red)# mode sso          

Verifying SSO
Switch# show redundant states          

  

NSF

Non-stop Forwarding, or NSF, is another redundancy protocol designed to accompany SSO.  Unlike SSO, which allows seamless layer 2 transitions during a failover, NSF is designed to optimize layer 3 reconvergence after a failover.  When both are used, zero or near zero packets are lost during the transition.  NSFhelps avoid route flapping problems by using the FIB table for failover.         

NSF works by continuing to forward CEF flows while layer 3 routing protocols reconverge behind the scenes.  The standby supervisor maintains a copy of the CEF entries and in the event of a failover, it uses those entries to prevent a loss of traffic.  After the routing has reconverged and a new RIB is built, the old CEF entries are removed.          

Changes have been made to many of the modern routing protocols (EIGRP, OSPF, IS-IS, BGP) so that upon switchover, an NSF-enabled router sends special packets that trigger routing updates from the NSF-aware neighbors without resetting the peer relationship and preventing route flapping and changes.          

In summary, NSF improves network availability and stability.          

            

Configuring NSF

The configuration is different for EIGRP, IS-IS, and OSPF than for BGP.  See the examples below:          

Switch# conf t
Switch(config)# router ospf 100
Switch(config)# nsf          

Switch# conf t
Switch(config)# router bgp 10
Switch(config)# bgp graceful-restart          

 

HSRP

Several first hop redundancy protocols exist including IRDP, HSRP, VVRP, and GLBP.  HSRP is another high-availability tool like Spanning Tree and dynamic routing protocols.         

Default gateways are essential for devices to communicate with devices outside their local network.  If the gateway is unavailable for any reason, external conversations cease.  In an effort to mitigate that situation, first hop redundancy protocols have been developed to provide pairs of gateways, often one active and the other in standby, to allow an always-up default gateway.          

HSRP (RFC 2281) is a redundancy protocol developed by Cisco to solve this problem.  HSRP provides a virtual MAC and IP address that represents a set (2 or more) of physical routers.  The virtual IP will be used as the default gateway address for the segment.  The virtual IP will respond to any ARP requests for the MAC address of the default gateway with its own.          


The active router sends hellos (multicast 224.0.0.2 // UDP port 1985) to the standby router(s) to let them know it is still up.  If a standby router stops receiving hellos from the active router, it assumes the role of active and takes over forwarding packet for the network - all transparent to the end systems.  

 


HSRP Groups
The virtual MAC used is always 0000.0c07.acxx where xx is the HSRP group ID.
    The .0c07 portion is the well-known HSRP virtual MAC identifier.    For example, if you see a message with XXXXXX.0c07.0b where the Xs are random MAC values, the HSRP group number would be 11.  The 0b HEX values after the .0c07. is 11 in base 10 format.  

Note:  There can be only a single active and single standby router in a HSRP group.    After two routers, the rest stay in initial state and wait for the active or standby to go down before contending for the active and standby position.  The active router processes packets sent to the virtual router.       
The active router is the HSRP group is determined by an election process.  The router with the highest HSRP priority configured wins and if not specific priority has been set, the router with the highest IP address is elected as the active router.          

Note:  A new election will only occur if the active router is removed, the same is true for the standby router.  This default behavior can be changed with the preempt command.          


HSRP States

Initial
State from which routers begin HSRP process.         

Standby
A candidate to become the next active router.         

Learn
The router is still waiting to hear from the active router.         

Active
The router is currently forwarding packets.         

Listen
Listens for hello messages from the active and standby routers.         

Speak
Participates in the election for the active or standby router.  This is also the state an active router enters immediately after it has been preempted by a higher priority router.         

** Hellos are sent in the active, standby, and speak states.         

            

HSRP Configuration

When configuring both spanning tree and HSRP on a segment, it is best practice to make the root bridge and HSRP active router the same device.    HSRP can only be configured on a layer 3 interface including SVIs, routed interfaces, and L3 etherchannels.      

HSRP Configuration          

Switch(conf-if)# standby group-number ip ip-address          

The group number is only required if you plan on implementing more than one HSRP group on the router.  If none is specified, group number 0 will be used.
A priority value can be set to force a router to become the active router in the group.  The default is 100, and it can be manually set between 0 and 255.  Higher wins.  If the priority is the same, the router with the highest IP address will become active for that standby group.  Load sharing is often implimented with HSRP by configuring multiple groups and assigning different VLANs to each.        

To set the HSRP priority value for a router:
Switch(conf-if)# standby group-number priority priority-value          

The no standby priority command will assign the router a priority of 100 (default). 
Remember that if two routers are manually booted up at the same time, if the one with the lower priority boots up first – it will become the active router in the group even though it’s priority is lower.  That is because it will not see any other routers when it begins the election process and will transition straight to active.  Once the other router comes up, it will not automatically become active.  To change this, use the preempt command on the router you want to remain active.          

Switch(conf-if)# standby group-number preempt          

To test, use the command show standby brief.          

            

HSRP Authentication

Authentication is optional with the following command:          

Switch(conf-if)# standby group-number authentication password          

The default password is cisco if none is specified and the password string must be the same on all members of the standby group.          


HSRP Timers

HSRP uses two important timers between the active/standby routers.  Hello timers are used to send to exchange HSRP information while the hold down timer is used to determine how long before a router is declared to be down in a group.  The default hello times are 3 seconds and the default hold down timer is 10 seconds.  That means there could be up to a 10 second delay before the standby router begins forwarding traffic if the active goes down.  To tune the timers (in seconds):          

Switch(conf-if)# standby group-number timers hellotime holdtime          

Example:
Switch(conf-if)# standby 10 timers 1 3
          

Note:  If you are noticing the HRSP states frequently changing, you may have a physical layer problems or a spanning-tree loop.  If you notice the output, “Standby router is unknown expired”, you likely have a HRSP misconfiguration or a physical layer issue.         


HSRP Versions

HSRP comes in two versions, 1 and 2.  The most significant difference is that v1 only allows up to 255 group numbers and v2 allows up to 4095 – making it now possible to correspond group numbers with VLAN IDs.          


Tracking

Tracking a critical uplink interface can force a re-election by decrementing the active router’s priority value by a set amount (default 10).          

Switch(conf-if)# standby  group-number  track  interface  value-to-decrement          

Example:
Switch(conf-if)# standby 10 track fa 1/0/1 100
          

 
 

VRRP

VRRP is an open standard redundancy protocol that is similar to Cisco’s HSRP.  One difference is that the virtual IP can either be a virtual one (as is the case with HSRP) or it can be the actual IP address of the active router.        

   

The VRRP ”master” forwards the traffic and is chosen because it owns the real IP address or has the highest priority (default is again 100).  The “backup” router takes over if the master fails.  Priority values are between 1-255.  If the master router fails, it advertises a priority of 0, forcing an election amongst the backup routers without waiting for the hold down timer to expire.        

  

Note:  Multiple VRRP groups are allowed (like HSRP).         

 


VRRP
 Configuration

Switch(conf-if)# vrrp group-number ip virtual-ip-address
Switch(conf-if)# vrrp group-number priority priority-value 
         

            

VRRP Timers

  • Advertisements, or hellos – default 1 second
  • Master down interval = 3 times the advertisement time + skew (essentially the same as HSRP’s hold down timer)
  • Skew time = (256-priority)/256.  Used to ensure the highest priority backup router becomes master.

Note:  Make changes on the master because changes in timers are then propagated to the backups automatically.          

Switch(conf-if)# vrrp  group-number  advertise  time-in-seconds          

Note:  VRRP cannot track interface changes, but can track IP SLA object groups.          

            

GLBP

One of the major limitations to both HSRP and VRRP is that a single router handles traffic for the whole group, leaving the others inactive until the master router fails.  GLBP or Gateway Load Balancing Protocol solves this dilemma by load balancing traffic over up to four gateways, maximizing bandwidth.  One virtual IP is used, but each participating router uses a virtual MAC address which are used to respond to ARP requests.          

Note:  GLBP is only suppported on Cisco’s 4500, 6500, and Nexus lines.          

There are three load sharing options:          

  • Weighted load balancing- based on preconfigured weights assigned to gateways
  • Host-dependant load balancing – each hosts uses a specific gateway
  • Round-robin load balancing – Each MAC is used to respond in turn (default)

The routers running GLBP elect a single Active Virtual Gateway (AVG), which manages the load balancing and responds to ARPs.  The highest priority router wins; in a tie highest IP address wins. group members sends hello multicasts every 3 seconds (multicast address 224.0.0.102), if a router goes down, another will answer for its requests.      

The job of the AVG is to assign virtual MAC addresses to each of the other GLBP routers and to assign each network host to one of the GLBP routers.  The routers that recieve the MAC address assignment are the Active Virtual Forwarders, or AVFs.          


GLBP Configuration

Switch(conf-if)# glbp group-number ip virtual-ip-address
Switch(conf-if)# glbp group-number priority priority-value 
         

Remember that the default gateway IP address that is configured on the end hosts should be set to the virtual IP address.       


IRDP

Some newer hosts use the ICMP Router Discovery Protocol (RFC 1256) to find a new router when a route becomes available.  A host running IRDP listens for hello multicast messages from its configured router and uses an alternate router when that router is no longer available.  It is not necessary to understand the technical details of how IRDP works, but be aware that it is a valid first hop redundancy protocol.    

      

3 Comments to CCNP SWITCH 642-813 :: High-Availability Overview

  1. Robert Wilson
    October 18, 2010 2:17 am

    thanks for the post

  2. magnific
    November 1, 2010 5:39 am

    this is the way a blog should be! thanks!

  3. Allan CCNA
    December 16, 2010 2:03 pm

    I love all the material so far for the Switching exam. I wish I could see a diagram for HSRP/VRRP/GLBP

Leave a Comment

Your email address will not be published. Required fields are marked *

*

  • Recent Testimonials

    I just wanted to thank you for your notes, they REALLY helped me put the SWITCH topics into perpesctive. 642-813 is as you know quite a difficult exam because it covers such a wide range of areas, and Im happy to say that with the help of your notes I successfully passed it today.

    Cheers,
    Sean from Ireland

    Really liked the design and makeup of the guide. Topics and hints and tips were right on the mark. Very helpful. a job well done and much appreciated!

    Derek

    Just thought that I’d take some time to write and thank-you for creating the CCNP Switch Guide, I bought the guide 10-days ago as final preparation for my exam which I sat this morning and I passed with 934 largely thanks to the guide!!! The guide has been absolutely invaluable, so concise and straight to [...]


    Hi, I passed the Route exam yesterday with a score of 965 and I couldn’t have done it without the use of your guide. I love the way in which your guide is so exam focussed, it cuts out all of the unnecessary padding that you find in the Cisco press! As I’ve now used [...]

    Cheers,
    Chris
  • Resource Downloads

Content Protected Using Blog Protector By: PcDrome.