Tuesday, May 27, 2008

My first touch- Optical Networking

Introduction
Optical networking is some flashing light moving back and forth across a glass rod. This flashing light carries the information traveling between networking components, such as optical switches and routers.

Optical networking is enabled because of two improvements in the field of shining a flashlight in one end of a glass rod:
-> The glass used in fiber optics is specially designed so that it is low in impurities. Further, within the fiber itself, the light can bounce around and propagate to the far end.
-> Very powerful lights called lasers are capable of traveling much farther than regular light.

Optical networking, offers enhancements over conventional networking because it provides three important network performance improvements: speed, capacity, and distance.

How Optical Networking Works
The transmission of light in optical fiber is most commonly explained using the principle of total internal reflection (TIR). This means that 100 percent of light that strikes a surface is reflected. For means of comparison, a mirror reflects about 90 percent of the light that strikes it, so you can see that TIR is a high standard to meet.

When light is emitted be it from a powerful laser or from candlelight, the radiated light can bounce, assuming it strikes the right material. Light can be manipulated in two basically different ways:
-> Reflection means that the light bounces back.
-> Refraction means that the light’s angle is altered as it passes through a different medium (like a glass of water, a prism, or a fiber). The angle is determined by the angle of incidence. The angle of incidence is the angle at which light strikes the interface between an optically denser material and an optically thinner one.

For TIR to occur, the following conditions must be present:
-> Beams of light must pass from a dense material to a less dense material.
-> The incident angle must be less than the critical angle. The critical angle is the angle of incidence at which light stops being refracted and is insteadtotally reflected.

The core and cladding are constructed out of optically denser and optically thinner types of highly pure silica glass. These components are mixed with components called dopants (like erbium), which adjusts their refractive indices. The difference between the refractive indices of the two different kinds of glass causes most of the emitted light to bounce off the cladding and stay within the core, traveling to the endpoint. The critical angle requirement is met by controlling the angle at which light is beamedinto the core.

Optical Fibers
For optical networking to happen, it must use a fantastically pure kind of glass. In this case, silica glass is the blend of choice. Even though it is exceptionally pure, there is still a little loss of light as the light travels through. The loss, however, is much less pronounced than with ordinary glass.

Once you have a core of pure silica, an extra layer of glass (called cladding) is wrapped around the core. The cladding has a lower refractive index than the core. The difference in refractive indices guides the light into the core and prevents the light from escaping through the sides of the fiber.

An optical signal can generate many different light waves, which can travel through the fiber simultaneously. This is the method used in multimode fibers (which provides a medium over which a number of concurrent transmissions can be sent). Unfortunately, this can also cause problems as the waves arrive at the end of the fiber and are out of sync. Most optical networks use single-mode fiber, which has a rather small fiber core (about 9 micrometers—a micrometer is a millionth of a meter), thus ensuring that only a single light wave traverses the fiber, alleviating receiving problems

Fiber optic cables contain extremely thin strands (10 micrometers across) of silica glass, which are then encased in a thicker, denser layer of silica glass (about 125 micrometers across). Once a protective wrap is applied, the fiber optic cable is a quarter of a millimeter in diameter. The glass used within the fiber is highly pure, thus ensuring that the photons keep moving. Also, these strands of glass can be hundreds or thousands of miles long.

Design
Optical fiber is composed of three parts:
-> The core, which carries the light
-> Cladding, which traps the light in the core, causing total internal reflection
-> The Buffer, which is the insulating wrap protecting the fiber


1. First, the light source (in this case a laser) converts the network’s electrical signal into pulses of light.
2. The light is injected into the core of the fiber.
3. The photons bounce off of the border between the core and the cladding. Because the core and the cladding have different refractive indices, the photons are bounced back into the core.
4. The photons continue through the length of the fiber.
5. Ultimately, they exit the fiber and are converted back into electrical signals by the light detector.
6. Because the fiber doesn’t exist in a harm-free environment, the core and cladding are encased in a protective wrap called the buffer. The buffer makes the fiber more durable and easy to handle.


Types of Fiber
Fiber comes in two basic types, single-mode and multimode.
-> Single-mode fiber has a core size of only about six times the wavelength of the fiber. In turn, this causes all the light to travel across a single path (in optical networking parlance, a path is referred to a mode). Single-mode fiber is useful because modal dispersion disappears and the bandwidth of the fiber is at least 100 times greater than graded index fiber.
-> Multimode fiber allows light to travel across several different paths through the core of the fiber, which enter and leave the fiber at various angles

Multimode fiber can further be broken down into two different types of fiber.They differ based on the index.
-> Step index fiber has a core made from a single type of glass. Light within the fiber travels in straight lines and reflects off the cladding. Step index fiber has a numerical aperture that is determined by the differences in the indices of refraction of the core and cladding. Because each mode of light travels a different path, a pulse of light is dispersed while traveling through the fiber, thereby restricting the bandwidth of step index fiber.
-> Graded index fiber has a core that is composed of many different layers of glass. The layers differ because of their densities, thereby transmitting light along a parabolic path. In glass with a lower index of refraction, light travels faster as it approaches the outside of the core. Conversely, light traveling closest to the core will travel at the slowest pace. Since the fiber contains these different layers of glass, the bandwidth capacity of the fiber is 100 times greater than for step index fiber.

Size Matters
Fiber can also be categorized by its size. The most popular fiber in multimode today is 62.5/125.

The first multimode fiber in wide use was 50/125. Telephone companies needing more bandwidth for long-distance uses were the first to adopt this type of fiber. Because it has a small core and a low numeric aperture, it was difficult to connect to LED sources. Because of this difficulty, a move was made to 100/140 fiber. It worked well, but because the core was so large, it was expensive to manufacture. Further, it had a unique cladding that required connector manufacturers to develop connectors specifically for it. The next popular size of fiber was 85/125. It provided connectivity to LED sources but used the same connectors as other fibers. Finally, 62.5/125 became the de facto standard for when IBM turned to 62.5/125 for its fiber optic hardware. When this occurred, other types of fiber have dropped away, leaving 62.5/125 as the industry standard.

Monday, May 26, 2008

My FAQs 3 - BGP

port ?
TCP 179

Goals of bgp?
->First, it needed to support the meshlike connectivity of ISP networks.
->It also required extensive policy controls to enforce the administrative policies of each ISP.
->It needed the ability to reliably transmit routing information between BGPpeers.
->Finally, the protocol required the ability to scale route advertisements beyond a few thousand routes

Loop prevention?
Provides loop prevention through an attribute called the AS Path, which is a collection of AS numbers through which a particular route has passed. The use of this attribute also leads to a common description of BGP as a path-vector protocol. In a BGP network, each router knows the networks a route has traversed (the path) and the direction to send a packet for the route (the vector).

EBGP?
When two BGP routers are in different AS networks, the session between them is considered an external BGP (EBGP) connection. By default, an EBGP connection is formed between directly connected peers. This requirement is enforced by setting the time-to-live (TTL) of the IP packet to 1, thereby not permitting an intermediate router to forward the BGP packet. Once the EBGP session is established, the two peers can begin to exchange routing knowledge with each other. All active BGP routes learned from other EBGP sessions are advertised. In addition, all active BGP routes learned from internal BGP peers are advertised.

IBGP?
The connection of two BGP routers within the same AS is called an internal BGP(IBGP) connection. Unlike the EBGP variety, there is no requirement forphysical connectivity between IBGP peers. The TTL of the BGP packets is set to64 to allow for connectivity across an AS. Once the IBGP session is established, routes are exchanged between the peers. By default, only active BGP routes learned from EBGP peers are advertised across an IBGP session.

BGP FSM?
Idle Idle is the initial neighbor state, in which it rejects all incoming session requests. After the BGP process starts, a TCP session is initiated to the remote peer. The local router transitions to the Connect state and begins to listen for a connection initiated by the remote router.
Connect In the Connect state, the local router is waiting for the TCP session to be completed. If it is successful, the local router sends an Open message to the peer and transitions to the OpenSent state. Should the TCP connection attempt fail, the local router resets the ConnectRetry timer and transitions to the Active state. If the ConnectRetry timer reaches 0 while the local router is in the Connect state, the timer is reset and another connection attempt is made. The local router remains in the Connect state.
Active In the Active state, the local router is trying to initiate a TCP session with its peer. If the session establishes successfully, an Open message is sent and the local router transitions to the OpenSent state. If the TCP session fails to establish, the local router initiates another session, sets the ConnectRetry timer to 0, and transitions back to the Connect state. Attempts by the remote router to connect from an unexpected IP address for the session causes the local router to refuse the connection. The local router remains in the Active state and resets the ConnectRetry timer.
OpenSent The OpenSent state is reached upon a successful TCP establishment. The local router sends a BGP Open message and waits for an Open message from the remote peer.
When a valid Open message is received, the local router begins to send Keepalive messages to the remote router. The BGP peers negotiate the session parameters and the local router transitions to the OpenConfirm state. Should a TCP disconnect be received while in this state, the local router terminates the BGP session, resets the ConnectRetry timer, and transitions back to the Active state.
OpenConfirm When the local router receives a valid Open message from the remote peer, the OpenConfirm state is reached. The local router sends Keepalive messages to the peer and waits for a Keepalive message in return.
Established The Established state is achieved when a Keepalive message is received while in the OpenConfirm state. This is the final state of a peer relationship and designates a fully operational connection.
Two BGP peers can exchange routing information only when the Established state is reached. All other BGP peering states designate a nonfunctional session

Message types?
The Open Message The Open message is the first packet BGP sends to a peer after the TCP connection has been established. It allows the two peers to negotiate the parameters of the peer session. These parameters include the BGP version, the hold time for the session, authentication data, refresh capabilities, and support for multiple Network Layer Reachability Information (NLRI).
The Update Message Routing information is sent and withdrawn in BGP using the Update message. If needed, each message contains information previously advertised by the local router that is no longer valid. The same message may also contain new information advertised to the remote peer. Each Update contains a single set of BGP attributes and all routes using those attributes. This format reduces the total number of packets routers send between BGP peers when exchanging routing knowledge.
The Notification Message When a BGP peer detects an error within the session, it sends a Notification message to the remote router and immediately closes both the BGP and TCP sessions.
The Keepalive Message A BGP Keepalive message contains only the 19-octet message header and no other data. These messages are exchanged at one-third the negotiated hold-time value for the session, if necessary. The advertisement of an Update message within the keepalive period resets the timer to 0. In short, a Keepalive is sent only in the absence of other messages for a particular session. Should the local router not receive a Keepalive or Update message within the hold-time period, a Notification message of Hold Time Expired is generated and the session is torn down.

Routing Information bases?
Each BGP router establishes memory locations in which to store routing knowledge. These are collectively known as a Routing Information Base (RIB). A BGP peer maintains three categories of RIBs: the Adjacency-RIB-In, the Local-RIB, and the Adjacency-RIB-Out.
Adjacency-RIB-In An Adjacency-RIB-In table is created on the local router for each established BGP peer. All routes received from the peer are placed in the appropriate memory table
Local-RIB The best path to each destination is stored in the Local-RIB table. These are the routes that thelocal router uses to forward user data traffic
Adjacency-RIB-Out Each established BGP peer also creates its own Adjacency-RIB-Out table for outbound route advertisements. Only routes currently located in the Local-RIB are eligible to be placed in this outbound database. In other words, a BGP router advertises only routes that it is currently using to forward data traffic.

Route selection process?
1. The Next Hop attribute value for each route must be reachable in the local routing table; otherwise, the local router discards the route.
2. The router selects the route with the highest Local Preference attribute value.
3. The router selects the route with the shortest AS Path length.
4. The router selects the route with the smallest Origin attribute value.
5. The router selects the route with the smallest Multiple Exit Discriminator attribute value. This step is executed, by default, only for routes from the same neighboring AS.
6. The router selects routes learned from an EBGP peer over routes learned from an IBGP peer. If the remaining routes are all EBGP-learned routes, the router skips to step 9.
7. The router selects the route with the smallest IGP metric to the advertised BGP Next Hop.
8. If Route Reflection is used for IBGP peering, the router selects the route with the shortest Cluster-List length.
9. The router selects the route from the peer with the smallest numerical Router ID.
10. The router selects the route from the peer with the smallest numerical Peer Address.

BGP attributes?
Optional Bit (Bit 0) An attribute is either well known (a value of 0) or optional (a value of 1).
Transitive Bit (Bit 1) Optional attributes can be either nontransitive (a value of 0) or transitive (a value of 1). Well-known attributes are always transitive.
Partial Bit (Bit 2) Only optional transitive attributes use this bit. A 0 means each BGP router along the path recognized this attribute. A 1 means that at least one BGP router along the path did not recognize the attribute

Next Hop Next Hop, attribute type code 3, is a well-known mandatory attribute
Local Preference Local Preference, attribute type code 5, is a well-known discretionary attribute.
AS Path AS Path, attribute type code 2, is a well-known mandatory attribute.Origin Origin, attribute type code 1, is a well-known mandatory attribute.
Multiple Exit Discriminator Multiple Exit Discriminator (MED), attribute type code 4, is an optional nontransitive attribute.
Community Community, attribute type code 8, is an optional transitive attribute. It is used to administrativelygroup routes for a common policy action.

My FAQs 2 - OSPF

runs over?
Directly over IP

packet types?
Hello
Database description
Link state request
Link state update
Link state ack

hello?
for discovery of OSPF neighbors. Addressed to multicast 224.0.0.5 for broadcast and point-to-point interfaces. Hello on other interfaces is unicast. Default hello interval - 10 sec

DD?
The DD packet, type code 2, summarizes the local database by sending LSA headers to the remote router. The remote router analyzes these headers to determine whether it lacks any information within its own copy of the link-state database.

virtual links?
Virtual links are used for:

Linking an area that does not have a direct connection to the backbone.

Linking the backbone in case of a partitioned backbone.

Area without direct connection to the backbone
The backbone always need to be the center of all other areas, in some rare case where it is impossible to have an area physically connected to the backbone, a virtual link is used. This virtual link will provide that area a logical path to the backbone area. This virtual link is established between two ABRs that are on one common area, with one of the ABRs connected to the backbone area.

Partitioned Backbone
OSPF allows for linking a partitioned backbone using a virtual link. The virtual link should be configured between two separate ABRs that touch the backbone are from each side and having a common area in between.

LSA types?
1 Router LSA
2 Network LSA
3 Network summary LSA
4 ASBR summary LSA
5 AS external LSA
6 Group membership LSA
7 NSSA external LSA

OSPF neighbors stuck in ex-start state?
1. MTU mismatch b/w neighbors
2. Can't pass large packets in L2

Adjacency states?

Down Down is the starting state for all OSPF routers. A start event, such as configuring the protocol, transitions the router to the Init state. The local router may list a neighbor in this state when no hello packets have been received within the specified router dead interval for that interface.

Init The Init state is reached when an OSPF router receives a hello packet but the local router ID is not listed in the received Neighbor field. This means that bidirectional communication has not been established between the peers.

Attempt The Attempt state is valid only for Non-Broadcast Multi-Access (NBMA) networks. It means that a hello packet has not been received from the neighbor and the local router is going to send a Unicast hello packet to that neighbor within the specified hello interval period.

2-Way The 2-Way state indicates that the local router has received a hello packet with its own router ID in the Neighbor field. Thus, bidirectional communication has been established and the peers are now OSPF neighbors. On Point-to-Point and Point-to-Multipoint interfaces, the state will be changed to Full. On Broadcast interfaces, only the DR/BDR will advance to Full state with their neighbors, all the remaining neighbors will remain in the 2-Way state.

ExStart In the ExStart state, the local router and its neighbor establish which router is in charge of the database synchronization process. The higher router ID of the two neighbors controls which router becomes the master.

Exchange In the Exchange state, the local router and its neighbor exchange DD packets that describe their local databases.

Loading Should the local router require complete LSA information from its neighbor, it transitions to the Loading state and begins to send link-state request packets.

Full The Full state represents a fully functional OSPF adjacency, with the local router having received a complete link-state database from its peer. Both neighboring routers in this state add the adjacency to their local database and advertise the relationship in a link-state update packet.

Router LSA?
For advertising the networks connected to the local router. This includes all links connected to the router, the metrics of those interfaces, and the OSPF capabilities of the router. It has area scope.

Need for DR and BDR?
Broadcast segments in a network, such as an Ethernet link, pose a special problem to link-state protocols and their peer-to-peer nature. Multiple routers on the same physical segment share the resources of that link and produce a lot of redundant information
The ramifications of this process are twofold. First, each router reports the same set of information, the Ethernet link, to the rest of the OSPF network. Second, and perhaps more damaging, every router floods LSAs to each of its adjacent neighbors using the 224.0.0.5 multicast address.

DR?
Each broadcast segment in an OSPF network elects a designated router to act as the main point of contact for the network segment. Each router on the link must become adjacent with the DR, which handles all LSAs for the network. Each router sends the DR information using a new multicast destination address of 224.0.0.6, AllDRRouters. The designated router generates a network LSA, type code 2, to represent the broadcast segment to the rest of the network.

DR election?
Based on priority and router id. Router with the highest priority (value 0 -ineligible for election) becomes DR. If there is a tie, higher router id is selected. The wait time for electing the first designated router on the segment arises from an OSPF timer called the WaitTimer. This is to guarantee exchange of hellos b/w ospf routers

When a higher priority router comes to a network, it will not immediately become the DR/BDR. It has to wait till the next electioni to become BDR first.

Backbone area?
Area 0.0.0.0 connects all areas and redistributes all non-backbone routing info b/w the areas. All other areas must be connected to the backbone area.

OSPF router types?
Internal router A router that maintains all operational interfaces within a single area is known as an internal router. An internal router may belong to any OSPF area.
Backbone router A router that has at least one interface in area 0 is known as a backbone router.
Area border router The area border router (ABR) connects one or more OSPF areas to the backbone. This means that at least one interface is within area 0 while another interface is in another area. The ABR plays a very important role in an OSPF network.
Autonomous System boundary router An Autonomous System boundary router (ASBR)
injects external routing knowledge into an OSPF network.

N/w summary LSA?
Routing knowledge crosses an area boundary in an OSPF network by using a network summary LSA, type code 3. By default, each Type 3 LSA matches a single router LSA or network LSA on a one-for-one basis. The network summary LSA also has an area-flooding scope.


AS external LSA?
Both the router and network summary LSAs are effective at propagating internal OSPF routing knowledge throughout the network. They are not capable, however, of carrying external routing information. The AS external LSA, type code 5, was defined for this explicit purpose. External routes in an OSPF network can come in multiple forms like redistribute static routes, or from a network(internal or external) that is not currently running OSPF.

ASBR summary LSA?
While the Type 5 LSA provides the network information necessary to reach the external networks, the OSPF routers may not automatically begin using that data. The address of the ASBR must be known in the link-state database via a router LSA. For each ASBR reachable by a router LSA, the ABR creates an ASBR summary LSA, type code 4, and injects in into the appropriate area. This LSA provides reachability information to the ASBR itself. ASBR summary LSA has area scope and is generated by an ABR.

Stub areas?
An OSPF stub area provides for a smaller link-state database by restricting the presence of AS external LSAs within the area. Since a single Type 5 LSA is generated for each external route, the potential number of LSAs in an OSPF network can be quite sizeable. ( disadvantage of forwarding potentially unroutable
packets)

The responsibility for enforcing an OSPF stub area rests with the ABR. Under normal circumstances, the ABR re-floods the Type 5 LSAs into the area. When configured as a stub area, however, the ABR simply does not flood the AS external LSAs into the area. To provide the required IP reachability, the ABR should instead generate a summary LSA for the default route and inject that into the stub area

Totally stubby area?
An expansion of the concept of a stubby area. The ABR in a totally stubby area stops creating and flooding Type 3 LSAs for the backbone and for area 22 routes. The default Type 3 LSA is generated to provide reachability to all routes outside area 10. The basic operation of the stub area did not change in this situation. Types 4 and 5 LSAs are still not present in the area 10 routers.

Not so stubby area?
Suppose that your OSPF network requires connectivity to a partner that is using RIP within its network. The routers in this area have been suffering from database issues that caused the area to become stub. This exact set of circumstances led to the development of the not-so-stubby area (NSSA).

A not-so-stubby area is an OSPF stub area that allows some external routes to be present in the database. This is accomplished with a new NSSA external LSA, type code 7. The Type 7 LSA carries external routing information from the ASBR within the NSSA. It has an area flooding scope, so only routers in the NSSA receive the Type 7 LSA. The external routing information within the LSA is converted by the ABR into an AS external LSA at the area boundary. The ABR floods the Type 5 LSA into the OSPF domain, and no other routers in the network are aware of the NSSA configuration.

Wednesday, May 7, 2008

My FAQs 1 - TCP/IP, RIP

How are the protocols classified?

Protocols are classified into two based on whether they use hop count as the metric or not. Distance vector protocols like RIP use Bellman Ford algorithm. Link state protocols like OSPF uses complete knowledge of the network to detemine the path to destination.

Another classification is based on whether they are running in the same AS or different AS. Protocols like OSPF,RIP which run within the same AS are called Interior Gateway Protocol(IGP) and others are Exterior Gateway Protocol(EGP)

What is Jumbo packet?
Packets whose size exceeds the MTU of the medium. Such packets are fragemented if the Don't Fragment bit in IP header is not set and discarded otherwise

What is ARP?

ARP is used for obtaining the hardware address for the given IP address. ARP requests are sent as broadcast to ff:ff:ff:ff:ff:ff:ff and ARP replies are sent as unicast.

Different types of ARP?

RARP- Reverse Address Resolution - used by diskless machines to obtain their own IP for the MAC address given
GARP- Gratuitous ARP - both request and reply - Source and destination IP set to the address of the machine issuing the request. Destination MAC is broadcast. Usually no reply obtained.
-> Used to detect IP conflicts
->Used to update other machines' arp table
->Every time an interface goes up, the driver for that interface sends out a gratuitous arp to preload the arp tables of other machines.
Proxy arp - Router will respond with its own MAC on the behalf of the host to which the arp request is sent. Usually occur when two hosts are connected over a router.

Collision domain and broadcast domain?
Collision domain-
Network segment where packets can collide while sent over that shared medium
Broadcast domain - Logical division of network where all nodes can reach each other though an L2 broadcast

CSMA/CD?
When two hosts would send at the same time, though, a collision would occur. When the signals would collide, both would be rendered unusable. A standard had to be created that would have the hosts follow rules relating to when they could send data and when they could not. This standard is Carrier Sense Multiple Access with Collision Detection, referred to as CSMA/CD.

If two of the three computers on this segment send data at the same time, a collision occurs.

To avoid this, CSMA/CD forces computers to “listen” to the Ethernet before sending in order to make sure that no other host on the wire is sending. When the Ethernet segment is not busy, the device that wants to send data can do so. The sender will then continue to listen, to make sure that sending the data didn’t cause a collision.

If a collision is heard, both of the senders will send a jam signal over the Ethernet. This jam signal indicates to all other devices on the Ethernet segment that there has been a collision, and they should not send data onto the wire. (A second indication of a collision is the noise created by the collision itself.)

After sending the jam signal, each of the senders will wait a random amount of time before beginning the entire process over. The random time helps to ensure that the two devices don't transmit simultaneously again.

Packet firewall filter and proxy based firewall filter?

Packet firewall filter- L3 firewall filter
Proxy based firewall filter - application layer filter

RIP - Routing Information Protocol

Protocol & port?
UDP port 520

Infinity metric/count to infinity ?

To prevent routing loops, the RIP protocol depends on a function known as “counting to infinity.” A maximum metric (the infinity metric) is defined within the protocol, and all routes with a larger metric are deemed unusable. For RIP, the maximum hop count is defined as 15. If a router receives a RIP update with a metric value over 15 (that is, 16 or greater) after it is incremented, the router must throw the update away and the destination is considered unreachable.

Messages?

Request and response
The purpose of a Request message is to ask for all or some part of the local router’s current routing table Request message has a single entry in it with a metric of 16 and an address family identifier field that contains all zeros. This message translates into “Send me your entire routing table.” A Request message may contain one or more specific route entries. In this case, the local router consults its routing table for each of the destinations listed

A RIP router receives Response messages for one of three different reasons:
>In response to a Request message generated by the local router
>A regular (unsolicited) Response message sent by a neighbor
>A triggered update Response message sent by a neighbor


Split horizon?
When the Update timer expires and a Response message is generated, split horizon prevents the local router from including any routes learned from a neighbor on the interface from which the message is being sent out.

SH with Poisoned Reverse?
Instead of never advertising a route back to the neighbor it was learned from, the router advertises it with an infinity metric.


Timers?
RIP uses a number of timers in its operation, among them the Update timer, the Hold-Down timer, the Timeout timer, and the Garbage Collection timer. A RIP router uses an Update timer to advertise its complete routing table (less split horizon) to all its neighbors. The JUNOS software uses 30 seconds as the default Update timer.

Hold down timer prevents the propagation of bad routing information throughout the network. The Hold-Down timer is used when an update is received by the local router that contains a route from a peer with a higher metric (other than infinity) than the one in the current routing table.

The Timeout timer is used to ensure that the copy of the route is valid and usable. When the route is first installed in the table, this timer is initialized to 120 seconds, its maximum value. The timer value is updated when a Response message from a neighbor is processed and the route is maintained in the table. In this case, the timer is reset to 120.

The Garbage Collection timer runs to a maximum non-configurable value of 180 seconds, at which time the route will be removed from the routing table.

Limitations ?

Scalability
RIP does not scale well for large networking environments. One issue is the maximum hop count used (discussed next). Another issue is the use of the 255.255.255.255 broadcast address for Response message updates in RIP version 1. On broadcast networks, this is quite disruptive to other IP (non-RIP) hosts.
Small hop count limit
Sixteen hops is the defined infinity metric that denotes an unreachable or unusable subnet. This value limits the size or “diameter” of the networks that can be built using RIP.
Slow convergence
Although triggered updates can help advertise new information into RIP, the timers can have the opposite effect. When a route needs to be removed from the protocol, the timer values for the Hold-Down, Timeout, and Garbage Collection timers can mean that a topology change at one end of the network may not be known at the other end of the network for several minutes.
Suboptimal routing
Since RIP routers utilize only the hop count as the metric, some suboptimal routing may occur. This occurs because hop count does not allow for dissimilar bandwidths, fewer delays, or less congestion on other alternate paths to a destination. When these alternate paths are available, RIP will always pick the one with the smallest hop count regardless of the interface speeds of the other path.
Nonhierarchical design
As the size of the RIP routing domain grows larger and approaches the maximum diameter of 15 routers, there is no mechanism to divide the domain into smaller, more manageable subdomains

RIPv2 over v1?

VLSM support By default, all RIPv2 Response updates include the subnet mask. This allows v2 routers to support variable-length subnet mask (VLSM) routing and provides for a classless network routing environment.

Multicast announcements RIPv2 sends all Request and Response messages to a multicast address (224.0.0.9) instead of the 255.255.255.255 broadcast address. This provides for better scalability since only RIP-speaking routers (or hosts) need to process the packets.

Authentication RIPv2 supports authentication by means of a password. This allows a RIP router to accept Response messages only from a “trusted” source. Although RFC 2453 specifies the use of a plain-text password only, the JUNOS software also supports the use of MD5 hashes, as defined in RFC 2082.

Route tag RIPv2 supports a 16-bit field called a route tag. This field was originally included to indicate whether the route was derived internally or externally to the RIP network. This field can also be used for other purposes, including administrative routing policy control.

Next hop address RIPv2 allows the sending router to advertise the immediate next hop address for a route entry. Similar to an ICMP redirect message, this field is helpful in a broadcast environment to avoid an extra forwarding hop when the advertising RIP router is not the immediate next hop for the route.


Configurations?

user@Cabernet# show
rip {
group neighbor-routers {
neighbor fe-0/0/0.0;
neighbor fe-0/0/1.0;
}
}

user@Cabernet> show rip neighbor
Source Destination Send Receive In
Neighbor State Address Address Mode Mode Met
-------- ----- ------- ----------- ---- ------- ---
fe-0/0/0.0 Up 172.16.1.2 224.0.0.9 mcast both 1
fe-0/0/1.0 Up 172.16.2.1 224.0.0.9 mcast both 1

user@Riesling> show route protocol rip
inet.0: 27 destinations, 27 routes (27 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

172.16.2.0/24 *[RIP/100] 00:07:25, metric 2
> to 172.16.1.2 via fe-0/0/0.0
192.168.8.1/32 *[RIP/100] 00:07:25, metric 2
> to 172.16.1.2 via fe-0/0/0.0

Wednesday, January 16, 2008

BFD Notes

Bidirectional forwarding detection


Introduction:-

Bfd is a protocol intended to detect faults in the bidirectional path between two forwarding engines, including interfaces, data link(s), and to the extent possible the forwarding engines themselves, with potentially very low latency. It operates independently of media, data protocols, and routing protocols.

An increasingly important feature of networking equipment is the rapid detection of communication failures between adjacent systems, in order to more quickly establish alternative paths.

The time to detect failures (“detection times”) available in the existing protocols is no better than a second, which is far too long for some applications and represents a great deal of lost data at gigabit rates. Furthermore, routing protocol Hellos are of no help when those routing protocols are not in use, and the semantics of detection are subtly different--they detect a failure in the path between the two routing protocol engines.

The goal of BFD is to provide low-overhead, short-duration detection of failures in the path between adjacent forwarding engines, including the interfaces, data link(s), and to the extent possible the forwarding engines themselves.

An additional goal is to provide a single mechanism that can be used for liveness detection over any media, at any protocol layer, with a wide range of detection times and overhead, to avoid a proliferation of different methods.

It is intended to be implemented in some component of the forwarding engine of a system, in cases where the forwarding and control engines are separated. This not only binds the protocol more to the forwarding plane, but decouples the protocol from the
fate of the routing protocol engine, making it useful in concert with various "graceful restart" mechanisms for those protocols. BFD may also be implemented in the control engine, though doing so may preclude the detection of some kinds of failures.

BFD operates on top of any data protocol being forwarded between two systems. It is always run in a unicast, point-to-point mode. BFD packets are carried as the payload of whatever encapsulating protocol is appropriate for the medium and network. BFD may be running at multiple layers in a system.





Protocol Overview:-

BFD is a simple hello protocol that in many respects is similar to the detection components of well-known routing protocols. A pair of systems transmits BFD packets periodically over each path between the two systems, and if a system stops receiving BFD packets for long enough, some component in that particular bidirectional path to the neighboring system is assumed to have failed.

A path is only declared to be operational when two-way communication has been established between systems, though this does not preclude the use of unidirectional links.

A separate BFD session is created for each communications path and data protocol in use between two systems

Operating Modes

BFD has two operating modes which may be selected, as well as an additional function that can be used in combination with the two modes.

The primary mode is known as Asynchronous mode. In this mode, the systems periodically send BFD Control packets to one another, and if a number of those packets in a row are not received by the other system, the session is declared to be down.

The second mode is known as Demand mode. In this mode, it is assumed that a system has an independent way of verifying that it has connectivity to the other system. Once a BFD session is established, such a system may ask the other system to stop sending BFD Control packets, except when the system feels the need to verify connectivity explicitly, in which case a short sequence of BFD Control packets is exchanged, and then the far system quiesces. Demand mode may operate independently in each direction, or simultaneously.

An adjunct to both modes is the Echo function. When the Echo function is active, a stream of BFD Echo packets is transmitted in such a way as to have the other system loop them back through its forwarding path. If a number of packets of the echoed data stream are not received, the session is declared to be down. The Echo function may be used with either Asynchronous or Demand modes. Since the Echo function is handling the task of detection, the rate of periodic transmission of Control packets may be reduced (in the case of Asynchronous mode) or eliminated completely (in the case of Demand mode.)

Pure asynchronous mode is advantageous in that it requires half as many packets to achieve a particular detection time as does the Echo function. It is also used when the Echo function cannot be supported for some reason.

The Echo function has the advantage of truly testing only the forwarding path on the remote system. This may reduce round-trip jitter and thus allow more aggressive detection times, as well as potentially detecting some classes of failure that might not otherwise be detected.

The Echo function may be enabled individually in each direction. It is enabled in a particular direction only when the system that loops the Echo packets back signals that it will allow it, and when the system that sends the Echo packets decides it wishes to.

Demand mode is useful in situations where the overhead of a periodic protocol might prove onerous, such as a system with a very large number of BFD sessions. It is also useful when the Echo function is being used symmetrically. Demand mode has the disadvantage that detection times are essentially driven by the heuristics of the system implementation and are not known to the BFD protocol. Demand mode may not be used when the path round trip time is greater than the desired detection time.


BFD Control Packet Format:-

Generic BFD Control Packet Format

BFD Control packets are sent in an encapsulation appropriate to the environment.

The BFD Control packet has a Mandatory Section and an optional Authentication Section. The format of the Authentication Section, if present, is dependent on the type of authentication in use.

The Mandatory Section of a BFD Control packet has the following format:










An optional Authentication Section may be present:







Version (Vers)

Denotes the version number of the protocol. Here we define protocol version 1.

Diagnostic (Diag)

A diagnostic code specifying the local system's reason for the last session state change to states Down or AdminDown.
Values are:

0 -- No Diagnostic
1 -- Control Detection Time Expired
2 -- Echo Function Failed
3 -- Neighbor Signaled Session Down
4 -- Forwarding Plane Reset
5 -- Path Down
6 -- Concatenated Path Down
7 -- Administratively Down
8 -- Reverse Concatenated Path Down
9-31 -- Reserved for future use

This field allows remote systems to determine the reason that the previous session failed.

State (Sta)

The current BFD session state as seen by the transmitting system.
Values are:

0 -- AdminDown
1 -- Down
2 -- Init
3 -- Up


Poll (P)

If set, the transmitting system is requesting verification of connectivity, or of a parameter change, and is expecting a packet with the Final (F) bit in reply. If clear, the transmitting system is not requesting verification.

Final (F)

If set, the transmitting system is responding to a received BFD Control packet that had the Poll (P) bit set. If clear, the transmitting system is not responding to a Poll.

Control Plane Independent (C)

If set, the transmitting system's BFD implementation does not share fate with its control plane (in other words, BFD is implemented in the forwarding plane and can continue to function through disruptions in the control plane.) If clear, the transmitting system's BFD implementation shares fate with its control plane.

The use of this bit is application dependent

Authentication Present (A)

If set, the Authentication Section is present and the session is to be authenticated.

Demand (D)

If set, Demand mode is active in the transmitting system (the system wishes to operate in Demand mode, knows that the session is up in both directions, and is directing the remote system to cease the periodic transmission of BFD Control packets.) If clear, Demand mode is not active in the transmitting system.

Multipoint (M)

This bit is reserved for future point-to-multipoint extensions to BFD. It must be zero on both transmit and receipt.

Detect Mult

Detection time multiplier. The negotiated transmit interval, multiplied by this value, provides the detection time for the transmitting system in Asynchronous mode.

Length

Length of the BFD Control packet, in bytes.






My Discriminator

A unique, nonzero discriminator value generated by the transmitting system, used to demultiplex multiple BFD sessions between the same pair of systems.


Your Discriminator

The discriminator received from the corresponding remote system. This field reflects back the received value of My Discriminator, or is zero if that value is unknown.

Desired Min TX Interval

This is the minimum interval, in microseconds, that the local system would like to use when transmitting BFD Control packets. The value zero is reserved.

Required Min RX Interval

This is the minimum interval, in microseconds, between received BFD Control packets that this system is capable of supporting. If this value is zero, the transmitting system does not want the remote system to send any periodic BFD Control packets.

Required Min Echo RX Interval

This is the minimum interval, in microseconds, between received BFD Echo packets that this system is capable of supporting. If this value is zero, the transmitting system does not support the receipt of BFD Echo packets.

Auth Type

The authentication type in use, if the Authentication Present (A) bit is set.

0 - Reserved
1 - Simple Password
2 - Keyed MD5
3 - Meticulous Keyed MD5
4 - Keyed SHA1
5 - Meticulous Keyed SHA1
6-255 - Reserved for future use


Auth Len

The length, in bytes, of the authentication section, including the Auth Type and Auth Len fields.



Elements of Procedure:-

A system may take either an Active role or a Passive role in session initialization. A system taking the Active role MUST send BFD Control packets for a particular session, regardless of whether it has received any BFD packets for that session. A system taking the Passive role MUST NOT begin sending BFD packets for a particular session until it has received a BFD packet for that session, and thus has learned the remote system's discriminator value. At least one system MUST take the Active role (possibly both.) The role that a system takes is specific to the application of BFD, and is outside the scope of this specification.

A session begins with the periodic, slow transmission of BFD Control packets. When bidirectional communication is achieved, the BFD session comes Up.

Once the BFD session is Up, a system can choose to start the Echo function if it desires to and the other system signals that it will allow it. The rate of transmission of Control packets is typically kept low when the Echo function is active.

If the Echo function is not active, the transmission rate of Control packets may be increased to a level necessary to achieve the detection time requirements for the session.

Once the session is up, a system may signal that it has entered Demand mode, and the transmission of BFD Control packets by the remote system ceases. Other means of implying connectivity are used to keep the session alive. If either system wishes to verify bidirectional connectivity, it can initiate a short exchange of BFD Control packets to do so.

If Demand mode is not active, and no Control packets are received in the calculated detection time, the session is declared Down. This is signaled to the remote end via the State(Sta) field in outgoing packets.

If sufficient Echo packets are lost, the session is declared down in the same manner.

If Demand mode is active and no appropriate Control packets are received in response to a Poll Sequence, the session is declared down in the same manner.

If the session goes down, the transmission of Echo packets (if any) ceases, and the transmission of Control packets goes back to the slow rate.

Once a session has been declared down, it cannot come back up until the remote end first signals that it is down (by leaving the Up state), thus implementing a three-way handshake.

A session may be kept administratively down by entering the AdminDown state and sending an explanatory diagnostic code in the Diagnostic field.


BFD State Machine:-

The BFD state machine is quite straightforward. There are three states through which a session normally proceeds, two for establishing a session (Init and Up) and one for tearing down a session (Down.) This allows a three-way handshake for both session establishment and session teardown (assuring that both systems are aware of all session state changes.) A fourth state (AdminDown) exists so that a session can be administratively put down indefinitely.

Each system communicates its session state in the State (Sta) field in the BFD Control packet, and that received state in combination with the local session state drives the state machine.

Down state means that the session is down (or has just been created.) A session remains in Down state until the remote system indicates that it agrees that the session is down by sending a BFD Control packet with the State field set to anything other than Up. If that packet signals Down state, the session advances to Init state; if that packet signals Init state, the session advances to Up state. Semantically, Down state indicates that the forwarding path is unavailable, and that appropriate actions should be taken by the applications monitoring the state of the BFD session. A system MAY hold a session in Down state indefinitely (by simply refusing to advance the session state.) This may be done for operational or administrative reasons, among others.

Init state means that the remote system is communicating, and the local system desires to bring the session up, but the remote system does not yet realize it. A session will remain in Init state until either a BFD Control Packet is received that is signaling Init or Up state (in which case the session advances to Up state) or until the detection time expires, meaning that communication with the remote system has been lost (in which case the session advances to Down state.)

Up state means that the BFD session has successfully been established, and implies that connectivity between the systems is working. The session will remain in the Up state until either connectivity fails, or the session is taken down administratively. If either the remote system signals Down state, or the detection time expires, the session advances to Down state.

AdminDown state means that the session is being held administratively down. This causes the remote system to enter Down state, and remain there until the local system exits AdminDown state. AdminDown state has no semantic implications for the availability of the forwarding path.

The following diagram provides an overview of the state machine. Transitions involving AdminDown state are deleted for clarity. The notation on each arc represents the state of the remote system (as received in the State field in the BFD Control packet) or indicates the expiration of the Detection Timer.



Tuesday, January 15, 2008

BGP4 Case Studies/Tutorial Section 4


--------------------------------------------------------------------------------


CIDR and Aggregate Addresses


One of the main enhancements of BGP4 over BGP3 is CIDR (Classless Interdomain Routing). CIDR or supernetting is a new way of looking at IP addresses. There is no notion of classes anymore (class A, B or C). For example, network 192.213.0.0 which used to be an illegal class C network is now a legal supernet represented by 192.213.0.0/16 where the 16 is the number of bits in the subnet mask counting from the far left of the IP address. This is similar to 192.213.0.0 255.255.0.0.

Aggregates are used to minimize the size of routing tables. Aggregation is the process of combining the characteristics of several different routes in such a way that a single route can be advertised. In the example below, RTB is generating network 160.10.0.0. We will configure RTC to propagate a supernet of that route 160.0.0.0 to RTA.

RTB#
router bgp 200
neighbor 3.3.3.1 remote-as 300
network 160.10.0.0

#RTC
router bgp 300
neighbor 3.3.3.3 remote-as 200
neighbor 2.2.2.2 remote-as 100
network 170.10.0.0
aggregate-address 160.0.0.0 255.0.0.0

RTC will propagate the aggregate address 160.0.0.0 to RTA.

Index


--------------------------------------------------------------------------------


Aggregate Commands
There is a wide range of aggregate commands. It is important to understand how each one works in order to have the desired aggregation behavior.

The first command is the one used in the previous example:

aggregate-address address mask

This will advertise the prefix route, and all of the more specific routes. The command aggregate-address 160.0.0.0 will propagate an additional network 160.0.0.0 but will not prevent 160.10.0.0 from being also propagated to RTA. The outcome of this is that both networks 160.0.0.0 and 160.10.0.0 have been propagated to RTA. This is what we mean by advertising the prefix and the more specific route.

Please note that you can not aggregate an address if you do not have a more specific route of that address in the BGP routing table.

For example, RTB can not generate an aggregate for 160.0.0.0 if it does not have a more specific entry of 160.0.0.0 in its BGP table. The more specific route could have been injected into the BGP table via incoming updates from other ASs, from redistributing an IGP or static into BGP or via the network command (network 160.10.0.0).

In case we would like RTC to propagate network 160.0.0.0 only and NOT the more specific route then we would have to use the following:

aggregate-address address mask summary-only

This will a advertise the prefix only; all the more specific routes are suppressed.

The command aggregate 160.0.0.0 255.0.0.0 summary-only will propagate network 160.0.0.0 and will suppress the more specific route 160.10.0.0.

Please note that if we are aggregating a network that is injected into our BGP via the network statement (ex: network 160.10.0.0 on RTB) then the network entry is always injected into BGP updates even though we are using "the aggregate summary-only" command. The upcoming CIDR example discusses this situation.

aggregate-address address mask as-set

This advertises the prefix and the more specific routes but it includes as-set information in the path information of the routing updates.

ex: aggregate 129.0.0.0 255.0.0.0 as-set.

This will be discussed in an example by itself in the following sections.

In case we would like to suppress more specific routes when doing the aggregation we can define a route map and apply it to the aggregates. This will allow us to be selective about which more specific routes to suppress.

aggregate-address address-mask suppress-map map-name

This advertises the prefix and the more specific routes but it suppresses advertisement according to a route-map. In the previous diagram, if we would like to aggregate 160.0.0.0 and suppress the more specific route 160.20.0.0 and allow 160.10.0.0 to be propagated, we can use the following route map:

route-map CHECK permit 10
match ip address 1

access-list 1 deny 160.20.0.0 0.0.255.255
access-list 1 permit 0.0.0.0 255.255.255.255

Then we apply the route-map to the aggregate statement.

RTC#
router bgp 300
neighbor 3.3.3.3 remote-as 200
neighbor 2.2.2.2 remote-as 100
neighbor 2.2.2.2 remote-as 100
network 170.10.0.0
aggregate-address 160.0.0.0 255.0.0.0 suppress-map CHECK

Another variation is the:

aggregate-address address mask attribute-map map-name

This allows us to set the attributes (metric, etc.) when aggregates are sent out. The following route map when applied to the aggregate attribute-map command will set the origin of the aggregates to IGP.

route-map SETMETRIC
set origin igp

aggregate-address 160.0.0.0 255.0.0.0 attribute-map SETORIGIN

Index


--------------------------------------------------------------------------------


CIDR example 1


Request: Allow RTB to advertise the prefix 160.0.0.0 and suppress all the more specific routes. The problem here is that network 160.10.0.0 is local to AS200 i.e. AS200 is the originator of 160.10.0.0. You cannot have RTB generate a prefix for 160.0.0.0 without generating an entry for 160.10.0.0 even if you use the "aggregate summary-only" command because RTB is the originator of 160.10.0.0.

Solution 1:

The first solution is to use a static route and redistribute it into BGP. The outcome is that RTB will advertise the aggregate with an origin of incomplete (?).

RTB#
router bgp 200
neighbor 3.3.3.1 remote-as 300
redistribute static (This will generate an update for 160.0.0.0 with the origin path as *incomplete*)

ip route 160.0.0.0 255.0.0.0 null0

Solution 2:

In addition to the static route we add an entry for the network command, this will have the same effect except that the origin of the update will be set to IGP.

RTB#
router bgp 200
network 160.0.0.0 mask 255.0.0.0 (this will mark the update with origin IGP)
neighbor 3.3.3.1 remote-as 300
redistribute static

ip route 160.0.0.0 255.0.0.0 null0

Index


--------------------------------------------------------------------------------


CIDR example 2 (as-set)
AS-SETS are used in aggregation to reduce the size of the path information by listing the AS number only once, regardless of how many times it may have appeared in multiple paths that were aggregated. The as-set aggregate command is used in situations were aggregation of information causes loss of information regarding the path attribute. In the following example RTC is getting updates about 160.20.0.0 from RTA and updates about 160.10.0.0 from RTB. Suppose RTC wants to aggregate network 160.0.0.0/8 and send it to RTD. RTD would not know what the origin of that route is. By adding the aggregate as-set statement we force RTC to generate path information in the form of a set {}. All the path information is included in that set irrespective of which path came first.



RTB#
router bgp 200
network 160.10.0.0
neighbor 3.3.3.1 remote-as 300

RTA#
router bgp 100
network 160.20.0.0
neighbor 2.2.2.1 remote-as 300

Case 1:

RTC does not have an as-set statement. RTC will send an update 160.0.0.0/8 to RTD with path information (300) as if the route has originated from AS300.

RTC#
router bgp 300
neighbor 3.3.3.3 remote-as 200
neighbor 2.2.2.2 remote-as 100
neighbor 4.4.4.4 remote-as 400
aggregate 160.0.0.0 255.0.0.0 summary-only
(this causes RTC to send RTD updates about 160.0.0.0/8 with no indication that 160.0.0.0 is actually coming from two different autonomous systems, this may create loops if RT4 has an entry back into AS100.

Case 2:

RTC#
router bgp 300
neighbor 3.3.3.3 remote-as 200
neighbor 2.2.2.2 remote-as 100
neighbor 4.4.4.4 remote-as 400
aggregate 160.0.0.0 255.0.0.0 summary-only
aggregate 160.0.0.0 255.0.0.0 as-set
(causes RTC to send RTD updates about 160.0.0.0/8 with an indication that 160.0.0.0 belongs to a set {100 200})

Index


--------------------------------------------------------------------------------


The next two subjects, "confederation" and "route reflectors" are designed for ISPs who would like to further control the explosion of IBGP peering inside their autonomous systems.

BGP Confederation
BGP confederation is implemented in order to reduce the IBGP mesh inside an AS. The trick is to divide an AS into multiple ASs and assign the whole group to a single confederation. Each AS by itself will have IBGP fully meshed and has connections to other AS's inside the confederation. Even though these ASs will have EBGP peers to ASs within the confederation, they exchange routing as if they were using IBGP; next hop, metric and local preference information are preserved. To the outside world, the confederation (the group of ASs) will look like a single AS.

To configure a BGP confederation use the following:

bgp confederation identifier autonomous-system

The confederation identifier will be the AS number of the confederation group. The group of ASs will look to the outside world as one AS with the AS number being the confederation identifier.

Peering within the confederation between multiple ASs is done via the following command:

bgp confederation peers autonomous-system [autonomous-system.]

The following is an example of confederation:

Example:



Let us assume that you have an autonomous system 500 consisting of nine BGP speakers (other non BGP speakers exist also, but we are only interested in the BGP speakers that have EBGP connections to other ASs). If you want to make a full IBGP mesh inside AS500 then you would need nine peer connections for each router, 8 IBGP peers and one EBGP peer to external ASs.

By using confederation we can divide AS500 into multiple ASs: AS50, AS60 and AS70. We give the AS a confederation identifier of 500. The outside world will see only one AS500. For each AS50, AS60 and AS70 we define a full mesh of IBGP peers and we define the list of confederation peers using the bgp confederation peers command.

I will show a sample configuration of routers RTC, RTD and RTA. Note that RTA has no knowledge of ASs 50, 60 or 70. RTA has only knowledge of AS500.

RTC#
router bgp 50
bgp confederation identifier 500
bgp confederation peers 60 70
neighbor 128.213.10.1 remote-as 50 (IBGP connection within AS50)
neighbor 128.213.20.1 remote-as 50 (IBGP connection within AS50)
neighbor 129.210.11.1 remote-as 60 (BGP connection with confederation peer 60)
neighbor 135.212.14.1 remote-as 70 (BGP connection with confederation peer 70)
neighbor 5.5.5.5 remote-as 100 (EBGP connection to external AS100)


RTD#
router bgp 60
bgp confederation identifier 500
bgp confederation peers 50 70
neighbor 129.210.30.2 remote-as 60 (IBGP connection within AS60)
neighbor 128.213.30.1 remote-as 50(BGP connection with confederation peer 50)
neighbor 135.212.14.1 remote-as 70 (BGP connection with confederation peer 70)
neighbor 6.6.6.6 remote-as 600 (EBGP connection to external AS600)

RTA#
router bgp 100
neighbor 5.5.5.4 remote-as 500 (EBGP connection to confederation 500)

Index


--------------------------------------------------------------------------------


Route Reflectors
Another solution for the explosion of IBGP peering within an autonomous system is Route Reflectors (RR). As demonstrated in the "Internal BGP" section, a BGP speaker will not advertise a route learned via another IBGP speaker to a third IBGP speaker. By relaxing this restriction a bit and by providing additional control, we can allow a router to advertise (reflect) IBGP learned routes to other IBGP speakers. This will reduce the number of IBGP peers within an AS.

Example:



In normal cases, a full IBGP mesh should be maintained between RTA, RTB and RTC within AS100. By utilizing the route reflector concept, RTC could be elected as a RR and have a partial IBGP peering with RTA and RTB. Peering between RTA and RTB is not needed because RTC will be a route reflector for the updates coming from RTA and RTB.

neighbor route-reflector-client

The router with the above command would be the RR and the neighbors pointed at would be the clients of that RR. In our example, RTC would be configured with the "neighbor route-reflector-client" command pointing at RTA and RTB's IP addresses. The combination of the RR and its clients is called a cluster. RTA, RTB and RTC above would form a cluster with a single RR within AS100.

Other IBGP peers of the RR that are not clients are called non-clients.

Example:



An autonomous system can have more than one route reflector; a RR would treat other RRs just like any other IBGP speaker. Other RRs could belong to the same cluster (client group) or to other clusters. In a simple configuration, the AS could be divided into multiple clusters, each RR will be configured with other RRs as non-client peers in a fully meshed topology. Clients should not peer with IBGP speakers outside their cluster.

Consider the above diagram. RTA, RTB and RTC form a single cluster with RTC being the RR. According to RTC, RTA and RTB are clients and anything else is a non-client. Remember that clients of an RR are pointed at using the "neighbor route-reflector-client" command. The same RTD is the RR for its clients RTE and RTF; RTG is a RR in a third cluster. Note that RTD, RTC and RTG are fully meshed but routers within a cluster are not. When a route is received by a RR, it will do the following depending on the peer type:

1- Route from a non-client peer: reflect to all the clients within the cluster.
2- Route from a client peer: reflect to all the non-client peers and also to the client peers.
3- Route from an EBGP peer: send the update to all client and non-client peers.

The following is the relative BGP configuration of routers RTC, RTD and RTB:

RTC#

router bgp 100
neighbor 2.2.2.2 remote-as 100
neighbor 2.2.2.2 route-reflector-client
neighbor 1.1.1.1 remote-as 100
neighbor 1.1.1.1 route-reflector-client
neighbor 7.7.7.7 remote-as 100
neighbor 4.4.4.4 remote-as 100
neighbor 8.8.8.8 remote-as 200

RTB#

router bgp 100
neighbor 3.3.3.3 remote-as 100
neighbor 12.12.12.12 remote-as 300

RTD#

router bgp 100
neighbor 6.6.6.6 remote-as 100
neighbor 6.6.6.6 route-reflector-client
neighbor 5.5.5.5 remote-as 100
neighbor 5.5.5.5 route-reflector-client
neighbor 7.7.7.7 remote-as 100
neighbor 3.3.3.3 remote-as 100


As the IBGP learned routes are reflected, it is possible to have the routing information loop. The Route-Reflector scheme has a few methods to avoid this:

1- Originator-id: this is an optional, non transitive BGP attribute that is four bytes long and is created by a RR. This attribute will carry the router-id (RID) of the originator of the route in the local AS. Thus, due to poor configuration, if the routing information comes back to the originator, it will be ignored.

2- Cluster-list: this will be discussed in the next section.

Index


--------------------------------------------------------------------------------


Multiple RRs within a cluster


Usually, a cluster of clients will have a single RR. In this case, the cluster will be identified by the router-id of the RR. In order to increase redundancy and avoid single points of failure, a cluster might have more than one RR. All RRs in the same cluster need to be configured with a 4 byte cluster-id so that a RR can recognize updates from RRs in the same cluster.

A cluster-list is a sequence of cluster-ids that the route has passed. When a RR reflects a route from its clients to non-clients outside of the cluster, it will append the local cluster-id to the cluster-list. If this update has an empty cluster-list the RR will create one. Using this attribute, a RR can identify if the routing information is looped back to the same cluster due to poor configuration. If the local cluster-id is found in the cluster-list, the advertisement will be ignored.

In the above diagram, RTD, RTE, RTF and RTH belong to one cluster with both RTD and RTH being RRs for the same cluster. Note the redundancy in that RTH has a fully meshed peering with all the RRs. In case RTD goes down, RTH will take its place. The following are the configuration of RTH, RTD, RTF and RTC:

RTH#

router bgp 100
neighbor 4.4.4.4 remote-as 100
neighbor 5.5.5.5 remote-as 100
neighbor 5.5.5.5 route-reflector-client
neighbor 6.6.6.6 remote-as 100
neighbor 6.6.6.6 route-reflector-client
neighbor 7.7.7.7 remote-as 100
neighbor 3.3.3.3 remote-as 100
neighbor 9.9.9.9 remote-as 300
bgp route-reflector 10 (This is the cluster-id)

RTD#

router bgp 100
neighbor 10.10.10.10 remote-as 100
neighbor 5.5.5.5 remote-as 100
neighbor 5.5.5.5 route-reflector-client
neighbor 6.6.6.6 remote-as 100
neighbor 6.6.6.6 route-reflector-client
neighbor 7.7.7.7 remote-as 100
neighbor 3.3.3.3 remote-as 100
neighbor 11.11.11.11 remote-as 400
bgp route-reflector 10 (This is the cluster-id)

RTF#

router bgp 100
neighbor 10.10.10.10 remote-as 100
neighbor 4.4.4.4 remote-as 100
neighbor 13.13.13.13 remote-as 500

RTC#

router bgp 100
neighbor 1.1.1.1 remote-as 100
neighbor 1.1.1.1 route-reflector-client
neighbor 2.2.2.2 remote-as 100
neighbor 2.2.2.2 route-reflector-client
neighbor 4.4.4.4 remote-as 100
neighbor 7.7.7.7 remote-as 100
neighbor 10.10.10.10 remote-as 100
neighbor 8.8.8.8 remote-as 200


Note that we did not need the cluster command for RTC because only one RR exists in that cluster.

An important thing to note, is that peer-groups were not used in the above configuration. If the clients inside a cluster do not have direct IBGP peers among one another and they exchange updates through the RR, peer-goups should not be used. If peer groups were to be configured, then a potential withdrawal to the source of a route on the RR would be sent to all clients inside the cluster and could cause problems.

The router sub-command bgp client-to-client reflection is enabled by default on the RR. If BGP client-to-client reflection were turned off on the RR and redundant BGP peering was made between the clients, then using peer groups would be alright.

Index


--------------------------------------------------------------------------------


RR and conventional BGP speakers
It is normal in an AS to have BGP speakers that do not understand the concept of route reflectors. We will call these routers conventional BGP speakers. The route reflector scheme will allow such conventional BGP speakers to coexist. These routers could be either members of a client group or a non-client group. This would allow easy and gradual migration from the current IBGP model to the route reflector model. One could start creating clusters by configuring a single router as RR and making other RRs and their clients normal IBGP peers. Then more clusters could be created gradually.

Example:



In the above diagram, RTD, RTE and RTF have the concept of route reflection. RTC, RTA and RTB are what we call conventional routers and cannot be configured as RRs. Normal IBGP mesh could be done between these routers and RTD. Later on, when we are ready to upgrade, RTC could be made a RR with clients RTA and RTB. Clients do not have to understand the route reflection scheme; it is only the RRs that would have to be upgraded.

The following is the configuration of RTD and RTC:

RTD#

router bgp 100
neighbor 6.6.6.6 remote-as 100
neighbor 6.6.6.6 route-reflector-client
neighbor 5.5.5.5 remote-as 100
neighbor 5.5.5.5 route-reflector-client
neighbor 3.3.3.3 remote-as 100
neighbor 2.2.2.2 remote-as 100
neighbor 1.1.1.1 remote-as 100
neighbor 13.13.13.13 remote-as 300

RTC#

router bgp 100
neighbor 4.4.4.4 remote-as 100
neighbor 2.2.2.2 remote-as 100
neighbor 1.1.1.1 remote-as 100
neighbor 14.14.14.14 remote-as 400


When we are ready to upgrade RTC and make it a RR, we would remove the IBGP full mesh and have RTA and RTB become clients of RTC.

Index


--------------------------------------------------------------------------------


Avoiding looping of routing information
We have mentioned so far two attributes that are used to prevent potential information looping: the originator-id and the cluster-list.

Another means of controlling loops is to put more restrictions on the set clause of out-bound route-maps.

The set clause for out-bound route-maps does not affect routes reflected to IBGP peers.

More restrictions are also put on the nexthop-self which is a per neighbor configuration option. When used on RRs the nexthop-self will only affect the nexthop of EBGP learned routes because the nexthop of reflected routes should not be changed.

Index


--------------------------------------------------------------------------------


Route Flap Dampening
Route dampening (introduced in Cisco IOS version 11.0) is a mechanism to minimize the instability caused by route flapping and oscillation over the network. To accomplish this, criteria are defined to identify poorly behaved routes. A route which is flapping gets a penalty for each flap (1000). As soon as the cumulative penalty reaches a predefined "suppress-limit", the advertisement of the route will be suppressed. The penalty will be exponentially decayed based on a preconfigured "half-time". Once the penalty decreases below a predefined "reuse-limit", the route advertisement will be un-suppressed.

Routes, external to an AS, learned via IBGP will not be dampened. This is to avoid the IBGP peers having higher penalty for routes external to the AS.

The penalty will be decayed at a granularity of 5 seconds and the routes will be un-suppressed at a granularity of 10 seconds. The dampening information is kept until the penalty becomes less than half of "reuse-limit" , at that point the information is purged from the router.

Initially, dampening will be off by default. This might change if there is a need to have this feature enabled by default. The following are the commands used to control route dampening:

bgp dampening (will turn on dampening).
no bgp dampening (will turn off dampening).
bgp dampening (will change the half-life-time).


A command that sets all parameters at the same time is:

bgp dampening

(range is 1-45 min, current default is 15 min).
(range is 1-20000, default is 750).
(range is 1-20000, default is 2000).
(maximum duration a route can be suppressed, range is 1-255, default is 4 times half-life-time).

Example:



RTB#
hostname RTB

interface Serial0
ip address 203.250.15.2 255.255.255.252

interface Serial1
ip address 192.208.10.6 255.255.255.252

router bgp 100
bgp dampening
network 203.250.15.0
neighbor 192.208.10.5 remote-as 300

RTD#
hostname RTD

interface Loopback0
ip address 192.208.10.174 255.255.255.192

interface Serial0/0
ip address 192.208.10.5 255.255.255.252

router bgp 300
network 192.208.10.0
neighbor 192.208.10.6 remote-as 100


RTB is configured for route dampening with default parameters. Assuming the EBGP link to RTD is stable, RTB's BGP table would look like this:

RTB#sh ip bgp
BGP table version is 24, local router ID is 203.250.15.2 Status codes: s
suppressed, d damped, h history, * valid, > best, i - internal Origin
codes: i - IGP, e - EGP, ? - incomplete

Network Next Hop Metric LocPrf Weight Path
*> 192.208.10.0 192.208.10.5 0 0 300 i
*> 203.250.15.0 0.0.0.0 0 32768 i


In order to simulate a route flap, I will do a "clear ip bgp 192.208.10.6" on RTD. RTB's BGP table will look like this:

RTB#sh ip bgp
BGP table version is 24, local router ID is 203.250.15.2 Status codes: s
suppressed, d damped, h history, * valid, > best, i - internal Origin
codes: i - IGP, e - EGP, ? - incomplete

Network Next Hop Metric LocPrf Weight Path
h 192.208.10.0 192.208.10.5 0 0 300 i
*> 203.250.15.0 0.0.0.0 0 32768 i


The BGP entry for 192.208.10.0 has been put in a "history" state. Which means that we do not have a best path to the route but information about the route flapping still exists.

RTB#sh ip bgp 192.208.10.0
BGP routing table entry for 192.208.10.0 255.255.255.0, version 25
Paths: (1 available, no best path)
300 (history entry)
192.208.10.5 from 192.208.10.5 (192.208.10.174)
Origin IGP, metric 0, external
Dampinfo: penalty 910, flapped 1 times in 0:02:03


The route has been given a penalty for flapping but the penalty is still below the "suppress limit" (default is 2000). The route is not yet suppressed. If the route flaps few more times we will see the following:

RTB#sh ip bgp
BGP table version is 32, local router ID is 203.250.15.2 Status codes:
s suppressed, d damped, h history, * valid, > best, i - internal Origin codes:
i - IGP, e - EGP, ? - incomplete

Network Next Hop Metric LocPrf Weight Path
*d 192.208.10.0 192.208.10.5 0 0 300 i
*> 203.250.15.0 0.0.0.0 0 32768 i

RTB#sh ip bgp 192.208.10.0
BGP routing table entry for 192.208.10.0 255.255.255.0, version 32
Paths: (1 available, no best path)
300, (suppressed due to dampening)
192.208.10.5 from 192.208.10.5 (192.208.10.174)
Origin IGP, metric 0, valid, external
Dampinfo: penalty 2615, flapped 3 times in 0:05:18 , reuse in 0:27:00


The route has been dampened (suppressed). The route will be reused when the penalty reaches the "reuse value", in our case 750 (default).The dampening information will be purged when the penalty becomes less than half of the reuse-limit, in our case (750/2=375). The Following are the commands used to show and clear flap statistics information:

show ip bgp flap-statistics
(displays flap statistics for all the paths)

show ip bgp-flap-statistics regexp
(displays flap statistics for all paths that match the regexp)

show ip bgp flap-statistics filter-list
(displays flap statistics for all paths that pass the filter)

show ip bgp flap-statistics A.B.C.D m.m.m.m
(displays flap statistics for a single entry)

show ip bgp flap-statistics A.B.C.D m.m.m.m longer-prefixes
(displays flap statistics for more specific entries)

show ip bgp neighbor [dampened-routes] [flap-statistics]
(displays flap statistics for all paths from a neighbor)

clear ip bgp flap-statistics
(clears flap statistics for all routes)

clear ip bgp flap-statistics regexp
(clears flap statistics for all the paths that match the regexp)

clear ip bgp flap-statistics filter-list (clears flap statistics for all the paths that pass the filter)

clear ip bgp flap-statistics A.B.C.D m.m.m.m
(clears flap statistics for a single entry)

clear ip bgp A.B.C.D flap-statistics (clears flap statistics for all paths from a neighbor)

Index


--------------------------------------------------------------------------------


How BGP selects a Path
Now that we are familiar with the BGP attributes and terminology, the following list indicates how BGP selects the best path for a particular destination. Remember that we only select one path as the best path. We put that path in our routing table and we propagate it to our BGP neighbors.

Path selection is based on the following:

1-If NextHop is inaccessible do not consider it.
2-Prefer the largest Weight.
3-If same weight prefer largest Local Preference.
4-If same Local Preference prefer the route that the specified router has originated.
5-If no route was originated prefer the shorter AS path.
6-If all paths are external prefer the lowest origin code (IGP < egp < incomplete).>
7-If origin codes are the same prefer the path with the lowest MED.
8-If path is the same length prefer the External path over Internal.
9-If IGP synchronization is disabled and only internal path remain prefer the path through the closest IGP neighbor.
10-Prefer the route with the lowest ip address value for BGP router ID.

The following is a design example that is intended to show the configuration and routing tables as they actually appear on the Cisco routers.

(End of section 4)

Index


--------------------------------------------------------------------------------
BGP4 Case Studies/Tutorial Section 3


--------------------------------------------------------------------------------


BGP Filtering
Sending and receiving BGP updates can be controlled by using a number of different filtering methods. BGP updates can be filtered based on route information, on path information or on communities. All methods will achieve the same results, choosing one over the other depends on the specific network configuration.

Route Filtering


In order to restrict the routing information that the router learns or advertises, you can filter BGP based on routing updates to or from a particular neighbor. In order to achieve this, an access-list is defined and applied to the updates to or from a neighbor. Use the following command in the router configuration mode:

Neighbor {ip-addresspeer-group-name} distribute-list access-list-number {in out}

In the following example, RTB is originating network 160.10.0.0 and sending it to RTC. If RTC wanted to stop those updates from propagating to AS100, we would have to apply an access-list to filter those updates and apply it when talking to RTA:

RTC#
router bgp 300
network 170.10.0.0
neighbor 3.3.3.3 remote-as 200
neighbor 2.2.2.2 remote-as 100
neighbor 2.2.2.2 distribute-list 1 out

access-list 1 deny 160.10.0.0 0.0.255.255

access-list 1 permit 0.0.0.0 255.255.255.255
(filter out all routing updates about 160.10.x.x)

Using access-lists is a bit tricky when we are dealing with supernets that might cause some conflicts.

Assume in the above example that RTB has different subnets of 160.10.X.X and our goal is to filter updates and advertise only 160.0.0.0/8 (this notation means that we are using 8 bits of subnet mask starting from the far left of the IP address; this is equivalent to 160.0.0.0 255.0.0.0)

The following access list:

access-list 1 permit 160.0.0.0 0.255.255.255

will permit 160.0.0.0/8,160.0.0.0/9 and so on. In order to restrict the update to only 160.0.0.0/8 we have to use an extended access list of the following format:

access-list

ex: access-list 101 160.0.0.0 0.255.255.255 255.0.0.0 0.0.0.0

This list will permit 160.0.0.0/8 only.

Another type of filtering, is path filtering which is described in the next section.

Index


--------------------------------------------------------------------------------


Path Filtering


You can specify an access list on both incoming and outgoing updates based on the BGP autonomous system paths information. In the above figure we can block updates about 160.10.0.0 from going to AS100 by defining an access list on RTC that prevents any updates that have originated from AS200 from being sent to AS100. To do this use the following statements.

ip as-path access-list access-list-number {permitdeny} as-regular-expression

neighbor {ip-addresspeer-group-name} filter-list access-list-number {inout}

The following example will stop RTC from sending RTA updates about 160.10.0.0

RTC#
router bgp 300
neighbor 3.3.3.3 remote-as 200
neighbor 2.2.2.2 remote-as 100
neighbor 2.2.2.2 filter-list 1 out (the 1 is the access list number below)

ip as-path access-list 1 deny ^200$
ip as-path access-list 1 permit .*

In the above example, access-list 1 states: deny any updates with path information that start with 200 (^) and end with 200 ($). The ^200$ is called a regular expression, with ^ meaning starts with and $ meaning ends with. Since RTB sends updates about 160.10.0.0 with path information starting with 200 and ending with 200, then this update will match the access list and will be denied.

The .* is another regular expression with the dot meaning any character and the * meaning the repetition of that character. So .* is actually any path information, which is needed to permit all other updates to be sent.

What would happen if instead of using ^200$ we have used ^200
If you have an AS400 (see figure above), updates originated by AS400 will have path information of the form (200, 400) with 200 being first and 400 being last. Those updates will match the access list ^200 because they start with 200 and will be prevented from being sent to RTA which is not the required behavior.

A good way to check whether we have implemented the correct regular expression is to do:

sh ip bgp regexp .

This will show us all the path that has matched the configured regular expression.

Regular expressions sound a bit complicated but actually they are not. The next section will explain what is involved in creating a regular expression.

Index


--------------------------------------------------------------------------------


AS-Regular Expression
A regular expression is a pattern to match against an input string. By building a regular expression we specify a string that input must match. In case of BGP we are specifying a string consisting of path information that an input should match.

In the previous example we specified the string ^200$ and wanted path information coming inside updates to match it in order to perform a decision.

The regular expression is composed of the following:

A- Ranges:
A range is a sequence of characters contained within left and right square brackets. ex: [abcd]

B- Atoms
An atom is a single character

. (Matches any single character)
^ (Matches the beginning of the input string)
$ (Matches the end of the input string)
\character (Matches the character)
- (Matches a comma (,), left brace ({), right brace (}), the beginning
of the input string, the end of the input string, or a space.

C-Pieces
A piece is an atom followed by one of the symbols:

* (Matches 0 or more sequences of the atom)
+ (Matches 1 or more sequences of the atom)
? (Matches the atom or the null string)

D- Branch
A branch is a 0 or more concatenated pieces.

Examples of regular expressions follow:

a* any occurrence of the letter a, including none
a+ at least one occurrence of a should be present
ab?a this will match aa or aba

ex:
_100_(via AS100)
^100$ (origin AS100)
^100 .* (coming from AS100)
^$ (originated from this AS)

Index


--------------------------------------------------------------------------------


BGP Community Filtering
We have already seen route filtering and as-path filtering. Another method is community filtering. Community has been discussed previously and here are few examples of how we can use it.



We would like RTB above to set the community attribute to the BGP routes it is advertising such that RTC would not propagate these routes to its external peers. The no-export community attribute is used:

RTB#
router bgp 200
network 160.10.0.0
neighbor 3.3.3.1 remote-as 300
neighbor 3.3.3.1 send-community
neighbor 3.3.3.1 route-map setcommunity out

route-map setcommunity
match ip address 1
set community no-export

access-list 1 permit 0.0.0.0 255.255.255.255

Note that we have used the route-map setcommunity in order to set the community to no-export. Note also that we had to use the "neighbor send-community" command in order to send this attribute to RTC.

When RTC gets the updates with the attribute no-export, it will not propagate them to its external peer RTA.

Example 2:

RTB#
router bgp 200
network 160.10.0.0
neighbor 3.3.3.1 remote-as 300
neighbor 3.3.3.1 send-community
neighbor 3.3.3.1 route-map setcommunity out

route-map setcommunity
match ip address 2
set community 100 200 additive

access-list 2 permit 0.0.0.0 255.255.255.255

In the above example, RTB has set the community attribute to 100 200 additive. The value 100 200 will be added to any existing community value before being sent to RTC.

A community list is a group of communities that we use in a match clause of a route map which allows us to do filtering or setting attributes based on different lists of community numbers.

ip community-list community-list-number {permitdeny} community-number

For example we can define the following route map, match-on-community:

route-map match-on-community
match community 10 (10 is the community-list number)
set weight 20

ip community-list 10 permit 200 300 (200 300 is the community number)

We can use the above in order to filter or set certain parameters like weight and metric based on the community value in certain updates. In example two above, RTB was sending updates to RTC with a community of 100 200. If RTC wants to set the weight based on those values we could do the following:

RTC#
router bgp 300
neighbor 3.3.3.3 remote-as 200
neighbor 3.3.3.3 route-map check-community in

route-map check-community permit 10
match community 1
set weight 20

route-map check-community permit 20
match community 2 exact
set weight 10

route-map check-community permit 30
match community 3

ip community-list 1 permit 100
ip community-list 2 permit 200
ip community-list 3 permit internet

In the above example, any route that has 100 in its community attribute will match list 1 and will have the weight set to 20. Any route that has only 200 as community will match list 2 and will have weight 20. The keyword exact states that community should consist of 200 only and nothing else. The last community list is here to make sure that other updates are not dropped. Remember that anything that does not match, will be dropped by default. The keyword internet means all routes because all routes are members of the internet community.

Index


--------------------------------------------------------------------------------


BGP Neighbors and Route maps


The neighbor command can be used in conjunction with route maps to perform either filtering or parameter setting on incoming and outgoing updates.

Route maps associated with the neighbor statement have no affect on incoming updates when matching based on the IP address:

neighbor ip-address route-map route-map-name

Assume in the above diagram we want RTC to learn from AS200 about networks that are local to AS200 and nothing else. Also, we want to set the weight on the accepted routes to 20. We can achieve this with a combination of neighbor and as-path access lists.

Example 1:

RTC#
router bgp 300
network 170.10.0.0
neighbor 3.3.3.3 remote-as 200
neighbor 3.3.3.3 route-map stamp in

route-map stamp
match as-path 1
set weight 20

ip as-path access-list 1 permit ^200$

Any updates that originate from AS200 have a path information that starts with 200 and ends with 200 and will be permitted. Any other updates will be dropped.

Example 2:

Assume that we want the following:
1- Updates originating from AS200 to be accepted with weight 20.
2- Updates originating from AS400 to be dropped.
3- Other updates to have a weight of 10.

RTC#
router bgp 300
network 170.10.0.0
neighbor 3.3.3.3 remote-as 200
neighbor 3.3.3.3 route-map stamp in

route-map stamp permit 10
match as-path 1
set weight 20

route-map stamp permit 20
match as-path 2
set weight 10

ip as-path access-list 1 permit ^200$
ip as-path access-list 2 permit ^200 600 .*

The above statement will set a weight of 20 for updates that are local to AS200, and will set a weight of 10 for updates that are behind AS400 and will drop updates coming from AS400.

Index


--------------------------------------------------------------------------------


Use of set as-path prepend
In some situations we are forced to manipulate the PATH information in order to manipulate the BGP decision process. The command that is used with a route map is:

set as-path prepend ...

Suppose in the above diagram that RTC is advertising its own network 170.10.0.0 to two different ASs: AS100 and AS200. When the information is propagated to AS600, the routers in AS600 will have network reachability information about 150.10.0.0 via two different routes, the first route is via AS100 with PATH (100, 300) and the second one is via AS400 with PATH (400, 200,300). Assuming that all other attributes are the same AS600 will pick the shortest path and will choose the route via AS100.

AS300 will be getting all its traffic via AS100. If we want to influence this decision from the AS300 end we can make the PATH through AS100 look like it is longer than the PATH going through AS400. We can do this by prepending autonomous system numbers to the existing path info advertised to AS100. A common practice is to repeat our own AS number using the following:

RTC#
router bgp 300
network 170.10.0.0
neighbor 2.2.2.2 remote-as 100
neighbor 2.2.2.2 route-map SETPATH out

route-map SETPATH
set as-path prepend 300 300

Because of the above configuration, AS600 will receive updates about 170.10.0.0 via AS100 with a PATH information of: (100, 300, 300, 300) which is longer than (400, 200, 300) received from AS100.

Index


--------------------------------------------------------------------------------


BGP Peer Groups


A BGP peer group, is a group of BGP neighbors with the same update policies. Update policies are usually set by route maps, distribute-lists and filter-lists, etc. Instead of defining the same policies for each separate neighbor, we define a peer group name and we assign these policies to the peer group.

Members of the peer group inherit all of the configuration options of the peer group. Members can also be configured to override these options if these options do not affect outbound updates; you can only override options set on the inbound.

To define a peer group use the following:

neighbor peer-group-name peer-group

In the following example we will see how peer groups are applied to internal and external BGP neighbors.

Example 1:

RTC#
router bgp 300
neighbor internalmap peer-group
neighbor internalmap remote-as 300
neighbor internalmap route-map SETMETRIC out
neighbor internalmap filter-list 1 out
neighbor internalmap filter-list 2 in
neighbor 5.5.5.2 peer-group internalmap
neighbor 6.6.6.2 peer-group internalmap
neighbor 3.3.3.2 peer-group internalmap
neighbor 3.3.3.2 filter-list 3 in

In the above configuration, we have defined a peer group named internalmap and we have defined some policies for that group, such as a route map SETMETRIC to set the metric to 5 and two different filter lists 1 and 2. We have applied the peer group to all internal neighbors RTE, RTF and RTG. We have defined a separate filter-list 3 for neighbor RTE, and this will override filter-list 2 inside the peer group. Note that we could only override options that affect inbound updates.

Now, let us look at how we can use peer groups with external neighbors. In the same diagram we will configure RTC with a peer-group externalmap and we will apply it to external neighbors.

Example 2:

RTC#
router bgp 300
neighbor externalmap peer-group
neighbor externalmap route-map SETMETRIC
neighbor externalmap filter-list 1 out
neighbor externalmap filter-list 2 in
neighbor 2.2.2.2 remote-as 100
neighbor 2.2.2.2 peer-group externalmap
neighbor 4.4.4.2 remote-as 600
neighbor 4.4.4.2 peer-group externalmap
neighbor 1.1.1.2 remote-as 200
neighbor 1.1.1.2 peer-group externalmap
neighbor 1.1.1.2 filter-list 3 in

Note that in the above configs we have defined the remote-as statements outside of the peer group because we have to define different external ASs. Also we did an override for the inbound updates of neighbor 1.1.1.2 by assigning filter-list 3.

(End of section 3)

Index


--------------------------------------------------------------------------------