MTU and Fragmentation Considerations in an IPsec VPNOne important consideration when deploying an IPsec VPN is how to deal with maximum transmission unit (MTU) and fragmentation issues caused by large IPsec packets. This is because IPsec adds considerable overhead to the IP packets that it encapsulates. IPsec Packet OverheadWhen user data packets are transmitted over an IPsec VPN, IPsec adds considerable overhead. The exact amount of overhead added by IPsec depends on a number of factors, including the following:
Figure 7-64 illustrates the overhead added by an IPsec VPN gateway to a user packet as the user packet is sent over an IPsec tunnel. Figure 7-64. Overhead by an IPsec VPN Gateway
The following sections examine the factors that affect the amount of overhead added by IPsec. Overhead Added by Security ProtocolsOne determinant of the IPsec overhead are the security protocols (AH/ESP) that you configure in your IPsec transform set. If you take look at Figure 6-13 on page 423, you can see that AH adds the following:
So, the AH is 16 bytes plus the number of bytes added by the variable-length ICV field long. Calculating the amount of overhead added by ESP is more complicated, as you see by looking at the ESP header in Figure 6-18 on page 427. As shown in Figure 6-18, the Security Parameter Index (SPI) field adds 4 bytes, the Sequence Number field adds 4 bytes, synchronization data (such as an Initialization Vector [IV]) in the Payload field adds a variable number of bytes, the Padding field can add from 0 to 255 bytes, the Pad Length field adds 1 byte, the Next Header field adds 1 byte, and the Authentication Data field adds a variable number of bytes. So, that is 10 bytes (SPI, Sequence Number, Pad Length, and Next Header) plus the number of bytes added by the variable-length synchronization data, Padding field, and Authentication Data fields. Overhead Added in Transport and Tunnel ModesIf you refer back to Figures 6-14, 6-15, 6-19, and 6-20 on pages 424, 428, and 429, you can see that in tunnel mode an additional (new) IP header is prepended to the packet. The length of an IP header is variable; in almost all cases, however, it is 20 bytes. In transport mode, however, no additional IP header is added. Overhead Added by a GRE TunnelIf you have configured a point-to-point GRE tunnel to carry multicast and multiprotocol traffic, this usually (assuming that optional fields are not present) adds 24 additional bytes of overhead. The 24 bytes are made up of the following:
mGRE adds 28 bytes of overhead because of the additional 4-byte Key field (which is not typically included in the GRE header when using a point-to-point tunnel). Figure 7-65 illustrates the overhead added by (point-to-point) GRE. Figure 7-65. Overhead Added by GRE
Calculating Total OverheadAs you can see, calculating the precise overhead added to user packets when they are transmitted over an IPsec VPN is complicated, particularly in relation to the overhead added by AH and ESP. Table 7-2, however, shows the overhead added when using AH and/or ESP (in tunnel and transport modes) and a variety of cryptographic algorithms to a user packet size of 1500 bytes sent over an IPsec or GRE/IPsec VPN tunnel.
Table 7-2 is fairly self-explanatory, but to give a few examples, highlighted line 1 shows that when you configure an IPsec tunnel mode transform set with AH MD5 authentication and ESP 3DES encryption, the IPsec packet size that results is 1564 bytes (the original 1500 bytes, plus 64 bytes of IPsec overhead). Highlighted line 2 shows that when you configure an IPsec tunnel mode transform set with ESP MD5 authentication and DES encryption, the IPsec packet size that results is 1552 bytes (the original 1500 bytes, plus 52 bytes of IPsec overhead). Finally, highlighted line 3 shows that when you configure an IPsec transport mode transform set with ESP MD5 authentication and DES encryption to protect a point-to-point GRE tunnel, the GRE/IPsec packet size that results is 1560 bytes (the original 1500 bytes, plus 60 bytes of GRE/IPsec overhead). It is worth emphasizing again that Table 7-2 shows the overhead that results for a 1500-byte user packet, but the overhead does vary according to the user packet size. So, for example, if you configure an IPsec transport mode transform set with ESP MD5 authentication and DES encryption to protect a GRE tunnel, the GRE/IPsec packet sizes that results for user packet sizes of 500 and 1000 bytes are 560 and 1056 bytes, respectively. Compare these with the resulting 1560 GRE/IPsec packet size shown in Table 7-2, highlighted line 3, for an original user packet size of 1500. You might be slightly mystified as to why a 500-byte user packet incurs more overhead than a 1000-byte user packet (60 bytes versus 56 bytes). The reason is the Padding field in the ESP header (see Figure 6-18 on page 427). As previously described, the length of this field must be such that the payload (including the additional IP/GRE headers if applicable), plus any padding is a multiple of the block size of the encryption algorithm (64 bits for DES, in this example), and the Pad Length and Next Header fields are right-aligned with a 4-byte word. The upshot of all this is that some smaller user packet sizes might incur slightly more overhead than larger user packet sizes. Ensuring That Large IPsec Packets Are Not Fragmented or DroppedOkay, so now you know that IPsec (and GRE) can add considerable overhead to user packets. This overhead can cause large (larger than the path maximum transmission unit [PMTU]) IPsec or GRE/IPsec packets to be dropped or fragmented (broken into smaller pieces). Note An interface MTU is the maximum packet size in bytes that can be transmitted out of an interface. The MTU between two devices over an intervening network is called the path MTU. If IPsec packets are dropped, this is clearly not good; but, why is fragmentation bad? The answer to this question is that fragmentation of IPsec packets can cause high processor and memory overhead and reduce overall throughput on the IPsec packet-receiving IPsec VPN gateway. High processor/memory overhead and lower throughput is caused on the receiving IPsec VPN gateway because it has to reassemble fragmented IPsec packets using process switching before it can decapsulate (authenticate and decrypt) them. Performance is impacted even if you use a hardware accelerator card because IPsec packets have to be reassembled before they can be sent to the hardware accelerator. Before discussing how to ensure that IPsec packets are not dropped or fragmented, it is useful to look at the exactly how fragmentation and packet drops can occur in an IPsec VPN. Fragmentation of IPsec packets and IPsec packet drops are discussed in the next two sections. These sections reference the sample IPsec VPN topology shown in Figure 7-66. Exhibit 7-66. Sample IPsec VPN Topology
In Figure 7-66, an IPsec VPN is configured between the Paris IPsec VPN gateway and the Hamburg IPsec VPN gateway. Host A is located at the Paris site, and Host B is located at the Hamburg site. Fragmentation of IPsec and GRE/IPsec PacketsAs previously discussed, if IPsec packets are fragmented, the receiving IPsec VPN gateway has to reassemble the fragments, and this causes high overhead and lowers IPsec VPN throughput. Fragmentation of IPsec (and other IP) packets occurs when the Don't Fragment (DF) bit is cleared in the outer IP header of IPsec packets, and these packets are larger than the path or outgoing interface MTU. Prior to Cisco IOS Software Release 12.1(11b)E, Cisco IOS IPsec VPN gateways fragment IPsec packets when the packet size is larger than the path MTU. Routers in the path between IPsec VPN gateways fragment IPsec packets when they are larger than their outgoing interface MTU. The DF bit is contained within the IP packet header, and, as the name suggests, it controls whether network devices are allowed to fragment an IP packet. If the DF bit is set (1), network devices cannot fragment the packet, and if the DF bit is cleared (0), network devices can fragment the packet. Figure 7-67 illustrates the IPv4 packet header. Figure 7-67. IPv4 Packet Header
Notice in Figure 7-67 the apparent absence of a DF bit. In fact, it is the middle of the three bits contained within the Flags field. Now that you know how fragmentation can occur, it is worth taking a look at one or two examples involving the fragmentation of plain (without GRE) IPsec packets and the fragmentation of GRE/IPsec packets. Fragmentation of Plain IPsec PacketsThe default behavior for IPsec is to copy (or maintain) the DF-bit setting from the IP header of the user packet to the (outer) IP header of the IPsec packet. In this example, a large, plain IPsec packet is fragmented as it crosses an IPsec VPN. Note that an IPsec tunnel mode transform with ESP MD5 authentication and DES encryption is used in this example. Figure 7-68 illustrates fragmentation of a large, plain IPsec packet. Host A (10.1.1.2) sends a 1500-byte user packet (with the DF bit not set in the IP header) to Host B (10.2.2.1). The Paris gateway encapsulates it in IPsec, which adds 52 bytes for a total IPsec packet size of 1552 bytes (see Table 7-2). Because the path MTU to the Hamburg gateway (which is initially equal to the MTU of the outgoing link from the Paris gateway to the ISP1 router) is only 1500 bytes, the Paris gateway has to fragment the IPsec packet. The DF bit is not set (it is copied or maintained from the user packet IP header), and so the Paris gateway goes ahead and fragments the 1552-byte packet into a 1500-byte fragment and a 72-byte fragment. Figure 7-68. Fragmentation of a Large IPsec Packet
Example 7-73 shows the fragmentation of the 1552-byte IPsec packet by the Paris gateway. Caution The debug ip packet (detail) command can cause high processor overhead and is used here for illustrative purposes only. Example 7-73. Fragmentation of the 1552-Byte IPsec Packet on the Paris IPsec VPN Gateway
In highlighted line 1, the Paris gateway receives the 1500-byte user packet from Host A. The IP packet length is shown as the totlen parameter (this is actually the Total Length field in the IP header [see Figure 7-67]). Then, in highlighted lines 2 and 3, the Paris gateway sends the 1500- and 72-byte packet fragments (see the totlen field in each of these lines) onward toward the ISP1 router. Hold on, you might be thinking, 1500 plus 72 is 1572that is 20 bytes more than the original IPsec packet size of 1552 bytes. When the 1552-byte IPsec packet is broken into 1500-byte and 52-byte fragments, the 1500-byte fragment includes the original (though slightly modified) IP packet header from the 1552-byte IPsec packet, but the 52-byte fragment does not include an IP header. Without an IP header, the 52-byte fragment cannot be transmitted, and so the Paris gateway adds a new (20-byte) IP header to the 52-byte fragment, giving a total of 72 bytes. Note that the datagramsize parameter shown in Example 7-73 specifies the datagram size including Layer 2 headers (such as Ethernet and PPP). Figure 7-69 illustrates the fragmentation of the 1552-byte IPsec packet on the Paris IPsec gateway. Figure 7-69. Fragmentation of the 1552-Byte IPsec Packet on the Paris IPsec VPN Gateway
The ISP1 router now receives the 1500- and 72-byte IPsec packet fragments from the Paris IPsec VPN gateway. It forwards the 72-byte fragment to the ISP2 router unmodified, but there is a problem with the 1500-byte fragmentit is bigger than ISP1's outgoing interface MTU of 1000 bytes. Note that the 1000-byte MTU between the ISP1 and ISP2 routers does not correspond to any particular link type, and is used for illustrative purposes only. At this point, the ISP1 router fragments the 1500-byte fragment into two further fragments of 996 bytes and 524 bytes. Again, the second of these two fragments (524 bytes) includes a new 20-byte IP header, so the total number of bytes in these two fragments is 996 + 524 = 1520 bytes. So, now three fragments constitute the original 1552-byte IPsec packet:
Example 7-74 shows the fragmentation of the 1552-byte IPsec packet on the ISP1 router. Example 7-74. Fragmentation of the 1500-Byte IPsec Packet Fragment on the ISP1 Router
In highlighted line 1, the ISP1 router receives the 1500-byte fragment from the Paris gateway. Then, in highlighted lines 2 and 3, the ISP1 router fragments the 1500-byte packet into the 996-byte and 524-byte fragments and sends them onward to the ISP2 router. Finally, in highlighted line 4, the ISP1 router forwards the 72-byte fragment to the ISP2 router. Figure 7-70 illustrates the fragmentation of the 1500-byte IPsec fragment on the ISP1 router. Figure 7-70. Fragmentation of the 1500-Byte IPsec Packet Fragment on the ISP1 Router
The ISP2 router forwards the three fragments unmodified to the Hamburg IPsec VPN gateway (all three fragments are smaller than ISP2's outgoing interface MTU of 1500 bytes). Finally, the Hamburg gateway reassembles the fragments, decapsulates the user packet (removes the IPsec encapsulation) and forwards the packet to Host B, as shown in Example 7-75. Example 7-75. Hamburg Gateway Reassembles the Fragments and Decapsulates the Packet
In highlighted lines 1 to 3, the 3 IPsec packet fragments are received (recv) on the Hamburg gateway. In highlighted line 4, the Hamburg gateway reassembles the IPsec packet (1552 bytes). The Hamburg gateway then decapsulates the 1500-byte user packet (removes the IPsec encapsulation by authenticating and decrypting the packet) in highlighted line 5. And finally, in highlighted line 6, the Hamburg gateway forwards the user packet to Host B. Figure 7-71 illustrates IPsec packet reassembly and decapsulation on the Hamburg gateway. Figure 7-71. IPsec Packet Reassembly and Decapsulation on the Hamburg Gateway
So, all seemingly well and goodHost B has successfully received the packet from Host A. But, there is that one major problem with packet reassembly on the receiving IPsec VPN gateway (Hamburg in this example)as previously mentioned, packet reassembly is process switched, and this incurs high processor overhead on the receiving gateway and reduces packet throughput. Fragmentation of GRE/IPsec PacketsThe default behavior for GRE tunnels is not to copy the DF bit setting from the encapsulated packet to the outer GRE tunnel IP header. Because the DF bit is not set on GRE packets, this means that by default the DF bit is not set in the outer IP header of GRE/IPsec packets because IPsec copies the DF bit setting from the GRE packet. Note that an IPsec transport mode transform with ESP MD5 authentication and DES encryption is used in this example. Figure 7-72 illustrates the fragmentation of a large GRE packets before it is encapsulated in IPsec. Figure 7-72. Fragmentation of a 1500-Byte IPsec Packet After It Is Encapsulated in IPsec
In Figure 7-72, Host A sends a 1500-byte user packet (with the DF bit set) to Host B. The 1500-byte user packet arrives at the Paris IPsec VPN gateway, but the user packet size is greater than the MTU on the GRE tunnel (1500 > 1476). The GRE tunnel MTU is automatically set to the outgoing link MTU (1500 bytes, the link to the ISP1 router) minus the total GRE overhead (24 bytes). Because the 1500-byte user packet is larger than the GRE tunnel interface MTU (1476 bytes) and the DF bit is set in the user packet, the user packet is dropped. The Paris gateway then sends an ICMP unreachable message (type 3, code 4) to Host A. The purpose of this message is to inform Host A that the Paris gateway needs to fragment the 1500-byte user packet, but that the DF bit is set. The ICMP unreachable also contains the MTU of the GRE tunnel (1476 bytes). Host A now sends a 1476-byte packet to Host B. The packet arrives at the Paris gateway, and is encapsulated in GRE, adding 24 bytes for a total of 1500 bytes. The DF bit is not, however, copied from the user packet IP header (this is the default behavior for GRE). Host A dynamically reduces the size of the packet that it sends (from 1500 bytes to 1476 bytes) because it is using a mechanism called Path MTU Discovery (PMTUD). PMTUD is discussed more fully in the next section. In this example, the 1500-byte GRE packet is now encapsulated in IPsec using an IPsec transport mode transform with ESP MD5 authentication and DES encryption. If you refer back to Table 7-2, you can see that the overhead for this IPsec transform is 60 bytes for a 1500-byte packet. So, the total IPsec packet size is now 1560 bytes. Because the IPsec packet size is greater than the path MTU (which is initially set the outgoing interface MTU to the ISP1 router), the IPsec packet is fragmented into a 1500-byte fragment and an 80-byte fragment. The second fragment includes, as previously described, a new 20-byte IP header, which gives the 80-byte total (60 bytes plus 20 bytes for the new IP header). The Paris gateway is able to fragment the IPsec packet because the DF bit is not set in the GRE packet. The Paris gateway then sends the two fragments across the network to the Hamburg gateway. In this example, the MTU on the links between the ISP1 and ISP2 routers, and the ISP2 and Hamburg gateway is 1500 bytes, so no further fragmentation is required. When the two fragments arrive at the Hamburg gateway, it reassembles the 1560-byte IPsec packet and decapsulates the 1500-byte GRE packet (by authenticating and decrypting it). Finally, the Hamburg gateway decapsulates the 1476-byte user packet from Host A (by removing the GRE headers) and sends it to Host B. So, sending large (larger than the path MTU) GRE/IPsec packets with the DF bit not set results in IPsec packet reassembly on the receiving gateway. PMTUD and IPsec Packet DropsYou have already seen that fragmentation results if the DF bit is not set in IPsec and GRE/IPsec packets. This section shows how you can use PMTUD to reduce or eliminate fragmentation. PMTUD is a mechanism that, as the name suggests, allows a host or network device to dynamically discover the lowest MTU along a path to a destination. PMTUD relies on four factors:
Again, a couple of examples are used to illustrate the concept. PMTUD with Plain IPsec PacketsThis section shows the interaction between PMTUD and plain IPsec packets. Figure 7-73 illustrates interaction between PMTUD and plain IPsec packets. Note that an IPsec tunnel mode transform with ESP MD5 authentication and DES encryption is used in this example. Figure 7-73. Interaction Between PMTUD and Plain IPsec Packets
In Figure 7-73, Host A sends a 1500-byte user packet (with the DF bit set) to Host B via the Paris gateway. The IPsec overhead (52 bytes) will increase the overall packet size of the packet to 1552 bytes, and because this is greater than the current path MTU (1552 bytes > 1500 bytes), the Paris gateway drops Host A's packet and sends an ICMP unreachable message back to Host A. This ICMP unreachable message lets Host A know that the Paris gateway needs to fragment the packet from Host A, but that the DF bit is set in the packet header. Included in the ICMP unreachable message sent by the Paris gateway to Host A is the path MTU minus the overhead that would be imposed by IPsec. The Paris gateway effectively uses the ICMP unreachable message to tell Host A the packet size that Host A can send without Paris needing to fragment it. The Paris gateway (and in fact, any IPsec gateway), records the path MTU to the destination IPsec gateway (Hamburg in this example) and the MTU of its outgoing interface (to ISP1 in this case) in the SADB. The contents of the SADB after Host A has sent its first packet (1500 bytes) can be viewed using the show crypto ipsec sa command, as shown in Example 7-76. Only the relevant portion of the output is shown. Example 7-76. show crypto ipsec sa Command Output After Host A Has Sent Its First Packet
In the highlighted line, you can see the path MTU (1500 bytes) and the local outgoing interface (media) MTU. It is worth noting that the path MTU can be dynamically updated by the IPsec gateway. Figure 7-74 shows a packet capture of the ICMP unreachable message sent by the Paris IPsec VPN gateway to Host A. The first highlighted line in Figure 7-74 shows the ICMP unreachable message sent from the Paris gateway (10.1.1.1) to Host A (10.1.1.2). If you look in the pane below the first highlighted line, you can see the packet detail, specifically that the ICMP message is a type 3 (destination unreachable), code 4 (fragmentation needed [but DF bit set]). Figure 7-74. ICMP Unreachable Message Sent by the Paris IPsec VPN Gateway to Host A
In the second highlighted line in Figure 7-74, you can see that the MTU of the next hop (that is, the path MTU on the Paris gateway) minus the IPsec overhead is included in the ICMP message. Before describing the rest of the packet, take a closer look at the second highlighted line. In particular, look at the MTU reported by the Paris gateway as the next-hop MTUit is 1442 bytes. Hold on a second, you might be thinking, shouldn't this be 1448 bytes (1500 52 [the overhead added to a 1500-byte user packet by ESP MD5 authentication and DES encryption])? Well, nothe maximum overhead that can be added to a user packet for the IPsec transform used in this example is 58 bytes, so the Paris gateway simply informs Host A of maximum possible overhead rather than the overhead for a specific user packet size (1500 bytes, in this example). This is a good ideaif the Paris gateway did not signal a 1442-byte MTU, it might have to continually inform Host A of different MTU sizes depending on the size of user packets sent by Host A (because the IPsec overhead varies according to the user packet size). Host A now sends another packet to Host B via the Paris gateway. This packet is 1442 bytes rather than 1500 bytesHost A is using PMTUD, and so has dynamically adjusted the size of the packet in response to the ICMP unreachable message sent by the Paris gateway. The Paris gateway adds the IPsec encapsulation (specifically ESP), giving a total packet size of 1442 + 54 (IPsec overhead) = 1496. If you are wondering what happened to the 52 bytes of overhead, remember that the 52 bytes is the overhead that applies to a 1500-byte user packet, not a 1442-byte user packet (IPsec overhead varies according to user packet size). One other important fact to remember here is that the Paris gateway copies the DF bit setting from the user packet IP header to the new outer IP packet header (this is the default behavior for IPsec). The 1496-byte packet size is less than the path MTU (1500), and so the Paris gateway is able to forward the packet onward to the ISP1 router. The ISP1 router now attempts to forward the packet onward over its outgoing link to ISP2. Unfortunately, ISP1's outgoing interface MTU is less than the packet size (1000 bytes < 1496 bytes). The ISP1 router drops the packet and sends an ICMP unreachable message back to the Paris gateway to tell the Paris gateway that it needs to fragment the IPsec packet but that the DF bit is set. Included in the ICMP unreachable message is the outgoing interface MTU (ISP1 to ISP2, 1000 bytes). 1000 bytes is the packet size that the Paris gateway can send without ISP1 needing to fragment it. The Paris gateway (192.168.1.1) receives the ICMP unreachable message from the ISP1 router (192.168.1.2), as shown in Example 7-77. Example 7-77. Paris IPsec Gateway Receives an ICMP Unreachable Message from the ISP1 Router
As shown in Example 7-78, the Paris gateway has now updated the path MTU in its SADB. Example 7-78. Paris IPsec VPN Gateway Updates the Path MTU in Its SADB
As you can see, the Paris gateway has updated the path MTU to be 1000 bytes (which is the MTU signaled in the ICMP unreachable sent by the ISP1 router). You might be surprised to learn that the Paris gateway does not immediately send an ICMP unreachable message to inform Host A of the new path MTU. There is a good reason for thisthe ICMP unreachable message from the ISP1 router does not contain enough information to allow the Paris gateway to work out that the IPsec packet which the ISP1 router dropped encapsulated a user packet from Host A (any number of hosts at the Paris site could be sending traffic over the IPsec tunnel). Host A now sends another packet (its third). This packet again has a size of 1442 bytes. Because the Paris gateway has updated the path MTU in its SADB to 1000 (as shown in Example 7-78), it now sends another ICMP unreachable message to Host A specifying a next-hop MTU of 942. Again, this MTU is the path MTU of 1000 minus the maximum overhead of 58 bytes. Figure 7-75 shows a packet capture of the second ICMP unreachable message sent by the Paris IPsec VPN gateway to Host A. Figure 7-75. Second ICMP Unreachable Message Sent by the Paris IPsec VPN Gateway to Host A
In the second highlighted line in Figure 7-75, you can see that the next-hop MTU specified in the ICMP unreachable message that the Paris gateway sent to Host A is 942 bytes. Finally, Host A sends a 942-byte packet to Host B. The Paris gateway encapsulates this packet with IPsec, giving a total packet size of 1000 bytes (the overhead for a 942-byte packet is 58 bytes). The Paris gateway forwards the IPsec packet to ISP1, which can now forward it over the link to ISP2. ISP2 then forwards the IPsec packet over the link to the Hamburg gateway (the ISP2 to Hamburg gateway link has a link MTU of 1500 bytes, so no problem there). The Hamburg gateway decapsulates the 942-byte user packet from Host A (by authenticating and decrypting it) and forwards it to Host B. Success! PMTUD with GRE/IPsec PacketsYou can also copy the DF bit setting from the encapsulated user packet and enable PMTUD for GRE packets by configuring the tunnel path-mtu discovery command on the GRE tunnel. Having the DF bit set is one of the factors that PMTUD relies on to function correctly. Remember that by default the DF bit setting is not copied from the user packet to the GRE packet, and so PMTUD (which relies on the DF bit being set) is disabled. Example 7-79 shows the configuration of the tunnel path-mtu discovery command on a GRE tunnel interface. Example 7-79. GRE Tunnel Configured with the tunnel path-mtu discovery Command
Figure 7-76 illustrates the interaction between PMTUD and GRE/IPsec packets when the tunnel path-mtu discovery command is configured on the GRE tunnel interface. An IPsec transport mode transform with ESP MD5 authentication and DES encryption is used in this example. Figure 7-76. Interaction Between PMTUD and GRE/IPsec Packets When the tunnel path-mtu discovery Command Is Configured on the GRE Tunnel Interface
In Figure 7-76, Host A sends a 1500-byte user packet (with the DF bit set) to Host B. The Paris IPsec VPN gateway drops the 1500-byte packet because the default MTU on the GRE tunnel interface is the outgoing physical interface MTU (toward the ISP1 router [1500 bytes]) minus the GRE tunnel overhead (24 bytes). In this example, therefore, the default GRE tunnel interface MTU is 1476 bytes (1500 bytes24 bytes). The Paris gateway sends an ICMP unreachable message to Host A specifying the MTU of the GRE tunnel interface (1476 bytes). Host A then sends a 1476-byte user packet to Host B. The Paris gateway encapsulates the user packet in GRE, giving a total GRE packet size of 1500 bytes (1476 bytes + 24 bytes). Unfortunately, the 1500-byte GRE packet size plus the IPsec overhead is greater than the outgoing interface MTU (1500 bytes). The Paris gateway, therefore, drops the GRE packet, and sends an ICMP unreachable message internally to GRE informing it that the effective MTU of the outgoing interface is 1462 bytes. This 1462-byte effective MTU size takes into account the maximum IPsec overhead of 38 bytes for the IPsec transform used in this example. The MTU of the GRE tunnel is now adjusted to 1438 bytes (1462 bytes 24 bytes [GRE overhead] = 1438 bytes), as shown in the Example 7-80. Example 7-80. MTU of the GRE Tunnel Is Now Adjusted to 1438 Bytes
Note that the debug tunnel command is used here for illustrative purposes only. Host A now sends another 1476-byte user packet. The Paris gateway drops the packet and sends an ICMP unreachable message to Host A specifying the new MTU of the GRE tunnel interface (1438 bytes). Host A then sends a 1438-byte user packet. The Paris gateway encapsulates the user packet in GRE (giving a GRE packet size of 1462 bytes) and then encapsulates the GRE packet in IPsec (giving a total GRE/IPsec packet size of 1496 bytes). The IPsec packet transits the network to the Hamburg IPsec VPN gateway. The Hamburg IPsec VPN gateway decapsulates the 1462-byte GRE packet (by authenticating and decrypting it), decapsulates the 1438-byte user packet (by removing the GRE header), and forwards the user packet to Host B. So, if you configure the tunnel path-mtu discovery command on a GRE tunnel, and the DF bit is set in user packets, GRE/IPsec packets are not fragmented as they cross the intervening network between IPsec VPN gateways. User Packets Are Dropped When PMTUD Is BrokenSo, fragmentation is bad, but PMTUD solves all your problems, right? Unfortunately, no. As previously described, PMTUD relies on a number factors. It is worth just briefly reiterating these:
If just one of the factors listed does not operate as described, PMTUD will not work correctly, and large IPsec packets will be dropped. The most common cause of PMTUD breaking is a misconfigured firewall dropping ICMP unreachable (type 3, code 4) messages. Figure 7-77 illustrates a scenario in which a misconfigured firewall causes PMTUD to break. Figure 7-77. Misconfigured Firewall Causes PMTUD to Break
In Figure 7-77, Host A sends a 1442-byte packet (with the DF bit set) to the Paris IPsec VPN gateway. The Paris gateway encapsulates the packet in IPsec and forwards the packet to firewall, which then forwards the packet on to the ISP1 router. The ISP1 drops the IPsec packet because it is too large (larger than its outgoing interface MTU) and the DF bit is set. At this point, the ISP1 router sends an ICMP unreachable message to the Paris gateway, but this ICMP unreachable message is blocked by the firewall. Because the ICMP message from the ISP1 router is blocked by the firewall, the Paris gateway is unaware that it should reduce the size of packets that it sends. And, because the Paris gateway is unaware that it should reduce the size of packets it sends, it does not in turn inform Host A (via an ICMP unreachable message) that Host A should reduce the size of packets that Host A sends. Host A, therefore, continues sending packets that are too large, the Paris gateway continues to encapsulate them in IPsec, and the ISP1 router continues to drop them. Not good! Solutions for IPsec Packet Fragmentation and DropsThere are two "evils" as far as large IPsec packets are concerned:
What can you do against these twin "evils?" There are a number of solutions, including the following:
These solutions are discussed in more detail in the following four sections. It is worth noting that to solve issues with fragmentation or packet drops, you might need to implement more than one of the solutions described. Solution 1: Ensuring That End Hosts Send Smaller User PacketsPerhaps the best solution for fragmentation and IPsec packet drops is for end hosts to send smaller user packets, such that even with the added IPsec or GRE/IPsec overhead the resulting IPsec or GRE/IPsec packets are smaller than or equal to the path MTU. In this case, packets will neither be fragmented nor dropped. It is possible to either manually configure end hosts to send smaller packets or indirectly get them to send smaller packets. The article "Adjusting IP MTU, TCP MSS, and PMTUD on Windows and Sun Systems" on Cisco.com summarizes a number of useful references that tell you how to manually configure Windows and Sun hosts to send smaller packet sizes. Manually configuring end hosts to send smaller packets is, however, often impractical. Two other ways to get end hosts to send smaller packets are to configure the ip mtu or ip tcp adjust-mss commands on IPsec VPN gateways. Configuring the ip mtu Command on GRE Tunnel InterfacesIf PMTUD is functioning correctly between end hosts and their local IPsec VPN gateway, you can use the ip mtu command to limit the size of IP packets that these end hosts send. You can configure the ip mtu command on the GRE tunnel interface as shown in Example 7-81. Example 7-81. Configuring the ip mtu Command on a GRE Tunnel Interface
In Example 7-81, the ip mtu command is used to specify a GRE tunnel MTU of 1418 bytes. Now, if you remember, the default MTU for a GRE tunnel interface is the MTU of the outgoing physical interface minus the GRE tunnel overhead of 24 bytes. So, if the outgoing physical interface has an MTU of 1500, the MTU of the GRE tunnel will, by default, be 1476 bytes (1500 24 = 1476). As you saw in Figure 7-72, assuming an outgoing interface MTU of 1500 bytes, if a 1476-byte user packet is sent on a GRE tunnel interface, fragmentation will still result after the GRE and IPsec overhead have been added (1476 + 24 [GRE] + IPsec overhead > 1500). In Example 7-81, the ip mtu 1418 command ensures that even after the GRE and IPsec overhead (with an IPsec tunnel mode ESP authentication and encryption transform set in this example) have been added the overall packet size will still be less than 1500 bytes (actually, 1496 bytes). Table 7-3 summarizes guideline maximum MTU sizes that you can configure on the GRE tunnel interfaces of IPsec VPN gateways using the ip mtu command such that with the added GRE and IPsec overhead, user packets will not require fragmentation. Table 7-3 assumes an outgoing physical interface MTU of 1500 bytes on the IPsec VPN gateway.
Note When using mGRE tunnel interfaces with DMVPN, a good guideline MTU to configure on the interfaces is 1400 bytes (although you can obviously tweak this based on specific requirements). Configuring the ip tcp adjust-mss Command on the GRE Tunnel or Inside Physical InterfaceThe ip tcp adjust-mss command was introduced in Cisco IOS Software Release 12.2(4)T and can be used to configure gateways to dynamically adjust the TCP maximum segment size (MSS) in SYN and SYN/ACK packets (segments) sent by end hosts. Peer devices exchange SYN and SYN/ACK messages during TCP connection establishment. The MSS is the largest amount of data, excluding the TCP and IP headers (20 bytes + 20 bytes = 40 bytes), that a device such as an end host will send using TCP. During TCP connection establishment, a host can optionally inform its peer of the MSS it can receive in the SYN or SYN/ACK packet it sends. If a host does not specify an MSS, its peer will infer an MSS of 536 bytes. A host will send data segments no larger than the smaller value of the MSS value specified by its peer and the MTU of its own interface (minus 40 bytes for the TCP and IP headers). So, for example, if the MSS specified by a peer host is 1460 bytes, and the MTU of the local host's interface is 1410, the local host will use an MSS of 1370 (the local interface MTU of 1410 bytes minus TCP/IP headers [40 bytes]). The ip tcp adjust-mss command allows a gateway to dynamically modify the MSS in SYN and SYN/ACK packets sent by end hosts (if the MSS sent by a host is greater than the MSS specified using the ip tcp adjust-mss command). This command ensures that end hosts do not send (TCP/IP) packets larger than the MSS specified plus 40 bytes (the TCP and IP headers). You can configure the ip tcp adjust-mss command on either the GRE tunnel interface (if you have one configured) or inside physical interface of an IPsec VPN gateway. In Example 7-82, the ip tcp adjust-mss command is configured on a GRE tunnel interface. In this example, the ip tcp adjust-mss command configures the gateway to dynamically adjust the TCP MSS in SYN and SYN/ACK packets to a value of 1378 bytes. TCP/IP packets sent by end hosts will, therefore, not exceed 1418 bytes (including TCP/IP headers). Example 7-82. Configuring the ip tcp adjust-mss Command on a GRE Tunnel Interface
Figure 7-78 illustrates the function of the ip tcp adjust-mss command. Figure 7-78. Function of the ip tcp adjust-mss Command
In Figure 7-78, Host A sends a TCP SYN to Host B. The SYN packet specifies an MSS of 1460 bytes. The Paris IPsec VPN gateway intercepts the SYN packet and dynamically adjusts the MSS value contained within it to a value of 1378 bytes. Host B then receives the SYN. Host B sends a TCP SYN/ACK to Host A. This SYN/ACK specifies an MSS of 1460 bytes. The Hamburg gateway intercepts the SYN/ACK packet and dynamically adjusts the MSS value contained within it to a value of 1378 bytes. Host A then receives the SYN/ACK. Finally, Host A sends an ACK packet to Host B, and the TCP connection is established. Host A will now not send a TCP data segment larger than 1378 bytes to Host B (a total IP packet size of 1418 bytes, including the [40-byte] TCP/IP headers). And Host B will also not send a data segment of more than 1378 bytes to Host A (again, a total IP packet size of 1418 bytes, including TCP/IP headers). Although it is a good idea to configure the ip tcp adjust-mss command on both IPsec VPN gateways (as shown in Figure 7-78), it is possible to configure this command on just one of the gateways to achieve the same result. You might want to configure the ip tcp adjust-mss command on just one gateway if the other gateway has an older Cisco IOS version that does not support this command. Solution 2: Fixing PMTUD If It Is BrokenIf you suspect that user or IPsec packets are being dropped because PMTUD is broken, the best course of action is, of course, to fix PMTUD. Fixing PMTUD might involve doing one or more of the following things:
Be sure to check that PMTUD is operating correctly both between end hosts and their local IPsec VPN gateway and between IPsec VPN gateways. You can partially verify this by pinging a remote host using large packets (greater than the path MTU) with the DF bit set and ensuring that ICMP unreachable messages (type 3, code 4) are received back from intermediate network devices. Check that ICMP unreachables are received using, for example, a packet sniffer, or if testing from a Cisco router by using (with caution) the debug ip packet [detailed] command. Solution 3: Using Prefragmentation for IPsec PacketsOne useful feature that you can use with Cisco routers is prefragmentation for IPsec VPNs. This feature was introduced in Cisco IOS Software Release 12.1(11b)E and allows an IPsec VPN gateway to fragment large user packets (which do not have their DF bit set) before IPsec encapsulation, if the gateway calculates that the packet including IPsec overhead would exceed the MTU. The upshot of the prefragmentation for IPsec VPNs feature is that fragmented user packets are reassembled on the destination end hosts rather than the receiving IPsec VPN gateway. Because IPsec VPN gateways do not have to reassemble IPsec packets, VPN performance and throughput can be greatly improved. Note The performance of end hosts is not (generally) noticeably affected by IP packet reassembly. Figure 7-79 illustrates prefragmentation for IPsec VPNs when the DF bit is set in large user packets. An IPsec tunnel mode transform with ESP authentication and DES encryption is used in this example. Figure 7-79. Prefragmentation for IPsec VPNs When the DF Bit Is Set in Large User Packets
In Figure 7-79, Host A sends a 1500-byte user packet with the DF bit set to Host B. Because the user packet plus the IPsec overhead would be greater than the path MTU to the Hamburg IPsec VPN gateway, the Paris VPN gateway sends an ICMP unreachable (type 3, code 4) message back to Host A. This ICMP unreachable message specifies an MTU of 1442 bytes. Host A then sends a 1442-byte user packet to Host B. The Paris gateway encapsulates it with IPsec and sends the IPsec packet to the Hamburg gateway. The IPsec packet is 1496 bytes long. The Hamburg gateway receives the IPsec packet, decapsulates the user packet, and sends it onward to Host B. As you can see, if the DF bit is set in large user packets, the IPsec VPN gateway does not (by default) fragment the packet even if prefragmentation for IPsec is enabled. Figure 7-80 illustrates prefragmentation for IPsec VPNs when the DF bit is not set in large user packets. Figure 7-80. Prefragmentation for IPsec VPNs When the DF Bit Is Not Set in Large User Packets
In Figure 7-80, Host A sends a 1500-byte user packet with the DF bit not set to Host B. Because the user packet plus the IPsec overhead would exceed the MTU, the Paris gateway fragments the user packet into (roughly equal sized) 788-byte and 732-byte fragments. The 732-byte fragment includes a new 20-byte IP header. Note that the prefragmentation feature causes large packets to be broken into roughly equal-sized fragments. This helps to prevent the further fragmentation of fragments-as illustrated in Figure 7-68 earlier in this chapter. The Paris gateway then encapsulates each user packet fragment into a separate IPsec packet and transmits them to the Hamburg gateway. The Hamburg gateway receives the two IPsec packets, decapsulates each, and sends the two user packet fragments separately on to Host B. Crucially, packet reassembly is not required on the Hamburg gateway. Host B then reassembles the two user packet fragments. Prefragmentation is enabled by default in Cisco IOS versions that support this feature. If, for any reason, you do want to configure fragmentation after IPsec encapsulation (the only kind of fragmentation available prior to Cisco IOS Software Release 12.1(11b)E), use the crypto ipsec fragmentation after-encryption global (affecting all interfaces) or interface (affecting a particular interface) configuration mode command. To reenable fragmentation before IPsec encapsulation (prefragmentation), use the crypto ipsec fragmentation before-encryption global or interface configuration mode command. Note that if one gateway is running Cisco IOS Software Release 12.1(11b)E or later, but its peer is running an older version, you can still take advantage of this feature on the gateway running Cisco IOS Software Release 12.1(11b)E or later. In this case, IPsec traffic will be fragmented before encryption in one direction, and after encryption in the other direction across the IPsec VPN. Note Prefragmentation is only supported with IPsec tunnel mode transforms, not transport mode transforms. Solution 4: As a Very Last Resort, Allowing Fragmentation of IPsec PacketsIn the event that PMTUD is broken between peer IPsec VPN gateways (and IPsec packets are being dropped), as a last resort it is possible to clear the DF bit in all tunnel mode IPsec packets sent by an IPsec VPN gateway. Even if encapsulated user packets have their DF bit set, the DF bit is cleared in the outer IP header. The upside to clearing the DF bit is that IPsec packets will not be dropped. The downside is that by clearing the DF bit in all IPsec packets you allow fragmentation of IPsec packets, and that means IPsec packet reassembly, and the concomitant performance and throughput degradation on the receiving IPsec VPN gateway. But at least the IPsec packets will get through. You can use the crypto ipsec df-bit clear command to clear the DF bit on IPsec packets. This command can be configured in global configuration mode to clear the DF bit in IPsec packets sent on all interfaces. Alternatively, you can configure this command on a specific interface if you only want the DF bit in packets sent on that interfaces to be cleared. Example 7-83 shows the configuration of the crypto ipsec df-bit clear command on a specific interface. Example 7-83. crypto ipsec df-bit clear Command Is Configured on a Specific Interface
If you do decide to use the crypto ipsec df-bit clear command, you should selectively configure it on IPsec VPN gateways. Configure it on a gateway only if IPsec packets (with the DF bit set) transmitted from that gateway are being dropped due to a break in PMTUD (and for some reason it cannot be fixed). As already mentioned, you can only use the crypto ipsec df-bit clear command to clear the DF bit in tunnel mode IPsec packets. But, it is also possible to indirectly ensure that the DF bit is cleared in transport mode IPsec packets. If you are using a transport mode transform, you can configure a route map on the inside interface of an IPsec VPN gateway to clear the DF bit in user packets. Because the DF bit setting of user packets is copied or maintained in IPsec packets by default, the route map also ensures that the DF bit is cleared in transport mode IPsec packets. Again, you should only configure route maps to clear the DF bit in user packets where absolutely necessary. Example 7-84 shows the configuration of a route map to clear the DF bit of user packets on the inside interface of an IPsec gateway. Example 7-84. Route Map to Clear the DF Bit on User Packets Configured on the Inside Interface of an IPsec VPN Gateway
In highlighted line 1, access list 199 matches all user packets from inside network 10.1.1.0/24 going to destination network 10.2.2.0/24. In highlighted line 2, route map clear.user.packet.df.bits matches the user packets defined in access list 199, and clears the DF bit in these packets. Finally, in highlighted line 3, route map clear.user.packet.df.bits is applied to the inside interface of the IPsec VPN gateway. |