IBM Support

Encrypted Traffic Fails Due to MTU Mismatch Between Physical and Virtual Hosts

Troubleshooting


Problem

Encrypted connections (DSE internode encryption and SSH) hang during SSL/TLS handshake when communicating between datacenters, specifically between physical hosts and VMware virtual hosts with mismatched MTU settings.

Symptom

  • DSE Symptoms:
    • Internode encryption fails between datacenters
    • Unencrypted traffic works normally
    • Local datacenter communication works fine
    • SSL handshake initiates but hangs during key exchange
  • SSH Symptoms:
    • SSH connections hang after key exchange algorithm negotiation
    • Telnet to SSH port (22) shows SSH-2.0-OpenSSH_X.X banner successfully
    • SSH verbose mode shows connection stuck at:

      debug1: kex: algorithm: curve25519-sha256
      debug1: kex: host key algorithm: ssh-rsa
      debug1: kex: server->client cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
      debug1: kex: client->server cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
      debug3: send packet: type 30
      debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
      
  • Network Symptoms:
    • tcpdump shows incoming packets as "IP11 (invalid)"
    • Packets appear corrupted or malformed
    • Only affects encrypted traffic, not plaintext

Cause

MTU (Maximum Transmission Unit) mismatch between network interfaces combined with the "Don't Fragment" (DF) bit set on encrypted packets.

Technical Details

  1. MTU Configuration Mismatch:
    • Physical hosts: MTU = 9000 (Jumbo frames)
    • VMware hypervisor: MTU = 1500 (Standard Ethernet)
    • Virtual hosts inherit 1500 MTU from hypervisor
  2. Encrypted Traffic Behavior:
    • Encrypted packets (SSL/TLS, SSH) set the "Don't Fragment" (DF) bit in IP header
    • This prevents packet fragmentation along the network path
    • When a packet larger than the receiving interface's MTU arrives with DF bit set, the packet is dropped
  3. Why It Only Affects Encrypted Traffic:
    • Unencrypted traffic typically doesn't set DF bit, allowing fragmentation
    • Encrypted protocols set DF bit for security and performance reasons
    • Key exchange packets in SSL/TLS and SSH are often larger than 1500 bytes
  4. Why Local DC Works:
    • All hosts in same DC have matching MTU (9000)
    • No MTU mismatch = no packet drops

Environment

  • Affected Datacenter: DC1 (VMware virtual hosts, MTU 1500)
  • Working Datacenter: DC2 (Physical hosts, MTU 9000)
  • Affected Services: DSE with internode encryption, SSH
  • Network Path: Cross-datacenter communication

 

Diagnosing The Problem

1. Check MTU Settings

# On each host, check MTU
ip link show | grep mtu

# Or for specific interface
ip link show eth0 | grep mtu

# Expected output shows MTU value:
# eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP

2. Test Path MTU Discovery

# Test with different packet sizes (Don't Fragment bit set)
ping -M do -s 1472 <remote_host>  # Should work (1472 + 28 header = 1500)
ping -M do -s 8972 <remote_host>  # Will fail if path MTU < 9000

# -M do: Set Don't Fragment bit
# -s: Packet size (add 28 bytes for IP+ICMP headers)

3. Capture Network Traffic

# Capture encrypted connection attempt
sudo tcpdump -i any -nn host <remote_host> and port 7001 -w /tmp/ssl-test.pcap

# Analyze capture
tcpdump -tttt -nn -r /tmp/ssl-test.pcap

# Look for "IP11 (invalid)" or dropped packets

4. Test SSH with Smaller Key Exchange

# Force SSH to use smaller KEX algorithm
ssh -o KexAlgorithms=ecdh-sha2-nistp521 user@remote_host

# If this works, confirms MTU issue (smaller packets succeed)

5. Verify VMware MTU Settings

# On VMware host, check virtual switch MTU
esxcli network vswitch standard list

# Check physical NIC MTU
esxcli network nic list

Resolving The Problem

On all DSE hosts with MTU 9000:

# Temporary change (lost on reboot)
sudo ip link set <interface> mtu 1500

# Permanent change - RHEL/CentOS 7/8/9
sudo vi /etc/sysconfig/network-scripts/ifcfg-<interface>
# Add or modify:
MTU=1500

# Restart network
sudo systemctl restart NetworkManager
# or
sudo nmcli connection down <connection> && nmcli connection up <connection>

# Verify
ip link show <interface> | grep mtu

For Ubuntu/Debian:

# Edit netplan configuration
sudo vi /etc/netplan/01-netcfg.yaml

# Add MTU setting:
network:
  version: 2
  ethernets:
    ens192:
      mtu: 1500
      # ... other settings

# Apply
sudo netplan apply

 

Option 2: Increase VMware MTU to 9000 (If Infrastructure Supports Jumbo Frames)

Requirements:

  • All switches and routers in path must support jumbo frames
  • Physical NICs must support MTU 9000
  • May require network team coordination

On VMware ESXi:

# Set vSwitch MTU
esxcli network vswitch standard set -v vSwitch0 -m 9000

# Set VMkernel interface MTU
esxcli network ip interface set -i vmk0 -m 9000

# Verify
esxcli network vswitch standard list

On Virtual Machines:

  • Follow same steps as Option 1, but set MTU to 9000
  • Reboot VM or restart network service

 

Option 3: Configure Path MTU Discovery (Workaround)

# Enable PMTU discovery on Linux hosts
sudo sysctl -w net.ipv4.ip_no_pmtu_disc=0

# Make permanent
echo "net.ipv4.ip_no_pmtu_disc=0" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p

Note: This is a workaround and may not fully resolve the issue if intermediate devices don't properly handle ICMP "Fragmentation Needed" messages.

Verification

1. Test DSE Internode Encryption

# Enable encryption in cassandra.yaml
server_encryption_options:
    internode_encryption: all

# Restart DSE
sudo systemctl restart dse

# Check logs for successful connections
grep -i "ssl\|handshake" /var/log/cassandra/system.log

# Verify with nodetool
nodetool status

2. Test SSH Connection

# SSH with verbose output
ssh -vvv user@remote_host

# Should complete without hanging at KEX_ECDH_REPLY

3. Test with OpenSSL

# Test SSL handshake
openssl s_client -connect <remote_host>:7001 -showcerts

# Should complete handshake and show certificate chain

 

Prevention

  1. Standardize MTU across all datacenters
    • Document MTU requirements in infrastructure standards
    • Use consistent MTU (1500 or 9000) across physical and virtual environments
  2. Validate MTU during host provisioning
    • Add MTU check to deployment automation
    • Include in pre-production validation checklist
  3. Monitor MTU settings
    • Add MTU monitoring to infrastructure monitoring tools
    • Alert on MTU mismatches between connected hosts
  4. Document network topology
    • Maintain documentation of MTU settings per datacenter
    • Include in network diagrams and runbooks

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB76","label":"Data Platform"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSIYC6","label":"DataStax Enterprise"},"ARM Category":[{"code":"a8mgJ0000000GPlQAM","label":"Drupal Knowledge Base Article"},{"code":"a8mgJ0000000GETQA2","label":"Functionality Issues"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":""}]

Product Synonym

dse; tls; ssl; server_encryption; mtu

Document Information

Modified date:
11 May 2026

UID

ibm17272588