Hi Hendrik,
Nice find! Thank you for reporting it! Actually, our goal is to try not being too clever about ICMP and surely not filtering it too much. One way to do that is to let it be handled by netfilter and the linux kernel, which have all proper rules implemented to sort out if an icmp packet is legitimate or not. No need to try mimicking that behaviour in firewall rules.
So, what went wrong here? Because, obviously something went wrong, as you noticed.
Some background info:
For the public facing HTTPs traffic, we use LVS (http://linuxvirtualserver.org/) to distribute incoming traffic for IP addresses to a bunch of web servers, running Nginx. The load balancing method used is "Direct Return". The load balancer system receives traffic for the IP addresses that your computer is talking to (currently 2001:888:2177:1::f3 for packages.mendix.com). After receiving it, it changes the mac address in the packet to that of one of the real nginx servers and throws it back on the wire, so the chosen real web server will receive it and process it. The nginx servers have the load balancer IP addresses configured on a dummy interface that does not participate in answering arp requests from the network. Return traffic from nginx is sent back to your computer directly instead of hitting the load balancer again.
When there's a link between you and our servers with a lower MTU than 1500, the first packet that will likely hit the limit of that link is a return packet from the webserver with TLS certificates in it, which should provide you with enough information to verify the identity of the server. The router at the ISP side of your tunnel drops the packet and generates an icmp packet too big to tell us that outgoing packet size should be lowered.
However, this icmp packet is sent to the IP address that the oversized packet has as source. This means that the nginx web server directly transmits the packet to you, but the icmp error ends up at the load balancer instead. Luckily, this should not a problem, because the LVS code can figure out that an incoming icmp6 packet too big probably belongs to a tracked (in lvs) connection, (which is actually quite amazing) and then forward the icmp packet to the nginx server that is currently handling the https connection. (This functionality was added years ago, search for "Add handling of incoming ICMPV6_PKT_TOOBIG messages
" or commit 94b265514). The nginx server technically has no idea about the load balancer in between, thinks it directly got the icmp message and properly processes it.
So it must be something else...
Since the load balancer host is only seeing half of the packets for each connection, it makes no sense to have connection tracking enabled on those connections in the linux kernel. Connections will never get to an ESTABLISHED state. For this reason, we turned off connection tracking for incoming traffic on the load balancers. Because the pattern of traffic is really simple, a NOTRACK rule and an INPUT rule is all we need for the load balancer. Currently, it looks like (only relevant rules):
*raw
:PREROUTING ACCEPT [0:0]
-A PREROUTING -p tcp -m multiport --dport 80,443 -j NOTRACK
*filter
:INPUT DROP [0:0]
-A INPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
-A INPUT -m conntrack --ctstate INVALID -j DROP
-A INPUT -p tcp -m multiport --dport 80,443 -j ACCEPT
-A INPUT -p ipv6-icmp -j ACCEPT
And that's where it goes wrong. Packets on port 80 and 443 will be let through, and are handled by the ipvs code, picked up from the INPUT flow and retransmitted to the real web servers, but when an icmp packet too big comes in that is related to a load balanced https connection, it's stopped when it's examined by the the INVALID rule. The connection tracking module won't be able to find a matching connection which could have caused this icmp message, marks it as INVALID and the rule drops it, so the packet never reaches the -p ipv6-icmp -j ACCEPT. :-(
Right now, the rules look like this...
*raw
:PREROUTING ACCEPT [0:0]
-A PREROUTING -p tcp -m multiport --dport 80,443 -j NOTRACK
*filter
:INPUT DROP [0:0]
-A INPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
-A INPUT -p icmpv6 --icmpv6-type destination-unreachable -j ACCEPT
-A INPUT -p icmpv6 --icmpv6-type packet-too-big -j ACCEPT
-A INPUT -p icmpv6 --icmpv6-type time-exceeded -j ACCEPT
-A INPUT -p icmpv6 --icmpv6-type parameter-problem -j ACCEPT
-A INPUT -m conntrack --ctstate INVALID -j DROP
-A INPUT -p tcp -m multiport --dport 80,443 -j ACCEPT
-A INPUT -p ipv6-icmp -j ACCEPT
...which is already better.
The most probable reason why I didn't catch this error earlier is that I usually by default consider pmtu a bit broken on the web and put MSS Clamping rules (http://lartc.org/howto/lartc.cookbook.mtu-mss.html) into routers (like, at home with PPP over dsl, or at branch offices which use HE IPv6 tunnels if there's no IPv6 available). And of course, because we didn't explicitely test this specific scenario. :|
When you're near our office in Rotterdam, don't forget drop by for a free beer and/or a nice discussion. :)
Hans
Hi,
I believe the above issue could also be the cause for the reachability issues of mijn.nle.nl (and bms.baminfratechniek.nl), that were recently discussed on the Ducth IPv6 Task Force mailing list. Could you please confirm (on the IPv6 TF mailing list) that the issues with these websites are fixed now?
Please see maills on IPv6 mailinglist with subject "mijn.nle.nl over IPv6" on https://lists.ams-ix.net/mailman/listinfo/ipv6-tf
See also Twitter-discussion on https://twitter.com/internetnl/status/639400994057678849 en https://twitter.com/Orwell84/status/642287457615134721
Cheers,
Baknu
Internet.nl