9

Situation is like this:

http client ----> corporate firewall ----> http server

Due to a keepalive, server and client would keep TCP connections open and the client would use a connection pool for HTTP requests.

The firewall has a rule to "kill" long-standing TCP connections after 1 hour. The problem is that our HTTP client would not detect that TCP connection was destroyed and it tried to reuse essentially dead connections which on our side looked like the client "hanged" after a period of time. A request would hang, then the next one would work, presumably because a new connection was established.

The question here is what is the mechanism with which the firewall is killing TCP connections in a way that our HTTP client was unable to detect them. I tried to reproduce this behavior locally in a few ways:

  1. Kill TCP connections on our vyos router, Wireshark on client side captured TCP FIN-ACK. OK
  2. Kill TCP connection client side in TCPView on Windows, Wireshark detected TCP RST on the client side. OK
  3. Block port after established connection to the client-side firewall, resulted in socket reset exception. OK

I have a Wireshark dump on server side and I tried to find if firewall sends a FIN or RST with ip.dst==serverip && (tcp.flags.reset==1 || tcp.flags.fin==1) but nothing showed up.

Additionally, Wireshark capture on client side shows the problem as HTTP request going out, followed by a dozen of TCP retransmissions, ultimately not going anywhere.

HTTP client is a Java native and/or Jetty HTTP client (tried both), both failed to detect a dead TCP connection. I'd like to reproduce the behavior locally but I am unable to figure out in what dodgy way the firewall is killing the connections, therefore looking for possible answers.

Kindle Q
  • 149
  • 8
cen
  • 191
  • 1
  • 1
  • 3
  • 8
    Simply drop the packet in the firewall, i.e. do not forward it to the destination and do not set any FIN, RST etc. – Steffen Ullrich Nov 28 '17 at 11:37
  • You should edit your question to include the firewall model and configuration (obfuscate any passwords and public addresses). – Ron Maupin Nov 28 '17 at 17:27
  • "Firewall has a rule to 'kill' long standing TCP connections after 1 hour." That sounds like one seriously bizarre firewall rule. – reirab Nov 28 '17 at 22:06
  • @RonMaupin I would already if I knew it or had access – cen Nov 28 '17 at 23:01
  • Unfortunately, questions about networks over which you have no direct control are explicitly off-topic here. – Ron Maupin Nov 28 '17 at 23:15
  • Truly my bad, completely missed that. I knew I should have stuck to serverfault.. still grateful for all the help either way. – cen Nov 28 '17 at 23:45

4 Answers4

11

You don't mention the kind of firewall, but I suspect most simply drop the packets.

I have a Wireshark dump on server side and I tried to find if firewall sends a FIN or RST with ip.dst==serverip && (tcp.flags.reset==1 || tcp.flags.fin==1) but nothing showed up.

Which would tend to confirm this.

Ron Trunk
  • 66,852
  • 5
  • 65
  • 126
  • Unfortunately I don't know any info about the firewall since it's internal network. Any suggestion how I could reproduce this locally (preferrably client side)? – cen Nov 28 '17 at 12:29
  • 1
    If your goal is to create a test bed, you could run a software firewall on a PC, or buy an inexpensive hardware firewall. Blocking the connection would seem to be a simple thing to do. – Ron Trunk Nov 28 '17 at 14:12
6

Most likely the firewall just dropped the packet without sending an RST packet, probably after hitting a session timeout value of some sort. This is typically configurable behaviour.

I personally prefer having that RST packet sent precisely because it helps clients behave normally, but I have heard arguments to the effect that this should not be done on externally-facing firewalls to avoid providing any kind of feedback to potential attackers.

I have seen this cause quite a few issues because clients typically don't handle this kind of scenario very elegantly. Essentially, they keep retrying through the original TCP session (which is now dead) and never try to re-establish a new one. Eventually a client-side timeout triggers and the user gets a nasty error message. Setting up HTTP keepalive appropriately for the app can help to fix this.

Jeremy Gibbons
  • 2,439
  • 9
  • 15
  • Sending a FIN or RST would require that the firewall implementation keep track of the sequence numbers on the connection (because it needs to fill in that data in the FIN/RST packet). In contrast, a "just drop it" policy would mean that the firewall implementation just needs to store the 4-tuple and kill it when the 1 hour time is up. – mere3ortal Nov 28 '17 at 16:14
  • I can understand the reasoning for external network, but for internal, this seems downright evil. – cen Nov 28 '17 at 16:23
  • It's evil to do this statefully. Only drop altogether if nothing should be listening on that port for that IP range. – Joshua Nov 28 '17 at 18:19
  • @Joshua, I entirely agreed that it's evil, precisely because I've had to fix the mess this caused. Still, it does happen with sufficiently paranoid SecOps teams... – Jeremy Gibbons Nov 30 '17 at 02:42
4

@Ron Trunk is exactly correct, almost certainly the open connection is being dropped either actively (deny rule inserted) or passively (removed from known connections and not allowed to be recreated without a syn). One of the comments suggested trying it out yourself. Here is a recipe for doing so using linux network namespaces. It assumes that ip forwarding is enabled in your host's kernel, you are root, and probably other things.

# Create network namespacs
ip netns add one; ip netns add two; ip netns add three
# Create interfaces between namespaces
ip link add dev i12 type veth peer name i21
ip link add dev i32 type veth peer name i23
# Bring interfaces up and assign them to respective namespaces
ip link set dev i12 netns one up
ip link set dev i21 netns two up
ip link set dev i32 netns three up
ip link set dev i23 netns two up
# Assign IP addresses
ip netns exec one ip addr add 1.1.1.1/24 dev i12
ip netns exec two ip addr add 1.1.1.2/24 dev i21
ip netns exec three ip addr add 3.3.3.3/24 dev i32
ip netns exec two ip addr add 3.3.3.2/24 dev i23
# Add routes when necessary
ip netns exec one ip route add default via 1.1.1.2
ip netns exec three ip route add default via 3.3.3.2

You then need three windows/shells/screens/terminals. Run each command below in a distinct terminal:

  • Start listening on server: ip netns exec three socat TCP-LISTEN:5001 STDIO
  • Start transmitting on client: ip netns exec one socat STDIO TCP:3.3.3.3:5001

Note that after running these commands, everything you type in one window will be reflected in the other, and vise-versa (after hitting return). If that isn't true, you might need to enable ip forwarding.

  • Instantiate the deny rule: ip netns exec two iptables -I FORWARD -j DROP

Then nothing you type will be allowed through.

You can simulate a less active drop method with (untested) forward rules like:

ip netns exec two iptables -A FORWARD -m state --state RELATED,ESTABLISHED -j ACCEPT
ip netns exec two iptables -A FORWARD -m tcp -p tcp -dport 5001 --tcp-flags ALL SYN -j ACCEPT
ip netns exec two iptables -A FORWARD -j DROP

See https://unix.stackexchange.com/questions/127081/conntrack-tcp-timeout-for-state-stablished-not-working and https://www.kernel.org/doc/Documentation/networking/nf_conntrack-sysctl.txt for information on how to adjust the timeouts--though it isn't clear to me that iptables natively supports a maximum connection lifetime; I believe all timeouts are idle timeouts.

Clean up with ip netns del one; ip netns del two; ip netns del three

  • Host/server/VM configurations are off-topic here. – Ron Maupin Nov 28 '17 at 19:58
  • @Ron Maupin: But is creating a network testbed to test a network engineering theory off topic? – Seth Robertson Nov 28 '17 at 20:11
  • There are some things, like configuring the network devices (router, switches, etc.), or using something like Pacet Tracer or GNS3 to mock up something, are on-topic. Configuring the hosts/servers/VMS are off-topic here. How those, and their OSes, work are not part of the network. The OP needs to include the firewall model and configuration so we can see if we can help. – Ron Maupin Nov 28 '17 at 20:19
1

The firewall can send an ICMP packet indicating that the target was unreachable. For anything but TCP, that is the only possible error indication, for example sending a packet to a closed UDP port will generate a "destination unreachable" message with the reason code set to "port unreachable".

It is also possible to send "port unreachable" messages as a response to TCP packets, this also terminates the connection, but anyone analyzing packet dumps will notice that this is unusual, because TCP convention is to indicate closed ports with a RST.

The sender is expected to map any ICMP error packets received back to the originating connection and handle them appropriately, so a firewall generated error packet can also be used to terminate a TCP connection. The ICMP packet contains a copy of the headers of the offending packet to allow this mapping.

Simon Richter
  • 253
  • 1
  • 4