warning: Clock skew detected. Your sanity may be incomplete.

Published July 9th, 2009 by Barney Desmond

I’m a sysadmin, so staying up late most nights is kind of a personality trait. As a consequence, I sometimes don’t know what day it is, but I’ve usually got a handle on the month and year. Here at the Hosting company for Creative Anachronisms we like to think we can deal with just about any such oddball request customers throw at us, but we have limits (asymptotic at times).

  • One customer, when answering our technical spec form for a new dedicated server, requested that we install a telnet daemon (locking it down of course, for security y’know)
  • Another recent request asked for IP-based vhosting – customer is presumably migrating their app from an old Netscape iPlanet webserver or something equally horrid, they wouldn’t say
  • A contract we’ve been chasing recently saw their tech guy raise the concern that Ping Of Death might cause problems if we allow ICMP through the firewall… we’re going to try to sell them our ping of death protection service, maybe see if they’re interested in teardrop as well
0
Comments

Firewalling VMware ESX for console access

Published February 23rd, 2009 by Barney Desmond

One of Anchor’s more recent product offerings is VMware-based virtual private servers. As one of my colleagues has already detailed, we take extra measures to secure the VMware host server to reduce the possibility of a compromise.

Our VPS offering uses VMware ESX, which runs on bare metal and doesn’t have a host operating system. This isn’t the full story – according to documentation it boots a Redhat Enterprise Linux 3 system, then loads the vmkernel which is where the real work is done. One of the nice things about this approach is that there’s a userspace environment in which to run support software, like good monitoring components.

We ran into an odd problem recently with an ESX host server on a dedicated network segment, namely that we couldn’t view the console for VM guests. Nothing would happen for about 30 seconds, then the VMware Infrastructure Client (VIC) would report a connection failure.

Most people using VMware now have probably used the vanilla VMware Server once or twice. It’s pretty easy to understand, and because it runs on top of your usual OS, firewalling it as simple as opening holes for legitimate clients to connect to TCP port 902. That’s not the case with VMware ESX, and without reading the manuals it’s not immediately obvious what the problem is.

As it turns out, the VIC is attempting to connect to TCP port 903 on the host server. If one assumed that the VMware hypervisor is just a modified linux system (which is what it looks like, but isn’t quite) you should be able to see a listening port in the output of netstat -tnlp, but you can’t.

[root@miyuki root]# netstat -tnlp
Proto Local Address       Foreign Address     State       PID/Program name
tcp   0.0.0.0:5666        0.0.0.0:*           LISTEN      2684/nrpe
tcp   127.0.0.1:32771     0.0.0.0:*           LISTEN      1807/cimserver
tcp   0.0.0.0:5988        0.0.0.0:*           LISTEN      1807/cimserver
tcp   0.0.0.0:5989        0.0.0.0:*           LISTEN      1807/cimserver
tcp   127.0.0.1:8005      0.0.0.0:*           LISTEN      1652/webAccess
tcp   0.0.0.0:902         0.0.0.0:*           LISTEN      1598/xinetd
tcp   0.0.0.0:199         0.0.0.0:*           LISTEN      1489/snmpd
tcp   0.0.0.0:8009        0.0.0.0:*           LISTEN      1652/webAccess
tcp   0.0.0.0:80          0.0.0.0:*           LISTEN      1765/vmware-hostd
tcp   0.0.0.0:8080        0.0.0.0:*           LISTEN      1652/webAccess
tcp   0.0.0.0:22          0.0.0.0:*           LISTEN      1498/sshd
tcp   127.0.0.1:8889      0.0.0.0:*           LISTEN      1839/openwsmand
tcp   0.0.0.0:2265        0.0.0.0:*           LISTEN      1732/osirisd
tcp   0.0.0.0:443         0.0.0.0:*           LISTEN      1765/vmware-hostd

As a little more reading revealed, the VIC makes an extra connection to port 903 for console data. This marks a significant change from the earlier model of passing everything through the same connection, and the reason for such a change is unclear.

We’ll assume there’s some performance benefit to be had there. What we find more interesting/important is that port 903 is entirely “under the radar”, as it’s implemented in vmkernel. The other traffic on the box is subject to our standard iptables rules as far as we can tell, including port 902, which is used for a lot of other management-client to host-server interaction.

tcpdump is also oblivious to port 903, so we’re guessing VMware passes traffic through the netfilter stack when it’s deemed to be convenient or necessary. If we had a spare ESX host sitting around doing nothing, I’d be interested to see if the packet counters shown in the output of ifconfig are also affected.

0
Comments

Firewall Hero III: Legends of Packet Filtering

Published October 27th, 2008 by Barney Desmond

Customer support can be fantastically rewarding sometimes. When your combination of skill, tenacity and knowledge produce a solution that’s straightforward and effective, the feeling of satisfaction is hard to match. It doesn’t even have to be something big, we’ll take those small victories gladly. Our work sometimes looks a bit like magic, and we don’t mind one bit.

A few weeks ago one of our customers reported problems with streaming media from their server. Clients were taking about 20-30sec to connect, which was of course unacceptable, and they were suspecting something was wrong on our side, perhaps congestion or some over-zealous border firewall. The redundant connectivity we purchase is well above requirements even in the face of failure, and we don’t oversell bandwidth, so the former wasn’t a possibility. We use Linux’s netfilter framework for firewalling and routing (some people are really surprised when they learn we’re not using an enterprise-y appliance box) so the latter wasn’t likely. Thanks, Cisco, but we know plenty about security; we don’t need you to silently break our SMTP transactions for us in the name of “security”.

The server had been setup to listen on port 443 (to get around annoying corporate firewalls and the like) instead of the defaults in the config file, which looked to be ports 8083 and 33840. Not that this should be a problem, I checked the ports the server was listening on with netstat and confirmed that the firewall was letting it through (the media server is a java app, by the way).

netstat -tunlp | grep java

The customer had been in contact with the vendor of the media streaming server software for a little while now, and they were adamant it was a problem with our network.

One of the most useful tools in our arsenal for diagnosing network issues is tcpdump. With a rule-specification syntax that’s probably rich enough to solve a three-body problem, it’s very easy to drill down and find what you need in the flow of information. In this case I got the customer’s IP address and asked them to attempt a few connections, using a filter something like this:

tcpdump -i any host 3.14.159.26 and not tcp port ssh

What I immediately saw was a wave of connection attempts to port 1935. “But there’s nothing listening there”, I thought. Puzzled, I dropped the firewall and asked them to try again; it worked immediately, and they were quite happy to leave the firewall off at that point. We didn’t want to do this of course, so I asked them to persevere for a bit.

After raising the firewall again I asked the customer to connect again and watched the packets intently. Requests were arriving on 1935, being dropped by the firewall, then retried, which is consistent with exponential backoff behaviour. Almost exactly 20sec later, a connection attempt arrived on port 443 and the customer commented that it had finally connected.

Aha! Everything suddenly clicked. The client was attempting a connection on 1935, which it turns out is a standard port used by the Flash content server. With nothing listening there, the standard procedure is to have the firewall drop all such packets and not bother replying. The client, assuming the possibility of a congested/lossy network, keeps retrying for 20 seconds, gives up, tries another port, and immediately succeeds. With the firewall down, the OS instead replies with a TCP RST to tell the client there’s nothing there, so it tries the alternate port straight away.

The solution then, was simple: ensure that the client gets a TCP RST for connections to port 1935. This is similar to what’s done for port 113 (the ident protocol that’s familiar to IRC users), which can also delay connections to the server. Filtergen doesn’t appear to have a way to specify how to REJECT, so it just uses the default ICMP port-unreachable. This should be sufficient, but testing showed that the client wasn’t getting them, so I just allowed the connections through in the end. The IP stack caught them and all was well. If you’re using raw iptables rules, something like this will do the job.

iptables -I INPUT -p tcp –dport 1935 -j REJECT –reject-with tcp-reset

All up this probably only took 10-15min on the phone with the customer. For them, after a number of hours of fruitless grappling with vendor tech support, that’s magic.

0
Comments