Sophos UTM when running on clustered Hyper-V 2012 R2 (or why is Linux is different?)

Originally my post was going to be about how painful it was working on a bug in the Sophos UTM virtual appliance. But after doing some digging I’ve realised I need to point the fingers in a different way!

What caused me pain was this:

Build a Highly Available VM using latest UTM download from Sophos (4x network cards and shared SMB storage in our case).

After built go into the console (loginuser and then su to root):
ifconfig | less

Note the eth network cards that are available
Live migrate the UTM to another host in the cluster
ifconfig | less

Note all eth network cards have now disappeared

Live migrate back to initial hosts
ifconfig | less

Note eth network cards have not come back

The system will continue to run and can be moved around hosts with no problems. The issue will only manifest itself once the UTM gets rebooted (in my case for an Up2date firmware fix).


After building a new UTM VM I did some testing to replicate the problem:

First thing - lets shut the VM down, move to the other host and bring it up, maybe the bug is in the Live migration.

Unfortunately, no, exactly the same problem, no eth cards.

A-ha I thought, so lets shut the VM down and move back to the initial host and bring it up, maybe it’s a bug in the network card stack so the cards will magically come back.

No, cards still missing

So at this point I was speaking to Donald at Sophos technical support (top guy, many thanks to him) and we had come to the conclusion it was a bug in the Linux kernel that the UTM uses.

He was going to add a bug number in and I was going to leave the UTM sitting on a single host with instructions to never move it!

And there might have ended the story and I would have gone away thinking that I’ve found a bug but I started to do some Googling Binging and hit the answer……

Those of you who have been using Linux on Hyper-V will have seen the problem straight away.
(for this part please forgive my lack of Linux knowledge, this is how I see the problem – but I really don’t want to start a flame war!)

And it’s all down to how Linux binds its network cards to the MAC address of the virtual machine. Windows boxes will happily use any MAC address it is given on boot whereas the first MAC address that is given to a Linux box will get hard coded into the config files and will always be used. This means when the virtual appliance (because to me that what it is) migrates between hosts the MAC address changes and Linux gets unhappy.

I’m sure that there are some people who will read this and say “surely you should have read up on the Microsoft documentation for running Linux in Hyper-V

I’ll counter with, “nope, I was running an appliance. Not a Linux box so had not thought of even reading up on this, and don’t call me Shirley

My plea to vendors who say that they have an appliance that will run in your virtual environment is that they make it clear in their documentation that this could happen so other hapless “Windoze” geeks like myself are not caught out in this way in the future.

And me – I destroyed my VM (for the 5 time!) and rebuilt it – this time with the MAC address set to a static address (side note: if you do not type a MAC address after setting a static one then Hyper-V will assign you one automatically) and live migrated my Linux VM Appliance to my hearts content.......

……..or so I thought: By setting the MAC to a static MAC on our WAN interface stopped all traffic over the WAN, so with time against us we had to set the WAN back to being dynamic, this means if the UTM moves hosts it’ll need a reboot and config changes to get the WAN interface back onto a the virtual NIC……

The story will continue if I get an answer back from Sophos prior to me moving to my new job!

3 comments:

  1. Hi Tobie
    This Issue seems not to be fixed yet, did you ever got an answer from Sophos?

    Regards, Michael

    ReplyDelete
    Replies
    1. Hi Michael,

      Sorry, didn't get a response from Sophos before I left my old company. I'll drop them a line but don't think it ever got fixed.

      Regards
      Tobie

      Delete
    2. Hi Michael,

      I've spoken to my old company and the way around this is to have two UTM's running in an Active/Passive mode:

      http://fastvue.co/sophos/blog/how-to-build-a-sophos-utm-high-availability-ha-cluster-in-hyper-v/

      Hope this helps.

      Regards
      Tobie

      Delete