Implementing XenApp 7.6 on Apache Cloudstack 4.3Marcel A' Campo
Before Xenapp 7.5 was released it was of course already possible to spin up instances in Cloudstack and install the various Xenapp components, so what has changed? What Citrix actually released with XenApp 7.5 is that you can now use Machine Creation Services (MCS) with Cloudstack. Also with Amazon AWS, but since my experiences are with ACS I will not discuss that option. Without this integration you had 2 options for provisioning workers like XenApp servers. The first option is to use Citrix Provisioning Services (CPS) and the other was use to provision XenApp servers from Cloudstack templates.
The Environment I had to build was a relatively small XenApp farm (100 concurrent users, 30 apps, 10 XenApp servers) so for that MCS is a perfect fit. Compared to CPS it lacks the added complexity of maintaining CPS servers. The main benefits of CPS are less IOPS (especially since you can now leverage the ‘RAM cache with overflow to disk’ option) and better versioning, but since my environment was relatively small those benefits would not outweigh the benefit of simplicity. The advantage of MCS compared to just deploying XenApp instances from template is that linked-clone technology is used (resulting in less storage requirements) and that it is completely integrated with Citrix Studio.
In the last 6 to 8 weeks I have built that XenApp environment and there were a number of lessons learned that I want to share with you. First of all I noticed that there is not much documentation and articles that provide information on how to build such an environment. The most important document is this one. The document describes how Xendesktop / XenApp can be integrated with Cloudplatform. We manage an Apache Cloudstack version 4.3 environment, but everything that is described in the document was equally valid for our ACS. The overall feeling I had after reading the document is that this Citrix document describes how to implement XenApp in a small Cloudplatform environment, like a lab environment. In our case we already have ACS running for several customers, which means that we are running it over several datacenters with many pods, existing service offerings, networks, delegated admins, etc, etc. The way Citrix developed the integration does not take into account the complexities that come with such an enterprise ACS implementation.
For example the document describes that you need to add ‘tags’ in ACS to the Volume Worker template and to the specific network in which you want to deploy the XenApp servers. The Volume Worker template needs to have the tag key “Citrix.XenDesktop.Template.Role” and the value “VolumeServiceWorkerRole”. The network needs to have the tag key “Citrix.XenDesktop.Network.Role” and the value “MachineIsolationRole “. These tags are then used by Citrix Studio and the Volume Worker instances to deploy new XenApp servers. The problem is that these are fixed tags. Now for our customer we needed to implement Test, Acceptance and Production environments with separated networks. As an admin you do not have the option to set a specific tag per environment, so that Studio \ MCS understands that a network belongs to a certain environment. Instead it just creates a listing of the networks and the first network with these tags set is the one in which the machines are deployed. So that does not work for a TAP street.
The way to work around that is to create a ACS user per environment and give this ACS user (a sort of service account) permissions on the network, like in this picture:
Normal user permissions are sufficient to use MCS with these networks. Citrix Studio is then configured to use these ACS user accounts to communicate with ACS:
To configure that simply copy the API and Secret keys of the ACS user account into the connection properties of Citrix Studio. As a result all jobs on ACS run with these credentials.
Next, we had to give these accounts permissions on all instances and templates within ACS that are used by Citrix Studio and live in these networks. In our case the Storefront servers are also in these networks so these also had to be linked to these accounts. The Volume Worker template also needed to be linked to these accounts (which means a separate Volume Worker Template per environment) and the Golden Image server as well. The last thing that we needed to be aware of is that the instances that are not deployed with MCS but do live in these networks (Storefront, Golden Image server) needed to be deployed with API and Secret keys of these accounts. Within SBP we use Chef for our deployments, so we had to adjust our knife.rb’s with these keys. The end result of this is that we have separated our TAP also nicely within ACS.
Another thing we ran into had to do with the service offerings that we used. Normally our instances in ACS have a service offering that is High Availability (HA) enabled. That means that if the instance goes down, ACS automatically tries to start the instance again. In our case (we are hosting mission critical environments) that is of course of essence. But those offerings cannot be selected by MCS when new XenApp servers are deployed. Because what happens is that during the provisioning the Volume Worker instructs the new XenApp instance to shut down and stay down. However ACS starts the instance again which will result in a failed provisioning process. So during deployment we select an offering that is not HA enabled and when all is done we make sure the instance gets an HA enabled service offering. Of course this can be scripted and automated, but nonetheless.
What also would be handy is that we could have MCS read a file of specific machine names / IP address combinations and that MCS would create the new XenApp instances accordingly. Currently you only have the option to provide a machine name from a range (e.g. HOSTNAME##, where ## are a unique number in a range). In our case the Test and Acceptance instances run in 1 specific datacenter and our naming convention details that these are odd numbered. So deploying a number of these machines in 1 batch is normally not possible. As a workaround we would create a few even numbered AD objects first, so that MCS only deploys the odd ones. Then after the deployment we would remove the even numbered AD objects again. Because we cannot instruct MCS to use specific IP addresses, ACS selects them which means that after deployment we have to verify which IPs are used and write those down in our IP numbering plan. In itself not really an issue, but if we would also needed to restrict traffic using firewalls for specific XenApp servers, then it becomes more cumbersome. Not really something for enterprise scale.
We also ran into a Cloudstack bug (Cloudstack article ID: 6172), but to explain the impact of that bug I first need to tell a bit more about our MCS image update procedure in the Production environment. The production environment is split up into 2 datacenters (DC A and DC B), which means 2 primary storage resources in 2 PODs in those datacenters. In ACS we have service offerings linked to the specific PODs. For every Datacenter we created separated machine catalogues in Citrix Studio. Our provisioning procedure follows these steps:
- We update our Golden Image server and shut it down.
- From within ACS we create a template.
- When updating both machine catalogues we use this same template, but different service offerings.
- Our Delivery Group contains instances from both machine catalogues.
Another thing. The Citrix documentation states that the image preparation phase must be done in an isolated network. I even noticed that if you do not tag your network before creating the Volume Worker such an isolated network is created automatically. In our Cloudstack implementation we use normal Layer 3 networking (so no basic networking with security groups, advanced networking with SDN or VPCs) and if we would use the automatically created isolated network then the provisioning would fail. This is because the Delivery Controller on which we installed Studio needs to be able to communicate with the Volume Worker that is created in that network over port 443. But because there is no route to that isolated network the provisioning fails. So we added the tags to our non-isolated network and removed the isolated network and all is well.
One last suggestion towards Citrix is to improve the error handling in Citrix Studio of the provisioning process. Both the logging node in Studio as well as the Actions tab show only high level progress information. An example we faces was that our primary storage became almost full, but not completely. Now ACS uses a formula that when the storage capacy exceeds some level that new instances are not getting deployed. As a result the provisioning process fails. Studio only informs you that a disk is being copied and then a generic error message is shown. We had to dive into ACS logging to find out that the reason was that no suitable storage could be found. It would have been nice if Studio would have picked this up and translated it into a suitable error message.
So there you have our Real World experiences with implementing XenApp on Cloudstack. Although we bumped into some issues I have to say it all works quite nicely and it is possible to use this in a production environment. If this would this also scale into the hundreds of desktops or XenApp servers I cannot tell. I think in such a scenario you would rather leverage Citrix Provisioning Services which completely changes the provisioning process. But for our environments this works well.
Next step for us is to look at Application Orchestration so we can consolidate the several separated Citrix environments into 1. In the next few months I will work on that and when ready I will blog about that also.