2012年12月24日星期一

CloudStack Troubleshooting

 

You should have one Java process running CloudStack on the Management Server. This is the Management Server process.
After everything is configured CloudStack will perform its initialization. This can take 30 minutes or more depending on the speed of your network. During this initialization process several things happen:

  • CloudStack will start the Secondary Storage VM and Console Proxy VM from the system VM template downloaded into each Zone. In the System Section, Virtual Resources, System VMs section you will see the status of these VMs listed first as Creating, then as Starting, then as Running. You can click on Refresh is the upper right to update the status.
  • After the Secondary Storage VM is running the Management Server will initiate the downloads of the CentOS templates. One is downloaded for each hypervisor type. The Management Server requests that the Secondary Storage VM perform this download. You can go to the Templates tab to check the status of this download. Go to Templates then My Templates when logged in as admin. The status will show “Storage agent or storage VM disconnected” until the Secondary Storage VM is running. Then the status will change to show that the download is in progress. You can click Refresh to update the download percentages.
  • Once the CentOS templates are downloaded they will be uncompressed by the Secondary Storage VM. This is a large file and this operation will take several minutes. The Management Server will then update each template’s status to Ready.
Once CloudStack has finished initializing, use the following steps to try creating a new virtual machine.
  1. Go to the Instances menu in the left hand column. Click on My Instances.
  2. Click the Add Instances button and follow the steps in the wizard.
    1. The template selection screen requires selecting a template. At this point you likely have only the provided CentOS template available.
    2. Select a service offering. Be sure that your hardware has enough available resources or the virtual machine will fail to stop.
    3. Add any additional "data disk". This is a second volume that will be available to but not mounted in the guest. For example, in Linux on XenServer you will see /dev/xvdb in the guest.
    4. Choose the primary network for the guest. You will have only one option here. You must pick exactly one.
    5. Optionally give your VM a name and a group. The group is text that may be whatever you would like. Click Submit and your VM will be created and started.
If you decide to grow your deployment, you can add more Hosts, Primary Storage, Zones, Pods, and Clusters. Repeat the procedures above as needed.
grep -i -E 'exc|unable|fail|invalid|leak|invalid|warn' /var/log/cloud/management/management-server.log

Troubleshooting the Secondary Storage VM


Many install problems relate to the secondary storage VM. The most commmon problems include:


  • SSVM cannot reach the DNS server

  • SSVM cannot reach the Management Server

  • SSVM cannot reach the outside world to download templates. It contacts download.cloud.com via HTTP.

  • The configured DNS server cannot resolve your internal hostnames.

I have tried and put some troubleshooting tips which hopefully should help you get out of troubled waters. But, this has to be a collective community effort so in case you find some more info or something which is not already covered please add to it so that we can learn from each other. 

  1. Logging into ssvm through the hypervisor   "ssh -i /opt/xensource/bin/id_rsa --p 3922 root@privateIP_or_LinkLocalIpofSSVM", or "ssh -i /root/.ssh/id_rsa.cloud -p 3922 root@LinkLocal" on XenServer.  Private ip in case of vmware and linklocal in case Xenserver.
  2. SSVM health check - Run the following script inside ssvm  /usr/local/cloud/systemvm/ssvm-check.sh
    It checks for 1)connectivity with  DNS server 2) resolving of  domain names 3)status of secondary storage 4)ability to write to secondary storage 5)connectivity with management server at port 8250 and 6) status of java process.
  3. Template not ready / not available when creating an instance - Many a times the SSVM is running but still the templates do not show as ready or to say templates are not available when creating an instance. Run the health check script above and diagnose. The most probable reason reason is that the agent running on SSVM hasn't been able to connect with MS which could also be validated by checking the host table in DB. select * from host where type like 'SecondaryStorageVM'. If the status shows as Alert then definitely that is the reason. There could be a number of reasons for the agent not being able to connect with MS. Below three could be one of them.

    1. Check whether port 8250 is open on MS and there is no firewall rule. This is the port on which the agent and MS communication happens.
    2. Check whether the SSVM is trying to connect to the right ip of MS. If it is incorrect it could be due to the wrong ip being set in the global settings (configuration table) for 'host' in MS. Change that, restart MS and SSVM and see if it solves the issue.
    3. Check the agent status on SSVM- See if the agent is running by typing "service cloud status" in SSVM. Try to run it and see if that's successful or changes the alert status.
  4. To check the state of templates whether is has downloaded or there is an error - Log into DB and check table template_host_ref and observe the download_state and error_string.
  5. Templates stuck in download in progress - Either stop and then start the SSVM. Or, run service cloud restart on the SSVM. You can also restart MS. This would trigger template sync which essentially will try and resume such stuck templates or redo the download of erred out templates
  6. Connection refused as the status for the template - Check whether the config parameter "secstorage.allowed.internal.sites" has been set to allow the internal n/w URL's.
  7. Retrying the download of templates - Try restarting MS / SSVM.
  8. SSVM Logs - /var/log/cloud/cloud.log
  9. SSVM templates physical location - find the mount point by typing command "mount" . Go to the directory and under template/tmpl you will find all the templates.
  10. SSVM Apache server - For 2.2 onwards the system vms are debian based. Type "service apache2 status" to find the status. Apache root is at /www/html/
  11. Run script of java process /usr/local/cloud/systemvm/run.sh
  12. Increasing log level - 1) Edit the file /usr/local/cloud/systemvm/conf/log4j-cloud.xml 2) For the log file cloud.log change the threshold to info:  <param name="Threshold" value="WARN"/>  to  <param name="Threshold" value="INFO"/>  3) Change com.cloud to INFO:  <category name="com.cloud"> <priority value="INFO"/> </category>  If you're not getting sufficient logging, you can also try setting it to  DEBUG.
  13. Download Complete 100% but getting error like this Failed post download script: /usr/sbin/vhd-utilvhd tool check /mnt/SecStorage/33e2e9f5/template/tmpl/345/447/dnld1469110483936142751tmp_ failed - Many reasons for this but amongst them are wrong OS selection, vhd corruption.
    Test this in the lab by copying the template to one of the hosts then on that host run
    vhd-util check -n filename.vhd
    vhd-util scan filename.vhd
  14. SSVM RAM - Set the param secstorage.vm.ram.size to in change the ram size of the vm. Default in the code is 256.
  15. Allow multiple secondary storages feature has been added in the 2.2.x series. This would help in scaling the secondary storages for snapshots. The private templates are copied to one of the secondary storages and public to all of them. The template sync happens only for  public templates.
  16. For each secondary storage there is a corresponding row created in the host table. 
  17. HTTP Server returned 403 (expected 200 OK) - For copy templates.
    Try to see the first log for this template initiation ? It should be logged with DownloadCommand and should have the url of the source ssvm's template. Then you can try going to the destination SSVM and try downloading that url.
    See what issues you get. I would also check the iptable rules to see if the destination ssvm is blocked from accessing the source ssvm and also if there is any .htaccess file in the apache directories forbidding the download of template
    One of the problems as was as follows.
    The problem is that we're using basic networking & have the private network setup with the same gateway & subnet as the public network.  When the storage VM comes up the public network gets setup first but then when the private network comes up on eth2 it clobbers the gateway & sets it to the eth2 interface.  So when the copy is initiated between the storage VMs it happens across the private network but the /var/www/html/copy/.htaccess file only allows the public IP of the other SSVM, thus the 403 errors.

Troubleshooting the Console-proxy VM


If you see the below messages when you launch the Console Viewer and see this error it most likely means that the Console Proxy VM cannot connect from its private interface to port 8250 on the Management Server (or load balanced Management server pool).
Access is denied for console session. Please close the window

Check these things:


  • Load balancer has port 8250 open

  • All Management Servers have port 8250 open

  • There is a network path from the CIDR in the Pod hosting the Console Proxy VM to the load balancer or Management Server

  • The "host" global configuration parameter is set to the load balancer if in use







没有评论:

发表评论