TSM and SAN administration notebook

Wednesday, July 23, 2014

ANS9365E TSM4VE back-up failed visdkCreateVmSnapshotMoRef API return code 67

The errors seen in the dsmerror.log were:
07/22/2014 13:27:59 ANS9365E VMware vStorage API error for virtual machine '<vm hostname>'.
TSM function name : visdkCreateVmSnapshotMoRef
TSM file : vmvisdk.cpp (5416)
API return code : 67
API error message : Another task is already in progress.
07/22/2014 13:27:59 ANS5250E An unexpected error was encountered.
TSM function name : vmVddkFullVMPrePareToOpenVMDKs
TSM function : snapshot targetMoRefP is null
TSM return code : 115
TSM file : ..\..\common\vm\vmbackvddk.cpp (12439)
07/22/2014 13:27:59 ANS1228E Sending of object '<vm hostname>' failed.
07/22/2014 13:27:59 ANS4015E Error processing '<vm hostname>': unexpected TSM error (115)

The problem was at the vm level. We found a hung snap vmtools install preventing the snapshot execution.
Resolution: On the vCenter console we manaully stopped the VMtools install and were ableto manually execute a TSM4VE back-up

Friday, July 11, 2014

ANS1311E Server out of Storage space -- TSM4VE back-ups fail

The Tivoli Storage Manager for Virtual Environments (TSM4VE) is a complex infrastructure to support and trouble-shoot.
Recently, this problem re-occured and so I thought that I'd give back to the TSM community online that has helped me so many times resolve problems.

TSM4VE back-ups can fail for many reasons. If you manuallly run the back-up, then you can retrieve helpful errors OR you cna look at the dsmerror.log. Many times the errors are generic enough to confused admins who have little experience.

Our TSM4VE back-ups were failing with the error message:
TSM4VE back-ups fail ANS1311E Server out of Storage space

Now, TSM4VE back-ups use 2 storagepools: one for data and one for vmware control files.
We use a Falconstor VTL to store data and a diskpool.
So, first check the VTL for scratch volumes by logging into dsadmc:
q libv vlib1
VLIB1        C1128OL4    Scratch                               3,929   LTO
VLIB1        C1128PL4    Scratch                               3,930   LTO
VLIB1        C1128QL4    Scratch                               3,931   LTO
VLIB1        C1128RL4    Scratch                               3,932   LTO
VLIB1        C1128SL4    Scratch                               3,933   LTO
VLIB1        C1128TL4    Scratch                               3,934   LTO
q libv vlib2
VLIB2        C2120BL4    Scratch                               3,628   LTO
VLIB2        C2120CL4    Scratch                               3,629   LTO
VLIB2        C2120DL4    Scratch                               3,630   LTO
VLIB2        C2120EL4    Scratch                               3,631   LTO
VLIB2        C2120FL4    Scratch                               3,632   LTO
VLIB2        C2120GL4    Scratch                               3,633   LTO
Well, have scratches so there is plenty of space for the VMs' data.
Next, we checked to see if we have space for the VMs' control files.
q stgpool VMCTLPOOL
Storage       Device        Estimated     Pct     Pct   High   Low   Next Stora-
Pool Name     Class Name     Capacity    Util    Migr    Mig   Mig   ge Pool
                                                         Pct   Pct
-----------   ----------   ----------   -----   -----   ----   ---   -----------
VMCTLPOOL     DISK            1,418 G    100.00   100.00     99    94

So, there is no space to store new VM control files.
So we created another volume in the VMCTLPOOL.
def volume VMCTLPOOL /vmctl1/stg33.dsm formatsize=42709

After, this completed back-ups stalled started sessions with the TSM server and transferred their back-up.

Wednesday, September 28, 2011

XIV basics -- easy tutorial

So, I was asked to create 3 new volumes and map them to 2 different XIV
hosts. I had no training. I did have a log in to the XIV.

I found:
http://www-03.ibm.com/systems/data/flash/storage/disk/demos/xiv_gui.html

This demo really helped me learn some basics!

Wednesday, September 21, 2011

Expanding a LUN served from the SVC

Around here, this is a task that MUST be performed like at 1:30 in the afternoon on a Friday.
Why? Procrastination. The filespace monitor (whoever that is) decides that 1 minute after they get back from their lunch they will check 'their' server before they go out of town for the weekend to see what the system admin. overlooked.

This chore is easy and straight forward depending on how much more they want,available resources, how many data drives they currently have on the server we are concerned about..

First, ask how much more space they need (not want) and how many data drives they have on that server. If they have only one data drive then the drive name to extend is <Hostname1>.
If they have more than one data drive and it is Windows, then ask the Sys Admin. how large the drive is currently. If it is a VMware server then ask the system admin for the LUN unique ID.
It will look like this: 6005076801908128C0000000000003C0.
You can match it on the SVC with the next command.

Next, log into the SVC. I will be using the hostname Goliath in this explanation for my examples.
Then runto find the disks mapped to the host:
svcinfo lshostvdiskmap Goliath
the output will look like this

id               name              SCSI_id        vdisk_id       vdisk_name        vdisk_UID
35               Goliath           0              195            Goliath0          6005076801908128C0000000000003C0
35               Goliath           1              197            Goliath1          6005076801908128C0000000000003C1
this will show you all the disks mapped to the host.
Determine which of these drives you want to expand. If they have multiple data drives, then find the drive that is the same size that the Windows Sys. admin. told you or the LUN unique id number the VMware admin gave you.

Next, run this command:
svcinfo lsvdisk -filtervalue name=Goliath1
id               name              IO_group_id    IO_group_name     status         mdisk_grp_id   mdisk_grp_name    capacity       type           FC_id             FC_name           RC_id          RC_name           vdisk_UID         fc_map_count   copy_count     fast_write_state
33              Goliath1         1              iog1              online         6              mdg2107_15_300    64.00GB        striped                                                                             6005076801908128C0000000000002D5 0              1              empty
Goliath's managed disk group for Goliath1 (its data drive or LUN) is mdg2107_15_300.

In the following step you are looking to see if there is enough space in the managed disk group to expand that LUN to the desired size.
svcinfo lsmdiskgrp -filtervalue name=mdg2107_15_300
The output will look like this:
id               name              status         mdisk_count    vdisk_count    capacity       extent_size    free_capacity virtual_capacity used_capacity real_capacity overallocation warning
102              mdg2107_15_300         online         30             31             54.6TB         512            5.7TB          48.89TB          48.89TB        48.89TB        89             0

So, in my example under free_capacity I have 5.7TB. So that extra 500GB is no big deal.
had I not enough space then I'd have to do one of two things:
1)Add a mdisk to the managed disk group
or
2) find an unused vdisk (maybe an old flash copy) make sure its not mapped to a host, and
    do a rmvdisk to it.

To expand the LUN by 500 GB execute:
svctask expandvdisksize -size 500 -unit gb Goliath1

where Goliath1 is the disk that needed to be expanded.
Now contact the windows admin. and tell her that she has the
needed space but she needs to expand the drive.

Tuesday, September 20, 2011

SVC Flashing a copy of a boot LUN

Why would you do this operation? Well, if your OS team wants to install patches, but want to be able to fall back to a before-the-patches-were-install state, then you'd flash the boot LUN. In my case the OS people believed that the production boot LUN had some kind of corruption in it. Since the boot LUN is windows and there is no native OS back-up utility, they wanted a copy of the Test box's boot LUN presented to the production server. So, actually this should be titled "Flashing a copy of the boot LUN from test in order to give to prodution."
Well let's get to it!
What is the managed disk group for production? (boot LUN's are named Hostname+0)
svcinfo lsvdisk -filtervalue name=Eastwood0

From the output, find 2 facts: 1) the size of the boot LUN and 2) the name of your managed disk group.
Then check to see if you have enough space in the managed disk group by running:
svcinfo lsmdiskgrp -filtervalue name=mdg1746c_7_2t

If you have enough space, then make a new vdisk on which to place the copy.
From the first command, get the iogrp # and fr the second command get the mdiskgrp #.

To make the new virtual disk, run the following command:
svctask mkvdisk -name Eastwood0New -mdiskgrp 6 -size 64 -unit gb -iogrp 1

Now, to make the flash:
First, get the source drives id number with:
svcinfo lsvdisk -filtervalue name=EastwoodT0
If you want to flash copy the current production's LUN run:
svcinfo lsvdisk -filtervalue name=Eastwood0
with EastwoodT0 being the test server's LUN and
Eastwood0 being the production server's LUN

Then get the destination drive's id (the drive you just made)
svcinfo lsvdisk -filtervalue name=Eastwood0New

Next, map the drives you are mapping from and to:
svctask mkfcmap -source 250 -target 179 -name Eastwood0New
if this svctask works then you will get the message:
"Flashcopy mapping successfully created"

Now, start the flashcopy:
svctask startfcmap -prep Eastwood0new

To check on the progress of how much has been copied run:
svcinfo lsfcmap

Once, the flash is done, then the Windows system admin need to shutdown the windows server before you switch boot LUNS on this server. Once this is down you can unmap the current boot LUN:
svctask rmvdiskhostmap -host Eastwood Eastwood0

Then map the newly made flash of the test boot LUN:
svctask mkvdiskhostmap -host Eastwood Eastwood0New

Then call the system admin to boot that system

Monday, September 19, 2011

Commands to look around the SVC -> svcinfo

When I started SVC admin, I wanted the informational commands that would not get me into trouble
and would just show me how everything is configured. The subcommands of svcinfo fit the bill.

To look at the last commands run to change things:
svcinfo catauditlog

To look at the SVC's error log:
svcinfo caterrlog

To look at a whether a flash copy has completed or what flash copies you have:
svcinfo lsfcmap

Managed disk groups? these are LUNs given to the SVC and placed into
sets called managed disk groups or mdiskgrp. The point of the SVC is to be able to
'carve-up' LUNS of desired sizes and then share them with clients of the svc (hosts).

what managed disk groups are on your SVC?
svcinfo lsmdiskgrp

If you want to get useful infomation of a mdiskgrp:
svcinfo lsmdiskgrp mdg1746a_7_2t

or find all mdiskgrp's that start the same naming scheme:
svcinfo lsmdiskgrp -filtervalue name="mdg*"

Need to see all the vdisks that have been carved from a mdiskgrp?
svcinfo lsvdisk -filtervalue mdisk_grp_name=<name of a mdiskgrp>

Now, for the mdiskgrp and vdisk to be useful they must be mapped to hosts.
To see what host a vdisk is mapped to run:
svcinfo lsvdiskhostmap <vdiskname>
If the vdisk is not mapped to a host then you could map it to a host or remove it.

To see what vdisks are mapped to a host run:
svcinfo lshostvdiskmap <hostname>

These svcinfo commands will give you information only and will not change anything,
so feel free to run them and get familar with you SVC environment.

Here is a great IBM link
http://publib.boulder.ibm.com/infocenter/svcic/v3r1m0/index.jsp?topic=%2Fcom.ibm.storage.svc.console.doc%2Fsvc_informationcomm_21pasg.html
under "Informational Commands" are all the sub-options of svcinfo

It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness

So my work partner (He did SAN; I did TSM) found a great new job with IBM.
Since he is the sole support for his family; I am glad that he found such a great job and jump in pay.
Since I understood little of what he did and am expected to do so, I am afraid.
I was hired to take some of the AIX, TSM, and SAN workload off his shoulders and to back him up.
He was a patient teacher when I was learning TSM admin. His dry wit, compassionate ear of a husband and father who could share my joys and sorrows, being able to discuss our shared Christianity, his thoroughness and quick intelligence, his encyclopedic knowledge of music, movies, Texas cities, and County employees will be missed sorely. He was one of the best work partners that I have ever had the pleasure to work with... (dangling particple and all).

I am being expected to administer the following: IBM SVC, Brocade DSX, Xseries blade center switches, N-series (Netapp) for storage I have IBM's DS8100, 1746, 1726, and three XIV's.
Yes, my employer is seeking to fill this position but I am not optimistic.
After searching high and low for blogs of other professionals and finding only business sponsored blogs with no real technical content, I decided that as I learned administrative commands and concepts that I'd place my notes online. Not just to share but also to have as quick reference with such a variety to manage, I will not be able to memorize it all. When I have learned how to do something, I will post it.