This document will show procedure of cloning a system running Solaris 10 that using ZFS as root file system. The idea is to take out mirror disk from the master server (source) & use it to boot up another server (target server). Then later we fix mirroring on both source & target server.
Suppose we have 1 server named vws01
with Solaris 10 SPARC installed. There are 4 disks on vws01
configured with ZFS file system.
slot#1
:c1t0d0s0
(part ofrpool
)slot#2
:c1t1d0s0
(part ofdatapool
)slot#3
:c1t2t0d0
(part ofrpool
)slot#4
:c1t3t0d0
(part ofdatapool
)
Remember that the device name could be different on your system, just pay attention on the concept and adapt it with your devices name.
We want to create 1 other server name which similar with the vws01
. We want to utilize ZFS mirror disks to cloning this server to create cloned server. So we have some tasks to do to achive that goal, I will split the steps into two big steps :
- Procedure on the source machine (the existing server)
- Procedure on the target machine (the clone server we want to create)
Procedure On The Source Machine
- Make sure all ZFS pools (
rpool
&datapool
) are in healthy state.root@vws01:/# zpool status -x rpool pool 'rpool' is healthy root@vws01:/# zpool status -x datapool pool 'datapool' is healthy root@vws01:/# zpool status pool: datapool state: ONLINE scan: resilvered 32.6G in 0h10m with 0 errors on Thu Apr 28 01:14:45 2011 config: NAME STATE READ WRITE CKSUM datapool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c1t1d0s0 ONLINE 0 0 0 c1t3d0s0 ONLINE 0 0 0 errors: No known data errors pool: rpool state: ONLINE scan: resilvered 32.6G in 0h10m with 0 errors on Thu Apr 28 01:14:45 2011 config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c1t0d0s0 ONLINE 0 0 0 c1t2d0s0 ONLINE 0 0 0 errors: No known data errors root@vws01:/#
- Shutdown the source machine.
root@vws01:/# shutdown –g0 –y –i5
- Take out 1 disk from
rpool
(slot#3
/c1t2d0
) and 1 data disk fromdatapool
(slot#4
/c1t3d0
). Give the label with clear information on it, for example :SOURCE_rpool_slot#3_c1t2d0 SOURCE_datapool_slot#4_c1t3d0
- Boot the server using only 1 mirror leg for
rpool
(c1t0d0
) &datapool
(c1t1d0
). During boot process its normal when we get error message like there :SUNW-MSG-ID: ZFS-8000-D3, TYPE: Fault, VER: 1, SEVERITY: Major EVENT-TIME: Thu Apr 28 01:04:17 GMT 2011 PLATFORM: SUNW,Netra-T5220, CSN: -, HOSTNAME: vws01 SOURCE: zfs-diagnosis, REV: 1.0 EVENT-ID: be9eeb00-66ab-6cf0-e27a-acab436e849d DESC: A ZFS device failed. Refer to http://sun.com/msg/ZFS-8000-D3 for more information. AUTO-RESPONSE: No automated response will occur. IMPACT: Fault tolerance of the pool may be compromised. REC-ACTION: Run 'zpool status -x' and replace the bad device. SUNW-MSG-ID: ZFS-8000-D3, TYPE: Fault, VER: 1, SEVERITY: Major EVENT-TIME: Thu Apr 28 01:04:17 GMT 2011 PLATFORM: SUNW,Netra-T5220, CSN: -, HOSTNAME: vws01 SOURCE: zfs-diagnosis, REV: 1.0 EVENT-ID: fc0231b2-d6a8-608e-de11-afe0ebde7508 DESC: A ZFS device failed. Refer to http://sun.com/msg/ZFS-8000-D3 for more information. AUTO-RESPONSE: No automated response will occur. IMPACT: Fault tolerance of the pool may be compromised. REC-ACTION: Run 'zpool status -x' and replace the bad device.
Those error come from ZFS inform that even that there are some fault on the ZFS pool. We can ignore those error for a while, since both pools still accessible & we intend to fix that broken mirror soon.
- After OS ready check the disk availability and the status of all pools. Solaris will only see 2 disks installed right now. You will see that both of
rpool
anddatapool
are inDEGRADED
state.root@vws01:/# format < /dev/null Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c1t0d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848> /pci@0/pci@0/pci@2/scsi@0/sd@0,0 1. c1t1d0 <SEAGATE-ST930003SSUN300G-0868-279.40GB> /pci@0/pci@0/pci@2/scsi@0/sd@1,0 Specify disk (enter its number): root@vws01:/# root@vws01:/# zfs status pool: datapool state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-2Q scrub: none requested config: NAME STATE READ WRITE CKSUM datapool DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 c1t1d0s0 ONLINE 0 0 0 c1t3d0s0 UNAVAIL 0 0 0 cannot open errors: No known data errors pool: rpool state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-2Q scrub: none requested config: NAME STATE READ WRITE CKSUM rpool DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 c1t0d0s0 ONLINE 0 0 0 c1t2d0s0 UNAVAIL 0 0 0 cannot open errors: No known data errors root@vws01:/#
- Plug in replacement disks (the disks must be same size with the previous one), one disk goes to
slot#3
(asc1t2d0
) and the other goes toslot#4
(asc1t3d0
) slot. You can plug in the disk when server running if your server support hot plug disk controller. If don’t, then please shutdown the server before insert the replacement disks. - Assume your system support hot plug disks, the next step is detach the broken mirror leg. First we need to detach the broken disk from the
rpool
.root@vws01:/# zpool detach rpool c1t2d0s0
After detach the missing root mirror,
rpool
now on ONLINE status again.root@vws01:/# zpool status pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 c1t0d0s0 ONLINE 0 0 0 errors: No known data errors root@vws01:/#
- Repeat the detach process for
datapool
.root@vws01:/# zpool detach datapool c1t3d0s0
After detach the missing data mirror,
datapool
now on ONLINE status again.root@vws01:/# zpool status datapool pool: datapool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM datapool ONLINE 0 0 0 c1t1d0s0 ONLINE 0 0 0 errors: No known data errors root@vws01:/#
- Before attach replacement disk to
rpool
, we need to prepare the replacement disk (c1t2d0
) first. We need to label the disk first (this step verified on the SPARC system, but should be applicable on x86 system too).root@vws01# format -e c1t2d0 selecting c1t2d0 [disk formatted] FORMAT MENU: disk - select a disk type - select (define) a disk type partition - select (define) a partition table current - describe the current disk format - format and analyze the disk repair - repair a defective sector label - write label to the disk analyze - surface analysis defect - defect list management backup - search for backup labels verify - read and display labels save - save new disk/partition definitions inquiry - show vendor, product and revision scsi - independent SCSI mode selects cache - enable, disable or query SCSI disk cache volname - set 8-character volume name !<cmd> - execute <cmd>, then return quit format> label [0] SMI Label [1] EFI Label Specify Label type[0]: 0 Ready to label disk, continue? yes format> quit root@vws01#
- Then we need to copy the partition table from the online root disk (
c1t0d0
) to the replacement disk (c1t2d0
) :root@vws01:/# prtvtoc /dev/rdsk/c1t0d0s2 | fmthard -s - /dev/rdsk/c1t2d0s2 fmthard: New volume table of contents now in place. root@vws01:/#
- Attach the replacement disk (
c1t2d0
) to therpool
.root@vws01:/# zpool attach rpool c1t0d0s0 c1t2d0s0 Please be sure to invoke installboot(1M) to make 'c1t2d0s0' bootable. Make sure to wait until resilver is done before rebooting. root@vws01:/#
Pay attention on the format of
zfs attach
command :zpool attach <pool_name> <online_disk> <new_disk>
- After attached the replacement disk to
rpool
then we can seerpool
is in the resilver (resynchronization) status.root@vws01:/# zfs status rpool pool: rpool state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress for 0h0m, 1.90% done, 0h47m to go config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c1t0d0s0 ONLINE 0 0 0 c1t2d0s0 ONLINE 0 0 0 1010M resilvered errors: No known data errors root@vws01:/#
- Same as the step #9, we also need to prepare the replacement disk (
c1t3d0
) before attaching it to thedatapool
. We need to label the disk first (this step verified on the SPARC system, but should be applicable on x86 system too).root@vws01# format -e c1t3d0 selecting c1t3d0 [disk formatted] FORMAT MENU: disk - select a disk type - select (define) a disk type partition - select (define) a partition table current - describe the current disk format - format and analyze the disk repair - repair a defective sector label - write label to the disk analyze - surface analysis defect - defect list management backup - search for backup labels verify - read and display labels save - save new disk/partition definitions inquiry - show vendor, product and revision scsi - independent SCSI mode selects cache - enable, disable or query SCSI disk cache volname - set 8-character volume name !<cmd> - execute <cmd>, then return quit format> label [0] SMI Label [1] EFI Label Specify Label type[0]: 0 Ready to label disk, continue? yes format> quit root@vws01#
- Then we need to copy the partition table from the ONLINE data disk (
c1t1d0
) to the replacement disk (c1t3d0
) :root@vws01:/# prtvtoc /dev/rdsk/c1t1d0s2 | fmthard -s - /dev/rdsk/c1t3d0s2 fmthard: New volume table of contents now in place. root@vws01:/#
- Attach the replacement disk (
c1t3d0
) to thedatapool
.root@vws01:/# zpool attach datapool c1t1d0s0 c1t3d0s0
- After attached the replacement disk to
datapool
, we can seedatapool
is in the resilver (resynchronization) status.root@vws01:/# zpool status pool: datapool state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress for 0h0m, 2.52% done, 0h9m to go config: NAME STATE READ WRITE CKSUM datapool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c1t1d0s0 ONLINE 0 0 0 c1t3d0s0 ONLINE 0 0 0 760M resilvered errors: No known data errors root@vws01:/#
- After resilver process on the
rpool
completed, we need to reinstall boot block on the new disk (c1t2d0
). This is crucial step otherwise we can’t usec1t2d0
as boot media even if it contains Solaris OS installation files. For SPARC system, we reinstall boot block on the new root mirror usinginstallboot
command like this :root@vws01:/# installboot -F zfs /usr/platform/`uname -i`/lib/fs/zfs/bootblk /dev/rdsk/c1t2d0s0
In x86 system we do it using
installgrub
command like shown below :/# installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c1d0s0 stage1 written to partition 0 sector 0 (abs 16065) stage2 written to partition 0, 273 sectors starting at 50 (abs 16115) /#
Procedure On The Target Machine
This procedure executed on the new server (the clone target). We will boot this new server using 2 disk we take from the source machine vws01
.
- Assume that target machine is on powered off state (if doesn’t please shut it down first).
- Remove all attached disks from the target machine.
- Unplug all network cables attached to target machine. This step is important if target machine located in the same network segment. Because we use the disk from
vws01
then when it boot up the server will try to use the same IP address as “original”vws01
. - Plug in the disk we got from the source machine (as explained on step #3 in the last section).
SOURCE_rpool_slot#3_c1t2d0 SOURCE_datapool_slot#4_c1t3d0
- With only 2 disk attached, boot the server. Remember to set the system to boot from slot #3. Otherwise system will failed to boot since it look at default disk on slot #1.
- When OS ready check disk availability & status of all ZFS pools.
root@vws01:/# format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c1t2d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848> /pci@0/pci@0/pci@2/scsi@0/sd@2,0 1. c1t3d0 <SEAGATE-ST930003SSUN300G-0868-279.40GB> /pci@0/pci@0/pci@2/scsi@0/sd@3,0 Specify disk (enter its number): ^D root@vws01:/# root@vws01:/# zpool status pool: rpool state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-2Q scrub: none requested config: NAME STATE READ WRITE CKSUM rpool DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 c1t0d0s0 UNAVAIL 0 0 0 cannot open c1t2d0s0 ONLINE 0 0 0 errors: No known data errors root@vws01:/#
In case we only see
rpool
then we must import the other pool before proceed to the next steps.root@vws01:/# zpool import pool: datapool id: 1782173212883254850 state: DEGRADED status: The pool was last accessed by another system. action: The pool can be imported despite missing or damaged devices. The fault tolerance of the pool may be compromised if imported. see: http://www.sun.com/msg/ZFS-8000-EY config: datapool DEGRADED mirror-0 DEGRADED c1t1d0s0 UNAVAIL cannot open c1t3d0s0 ONLINE root@vws01:/# root@vws01:/# zpool import -f datapool root@vws01:/# root@vws01:/# zpool status datapool pool: datapool state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-2Q scrub: none requested config: NAME STATE READ WRITE CKSUM datapool DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 10987259943749998344 UNAVAIL 0 0 0 was /dev/dsk/c1t1d0s0 c1t3d0s0 ONLINE 0 0 0 errors: No known data errors root@vws01:/#
- The next step is we need to detach unavailable disk from the
rpool
.root@vws01:/# zpool detach rpool c1t0d0s0
After detach the missing root mirror,
rpool
now on ONLINE status again.root@vws01:/# zpool status -x rpool pool 'rpool' is healthy root@vws01:/# zpool status rpool pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 c1t2d0s0 ONLINE 0 0 0 errors: No known data errors root@vws01:/#
- Repeat the detach process for unavailable disk on
datapool
.root@vws01:/# zpool detach datapool /dev/dsk/c1t1d0s0
After detach the missing data mirror,
datapool
now on ONLINE status again.root@vws01:/# zpool status -x datapool pool 'datapool' is healthy root@vws01:/# zpool status datapool pool: datapool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM datapool ONLINE 0 0 0 c1t3d0s0 ONLINE 0 0 0 errors: No known data errors root@vws01:/#
- At this stage we probably want to change the hostname & modify IP address for this new machine before continue fix the mirroring. In this example the new machine will use hostname
vws02
. To change the hostname & IP address we need to edit several files :/etc/hosts
/etc/hostname.*
/etc/nodename
/etc/netmasks
/etc/defaultrouter
We can use the following command to change the hostname :
root@vws01:/# find /etc -name "hosts" -exec perl -pi -e 's/vws01/vws02/g' {} \; root@vws01:/# find /etc -name "nodename" -exec perl -pi -e 's/vws01/vws02/g' {} \; root@vws01:/# find /etc -name "hostname*" -exec perl -pi -e 's/vws01/vws02/g' {} \;
Reboot the server so it can pick up the new hostname.
root@vws01:/# shutdown -g0 -y -i6
- Once the server reboot successfully, insert the replacement disks on slot#1 and slot#2. Invoke
devfsadm
command to let Solaris detect the new disks.root@vws02:/# devfsadm root@vws02:/# format < /dev/null Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c1t0d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848> /pci@0/pci@0/pci@2/scsi@0/sd@0,0 1. c1t1d0 <SUN300G cyl 46873 alt 2 hd 20 sec 625> /pci@0/pci@0/pci@2/scsi@0/sd@1,0 2. c1t2d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848> /pci@0/pci@0/pci@2/scsi@0/sd@2,0 3. c1t3d0 <SEAGATE-ST930003SSUN300G-0868-279.40GB> /pci@0/pci@0/pci@2/scsi@0/sd@3,0 Specify disk (enter its number): root@vws02:/#
See that the new disks (
c1t0d0
&c1t1d0
) already recognized by Solaris. - Before fixing mirroring on
rpool
we need to prepare the disk first (this step verified on the SPARC system, but should be applicable on x86 system too).root@vws02# format -e c1t0d0 selecting c1t0d0 [disk formatted] FORMAT MENU: disk - select a disk type - select (define) a disk type partition - select (define) a partition table current - describe the current disk format - format and analyze the disk repair - repair a defective sector label - write label to the disk analyze - surface analysis defect - defect list management backup - search for backup labels verify - read and display labels save - save new disk/partition definitions inquiry - show vendor, product and revision scsi - independent SCSI mode selects cache - enable, disable or query SCSI disk cache volname - set 8-character volume name !<cmd> - execute <cmd>, then return quit format> label [0] SMI Label [1] EFI Label Specify Label type[0]: 0 Ready to label disk, continue? yes format> quit root@vws02#
Then we need to copy the partition table from the ONLINE root disk (
c1t2d0
) to the replacement disk (c1t0d0
) :root@vws02:/# prtvtoc /dev/rdsk/c1t2d0s2 | fmthard -s - /dev/rdsk/c1t0d0s2 fmthard: New volume table of contents now in place. root@vws02:/#
- Now we can attach replacement disk to
rpool
.root@vws02:/# zpool attach rpool c1t2d0s0 c1t0d0s0 Please be sure to invoke installboot(1M) to make 'c1t0d0s0' bootable. Make sure to wait until resilver is done before rebooting. root@vws02:/#
- After attached replacement disk to
rpool
, check thatrpool
now on resilver process.root@vws02:/# zpool status rpool pool: rpool state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress for 0h0m, 6.60% done, 0h12m to go config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c1t2d0s0 ONLINE 0 0 0 c1t0d0s0 ONLINE 0 0 0 3.43G resilvered errors: No known data errors root@vws02:/#
- Then we can continue to fix mirroring on
datapool
. But before that we need to prepare the replacement disk first (c1t1d0
).root@vws02# format -e c1t1d0 selecting c1t0d0 [disk formatted] FORMAT MENU: disk - select a disk type - select (define) a disk type partition - select (define) a partition table current - describe the current disk format - format and analyze the disk repair - repair a defective sector label - write label to the disk analyze - surface analysis defect - defect list management backup - search for backup labels verify - read and display labels save - save new disk/partition definitions inquiry - show vendor, product and revision scsi - independent SCSI mode selects cache - enable, disable or query SCSI disk cache volname - set 8-character volume name !<cmd> - execute <cmd>, then return quit format> label [0] SMI Label [1] EFI Label Specify Label type[0]: 0 Ready to label disk, continue? yes format> quit root@vws02#
Then we need to copy the partition table from the ONLINE data disk (
c1t3d0
) to the replacement disk (c1t1d0
) :root@vws02:/# prtvtoc /dev/rdsk/c1t3d0s2 | fmthard -s - /dev/rdsk/c1t1d0s2 fmthard: New volume table of contents now in place. root@vws02:/#
- Now we can attach replacement disk to
datapool
.root@vws02:/# zpool attach datapool c1t3d0s0 c1t1d0s0
- After attached replacement disk to
datapool
, check thatdatapool
now on resilver process.root@vws02:/# zpool status datapool pool: datapool state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress for 0h0m, 5.28% done, 0h6m to go config: NAME STATE READ WRITE CKSUM datapool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c1t3d0s0 ONLINE 0 0 0 c1t1d0s0 ONLINE 0 0 0 1.56G resilvered errors: No known data errors root@vws02:/#
- After resilver process on the
rpool
finished, we need to reinstall boot block on the new disk (c1t0d0s0
). This is crucial step otherwise we can’t usec1t0d0
as boot media even if it contains Solaris OS installation files. For SPARC system, we reinstall boot block on the new root mirror usinginstallboot
command like this :root@vws02:/# installboot -F zfs /usr/platform/`uname -i`/lib/fs/zfs/bootblk /dev/rdsk/c1t0d0s0
In x86 system we do it using
installgrub
command like shown below :# installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c1t0d0s0 stage1 written to partition 0 sector 0 (abs 16065) stage2 written to partition 0, 273 sectors starting at 50 (abs 16115) #
- Reboot the target machine once more to wrap everything completely.
root@vws02:/# shutdown -g0 -y -i6
- Plug in all network cables and check the network connectivity.
Cloning Solaris 10 Server With ZFS Mirror Disk by Tedy Tirtawidjaja