Let’s fuckup the cluster!
1 | [root@rico1 ~]# dd if=/dev/zero of=/dev/asm_grid bs=1M count=256 |
2 | 256+0 przeczytanych recordów |
3 | 256+0 zapisanych recordów |
4 | skopiowane 268435456 bajtów (268 MB), 0,521837 s, 514 MB/s |
Of course after this operation, the final state of the processes can look like this:
1 | [root@rico1 ~]# crsctl stat res -t -init |
2 | -------------------------------------------------------------------------------- |
3 | Name Target State Server State details |
4 | -------------------------------------------------------------------------------- |
6 | -------------------------------------------------------------------------------- |
8 | 1 ONLINE OFFLINE Instance Shutdown,ST |
10 | ora.cluster_interconnect.haip |
11 | 1 ONLINE OFFLINE STABLE |
13 | 1 ONLINE OFFLINE STABLE |
15 | 1 ONLINE OFFLINE STABLE |
17 | 1 ONLINE OFFLINE rico1 STARTING |
19 | 1 ONLINE ONLINE rico1 STABLE |
21 | 1 ONLINE OFFLINE STABLE |
23 | 1 OFFLINE OFFLINE STABLE |
25 | 1 ONLINE INTERMEDIATE rico1 STABLE |
27 | 1 ONLINE ONLINE rico1 STABLE |
29 | 1 ONLINE ONLINE rico1 STABLE |
31 | 1 ONLINE ONLINE rico1 STABLE |
33 | 1 ONLINE OFFLINE STABLE |
34 | -------------------------------------------------------------------------------- |
The cssd service will not be able to start, because there are no voting disks:
1 | [root@rico1 ~]# tail -10 /u01/app/oracle/diag/crs/rico1/crs/trace/ocssd.trc |
2 | 2016-06-10 10:28:32.331227 : CSSD:990865152: clssnmvDiskVerify: Successful discovery of 0 disks |
3 | 2016-06-10 10:28:32.331229 : CSSD:990865152: clssnmCompleteInitVFDiscovery: Completing initial voting file discovery |
4 | 2016-06-10 10:28:32.331231 : CSSD:990865152: clssnmvFindInitialConfigs: No voting files found |
5 | 2016-06-10 10:28:32.331302 : CSSD:990865152: (:CSSNM00070:)clssnmCompleteInitVFDiscovery: Voting file not found. Retrying discovery in 15 seconds |
6 | 2016-06-10 10:28:33.270863 : CSSD:1279616768: clsssc_CLSFAInit_CB: System not ready for CLSFA initialization |
7 | 2016-06-10 10:28:33.270876 : CSSD:1279616768: clsssc_CLSFAInit_CB: clsfa fencing not ready yet |
8 | 2016-06-10 10:28:34.271252 : CSSD:1279616768: clsssc_CLSFAInit_CB: System not ready for CLSFA initialization |
OK, so let’s try to stop the cluster services:
1 | [root@rico1 ~]# crsctl stop crs -f |
2 | CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rico1' |
3 | CRS-2673: Attempting to stop 'ora.mdnsd' on 'rico1' |
4 | CRS-2677: Stop of 'ora.mdnsd' on 'rico1' succeeded |
5 | CRS-2673: Attempting to stop 'ora.gipcd' on 'rico1' |
6 | CRS-2673: Attempting to stop 'ora.evmd' on 'rico1' |
7 | CRS-2673: Attempting to stop 'ora.gpnpd' on 'rico1' |
8 | CRS-2677: Stop of 'ora.gipcd' on 'rico1' succeeded |
9 | CRS-2677: Stop of 'ora.evmd' on 'rico1' succeeded |
10 | CRS-2677: Stop of 'ora.gpnpd' on 'rico1' succeeded |
11 | CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rico1' has completed |
12 | CRS-4133: Oracle High Availability Services has been stopped. |
Now we will have to start CRS in exclusive mode and start fixing stuff:
1 | [root@rico1 ~]# crsctl start crs -excl -nocrs |
2 | CRS-4123: Oracle High Availability Services has been started. |
3 | CRS-2672: Attempting to start 'ora.evmd' on 'rico1' |
4 | CRS-2672: Attempting to start 'ora.mdnsd' on 'rico1' |
5 | CRS-2676: Start of 'ora.mdnsd' on 'rico1' succeeded |
6 | CRS-2676: Start of 'ora.evmd' on 'rico1' succeeded |
7 | CRS-2672: Attempting to start 'ora.gpnpd' on 'rico1' |
8 | CRS-2676: Start of 'ora.gpnpd' on 'rico1' succeeded |
9 | CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rico1' |
10 | CRS-2672: Attempting to start 'ora.gipcd' on 'rico1' |
11 | CRS-2676: Start of 'ora.cssdmonitor' on 'rico1' succeeded |
12 | CRS-2676: Start of 'ora.gipcd' on 'rico1' succeeded |
13 | CRS-2672: Attempting to start 'ora.cssd' on 'rico1' |
14 | CRS-2672: Attempting to start 'ora.diskmon' on 'rico1' |
15 | CRS-2676: Start of 'ora.diskmon' on 'rico1' succeeded |
16 | CRS-2676: Start of 'ora.cssd' on 'rico1' succeeded |
17 | CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rico1' |
18 | CRS-2672: Attempting to start 'ora.ctssd' on 'rico1' |
19 | CRS-2676: Start of 'ora.ctssd' on 'rico1' succeeded |
20 | CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rico1' succeeded |
21 | CRS-2672: Attempting to start 'ora.asm' on 'rico1' |
22 | CRS-2676: Start of 'ora.asm' on 'rico1' succeeded |
Of course in this situation KFED will not be helpful 🙂
1 | [root@rico1 ~]# kfed repair /dev/asm_grid |
2 | KFED-00320: Invalid block num1 = [0], num2 = [1], error = [endian_kfbh] |
So now we have to recreate diskgroup GRID and ASM spfile and passwordfile:
2 | 2 set asm_diskstring='/dev/asm*'; |
8 | 2* set asm_diskgroups='GRID','DATA' |
14 | 1* select path, header_status from v$asm_disk |
18 | ------------------------------ ------------ |
20 | /dev/asm_grid CANDIDATE |
Let’s create back our GRID diskgroup:
4 | 1 create diskgroup grid |
7 | 4 attribute 'compatible.asm'='12.1.0.2', |
8 | 5* 'compatible.rdbms'='12.1.0.2' |
Now we have to recreate SPFILE for ASM. First step will be creating a simple pfile:
1 | [oracle@rico1 ~]$ cd $ORACLE_HOME/dbs |
2 | [oracle@rico1 dbs]$ vim init+ASM1.ora |
3 | [oracle@rico1 dbs]$ cat !$ |
7 | *.asm_diskstring='/dev/asm*' |
Next we can create spfile:
1 | SQL> create spfile='+GRID' from pfile; |
And passwordfile:
1 | [oracle@rico1 dbs]$ orapwd file=+GRID password=oracle asm=yes |
We are now ready to restore the OCR file – remember to restore the newest one:
1 | [root@rico1 ~]# ocrconfig -restore /u01/app/12.1.0/grid/cdata/rico-cluster/backup_20160610_101746.ocr |
2 | [root@rico1 ~]# ocrcheck |
3 | Status of Oracle Cluster Registry is as follows : |
5 | Total space (kbytes) : 409568 |
6 | Used space (kbytes) : 1460 |
7 | Available space (kbytes) : 408108 |
9 | Device/File Name : +GRID |
10 | Device/File integrity check succeeded |
12 | Device/File not configured |
14 | Device/File not configured |
16 | Device/File not configured |
18 | Device/File not configured |
20 | Cluster registry integrity check succeeded |
22 | Logical corruption check succeeded |
Now we can create new voting disk:
1 | [root@rico1 ~]# crsctl query css votedisk |
2 | Located 0 voting disk(s). |
3 | [root@rico1 ~]# crsctl replace votedisk +GRID |
4 | Successful addition of voting disk 62a6bea00e4e4f01bf3ed09c345eedba. |
5 | Successfully replaced voting disk group with +GRID. |
6 | CRS-4266: Voting file(s) successfully replaced |
7 | [root@rico1 ~]# crsctl query css votedisk |
8 | ## STATE File Universal Id File Name Disk group |
9 | -- ----- ----------------- --------- --------- |
10 | 1. ONLINE 62a6bea00e4e4f01bf3ed09c345eedba (/dev/asm_grid) [GRID] |
11 | Located 1 voting disk(s). |
So it seems, that everything looks fine. It’s time to stop CRS
1 | [root@rico1 ~]# crsctl stop crs |
2 | CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rico1' |
3 | CRS-2673: Attempting to stop 'ora.evmd' on 'rico1' |
4 | CRS-2673: Attempting to stop 'ora.ctssd' on 'rico1' |
5 | CRS-2673: Attempting to stop 'ora.mdnsd' on 'rico1' |
6 | CRS-2673: Attempting to stop 'ora.gpnpd' on 'rico1' |
7 | CRS-2677: Stop of 'ora.evmd' on 'rico1' succeeded |
8 | CRS-2677: Stop of 'ora.ctssd' on 'rico1' succeeded |
9 | CRS-2673: Attempting to stop 'ora.asm' on 'rico1' |
10 | CRS-2677: Stop of 'ora.mdnsd' on 'rico1' succeeded |
11 | CRS-2677: Stop of 'ora.gpnpd' on 'rico1' succeeded |
12 | CRS-2677: Stop of 'ora.asm' on 'rico1' succeeded |
13 | CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'rico1' |
14 | CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'rico1' succeeded |
15 | CRS-2673: Attempting to stop 'ora.cssd' on 'rico1' |
16 | CRS-2677: Stop of 'ora.cssd' on 'rico1' succeeded |
17 | CRS-2673: Attempting to stop 'ora.gipcd' on 'rico1' |
18 | CRS-2677: Stop of 'ora.gipcd' on 'rico1' succeeded |
19 | CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rico1' has completed |
20 | CRS-4133: Oracle High Availability Services has been stopped. |
And start it in normal mode
1 | [root@rico1 ~]# crsctl start crs |
2 | CRS-4123: Oracle High Availability Services has been started. |
And we’re done 🙂
1 | [root@rico1 ~]# crsctl stat res -t |
2 | -------------------------------------------------------------------------------- |
3 | Name Target State Server State details |
4 | -------------------------------------------------------------------------------- |
6 | -------------------------------------------------------------------------------- |
8 | ONLINE ONLINE rico1 STABLE |
9 | ONLINE ONLINE rico2 STABLE |
11 | ONLINE ONLINE rico1 STABLE |
12 | ONLINE ONLINE rico2 STABLE |
14 | ONLINE ONLINE rico1 STABLE |
15 | ONLINE ONLINE rico2 STABLE |
17 | ONLINE ONLINE rico1 Started,STABLE |
18 | ONLINE ONLINE rico2 Started,STABLE |
20 | ONLINE ONLINE rico1 STABLE |
21 | ONLINE ONLINE rico2 STABLE |
23 | ONLINE ONLINE rico1 STABLE |
24 | ONLINE ONLINE rico2 STABLE |
25 | -------------------------------------------------------------------------------- |
27 | -------------------------------------------------------------------------------- |
28 | ora.LISTENER_SCAN1.lsnr |
29 | 1 ONLINE ONLINE rico2 STABLE |
30 | ora.LISTENER_SCAN2.lsnr |
31 | 1 ONLINE ONLINE rico1 STABLE |
32 | ora.LISTENER_SCAN3.lsnr |
33 | 1 ONLINE ONLINE rico1 STABLE |
35 | 1 ONLINE ONLINE rico2 169.254.26.98 10.0.0 |
38 | 1 ONLINE ONLINE rico1 STABLE |
40 | 1 ONLINE ONLINE rico1 Open,STABLE |
41 | 2 ONLINE ONLINE rico2 Open,STABLE |
43 | 1 ONLINE OFFLINE STABLE |
45 | 1 ONLINE ONLINE rico1 STABLE |
47 | 1 ONLINE ONLINE rico1 STABLE |
49 | 1 ONLINE ONLINE rico2 STABLE |
51 | 1 ONLINE ONLINE rico2 STABLE |
53 | 1 ONLINE ONLINE rico1 STABLE |
55 | 1 ONLINE ONLINE rico1 STABLE |
56 | -------------------------------------------------------------------------------- |
The last step would be to recreate -MGMTDB:
1 | [root@rico1 ~]# srvctl remove mgmtdb |
2 | Remove the database _mgmtdb? (y/[n]) y |
3 | [root@rico1 ~]# su - oracle |
4 | [oracle@rico1 ~]$ . oraenv |
5 | ORACLE_SID = [oracle] ? +ASM1 |
6 | The Oracle base has been set to /u01/app/oracle |
7 | (reverse-i-search)`': ^C |
8 | [oracle@rico1 ~]$ export GI_HOME=$ORACLE_HOME |
9 | [oracle@rico1 ~]$ dbca -silent -createDatabase -sid -MGMTDB -createAsContainerDatabase true -templateName MGMTSeed_Database.dbc -gdbName _mgmtdb -storageType ASM -diskGroupName +grid -datafileJarLocation $GI_HOME/assistants/dbca/templates -characterset AL32UTF8 -autoGeneratePasswords -skipUserTemplateCheck |
10 | Registering database with Oracle Grid Infrastructure |
19 | Creating and starting Oracle instance |
28 | Completing Database Creation |
33 | Look at the log file "/u01/app/oracle/cfgtoollogs/dbca/_mgmtdb/_mgmtdb3.log" for further details. |
I hope you won’t have to use this procedure in real life 🙂