ORACLE RAC 12c: restore OCR and VOTEDISK


14.06.2016
by Kamil Stawiarski

Let’s fuckup the cluster!

1[root@rico1 ~]# dd if=/dev/zero of=/dev/asm_grid bs=1M count=256
2256+0 przeczytanych recordów
3256+0 zapisanych recordów
4skopiowane 268435456 bajtów (268 MB), 0,521837 s, 514 MB/s

Of course after this operation, the final state of the processes can look like this:

1[root@rico1 ~]# crsctl stat res -t -init
2--------------------------------------------------------------------------------
3Name           Target  State        Server                   State details
4--------------------------------------------------------------------------------
5Cluster Resources
6--------------------------------------------------------------------------------
7ora.asm
8      1        ONLINE  OFFLINE                               Instance Shutdown,ST
9                                                             ABLE
10ora.cluster_interconnect.haip
11      1        ONLINE  OFFLINE                               STABLE
12ora.crf
13      1        ONLINE  OFFLINE                               STABLE
14ora.crsd
15      1        ONLINE  OFFLINE                               STABLE
16ora.cssd
17      1        ONLINE  OFFLINE      rico1                    STARTING
18ora.cssdmonitor
19      1        ONLINE  ONLINE       rico1                    STABLE
20ora.ctssd
21      1        ONLINE  OFFLINE                               STABLE
22ora.diskmon
23      1        OFFLINE OFFLINE                               STABLE
24ora.evmd
25      1        ONLINE  INTERMEDIATE rico1                    STABLE
26ora.gipcd
27      1        ONLINE  ONLINE       rico1                    STABLE
28ora.gpnpd
29      1        ONLINE  ONLINE       rico1                    STABLE
30ora.mdnsd
31      1        ONLINE  ONLINE       rico1                    STABLE
32ora.storage
33      1        ONLINE  OFFLINE                               STABLE
34--------------------------------------------------------------------------------

The cssd service will not be able to start, because there are no voting disks:

1[root@rico1 ~]# tail -10 /u01/app/oracle/diag/crs/rico1/crs/trace/ocssd.trc
22016-06-10 10:28:32.331227 :    CSSD:990865152: clssnmvDiskVerify: Successful discovery of 0 disks
32016-06-10 10:28:32.331229 :    CSSD:990865152: clssnmCompleteInitVFDiscovery: Completing initial voting file discovery
42016-06-10 10:28:32.331231 :    CSSD:990865152: clssnmvFindInitialConfigs: No voting files found
52016-06-10 10:28:32.331302 :    CSSD:990865152: (:CSSNM00070:)clssnmCompleteInitVFDiscovery: Voting file not found. Retrying discovery in 15 seconds
62016-06-10 10:28:33.270863 :    CSSD:1279616768: clsssc_CLSFAInit_CB: System not ready for CLSFA initialization
72016-06-10 10:28:33.270876 :    CSSD:1279616768: clsssc_CLSFAInit_CB: clsfa fencing not ready yet
82016-06-10 10:28:34.271252 :    CSSD:1279616768: clsssc_CLSFAInit_CB: System not ready for CLSFA initialization

OK, so let’s try to stop the cluster services:

1[root@rico1 ~]# crsctl stop crs -f
2CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rico1'
3CRS-2673: Attempting to stop 'ora.mdnsd' on 'rico1'
4CRS-2677: Stop of 'ora.mdnsd' on 'rico1' succeeded
5CRS-2673: Attempting to stop 'ora.gipcd' on 'rico1'
6CRS-2673: Attempting to stop 'ora.evmd' on 'rico1'
7CRS-2673: Attempting to stop 'ora.gpnpd' on 'rico1'
8CRS-2677: Stop of 'ora.gipcd' on 'rico1' succeeded
9CRS-2677: Stop of 'ora.evmd' on 'rico1' succeeded
10CRS-2677: Stop of 'ora.gpnpd' on 'rico1' succeeded
11CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rico1' has completed
12CRS-4133: Oracle High Availability Services has been stopped.

Now we will have to start CRS in exclusive mode and start fixing stuff:

1[root@rico1 ~]# crsctl start crs -excl -nocrs
2CRS-4123: Oracle High Availability Services has been started.
3CRS-2672: Attempting to start 'ora.evmd' on 'rico1'
4CRS-2672: Attempting to start 'ora.mdnsd' on 'rico1'
5CRS-2676: Start of 'ora.mdnsd' on 'rico1' succeeded
6CRS-2676: Start of 'ora.evmd' on 'rico1' succeeded
7CRS-2672: Attempting to start 'ora.gpnpd' on 'rico1'
8CRS-2676: Start of 'ora.gpnpd' on 'rico1' succeeded
9CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rico1'
10CRS-2672: Attempting to start 'ora.gipcd' on 'rico1'
11CRS-2676: Start of 'ora.cssdmonitor' on 'rico1' succeeded
12CRS-2676: Start of 'ora.gipcd' on 'rico1' succeeded
13CRS-2672: Attempting to start 'ora.cssd' on 'rico1'
14CRS-2672: Attempting to start 'ora.diskmon' on 'rico1'
15CRS-2676: Start of 'ora.diskmon' on 'rico1' succeeded
16CRS-2676: Start of 'ora.cssd' on 'rico1' succeeded
17CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rico1'
18CRS-2672: Attempting to start 'ora.ctssd' on 'rico1'
19CRS-2676: Start of 'ora.ctssd' on 'rico1' succeeded
20CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rico1' succeeded
21CRS-2672: Attempting to start 'ora.asm' on 'rico1'
22CRS-2676: Start of 'ora.asm' on 'rico1' succeeded

Of course in this situation KFED will not be helpful 🙂

1[root@rico1 ~]# kfed repair /dev/asm_grid
2KFED-00320: Invalid block num1 = [0], num2 = [1], error = [endian_kfbh]

So now we have to recreate diskgroup GRID and ASM spfile and passwordfile:

1SQL> alter system
2  2  set asm_diskstring='/dev/asm*';
3 
4SQL> ed
5Wrote file afiedt.buf
6 
7  1  alter system
8  2* set asm_diskgroups='GRID','DATA'
9SQL> /
10 
11System altered.
12 
13SQL> ;
14  1* select path, header_status from v$asm_disk
15SQL> /
16 
17PATH                   HEADER_STATU
18------------------------------ ------------
19/dev/asm_data              MEMBER
20/dev/asm_grid              CANDIDATE

Let’s create back our GRID diskgroup:

1SQL> ed
2Wrote file afiedt.buf
3 
4  1  create diskgroup grid
5  2  external redundancy
6  3  disk '/dev/asm_grid'
7  4  attribute 'compatible.asm'='12.1.0.2',
8  5*        'compatible.rdbms'='12.1.0.2'
9SQL> /
10 
11Diskgroup created.

Now we have to recreate SPFILE for ASM. First step will be creating a simple pfile:

1[oracle@rico1 ~]$ cd $ORACLE_HOME/dbs
2[oracle@rico1 dbs]$ vim init+ASM1.ora
3[oracle@rico1 dbs]$ cat !$
4cat init+ASM1.ora
5*.asm_diskgroups='GRID'
6*.asm_diskgroups='DATA'
7*.asm_diskstring='/dev/asm*'

Next we can create spfile:

1SQL> create spfile='+GRID' from pfile;
2 
3File created.
4 
5SQL> !rm init+ASM1.ora

And passwordfile:

1[oracle@rico1 dbs]$ orapwd file=+GRID password=oracle asm=yes

We are now ready to restore the OCR file – remember to restore the newest one:

1[root@rico1 ~]# ocrconfig -restore /u01/app/12.1.0/grid/cdata/rico-cluster/backup_20160610_101746.ocr
2[root@rico1 ~]# ocrcheck
3Status of Oracle Cluster Registry is as follows :
4     Version                  :          4
5     Total space (kbytes)     :     409568
6     Used space (kbytes)      :       1460
7     Available space (kbytes) :     408108
8     ID                       :  115130541
9     Device/File Name         :      +GRID
10                                    Device/File integrity check succeeded
11 
12                                    Device/File not configured
13 
14                                    Device/File not configured
15 
16                                    Device/File not configured
17 
18                                    Device/File not configured
19 
20     Cluster registry integrity check succeeded
21 
22     Logical corruption check succeeded

Now we can create new voting disk:

1[root@rico1 ~]# crsctl query css votedisk
2Located 0 voting disk(s).
3[root@rico1 ~]# crsctl replace votedisk +GRID
4Successful addition of voting disk 62a6bea00e4e4f01bf3ed09c345eedba.
5Successfully replaced voting disk group with +GRID.
6CRS-4266: Voting file(s) successfully replaced
7[root@rico1 ~]# crsctl query css votedisk
8##  STATE    File Universal Id                File Name Disk group
9--  -----    -----------------                --------- ---------
10 1. ONLINE   62a6bea00e4e4f01bf3ed09c345eedba (/dev/asm_grid) [GRID]
11Located 1 voting disk(s).

So it seems, that everything looks fine. It’s time to stop CRS

1[root@rico1 ~]# crsctl stop crs
2CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rico1'
3CRS-2673: Attempting to stop 'ora.evmd' on 'rico1'
4CRS-2673: Attempting to stop 'ora.ctssd' on 'rico1'
5CRS-2673: Attempting to stop 'ora.mdnsd' on 'rico1'
6CRS-2673: Attempting to stop 'ora.gpnpd' on 'rico1'
7CRS-2677: Stop of 'ora.evmd' on 'rico1' succeeded
8CRS-2677: Stop of 'ora.ctssd' on 'rico1' succeeded
9CRS-2673: Attempting to stop 'ora.asm' on 'rico1'
10CRS-2677: Stop of 'ora.mdnsd' on 'rico1' succeeded
11CRS-2677: Stop of 'ora.gpnpd' on 'rico1' succeeded
12CRS-2677: Stop of 'ora.asm' on 'rico1' succeeded
13CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'rico1'
14CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'rico1' succeeded
15CRS-2673: Attempting to stop 'ora.cssd' on 'rico1'
16CRS-2677: Stop of 'ora.cssd' on 'rico1' succeeded
17CRS-2673: Attempting to stop 'ora.gipcd' on 'rico1'
18CRS-2677: Stop of 'ora.gipcd' on 'rico1' succeeded
19CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rico1' has completed
20CRS-4133: Oracle High Availability Services has been stopped.

And start it in normal mode

1[root@rico1 ~]# crsctl start crs
2CRS-4123: Oracle High Availability Services has been started.

And we’re done 🙂

1[root@rico1 ~]# crsctl stat res -t
2--------------------------------------------------------------------------------
3Name           Target  State        Server                   State details
4--------------------------------------------------------------------------------
5Local Resources
6--------------------------------------------------------------------------------
7ora.DATA.dg
8               ONLINE  ONLINE       rico1                    STABLE
9               ONLINE  ONLINE       rico2                    STABLE
10ora.GRID.dg
11               ONLINE  ONLINE       rico1                    STABLE
12               ONLINE  ONLINE       rico2                    STABLE
13ora.LISTENER.lsnr
14               ONLINE  ONLINE       rico1                    STABLE
15               ONLINE  ONLINE       rico2                    STABLE
16ora.asm
17               ONLINE  ONLINE       rico1                    Started,STABLE
18               ONLINE  ONLINE       rico2                    Started,STABLE
19ora.net1.network
20               ONLINE  ONLINE       rico1                    STABLE
21               ONLINE  ONLINE       rico2                    STABLE
22ora.ons
23               ONLINE  ONLINE       rico1                    STABLE
24               ONLINE  ONLINE       rico2                    STABLE
25--------------------------------------------------------------------------------
26Cluster Resources
27--------------------------------------------------------------------------------
28ora.LISTENER_SCAN1.lsnr
29      1        ONLINE  ONLINE       rico2                    STABLE
30ora.LISTENER_SCAN2.lsnr
31      1        ONLINE  ONLINE       rico1                    STABLE
32ora.LISTENER_SCAN3.lsnr
33      1        ONLINE  ONLINE       rico1                    STABLE
34ora.MGMTLSNR
35      1        ONLINE  ONLINE       rico2                    169.254.26.98 10.0.0
36                                                             .12,STABLE
37ora.cvu
38      1        ONLINE  ONLINE       rico1                    STABLE
39ora.dupa.db
40      1        ONLINE  ONLINE       rico1                    Open,STABLE
41      2        ONLINE  ONLINE       rico2                    Open,STABLE
42ora.mgmtdb
43      1        ONLINE  OFFLINE                               STABLE
44ora.oc4j
45      1        ONLINE  ONLINE       rico1                    STABLE
46ora.rico1.vip
47      1        ONLINE  ONLINE       rico1                    STABLE
48ora.rico2.vip
49      1        ONLINE  ONLINE       rico2                    STABLE
50ora.scan1.vip
51      1        ONLINE  ONLINE       rico2                    STABLE
52ora.scan2.vip
53      1        ONLINE  ONLINE       rico1                    STABLE
54ora.scan3.vip
55      1        ONLINE  ONLINE       rico1                    STABLE
56--------------------------------------------------------------------------------

The last step would be to recreate -MGMTDB:

1[root@rico1 ~]# srvctl remove mgmtdb
2Remove the database _mgmtdb? (y/[n]) y
3[root@rico1 ~]# su - oracle
4[oracle@rico1 ~]$ . oraenv
5ORACLE_SID = [oracle] ? +ASM1
6The Oracle base has been set to /u01/app/oracle
7(reverse-i-search)`': ^C
8[oracle@rico1 ~]$ export GI_HOME=$ORACLE_HOME
9[oracle@rico1 ~]$ dbca -silent -createDatabase -sid -MGMTDB -createAsContainerDatabase true -templateName MGMTSeed_Database.dbc -gdbName _mgmtdb -storageType ASM -diskGroupName +grid -datafileJarLocation $GI_HOME/assistants/dbca/templates -characterset AL32UTF8 -autoGeneratePasswords -skipUserTemplateCheck
10Registering database with Oracle Grid Infrastructure
115% complete
12Copying database files
137% complete
149% complete
1516% complete
1623% complete
1730% complete
1841% complete
19Creating and starting Oracle instance
2043% complete
2148% complete
2249% complete
2350% complete
2455% complete
2560% complete
2661% complete
2764% complete
28Completing Database Creation
2968% complete
3079% complete
3189% complete
32100% complete
33Look at the log file "/u01/app/oracle/cfgtoollogs/dbca/_mgmtdb/_mgmtdb3.log" for further details.

I hope you won’t have to use this procedure in real life 🙂


Contact us

Database Whisperers sp. z o. o. sp. k.
al. Jerozolimskie 200, 3rd floor, room 342
02-486 Warszawa
NIP: 5272744987
REGON:362524978
+48 508 943 051
+48 661 966 009
info@ora-600.pl

Newsletter Sign up to be updated