Oracle集群心跳及其參數(shù)misscount/disktimeout/reboottime分析

這篇文章主要講解了“Oracle集群心跳及其參數(shù)misscount/disktimeout/reboottime分析”，文中的講解內(nèi)容簡單清晰，易于學(xué)習(xí)與理解，下面請大家跟著小編的思路慢慢深入，一起來研究和學(xué)習(xí)“Oracle集群心跳及其參數(shù)misscount/disktimeout/reboottime分析”吧！

創(chuàng)新互聯(lián)是一家集網(wǎng)站建設(shè),甕安企業(yè)網(wǎng)站建設(shè),甕安品牌網(wǎng)站建設(shè),網(wǎng)站定制,甕安網(wǎng)站建設(shè)報價,網(wǎng)絡(luò)營銷,網(wǎng)絡(luò)優(yōu)化,甕安網(wǎng)站推廣為一體的創(chuàng)新建站企業(yè)，幫助傳統(tǒng)企業(yè)提升企業(yè)形象加強企業(yè)競爭力?？沙浞譂M足這一群體相比中小企業(yè)更為豐富、高端、多元的互聯(lián)網(wǎng)需求。同時我們時刻保持專業(yè)、時尚、前沿，時刻以成就客戶成長自我，堅持不斷學(xué)習(xí)、思考、沉淀、凈化自己，讓我們?yōu)楦嗟钠髽I(yè)打造出實用型網(wǎng)站。

一、OCSSD與CSS
OCSSD是一個管理及提供Cluster Synchronization Services (CSS)服務(wù)的Linux或者Unix進程。使用Oracle用戶來執(zhí)行該進程并提供節(jié)點成員管理功能，一旦該進程失敗。將導(dǎo)致節(jié)點重新啟動。CSS服務(wù)提供2種心跳機制。一種為網(wǎng)絡(luò)心跳。一種為磁盤心跳。兩種心跳都有最大延時，網(wǎng)絡(luò)心跳的延時叫MC(Misscount)，磁盤心跳延時叫作IOT (I/O Timeout)。

這2個參數(shù)都以秒為單位。缺省時情況下Misscount < Disktimeout。

以下分別描寫敘述這2種心跳機制。

二、網(wǎng)絡(luò)心跳
    故名思義即是通過私有網(wǎng)絡(luò)來檢測節(jié)點的狀態(tài)。假設(shè)私有網(wǎng)絡(luò)硬件、軟件導(dǎo)致集群節(jié)點間私有網(wǎng)絡(luò)在一定時間內(nèi)無法進行正常通信。由此而導(dǎo)致腦裂。由于集群環(huán)境中的存儲為共享存儲，因此此時必須要將故障節(jié)點從集群隔離出來，以避免數(shù)據(jù)災(zāi)難。關(guān)于這個網(wǎng)絡(luò)心跳的詳細動作描寫敘述例如以下：
   Every one second, a sending thread in the cssd sends a network tcp heartbeat to itself and all nodes. The receiving thread of the ocssd.bin receives the heartbeat.
    If the package network is dropped or has error, the error correction mechanism on tcp would retransmit the package.
    Oracle does not retransmit. From the ocssd.log, you will see a WARNING message about missing of heartbeat if a node does not receive a heartbeat from another node for 15 seconds (50% of miscount).  Another warning is reported in ocssd.log if the same node is missing for 22 seconds (75% of miscount)..another warning continues from the same node for 27 seconds (90% miscount).  When the heartbeat is missing 100% ..30 seconds miscount, the node is evicted

   這個網(wǎng)絡(luò)心跳的延遲稱之為misscount，能夠通過crsctl 工具查詢及改動。
   [grid@Linux-01 ~]$ crsctl get css misscount
   CRS-4678: Successful get misscount 30 for Cluster Synchronization Services.

   上面的查詢結(jié)果表明，假設(shè)集群各節(jié)點間內(nèi)聯(lián)網(wǎng)絡(luò)延遲大于30s，Oracle覺得節(jié)點間發(fā)生了腦裂，須要將故障節(jié)點逐出集群。

怎樣尋找故障節(jié)點。Oracle則通過投票算法來決定，以下是一個算法描寫敘述演示樣例，描寫敘述參考大話Oracle RAC。
集群中各個節(jié)點須要心跳機制來通報彼此的"健康狀態(tài)"。假設(shè)每收到一個節(jié)點的"通報"代表一票。對于三個節(jié)點的集群。正常執(zhí)行時，每一個節(jié)點都會有3票。當(dāng)結(jié)點A心跳出現(xiàn)故障但節(jié)點A還在執(zhí)行，這時整個集群就會分裂成2個小的partition。

節(jié)點A是一個。剩下的2個是一個。

這是必須剔除一個partition才干保障集群的健康執(zhí)行。對于這3個節(jié)點的集群， A 心跳出現(xiàn)故障后， B 和 C 是一個partion，有2票， A僅僅有1票。

依照投票算法。 B 和C 組成的集群獲得控制權(quán)。 A 被剔除。假設(shè)僅僅有2個節(jié)點，投票算法就失效了。

由于每一個節(jié)點上都僅僅有1票。這時就須要引入第三個設(shè)備：Quorum Device. Quorum Device 通常採用的是共享磁盤，這個磁盤也叫作Quorum disk。這個Quorum Disk 也代表一票。當(dāng)2個結(jié)點的心跳出現(xiàn)故障時， 2個節(jié)點同一時候去爭取Quorum Disk 這一票，最早到達的請求被最先滿足。

故最先獲得Quorum Disk的節(jié)點就獲得2票。還有一個節(jié)點就會被剔除。

節(jié)點一旦被隔離之后，在11gR2之前一般是重新啟動故障節(jié)點。

而在11gR2中。ClusterWare會首先嘗試關(guān)閉該節(jié)點的全部資源，嘗試對集群中失敗的組建進行清理，即重新啟動失敗的組件。

假設(shè)清理失敗的組件未成功，為了強制清理，則再對節(jié)點進行重新啟動。

三、磁盤心跳
A thread in ocssd.bin updates the voting disk every second.
If a node does not update the voting disks for 200 seconds, it's evicted.
However, the ocssd.bin on the local node has the logic that it will bring down the node if it has an I/O error more than majority of the voting disks. Also there is a CRS reconfiguration is happening when misscount is 27 second and the local node is rebooted. As a result, you rarely see an eviction due to failure of the voting disk on 10.2.0.4 (this is more common in 10.2.0.1)) because the ocssd.bin will abort the node before it get evicted by another node if writing to the voting disk is the problem.
如上所述，每一個節(jié)點會每一秒鐘更新一次表決磁盤。共享的表決磁盤用于檢查磁盤心跳。

假設(shè)ocssd進程更新表決磁盤的時間超過200s，即disktimeout設(shè)定的值。Oracle會覺得該表決磁盤脫機，同一時候在Clusterware的告警日志中生成表決磁盤脫機記錄。假設(shè)當(dāng)前節(jié)點表決磁盤脫機的個數(shù)小于在線表決磁盤的個數(shù)，該節(jié)點能夠幸存，假設(shè)脫機表決磁盤的個數(shù)大于或等于在線表決磁盤的個數(shù)，則clusterware覺得磁盤心跳出現(xiàn)故障。故障節(jié)點會被逐出集群。執(zhí)行自己主動修復(fù)過程。

比方有3個表決磁盤。節(jié)點A有表決磁盤出現(xiàn)了脫機。此時脫機磁盤(1個)<在線磁盤(2)。clusterware會在告警日志中生成脫機記錄，但不採取不論什么行動。假設(shè)當(dāng)前節(jié)點有2個或2個以上表決磁盤脫機，此時脫機磁盤(2個)>在線磁盤(1個)。那節(jié)點A被踢出集群。

四、RebootTime參數(shù)
    注意這個RebootTime參數(shù)。也非常重要，缺省情況下為3s。
    Default 3 seconds -the amount of time allowed for a node to complete a reboot
    after the CSS daemon has been evicted.
    crsctl get css reboottime

五、心跳參數(shù)的調(diào)整
1) 10.2.0.2 to 11.1.0.7版本號的改動方法
   a) Shut down CRS on all but one node. For exact steps use note 309542.1
    b) Execute crsctl as root to modify the misscount:
       $CRS_HOME/bin/crsctl set css misscount <n>    #### where <n> is the maximum private network latency in seconds
       $CRS_HOME/bin/crsctl set css reboottime <r> [-force] #### (<r> is seconds)
       $CRS_HOME/bin/crsctl set css disktimeout <d> [-force] #### (<d> is seconds)
    c) Reboot the node where adjustment was made
    d) Start all other nodes which was shutdown in step 1
    e) Execute crsctl as root to confirm the change:
       $CRS_HOME/bin/crsctl get css misscount
       $CRS_HOME/bin/crsctl get css reboottime
       $CRS_HOME/bin/crsctl get css disktimeout

2) 11gR2的改動方法
With 11gR2, these settings can be changed online without taking any node down:

    a) Execute crsctl as root to modify the misscount:
       $CRS_HOME/bin/crsctl set css misscount <n>    #### where <n> is the maximum private network latency in seconds
       $CRS_HOME/bin/crsctl set css reboottime <r> [-force] #### (<r> is seconds)
       $CRS_HOME/bin/crsctl set css disktimeout <d> [-force] #### (<d> is seconds)
    b) Execute crsctl as root to confirm the change:
       $CRS_HOME/bin/crsctl get css misscount
       $CRS_HOME/bin/crsctl get css reboottime
       $CRS_HOME/bin/crsctl get css disktimeout

感謝各位的閱讀，以上就是“Oracle集群心跳及其參數(shù)misscount/disktimeout/reboottime分析”的內(nèi)容了，經(jīng)過本文的學(xué)習(xí)后，相信大家對Oracle集群心跳及其參數(shù)misscount/disktimeout/reboottime分析這一問題有了更深刻的體會，具體使用情況還需要大家實踐驗證。這里是創(chuàng)新互聯(lián)，小編將為大家推送更多相關(guān)知識點的文章，歡迎關(guān)注！

當(dāng)前文章：Oracle集群心跳及其參數(shù)misscount/disktimeout/reboottime分析
網(wǎng)站網(wǎng)址：http://www.rwnh.cn/article28/jsdjcp.html

成都網(wǎng)站建設(shè)公司_創(chuàng)新互聯(lián)，為您提供自適應(yīng)網(wǎng)站、手機網(wǎng)站建設(shè)、服務(wù)器托管、商城網(wǎng)站、定制網(wǎng)站、網(wǎng)頁設(shè)計公司

聲明：本網(wǎng)站發(fā)布的內(nèi)容（圖片、視頻和文字）以用戶投稿、用戶轉(zhuǎn)載內(nèi)容為主，如果涉及侵權(quán)請盡快告知，我們將會在第一時間刪除。文章觀點不代表本網(wǎng)站立場，如需處理請聯(lián)系客服。電話：028-86922220；郵箱：631063699@qq.com。內(nèi)容未經(jīng)允許不得轉(zhuǎn)載，或轉(zhuǎn)載時需注明來源：創(chuàng)新互聯(lián)

猜你還喜歡下面的內(nèi)容

内射老阿姨1区2区3区4区_久久精品人人做人人爽电影蜜月_久久国产精品亚洲77777_99精品又大又爽又粗少妇毛片

Oracle集群心跳及其參數(shù)misscount/disktimeout/reboottime分析