北京2024年(nian)1月15日 /美通社(she)(she)/ -- 數(shu)據(ju)是社(she)(she)會發(fa)(fa)展的基礎資源(yuan)。隨著數(shu)字經濟時代到來(lai),爆發(fa)(fa)式增(zeng)長的數(shu)據(ju)量為用戶(hu)生(sheng)活帶(dai)來(lai)便利,也為企(qi)業智能化發(fa)(fa)展提供動力。存儲作為數(shu)據(ju)載體設備發(fa)(fa)揮著重(zhong)要(yao)作用,數(shu)據(ju)存儲既(ji)要(yao)滿足當前全(quan)球數(shu)據(ju)量ZB級的高速增(zeng)長需(xu)求,又要(yao)保證數(shu)據(ju)安(an)全(quan)可靠、高效精準的存、讀、寫。數(shu)據(ju)中心"穩定的數(shu)據(ju)存力"離不(bu)開各存儲節點的平穩運行,提升數(shu)據(ju)存儲的安(an)全(quan)可靠性,避(bi)免意外場景下的數(shu)據(ju)丟失,已(yi)成為存儲硬(ying)件平臺發(fa)(fa)展的重(zhong)要(yao)挑戰。
存儲備電——數據存儲安全的保障
存(cun)儲(chu)(chu)系統(tong)采用電(dian)(dian)源單元(PSU)"X+X"冗(rong)余供(gong)(gong)電(dian)(dian),在供(gong)(gong)電(dian)(dian)之(zhi)外還配置(zhi)備用電(dian)(dian)池BBU(Battery Back-Up Unit),當(dang)機房(fang)市(shi)電(dian)(dian)掉電(dian)(dian)或PSU模塊出現異常,存(cun)儲(chu)(chu)系統(tong)實(shi)時監測PSU供(gong)(gong)電(dian)(dian)狀態,并進(jin)行預(yu)判,無縫切換(huan)由備用電(dian)(dian)池BBU供(gong)(gong)電(dian)(dian)。BBU提供(gong)(gong)持(chi)續的供(gong)(gong)電(dian)(dian)能(neng)力,確保(bao)存(cun)儲(chu)(chu)系統(tong)控制器(qi)寫緩存(cun)中(zhong)數據(ju)(ju)完整且安全(quan)地(di)寫入非(fei)易(yi)失(shi)性介質(zhi)(HDD、SSD硬盤),避免(mian)數據(ju)(ju)丟失(shi)。為(wei)保(bao)證(zheng)數據(ju)(ju)存(cun)儲(chu)(chu)的業務連(lian)續性,機房(fang)市(shi)電(dian)(dian)意外掉電(dian)(dian)、市(shi)電(dian)(dian)恢復后(hou)能(neng)快速恢復存(cun)儲(chu)(chu)系統(tong)的業務。
隨(sui)著(zhu)數(shu)(shu)據量級成倍(bei)增(zeng)長、存儲(chu)業(ye)務(wu)復雜程度(du)提升(sheng),存儲(chu)硬件(jian)平(ping)臺(tai)正朝著(zhu)高密(mi)度(du)與(yu)高性(xing)能(neng)方向發展,傳統供(gong)電(dian)(dian)備(bei)電(dian)(dian)控制(zhi)策略難(nan)以滿(man)足(zu)存儲(chu)系統的(de)穩定性(xing)要求(qiu)。高端存儲(chu)平(ping)臺(tai)從系統架構(gou)到部件(jian)性(xing)能(neng)的(de)升(sheng)級伴隨(sui)著(zhu)系統整(zheng)體功(gong)率提升(sheng),正常(chang)(chang)運行時(shi)存儲(chu)陣列(lie)整(zheng)機功(gong)耗是上(shang)一代(dai)產品2倍(bei),掉電(dian)(dian)時(shi)刻控制(zhi)器快速降低功(gong)耗,備(bei)份(fen)數(shu)(shu)據時(shi)整(zheng)機功(gong)耗仍(reng)是上(shang)一代(dai)產品2倍(bei);單個BBU的(de)電(dian)(dian)芯(xin)節數(shu)(shu)增(zeng)加2倍(bei)才能(neng)滿(man)足(zu)異(yi)常(chang)(chang)掉電(dian)(dian)時(shi)備(bei)份(fen)數(shu)(shu)據功(gong)耗需(xu)求(qiu)。受功(gong)率密(mi)度(du)限制(zhi),供(gong)電(dian)(dian)需(xu)求(qiu)增(zeng)加,PSU異(yi)常(chang)(chang)掉電(dian)(dian)維持(chi)時(shi)間會減(jian)小3/4,大功(gong)率BBU輸出(chu)啟(qi)動時(shi)間會增(zeng)加3倍(bei)。因此(ci),需(xu)要設(she)計新的(de)供(gong)電(dian)(dian)架構(gou),PSU供(gong)電(dian)(dian)異(yi)常(chang)(chang)時(shi)無縫切換至BBU供(gong)電(dian)(dian)。
高端存儲性能提升,存儲備電方案挑戰升級
浪潮信息(xi)提出BBU冷備份(fen)模式(shi)無(wu)縫切換和(he)"X+X"冗余供電控制方案:
1、PSU、BBU實(shi)現"X+X"冗余供電(dian),三年產(chan)品生命(ming)周期內,BBU冗余供電(dian)狀態滿(man)足兩次掉(diao)電(dian)數據備份要(yao)求(qiu);BBU非冗余供電(dian)狀態滿(man)足存儲(chu)產(chan)品1次掉(diao)電(dian)數據備份要(yao)求(qiu)。
2、相(xiang)比傳統BBU熱備(bei)份(fen)供(gong)電,創新采用(yong)備(bei)用(yong)電池(chi)BBU冷(leng)備(bei)份(fen)無縫切(qie)換控制策略(lve),滿足(zu)存(cun)儲系統高可(ke)靠(kao)性要求的(de)同時,提(ti)高了(le)電池(chi)的(de)使用(yong)壽(shou)命,降(jiang)低了(le)BBU熱備(bei)份(fen)能耗和電池(chi)報(bao)廢數(shu)量,降(jiang)低對環境的(de)污染(ran)。
浪潮信息(xi)智(zhi)能備電(dian)(dian)控(kong)制方案設計雙向流動充(chong)(chong)(chong)放(fang)電(dian)(dian)控(kong)制電(dian)(dian)路,BBU需要充(chong)(chong)(chong)電(dian)(dian)時,充(chong)(chong)(chong)放(fang)電(dian)(dian)模塊(kuai)工作(zuo)在(zai)BUCK充(chong)(chong)(chong)電(dian)(dian)模式,為(wei)(wei)BBU充(chong)(chong)(chong)電(dian)(dian);數(shu)據備份BBU放(fang)電(dian)(dian)電(dian)(dian)壓低于一定閾(yu)值,充(chong)(chong)(chong)放(fang)電(dian)(dian)模塊(kuai)工作(zuo)在(zai)BOOST升壓放(fang)電(dian)(dian)模式,輸出(chu)電(dian)(dian)壓恒定。旁路放(fang)電(dian)(dian)+OR-ING線或(huo)控(kong)制電(dian)(dian)路保證PSU工作(zuo)正常時BBU不(bu)為(wei)(wei)系(xi)統供(gong)(gong)電(dian)(dian),PSU工作(zuo)異常時無縫切換至BBU為(wei)(wei)系(xi)統供(gong)(gong)電(dian)(dian)。PSU為(wei)(wei)系(xi)統供(gong)(gong)電(dian)(dian)時BBU放(fang)電(dian)(dian)模塊(kuai)不(bu)需要工作(zuo),降低了BBU熱備份能耗(hao)。
智能(neng)(neng)控(kong)制(zhi)(zhi)算(suan)(suan)法根據(ju)不(bu)同(tong)(tong)(tong)使(shi)用(yong)場(chang)景選用(yong)不(bu)同(tong)(tong)(tong)充(chong)(chong)電(dian)(dian)(dian)控(kong)制(zhi)(zhi)模式(預(yu)充(chong)(chong)、CC、CV)對(dui)BBU進(jin)行充(chong)(chong)放電(dian)(dian)(dian)管理(li),保(bao)證BBU供(gong)(gong)電(dian)(dian)(dian)時(shi)恒壓輸(shu)出,提(ti)高電(dian)(dian)(dian)池(chi)使(shi)用(yong)壽命的同(tong)(tong)(tong)時(shi)降低BBU電(dian)(dian)(dian)池(chi)報廢數量和對(dui)環境的污(wu)染(ran)。智能(neng)(neng)PID控(kong)制(zhi)(zhi)算(suan)(suan)法,調頻與(yu)調幅相結合,提(ti)高了充(chong)(chong)電(dian)(dian)(dian)、放電(dian)(dian)(dian)控(kong)制(zhi)(zhi)精度。根據(ju)負載大小選取不(bu)同(tong)(tong)(tong)控(kong)制(zhi)(zhi)算(suan)(suan)法,提(ti)高轉化(hua)效率,達(da)到節能(neng)(neng)減排目的。BBU單節點供(gong)(gong)電(dian)(dian)(dian)升(sheng)級成1+1冗余(yu)供(gong)(gong)電(dian)(dian)(dian),全數字控(kong)制(zhi)(zhi)算(suan)(suan)法,優化(hua)現有供(gong)(gong)電(dian)(dian)(dian)鏈路偵測、備電(dian)(dian)(dian)能(neng)(neng)力評估算(suan)(suan)法,杜絕(jue)了丟失數據(ju)隱(yin)患,提(ti)高供(gong)(gong)電(dian)(dian)(dian)穩定(ding)性、可靠(kao)性。
存儲(chu)系統(tong)作(zuo)為備電流程順利進行的關(guan)鍵,對BBU單元(yuan)的管(guan)理至關(guan)重(zhong)要,基于BBU單元(yuan)自診斷的狀態(tai)信息,從以下五個方面進行智能備電狀態(tai)監測(ce)處理,并對傳統(tong)監測(ce)處理算法進行優化:
1) 定期對BBU供(gong)電鏈(lian)路偵(zhen)測,模擬存儲(chu)系統(tong)供(gong)電切(qie)換(huan)流程,BBU供(gong)電鏈(lian)路異常(chang)時提前(qian)識別鏈(lian)路隱患,出現(xian)隱患時存儲(chu)系統(tong)不(bu)會(hui)下(xia)電;
2) 定期評(ping)(ping)估BBU單元備電能力,深度放電提高(gao)評(ping)(ping)估精度,判(pan)定是否滿足存儲系(xi)統備電需求,同(tong)時消除BMS采樣累積誤差(cha);
3) 存儲系統實時讀取BBU單元電(dian)(dian)壓(ya)與電(dian)(dian)流(liu)、電(dian)(dian)芯(xin)電(dian)(dian)壓(ya)與溫度、充放電(dian)(dian)MOS管溫度,接近BMS內置閾值時異常修復(fu),無法修復(fu)時報警處理(li);
4) 充(chong)電(dian)過(guo)程(cheng)自動監測BBU備電(dian)能力(li),檢測是否滿足存儲系統一次(ci)備電(dian)需求,實時更新(xin)BBU狀(zhuang)態;
5)存儲系統對BBU單元BMS狀態寄(ji)存器狀態值實時(shi)(shi)監控,出現異(yi)常(chang)后智能修復,無法修復及時(shi)(shi)告警,BBU非冗余模式,進入數據備份異(yi)常(chang)處理(li)流程。
浪潮信息BBU狀態(tai)智能診斷,提前識別供電隱患,將潛在異常(chang)的識別率提升了(le)(le)5倍(bei),杜絕了(le)(le)數據丟(diu)失(shi)風險;問題診斷完成后,存儲系(xi)統對日志(zhi)進(jin)行智能分析(xi),準(zhun)確定(ding)位如BBU電芯異常(chang)、控(kong)制模塊異常(chang)、放電鏈路異常(chang)和(he)系(xi)統散(san)熱異常(chang)等問題源頭。
浪潮信息秉承"極(ji)致存(cun)(cun)(cun)儲,智(zhi)慧有數"的理(li)念,深耕于存(cun)(cun)(cun)儲平臺底層硬件(jian)創新研(yan)發,全(quan)方位從源頭落實智(zhi)能備電控制策略,充(chong)分發揮硬件(jian)平臺的數據備份處理(li)優勢,實現數據存(cun)(cun)(cun)得高效、存(cun)(cun)(cun)得可靠(kao),打造(zao)安全(quan)可靠(kao)、經濟高效、易用易管的存(cun)(cun)(cun)儲平臺。