关于HBase MVCC的设计原理以及MVCC所引起的一个scan问题

ZHBMcCoy

2013-05-31

关注关注

最近在使用HBase0.94版本的时，偶尔会出现，HRegionInfowasnulloremptyinMeta的警告

java.io.IOException:HRegionInfowasnulloremptyinMetaforwritetest,row=lot_let,9399239430349923234234,99999999999999

atorg.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:170)

在客户端的MetaScanner.metaScan实现中

metaTable=newHTable(configuration,HConstants.META_TABLE_NAME);

ResultstartRowResult=metaTable.getRowOrBefore(searchRow,HConstants.CATALOG_FAMILY);

if(startRowResult==null){thrownewTableNotFoundException("Cannotfindrowin.META.fortable:"+Bytes.toString(tableName)+",row="+Bytes.toStringBinary(searchRow));}

byte[]value=startRowResult.getValue(HConstants.CATALOG_FAMILY,

HConstants.REGIONINFO_QUALIFIER);

if(value==null||value.length==0){thrownewIOException("HRegionInfowasnulloremptyinMetafor"+Bytes.toString(tableName)+",row="+Bytes.toStringBinary(searchRow));}

可以发现在扫描MetaScanner,rowkey所在的范围在Meta表中不存在；通过RPC定位到服务端的实现

HRegion中：

publicResultgetClosestRowBefore(finalbyte[]row,finalbyte[]family)

throwsIOException{

if(coprocessorHost!=null){

Resultresult=newResult();

if(coprocessorHost.preGetClosestRowBefore(row,family,result)){

returnresult;

}

//lookacrossalltheHStoresforthisregionanddeterminewhatthe

//closestkeyisacrossallcolumnfamilies,sincethedatamaybesparse

checkRow(row,"getClosestRowBefore");

startRegionOperation();

this.readRequestsCount.increment();

try{

Storestore=getStore(family);

//gettheclosestkey.(HStore.getRowKeyAtOrBeforecanreturnnull)

KeyValuekey=store.getRowKeyAtOrBefore(row);

Resultresult=null;

if(key!=null){

Getget=newGet(key.getRow());

get.addFamily(family);

result=get(get,null);

}

if(coprocessorHost!=null){

coprocessorHost.postGetClosestRowBefore(row,family,result);

}

returnresult;

}finally{

closeRegionOperation();

}

在KeyValuekey=store.getRowKeyAtOrBefore(row);中获得了Meta表的rowkey，但是在后续的实现中

if(key!=null){

Getget=newGet(key.getRow());

get.addFamily(family);

result=get(get,null);

}

获得空的result导致了这个问题;

为什么会存在这个现象。

先讲一下HBase的MVCC的原理，

MVCC是保证数据一致性的手段，HBase在写数据的过程中，需要经过好几个阶段，写HLog，写memstore，更新MVCC;

只有更新了MVCC，才算真正memstore写成功，其中事务的隔离需要有mvcc的来控制，比如读数据不可以获取别的线程还未提交的数据。

1、put、delete数据都会调用applyFamilyMapToMemstore

HRegion中

privatelongapplyFamilyMapToMemstore(Map<byte[],List<KeyValue>>familyMap,

MultiVersionConsistencyControl.WriteEntrylocalizedWriteEntry){

longsize=0;

booleanfreemvcc=false;

try{

if(localizedWriteEntry==null){

//开始一个写memstore，mvcc中的memstoreWrite++，并add待writepending队列中

localizedWriteEntry=mvcc.beginMemstoreInsert();

freemvcc=true;

}

for(Map.Entry<byte[],List<KeyValue>>e:familyMap.entrySet()){

byte[]family=e.getKey();

List<KeyValue>edits=e.getValue();

Storestore=getStore(family);

for(KeyValuekv:edits){

kv.setMemstoreTS(localizedWriteEntry.getWriteNumber());

size+=store.add(kv);

}

}finally{

if(freemvcc){

mvcc.completeMemstoreInsert(localizedWriteEntry);

}

returnsize;

}

mvcc.completeMemstoreInsert，更新mvcc的memstoreRead，也就是可以读的位置，并通知readWaiters.notifyAll()，释放因flushcache调用waitForRead引起的阻塞;

waitForRead参见以下代码：

publicvoidwaitForRead(WriteEntrye){

booleaninterrupted=false;

synchronized(readWaiters){

//小于，表示还有写未提交

while(memstoreRead<e.getWriteNumber()){

try{

readWaiters.wait(0);

}catch(InterruptedExceptionie){

//Wewereinterrupted...finishtheloop--i.e.cleanup--andthen

//onourwayout,resettheinterruptflag.

interrupted=true;

}

if(interrupted)Thread.currentThread().interrupt();

}

2、在flushcache的过程中，获取到memstore中的keyvalues后,会调用mvcc.waitForRead(w)(因memstore所有的keyvalue,包括还未真正提交的，所以要等待其他事务提交后，才可以进行后续的flush操作，保证事务的一致性。

w=mvcc.beginMemstoreInsert();

mvcc.advanceMemstore(w);

mvcc.waitForRead(w);

3、scan数据

在RegionScannerImpl.next方法实现中：

publicsynchronizedbooleannext(List<KeyValue>outResults,intlimit)

throwsIOException{

if(this.filterClosed){

thrownewUnknownScannerException("Scannerwasclosed(timedout?)"+

"afterwerenewedit.Couldbecausedbyaveryslowscanner"+

"oralengthygarbagecollection");

}

startRegionOperation();

readRequestsCount.increment();

try{

//Thiscouldbeanewthreadfromthelasttimewecallednext().

//this.readPoint在构造的时，初始化（readpoint为当前hregion的mvcc中的memstoreRead，为当前可读的点）和当前线程绑定

MultiVersionConsistencyControl.setThreadReadPoint(this.readPt);

在MemStore中过滤掉还未提交的事务（新的keyvalue中有最新的point）

protectedKeyValuegetNext(Iterator<KeyValue>it){

longreadPoint=MultiVersionConsistencyControl.getThreadReadPoint();

while(it.hasNext()){

KeyValuev=it.next();

//过滤掉大于当前线程readPoint的keyvalue

if(v.getMemstoreTS()<=readPoint){

returnv;

}

returnnull;

}

纵观MVCC的整个过程，再分析HRegion中的getClosestRowBefore方法实现，

KeyValuekey=store.getRowKeyAtOrBefore(row);

这个调用不会进行MVCC的控制，可以读到memstore中所有的数据

而get方法是会进行MVCC进行控制的，所以一种可能情况是在get调用的时，store.getRowKeyAtOrBefore(row)读到的key值还未提交，

所有都过滤掉了，查询范围为null。

hbase

安科网

关于HBase MVCC的设计原理以及MVCC所引起的一个scan问题

ZHBMcCoy

ZHBMcCoy

相关推荐

HBase/TiDB都在用的数据结构：LSM Tree，不得了解一下？

hbase 基础 —— 架构

hdfs、hive、hbase的搭建总结

hbase 建表数据类型

Hbase常见问题

hue集成hbase

HBase安装部署

在hadoop集群下启动hbase的方法

Spark读取Hbase中的数据

Flume-0.9.4和Hbase-0.96整合

HBase的安装部署

Spark读取Mysql，Redis，Hbase数据（一）

Spark 与 JDBC、Hbase之间的交互

1，pinpoint全链路监控

HBase与Hive

HBase与MapReduce交互

HBase原理总结

Hbase scan 查询命令大全，前缀，模糊，正则

Hbase API 创建表错误记录 for Docker 容器部署集群

hbase设置ttl后出现坏块，重启后master abort 问题梳理

ZHBMcCoy