关于HBase MVCC的设计原理以及MVCC所引起的一个scan问题
最近在使用HBase0.94版本的时,偶尔会出现,HRegionInfowasnulloremptyinMeta的警告
java.io.IOException:HRegionInfowasnulloremptyinMetaforwritetest,row=lot_let,9399239430349923234234,99999999999999
atorg.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:170)
在客户端的MetaScanner.metaScan实现中
metaTable=newHTable(configuration,HConstants.META_TABLE_NAME);
ResultstartRowResult=metaTable.getRowOrBefore(searchRow,HConstants.CATALOG_FAMILY);
if(startRowResult==null){thrownewTableNotFoundException("Cannotfindrowin.META.fortable:"+Bytes.toString(tableName)+",row="+Bytes.toStringBinary(searchRow));}
byte[]value=startRowResult.getValue(HConstants.CATALOG_FAMILY,
HConstants.REGIONINFO_QUALIFIER);
if(value==null||value.length==0){thrownewIOException("HRegionInfowasnulloremptyinMetafor"+Bytes.toString(tableName)+",row="+Bytes.toStringBinary(searchRow));}
可以发现在扫描MetaScanner,rowkey所在的范围在Meta表中不存在;通过RPC定位到服务端的实现
HRegion中:
publicResultgetClosestRowBefore(finalbyte[]row,finalbyte[]family)
throwsIOException{
if(coprocessorHost!=null){
Resultresult=newResult();
if(coprocessorHost.preGetClosestRowBefore(row,family,result)){
returnresult;
}
}
//lookacrossalltheHStoresforthisregionanddeterminewhatthe
//closestkeyisacrossallcolumnfamilies,sincethedatamaybesparse
checkRow(row,"getClosestRowBefore");
startRegionOperation();
this.readRequestsCount.increment();
try{
Storestore=getStore(family);
//gettheclosestkey.(HStore.getRowKeyAtOrBeforecanreturnnull)
KeyValuekey=store.getRowKeyAtOrBefore(row);
Resultresult=null;
if(key!=null){
Getget=newGet(key.getRow());
get.addFamily(family);
result=get(get,null);
}
if(coprocessorHost!=null){
coprocessorHost.postGetClosestRowBefore(row,family,result);
}
returnresult;
}finally{
closeRegionOperation();
}
}
在KeyValuekey=store.getRowKeyAtOrBefore(row);中获得了Meta表的rowkey,但是在后续的实现中
if(key!=null){
Getget=newGet(key.getRow());
get.addFamily(family);
result=get(get,null);
}
获得空的result导致了这个问题;
为什么会存在这个现象。
先讲一下HBase的MVCC的原理,
MVCC是保证数据一致性的手段,HBase在写数据的过程中,需要经过好几个阶段,写HLog,写memstore,更新MVCC;
只有更新了MVCC,才算真正memstore写成功,其中事务的隔离需要有mvcc的来控制,比如读数据不可以获取别的线程还未提交的数据。
1、put、delete数据都会调用applyFamilyMapToMemstore
HRegion中
privatelongapplyFamilyMapToMemstore(Map<byte[],List<KeyValue>>familyMap,
MultiVersionConsistencyControl.WriteEntrylocalizedWriteEntry){
longsize=0;
booleanfreemvcc=false;
try{
if(localizedWriteEntry==null){
//开始一个写memstore,mvcc中的memstoreWrite++,并add待writepending队列中
localizedWriteEntry=mvcc.beginMemstoreInsert();
freemvcc=true;
}
for(Map.Entry<byte[],List<KeyValue>>e:familyMap.entrySet()){
byte[]family=e.getKey();
List<KeyValue>edits=e.getValue();
Storestore=getStore(family);
for(KeyValuekv:edits){
kv.setMemstoreTS(localizedWriteEntry.getWriteNumber());
size+=store.add(kv);
}
}
}finally{
if(freemvcc){
mvcc.completeMemstoreInsert(localizedWriteEntry);
}
}
returnsize;
}
mvcc.completeMemstoreInsert,更新mvcc的memstoreRead,也就是可以读的位置,并通知readWaiters.notifyAll(),释放因flushcache调用waitForRead引起的阻塞;
waitForRead参见以下代码:
publicvoidwaitForRead(WriteEntrye){
booleaninterrupted=false;
synchronized(readWaiters){
//小于,表示还有写未提交
while(memstoreRead<e.getWriteNumber()){
try{
readWaiters.wait(0);
}catch(InterruptedExceptionie){
//Wewereinterrupted...finishtheloop--i.e.cleanup--andthen
//onourwayout,resettheinterruptflag.
interrupted=true;
}
}
}
if(interrupted)Thread.currentThread().interrupt();
}
2、在flushcache的过程中,获取到memstore中的keyvalues后,会调用mvcc.waitForRead(w)(因memstore所有的keyvalue,包括还未真正提交的,所以要等待其他事务提交后,才可以进行后续的flush操作,保证事务的一致性。
w=mvcc.beginMemstoreInsert();
mvcc.advanceMemstore(w);
mvcc.waitForRead(w);
3、scan数据
在RegionScannerImpl.next方法实现中:
publicsynchronizedbooleannext(List<KeyValue>outResults,intlimit)
throwsIOException{
if(this.filterClosed){
thrownewUnknownScannerException("Scannerwasclosed(timedout?)"+
"afterwerenewedit.Couldbecausedbyaveryslowscanner"+
"oralengthygarbagecollection");
}
startRegionOperation();
readRequestsCount.increment();
try{
//Thiscouldbeanewthreadfromthelasttimewecallednext().
//this.readPoint在构造的时,初始化(readpoint为当前hregion的mvcc中的memstoreRead,为当前可读的点)和当前线程绑定
MultiVersionConsistencyControl.setThreadReadPoint(this.readPt);
在MemStore中过滤掉还未提交的事务(新的keyvalue中有最新的point)
protectedKeyValuegetNext(Iterator<KeyValue>it){
longreadPoint=MultiVersionConsistencyControl.getThreadReadPoint();
while(it.hasNext()){
KeyValuev=it.next();
//过滤掉大于当前线程readPoint的keyvalue
if(v.getMemstoreTS()<=readPoint){
returnv;
}
}
returnnull;
}
纵观MVCC的整个过程,再分析HRegion中的getClosestRowBefore方法实现,
KeyValuekey=store.getRowKeyAtOrBefore(row);
这个调用不会进行MVCC的控制,可以读到memstore中所有的数据
而get方法是会进行MVCC进行控制的,所以一种可能情况是在get调用的时,store.getRowKeyAtOrBefore(row)读到的key值还未提交,
所有都过滤掉了,查询范围为null。