HBase Merging Regions

我承认我之前不知道hbase还能做merge region操作,而且它适合在什么情况下用呢,下面的这篇文章给出了一些结论:

有的时候region个数太多不是什么好事情,所以merge region大势所趋啦~ 

While it is much more common for regions to split automatically over time as you are adding data to the corresponding table, there might be situations where you need to merge regions, for example, after you have removed a large amount of data and you want to reduce the number of regions hosted by each server.

HBase ships with a tool that allows you to merge two adjacent regions as long as the cluster is not online. You can use the command line tool to get the usage details:

  1. $ ./bin/hbase org.apache.hadoop.hbase.util.Merge  
  2. Usage: bin/hbase merge <table-name> <region-1> <region-2>  

Here is an example of a table that has more than one region, which are then subsequently merged:<strong style="margin:0px;padding:0px;"><br style="margin:0px;padding:0px;" /></strong>

  1. $ ./bin/hbase shell  
  2.   
  3. hbase(main):001:0> create 'testtable', 'colfam1', \  
  4.  {SPLITS => ['row-10','row-20','row-30','row-40','row-50']}  
  5. 0 row(s) in 0.2640 seconds  
  6.   
  7. hbase(main):002:0> for i in '0'..'9' do for j in '0'..'9' do \  
  8.  put 'testtable', "row-#{i}#{j}", "colfam1:#{j}", "#{j}" end end  
  9. 0 row(s) in 1.0450 seconds  
  10.   
  11. hbase(main):003:0> flush 'testtable'  
  12. 0 row(s) in 0.2000 seconds  
  13.   
  14. hbase(main):004:0> scan '.META.', { COLUMNS => ['info:regioninfo']}  
  15. ROW                                  COLUMN+CELL  
  16.  testtable,,1309614509037.612d1e0112 column=info:regioninfo, timestamp=130...  
  17.  406e6c2bb482eeaec57322.             STARTKEY => '', ENDKEY => 'row-10'  
  18.  testtable,row-10,1309614509040.2fba column=info:regioninfo, timestamp=130...  
  19.  fcc9bc6afac94c465ce5dcabc5d1.       STARTKEY => 'row-10', ENDKEY => 'row-20'  
  20.  testtable,row-20,1309614509041.e7c1 column=info:regioninfo, timestamp=130...  
  21.  6267eb30e147e5d988c63d40f982.       STARTKEY => 'row-20', ENDKEY => 'row-30'  
  22.  testtable,row-30,1309614509041.a9cd column=info:regioninfo, timestamp=130...  
  23.  e1cbc7d1a21b1aca2ac7fda30ad8.       STARTKEY => 'row-30', ENDKEY => 'row-40'  
  24.  testtable,row-40,1309614509041.d458 column=info:regioninfo, timestamp=130...  
  25.  236feae097efcf33477e7acc51d4.       STARTKEY => 'row-40', ENDKEY => 'row-50'  
  26.  testtable,row-50,1309614509041.74a5 column=info:regioninfo, timestamp=130...  
  27.  7dc7e3e9602d9229b15d4c0357d1.       STARTKEY => 'row-50', ENDKEY => ''  
  28. 6 row(s) in 0.0440 seconds  
  29.   
  30. hbase(main):005:0> exit  
  31.   
  32. $ ./bin/stop-hbase.sh  
  33.   
  34. $ ./bin/hbase org.apache.hadoop.hbase.util.Merge testtable \  
  35.  testtable,row-20,1309614509041.e7c16267eb30e147e5d988c63d40f982. \  
  36.  testtable,row-30,1309614509041.a9cde1cbc7d1a21b1aca2ac7fda30ad8.  



The example creates a table with five split points, resulting in six regions. It then inserts some rows and flushes the data to ensure that there are store files for the subsequent merge. The scan is used to get the names of the regions, but you can also use the web UI of the master: click on the table name in the User Tables section to get the same list of regions.

Note

Note how the shell wraps the values in each column. The region name is split over two lines, which you need to copy&paste separately. The web UI is easier to use in that respect as it has the names in one column and in a single line.

The content of the column values is abbreviated to the start and end keys. You can see how the createcommand using the split keys has created the regions. The example goes on to exit the shell, and stop the HBase cluster. Note that HDFS still needs to run for the merge to work as it needs to read the store files of each region and merge them into a new combined one.