HBase(1)Introduction and Installation

HBase(1)Introduction and Installation

1. HBase Introduction
Hadoop Database  ——> Hadoop HDFS
Hadoop Database ——>Hadoop MapReduce
Hadoop Database ——> Hadoop Zookeeper

Fundamentally Distributed — partitioning(sharding), replication
Column Oriented
Sequential Write     in memory——flush to disk
Merged Read
Periodic Data Compation

Pig(Data Flow) Hive(SQL), Sqoop(RDBMS importing support)

HMaster Server: Region assignment Mgmt(Hadoop Master,NameNode,JobTracker)

HRegionServer #1:DateNode, TaskTracker


2. Install and Setup Hadoop
Install protoc
>wgethttps://protobuf.googlecode.com/files/protobuf-2.5.0.tar.gz
Unzip and cd to that directory
>./configure --prefix=/Users/carl/tool/protobuf-2.5.0
>make
>make install
>sudo ln -s /Users/carl/tool/protobuf-2.5.0 /opt/protobuf-2.5.0
>sudo ln -s /opt/protobuf-2.5.0 /opt/protobuf

Add this line to my environment
export PATH=/opt/protobuf/bin:$PATH

Check the Installation Environment
>protoc --version
libprotoc 2.5.0

Compile Hadoop
>svn checkouthttp://svn.apache.org/repos/asf/hadoop/common/tags/release-2.4.0hadoop-common-2.4.0

Read the document here for building
http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.4.0/BUILDING.txt
>cd hadoop-common-2.4.0/
>mvn clean install -DskipTests
>cd hadoop-mapreduce-project
>mvn clean install assembly:assembly -Pnative

Maybe my machine is too slow, so a lot of timeout Error on my machine. So I redo it like this
>mvn clean -DskipTests install assembly:assembly -Pnative

Need to get rid of the native
>mvn clean -DskipTests install assembly:assembly

Not working, read the document INSTALL
http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.4.0/hadoop-mapreduce-project/INSTALL
>cd ..
>mvn clean package -Pdist -Dtar -DskipTests

Lastest document from here
/Users/carl/data/installation/hadoop-2.4.0/share/doc/hadoop/hadoop-project-dist/hadoop-common/SingleCluster.html 

Follow the BLOG and set hadoop on 4 machines
http://sillycat.iteye.com/blog/2084169
http://sillycat.iteye.com/blog/2090186

>sbin/start-dfs.sh
>sbin/start-yarn.sh
>sbin/mr-jobhistory-daemon.sh start historyserver

3. Setup Zookeeper
Follow the BLOG and set zookeeper on 3 machines
http://sillycat.iteye.com/blog/2015175

>zkServer.sh start conf/zoo-cluster.cfg

4. Try install HBase - Standalone HBase

Download this version since I am using hadoop 2.4.x
>wget http://mirrors.gigenet.com/apache/hbase/hbase-0.98.4/hbase-0.98.4-hadoop2-bin.tar.gz
Unzip the file and move it to the work directory.

>sudo ln -s /home/carl/tool/hbase-0.98.4 /opt/hbase-0.98.4
>sudo ln -s /opt/hbase-0.98.4 /opt/hbase

Check and modify the configuration file
>cat conf/hbase-site.xml 
<configuration>      

  <property>          

   <name>hbase.rootdir</name>          

   <value>file:///opt/hbase</value>      

  </property>      

  <property>          

    <name>hbase.zookeeper.property.dataDir</name>          

    <value>/home/carl/etc/hbase</value>      

  </property>

</configuration>

Start the Service
>bin/start-hbase.sh 

>jps
2036 NameNode 4084 Jps 3340 HMaster 2403 ResourceManager 2263 SecondaryNameNode 2686 JobHistoryServer

Enter the Client Shell
>bin/hbase shell

Create the table
>create 'test', 'cf'

Check the info on that table
>list 'test'

Inserting data
>put 'test', 'row1', 'cf:a', 'value1'
>put 'test', 'row2', 'cf:a', 'value2'
>put 'test', 'row3', 'cf:a', 'value3'
row1 should be the row key, column will be cf:a, value should be value1.

Get all the data
>scan 'test'
ROW                                         COLUMN+CELL                                                                        row1                                       column=cf:a, timestamp=1407169545627, value=value1                                                                           row2                                       column=cf:a, timestamp=1407169557668, value=value2                                                                           row3                                       column=cf:a, timestamp=1407169563458, value=value3                                                                          3 row(s) in 0.0630 seconds

Get a single row
>get 'test', 'row1'
COLUMN                                      CELL                                                                                                                         cf:a                                       timestamp=1407169545627, value=value1   

Some other command
>disable ‘test’
>enable ‘test’
>drop ‘test'

5. Pseudo-Distributed Local Install
Change the configuration as follow
<configuration>      

  <property>          

    <name>hbase.rootdir</name>          

    <value>hdfs://ubuntu-master:9000/hbase</value>      

  </property>      

  <property>          

    <name>hbase.zookeeper.property.dataDir</name>          

    <value>/home/carl/etc/hbase</value>      

  </property>      

  <property>          

    <name>hbase.cluster.distributed</name>          

    <value>true</value>      

  </property>      

  <property>          

    <name>hbase.master.wait.on.regionservers.mintostart</name>          

    <value>1</value>      

  </property>

</configuration>

List the HDFS directory
>hadoop fs -ls /
Found 4 items drwxr-xr-x   - carl supergroup          0 2014-07-09 13:22 /data drwxr-xr-x   - carl supergroup          0 2014-08-04 11:47 /hbase drwxr-xr-x   - carl supergroup          0 2014-07-10 13:09 /output drwxrwx---   - carl supergroup          0 2014-08-04 11:21 /tmp

>hadoop fs -ls /hbase
Found 6 items drwxr-xr-x   - carl supergroup          0 2014-08-04 11:48 /hbase/.tmp drwxr-xr-x   - carl supergroup          0 2014-08-04 11:47 /hbase/WALs drwxr-xr-x   - carl supergroup          0 2014-08-04 11:48 /hbase/data -rw-r--r--   3 carl supergroup         42 2014-08-04 11:47 /hbase/hbase.id -rw-r--r--   3 carl supergroup          7 2014-08-04 11:47 /hbase/hbase.version drwxr-xr-x   - carl supergroup          0 2014-08-04 11:47 /hbase/oldWALs

Start HMaster Backup Servers
The default port number for HMaster is 16010, 16020, 16030
>bin/local-master-backup.sh start 2 3 5 

That will start 3 HMaster backup Server on 16012/16022/16032,16013/16023/16033, 16015/16025/16035

Find the process id in /tmp/hbase-USERS-x-master.pid to stop the server

For example
>cat /tmp/hbase-carl-2-master.pid 
6442
>cat /tmp/hbase-carl-5-master.pid |xargs kill -9

Start and stop Additional RegionServers
The default port is 16020,16030. But the base additional ports are 16200, 16300.
>bin/local-regionservers.sh start 2 3 5

>bin/local-regionservers.sh stop 5

6. Fully Distributed
I have the 4 machines, I will list them as follow:
ubuntu-master     hmaster
ubuntu-client1      hmaster-backup
ubuntu-client2      regionserver
ubuntu-client3      regionserver

Set up the Configuration
>cat conf/regionservers 
ubuntu-client2 ubuntu-client3

>cat conf/backup-masters 
ubuntu-client1

Since I already have the ZK running, so 
>vi conf/hbase-env.sh
export HBASE_MANAGES_ZK=false

The main configuration file
>cat conf/hbase-site.xml 
<configuration>      

  <property>          

    <name>hbase.rootdir</name>          

    <value>hdfs://ubuntu-master:9000/hbase</value>      

  </property>      

  <property>          

    <name>hbase.zookeeper.property.dataDir</name>          

    <value>/home/carl/etc/hbase</value>      

  </property>      

  <property>          

    <name>hbase.cluster.distributed</name>          

    <value>true</value>      

  </property>      

  <property>          

    <name>hbase.master.wait.on.regionservers.mintostart</name>          

    <value>1</value>      

  </property>      

  <property>          

    <name>hbase.zookeeper.quorum</name>          

    <value>ubuntu-client1,ubuntu-client2,ubuntu-client3</value>      

  </property>      

  <property>          

    <name>hbase.zookeeper.property.dataDir</name>                

    <value>/home/carl/etc/zookeeper</value>      

  </property>

</configuration>

The last step is just to start the server
>bin/start-hbase.sh

Visit the web UI
http://ubuntu-master:60010/master-status

References:
https://hbase.apache.org/http://www.alidata.org/archives/1509

http://blog.csdn.net/heyutao007/article/details/6920882
http://blog.sina.com.cn/s/blog_5c5d5cdf0101dvgq.html      hadoop hbase zookeeper
http://www.cnblogs.com/ventlam/archive/2011/01/22/HBaseCluster.html

http://www.searchtb.com/2011/01/understanding-hbase.html
http://www.searchdatabase.com.cn/showcontent_31652.htm

hadoop
http://sillycat.iteye.com/blog/1556106
http://sillycat.iteye.com/blog/1556107

tips about hadoop
http://blog.chinaunix.net/uid-20682147-id-4229024.html
http://my.oschina.net/skyim/blog/228486
http://blog.huangchaosuper.cn/work/tech/2014/04/24/hadoop-install.html
http://blog.sina.com.cn/s/blog_45d2413b0102e2zx.html
http://www.it165.net/os/html/201405/8311.html

相关推荐