HBase(1)Introduction and Installation
HBase(1)Introduction and Installation
1. HBase Introduction
Hadoop Database ——> Hadoop HDFS
Hadoop Database ——>Hadoop MapReduce
Hadoop Database ——> Hadoop Zookeeper
Fundamentally Distributed — partitioning(sharding), replication
Column Oriented
Sequential Write in memory——flush to disk
Merged Read
Periodic Data Compation
Pig(Data Flow) Hive(SQL), Sqoop(RDBMS importing support)
HMaster Server: Region assignment Mgmt(Hadoop Master,NameNode,JobTracker)
HRegionServer #1:DateNode, TaskTracker
2. Install and Setup Hadoop
Install protoc
>wgethttps://protobuf.googlecode.com/files/protobuf-2.5.0.tar.gz
Unzip and cd to that directory
>./configure --prefix=/Users/carl/tool/protobuf-2.5.0
>make
>make install
>sudo ln -s /Users/carl/tool/protobuf-2.5.0 /opt/protobuf-2.5.0
>sudo ln -s /opt/protobuf-2.5.0 /opt/protobuf
Add this line to my environment
export PATH=/opt/protobuf/bin:$PATH
Check the Installation Environment
>protoc --version
libprotoc 2.5.0
Compile Hadoop
>svn checkouthttp://svn.apache.org/repos/asf/hadoop/common/tags/release-2.4.0hadoop-common-2.4.0
Read the document here for building
http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.4.0/BUILDING.txt
>cd hadoop-common-2.4.0/
>mvn clean install -DskipTests
>cd hadoop-mapreduce-project
>mvn clean install assembly:assembly -Pnative
Maybe my machine is too slow, so a lot of timeout Error on my machine. So I redo it like this
>mvn clean -DskipTests install assembly:assembly -Pnative
Need to get rid of the native
>mvn clean -DskipTests install assembly:assembly
Not working, read the document INSTALL
http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.4.0/hadoop-mapreduce-project/INSTALL
>cd ..
>mvn clean package -Pdist -Dtar -DskipTests
Lastest document from here
/Users/carl/data/installation/hadoop-2.4.0/share/doc/hadoop/hadoop-project-dist/hadoop-common/SingleCluster.html
Follow the BLOG and set hadoop on 4 machines
http://sillycat.iteye.com/blog/2084169
http://sillycat.iteye.com/blog/2090186
>sbin/start-dfs.sh
>sbin/start-yarn.sh
>sbin/mr-jobhistory-daemon.sh start historyserver
3. Setup Zookeeper
Follow the BLOG and set zookeeper on 3 machines
http://sillycat.iteye.com/blog/2015175
>zkServer.sh start conf/zoo-cluster.cfg
4. Try install HBase - Standalone HBase
Download this version since I am using hadoop 2.4.x
>wget http://mirrors.gigenet.com/apache/hbase/hbase-0.98.4/hbase-0.98.4-hadoop2-bin.tar.gz
Unzip the file and move it to the work directory.
>sudo ln -s /home/carl/tool/hbase-0.98.4 /opt/hbase-0.98.4
>sudo ln -s /opt/hbase-0.98.4 /opt/hbase
Check and modify the configuration file
>cat conf/hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>file:///opt/hbase</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/carl/etc/hbase</value>
</property>
</configuration>
Start the Service
>bin/start-hbase.sh
>jps
2036 NameNode 4084 Jps 3340 HMaster 2403 ResourceManager 2263 SecondaryNameNode 2686 JobHistoryServer
Enter the Client Shell
>bin/hbase shell
Create the table
>create 'test', 'cf'
Check the info on that table
>list 'test'
Inserting data
>put 'test', 'row1', 'cf:a', 'value1'
>put 'test', 'row2', 'cf:a', 'value2'
>put 'test', 'row3', 'cf:a', 'value3'
row1 should be the row key, column will be cf:a, value should be value1.
Get all the data
>scan 'test'
ROW COLUMN+CELL row1 column=cf:a, timestamp=1407169545627, value=value1 row2 column=cf:a, timestamp=1407169557668, value=value2 row3 column=cf:a, timestamp=1407169563458, value=value3 3 row(s) in 0.0630 seconds
Get a single row
>get 'test', 'row1'
COLUMN CELL cf:a timestamp=1407169545627, value=value1
Some other command
>disable ‘test’
>enable ‘test’
>drop ‘test'
5. Pseudo-Distributed Local Install
Change the configuration as follow
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://ubuntu-master:9000/hbase</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/carl/etc/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.master.wait.on.regionservers.mintostart</name>
<value>1</value>
</property>
</configuration>
List the HDFS directory
>hadoop fs -ls /
Found 4 items drwxr-xr-x - carl supergroup 0 2014-07-09 13:22 /data drwxr-xr-x - carl supergroup 0 2014-08-04 11:47 /hbase drwxr-xr-x - carl supergroup 0 2014-07-10 13:09 /output drwxrwx--- - carl supergroup 0 2014-08-04 11:21 /tmp
>hadoop fs -ls /hbase
Found 6 items drwxr-xr-x - carl supergroup 0 2014-08-04 11:48 /hbase/.tmp drwxr-xr-x - carl supergroup 0 2014-08-04 11:47 /hbase/WALs drwxr-xr-x - carl supergroup 0 2014-08-04 11:48 /hbase/data -rw-r--r-- 3 carl supergroup 42 2014-08-04 11:47 /hbase/hbase.id -rw-r--r-- 3 carl supergroup 7 2014-08-04 11:47 /hbase/hbase.version drwxr-xr-x - carl supergroup 0 2014-08-04 11:47 /hbase/oldWALs
Start HMaster Backup Servers
The default port number for HMaster is 16010, 16020, 16030
>bin/local-master-backup.sh start 2 3 5
That will start 3 HMaster backup Server on 16012/16022/16032,16013/16023/16033, 16015/16025/16035
Find the process id in /tmp/hbase-USERS-x-master.pid to stop the server
For example
>cat /tmp/hbase-carl-2-master.pid
6442
>cat /tmp/hbase-carl-5-master.pid |xargs kill -9
Start and stop Additional RegionServers
The default port is 16020,16030. But the base additional ports are 16200, 16300.
>bin/local-regionservers.sh start 2 3 5
>bin/local-regionservers.sh stop 5
6. Fully Distributed
I have the 4 machines, I will list them as follow:
ubuntu-master hmaster
ubuntu-client1 hmaster-backup
ubuntu-client2 regionserver
ubuntu-client3 regionserver
Set up the Configuration
>cat conf/regionservers
ubuntu-client2 ubuntu-client3
>cat conf/backup-masters
ubuntu-client1
Since I already have the ZK running, so
>vi conf/hbase-env.sh
export HBASE_MANAGES_ZK=false
The main configuration file
>cat conf/hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://ubuntu-master:9000/hbase</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/carl/etc/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.master.wait.on.regionservers.mintostart</name>
<value>1</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>ubuntu-client1,ubuntu-client2,ubuntu-client3</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/carl/etc/zookeeper</value>
</property>
</configuration>
The last step is just to start the server
>bin/start-hbase.sh
Visit the web UI
http://ubuntu-master:60010/master-status
References:
https://hbase.apache.org/http://www.alidata.org/archives/1509
http://blog.csdn.net/heyutao007/article/details/6920882
http://blog.sina.com.cn/s/blog_5c5d5cdf0101dvgq.html hadoop hbase zookeeper
http://www.cnblogs.com/ventlam/archive/2011/01/22/HBaseCluster.html
http://www.searchtb.com/2011/01/understanding-hbase.html
http://www.searchdatabase.com.cn/showcontent_31652.htm
hadoop
http://sillycat.iteye.com/blog/1556106
http://sillycat.iteye.com/blog/1556107
tips about hadoop
http://blog.chinaunix.net/uid-20682147-id-4229024.html
http://my.oschina.net/skyim/blog/228486
http://blog.huangchaosuper.cn/work/tech/2014/04/24/hadoop-install.html
http://blog.sina.com.cn/s/blog_45d2413b0102e2zx.html
http://www.it165.net/os/html/201405/8311.html