Redis高可用集群实践
redis版本:Redis-5.0.8
系统版本:Redhat 6.8
主机 | IP |
---|---|
redispri1 | 110.50.1.55 |
redispri2 | 110.50.1.56 |
redispri3 | 110.50.1.57 |
redisbck1 | 110.50.1.57 |
redisbck2 | 110.50.1.59 |
redisbck3 | 110.50.1.60 |
二、搭建redis高可用集群
- 各节点关闭NetworkManager、selinux=disable
# /etc/init.d/NetworkManager stop # chkconfig NetworkManager off # sed -i "s/SELINUX=.*/SELINUX=disabled/g" /etc/selinux/config && setenforce 0
- 各节点安装Redis、gcc gcc-c++ libstdc++-devel
# tar xvf redis-5.0.8.tar.gz -C /usr/local/ # mv /usr/local/redis-5.0.8/ /usr/local/redis/ # yum install -y gcc gcc-c++ libstdc++-devel # cd /usr/local/redis/src && make MALLOC=libc && make install prefix=/uer/local/redis
- 各节点设置Iptables,开放6379/16379端口
# iptables -I INPUT -p TCP --dport 6379 -j ACCEPT # iptables -I INPUT -p TCP --dport 16379 -j ACCEPT # /etc/init.d/iptables save # chkconfig iptables on
- 集群配置
#创建备份文件存储、数据文件存储、日志文件存储、AOF模式文件存储 # mkdir -p /appdata/redis/{db,file,log,aof} #修改配置文件参数 # cp /usr/local/redis/redis.conf /usr/local/redis/redis.conf.bak # vim /usr/local/redis/redis_6379.conf bind 当前节点IP port 6379 daemonize yes pidfile /var/run/redis_6379.pid #验证环境debug,生产环境notice loglevel debug logfile /appdata/redis/log/redis.log save 900 1 save 300 10 save 60 10000 dbfilename dump.rdb dir /appdata/redis/file/ requirepass 202004 maxclients 10000 maxmemory 100M appendonly yes appendfilename appendonly.aof appendfsync everysec auto-aof-rewrite-percentage 100 auto-aof-rewrite-min-size 64mb cluster-enabled yes cluster-config-file nodes-6379.conf cluster-node-timeout 5000 #禁用清空所有记录 rename-command FLUSHALL "" #禁用清空数据库 rename-command FLUSHDB "" #禁用客户端连接后可配置服务器 #rename-command CONFIG "" #禁用客户端连接后可查看所有存在的键 rename-command KEYS ""
- 各节点启动redis
# /usr/local/bin/redis-server /usr/local/redis/redis_6379.conf #bin目录文件说明 redis-benchmark #脚本用于性能测试 redis-check-aof #脚本用于redis数据持久化,是来一条存储一条 redis-check-rdb #脚本用于redis数据持久化,是每隔一段时间存储一次 redis-cli #脚本用于客户端对redis的连接 redis-sentinel -- >redis-server #脚本用于集群上,哨兵 redis-server #脚本用于redis服务的开启
- 创建集群
# redis-cli --cluster create --cluster-replicas 1 110.50.1.55:6379 110.50.1.56:6379 110.50.1.57:6379 110.50.1.58:6379 110.50.1.59:6379 110.50.1.60:6379>>> Performing hash slots allocation on 6 nodes... Master[0] -> Slots 0 - 5460 Master[1] -> Slots 5461 - 10922 Master[2] -> Slots 10923 - 16383 Adding replica 110.50.1.59:6379 to 110.50.1.55:6379 Adding replica 110.50.1.60:6379 to 110.50.1.56:6379 Adding replica 110.50.1.58:6379 to 110.50.1.57:6379 M: d851fcaf25e413e9e58d0ad96c91ac2d7b4ae515 110.50.1.55:6379 slots:[0-5460] (5461 slots) master M: ec79401490c99d77c878c9a8b7cfb08fe237b949 110.50.1.56:6379 slots:[5461-10922] (5462 slots) master M: 6a75af4717e1890a632ffffcad4d1e8b98cebcab 110.50.1.57:6379 slots:[10923-16383] (5461 slots) master S: b38d96efe145a810dd48e3349002794c98649d72 110.50.1.58:6379 replicates 6a75af4717e1890a632ffffcad4d1e8b98cebcab S: 00d4fdf753d8097637d3c96b1da6abe43ebb7e59 110.50.1.59:6379 replicates d851fcaf25e413e9e58d0ad96c91ac2d7b4ae515 S: f66e8c1f3e7e3cd35a86b11dd0506b1fc24691fe 110.50.1.60:6379 replicates ec79401490c99d77c878c9a8b7cfb08fe237b949 Can I set the above configuration? (type ‘yes‘ to accept): yes >>> Nodes configuration updated >>> Assign a different config epoch to each node >>> Sending CLUSTER MEET messages to join the cluster Waiting for the cluster to join .. >>> Performing Cluster Check (using node 110.50.1.55:6379) M: d851fcaf25e413e9e58d0ad96c91ac2d7b4ae515 110.50.1.55:6379 slots:[0-5460] (5461 slots) master 1 additional replica(s) S: b38d96efe145a810dd48e3349002794c98649d72 110.50.1.58:6379 slots: (0 slots) slave replicates 6a75af4717e1890a632ffffcad4d1e8b98cebcab S: 00d4fdf753d8097637d3c96b1da6abe43ebb7e59 110.50.1.59:6379 slots: (0 slots) slave replicates d851fcaf25e413e9e58d0ad96c91ac2d7b4ae515 M: 6a75af4717e1890a632ffffcad4d1e8b98cebcab 110.50.1.57:6379 slots:[10923-16383] (5461 slots) master 1 additional replica(s) S: f66e8c1f3e7e3cd35a86b11dd0506b1fc24691fe 110.50.1.60:6379 slots: (0 slots) slave replicates ec79401490c99d77c878c9a8b7cfb08fe237b949 M: ec79401490c99d77c878c9a8b7cfb08fe237b949 110.50.1.56:6379 slots:[5461-10922] (5462 slots) master 1 additional replica(s) [OK] All nodes agree about slots configuration. >>> Check for open slots... >>> Check slots coverage... [OK] All 16384 slots covered.
- 查看集群状态
# redis-cli -c -h 110.50.1.55 -p 6379 # 查看集群节点 110.50.1.55:6379> cluster nodes b38d96efe145a810dd48e3349002794c98649d72 110.50.1.58: slave 6a75af4717e1890a632ffffcad4d1e8b98cebcab 0 1587694145212 4 connected 00d4fdf753d8097637d3c96b1da6abe43ebb7e59 110.50.1.59: slave d851fcaf25e413e9e58d0ad96c91ac2d7b4ae515 0 1587694145000 5 connected 6a75af4717e1890a632ffffcad4d1e8b98cebcab 110.50.1.57: master - 0 1587694145000 3 connected 10923-16383 f66e8c1f3e7e3cd35a86b11dd0506b1fc24691fe 110.50.1.60: slave ec79401490c99d77c878c9a8b7cfb08fe237b949 0 1587694146623 6 connected d851fcaf25e413e9e58d0ad96c91ac2d7b4ae515 110.50.1.55: myself,master - 0 1587694146000 1 connected 0-5460 ec79401490c99d77c878c9a8b7cfb08fe237b949 110.50.1.56: master - 0 1587694146219 2 connected 5461-10922 # 查看集群信息 110.50.1.55:6379> cluster info cluster_state:ok cluster_slots_assigned:16384 cluster_slots_ok:16384 cluster_slots_pfail:0 cluster_slots_fail:0 cluster_known_nodes:6 cluster_size:3 cluster_current_epoch:6 cluster_my_epoch:1 cluster_stats_messages_ping_sent:828 cluster_stats_messages_pong_sent:826 cluster_stats_messages_sent:1654 cluster_stats_messages_ping_received:821 cluster_stats_messages_pong_received:828 cluster_stats_messages_meet_received:5 cluster_stats_messages_received:1654 # 查看集群分片 110.50.1.55:6379> cluster slots 1) 1) (integer) 10923 2) (integer) 16383 3) 1) "110.50.1.57" 2) (integer) 6379 3) "6a75af4717e1890a632ffffcad4d1e8b98cebcab" 4) 1) "110.50.1.58" 2) (integer) 6379 3) "b38d96efe145a810dd48e3349002794c98649d72" 2) 1) (integer) 0 2) (integer) 5460 3) 1) "110.50.1.55" 2) (integer) 6379 3) "d851fcaf25e413e9e58d0ad96c91ac2d7b4ae515" 4) 1) "110.50.1.59" 2) (integer) 6379 3) "00d4fdf753d8097637d3c96b1da6abe43ebb7e59" 3) 1) (integer) 5461 2) (integer) 10922 3) 1) "110.50.1.56" 2) (integer) 6379 3) "ec79401490c99d77c878c9a8b7cfb08fe237b949" 4) 1) "110.50.1.60" 2) (integer) 6379 3) "f66e8c1f3e7e3cd35a86b11dd0506b1fc24691fe"
- 设置集群密码
requirepass:外面服务、客户端来连接redis的密码。
masterauth:redis从去连接redis主使用的密码。
这个意思是说,如果你在主上设置了requirepass参数,你就需要再从上设置masterauth参数,并和主密码指定成一样的。这样从才能继续去同步主的数据。方法一:配置文件添加字段,但是需要重启 requirepass "passwd" masterauth "passwd" 方法二:登录redis执行以下命令,不需要重启服务 config rewrite 可以将config set持久化到Redis配置文件中 config set requirepass "passwd" config set masterauth "passwd" config rewrite
- 验证测试及结论
随机关闭1主节点,对应从节点被推选为主节点,集群依旧正常对外服务
*#正常集群* 112.111.110.50.1.55:6379> cluster nodes d851fcaf25e413e9e58d0ad96c91ac2d7b4ae515 110.50.1.55: myself,master - 0 1587699843000 9 connected 0-5460 00d4fdf753d8097637d3c96b1da6abe43ebb7e59 110.50.1.59: slave d851fcaf25e413e9e58d0ad96c91ac2d7b4ae515 0 1587699845503 9 connected b38d96efe145a810dd48e3349002794c98649d72 110.50.1.58: slave 6a75af4717e1890a632ffffcad4d1e8b98cebcab 0 1587699844597 4 connected ec79401490c99d77c878c9a8b7cfb08fe237b949 110.50.1.56: master - 0 1587699845504 10 connected 5461-10922 f66e8c1f3e7e3cd35a86b11dd0506b1fc24691fe 110.50.1.60: slave ec79401490c99d77c878c9a8b7cfb08fe237b949 0 1587699844902 10 connected 6a75af4717e1890a632ffffcad4d1e8b98cebcab 110.50.1.57: master - 0 1587699844000 3 connected 10923-16383 *#关闭主节点110.50.1.57* *#此时节点状态为master,fail,之前的从节点110.50.1.58被推选为master* 111.110.50.1.55:6379> cluster nodes* d851fcaf25e413e9e58d0ad96c91ac2d7b4ae515 110.50.1.55: myself,master - 0 1587699968000 9 connected 0-5460 00d4fdf753d8097637d3c96b1da6abe43ebb7e59 110.50.1.59: slave d851fcaf25e413e9e58d0ad96c91ac2d7b4ae515 0 1587699967723 9 connected b38d96efe145a810dd48e3349002794c98649d72 110.50.1.58: master - 0 1587699969535 12 connected 10923-16383 ec79401490c99d77c878c9a8b7cfb08fe237b949 110.50.1.56: master - 0 1587699967519 10 connected 5461-10922 f66e8c1f3e7e3cd35a86b11dd0506b1fc24691fe 110.50.1.60: slave ec79401490c99d77c878c9a8b7cfb08fe237b949 0 1587699969739 10 connected 6a75af4717e1890a632ffffcad4d1e8b98cebcab 110.50.1.57: master,fail - 1587699941783 1587699941000 3 disconnected *#设置Key并随机查看,集群可用* 111.110.50.1.55:6379> set helloworld 111 OK 110.50.1.58:6379> get helloworld -> Redirected to slot [2739] located at 110.50.1.55:6379 "111" 110.50.1.59:6379> get helloworld -> Redirected to slot [2739] located at 110.50.1.55:6379 "111"
随机手动关闭1从节点,集群依旧正常对外服务
*#关闭从节点110.50.1.59* *#此时节点状态为slave,fail* 111.110.50.1.56:6379> cluster nodes 00d4fdf753d8097637d3c96b1da6abe43ebb7e59 110.50.1.59: slave,fail d851fcaf25e413e9e58d0ad96c91ac2d7b4ae515 1587703268981 1587703268000 9 disconnected f66e8c1f3e7e3cd35a86b11dd0506b1fc24691fe 110.50.1.60: slave ec79401490c99d77c878c9a8b7cfb08fe237b949 0 1587703278821 10 connected b38d96efe145a810dd48e3349002794c98649d72 110.50.1.58: slave 6a75af4717e1890a632ffffcad4d1e8b98cebcab 0 1587703278000 13 connected ec79401490c99d77c878c9a8b7cfb08fe237b949 110.50.1.56: myself,master - 0 1587703277000 10 connected 5461-10922 d851fcaf25e413e9e58d0ad96c91ac2d7b4ae515 110.50.1.55: master - 0 1587703277819 9 connected 0-5460 6a75af4717e1890a632ffffcad4d1e8b98cebcab 110.50.1.57: master - 0 1587703278000 13 connected 10923-16383 *#设置Key并随机查看,集群可用* 111.110.50.1.56:6379> set helloworld 222 -> Redirected to slot [2739] located at 110.50.1.55:6379 OK 110.50.1.58:6379> get helloworld -> Redirected to slot [2739] located at 110.50.1.55:6379 "222"
随机关闭一对主从
*#随机主动关闭一对主从节点110.50.1.57,110.50.1.58* *#报错(error) CLUSTERDOWN The cluster is down,集群已不可用* 111.110.50.1.55:6379> cluster nodes d851fcaf25e413e9e58d0ad96c91ac2d7b4ae515 110.50.1.55: myself,master - 0 1587703705000 9 connected 0-5460 00d4fdf753d8097637d3c96b1da6abe43ebb7e59 110.50.1.59: slave d851fcaf25e413e9e58d0ad96c91ac2d7b4ae515 0 1587703706000 9 connected b38d96efe145a810dd48e3349002794c98649d72 110.50.1.58: slave,fail 6a75af4717e1890a632ffffcad4d1e8b98cebcab 1587703697645 1587703697546 13 disconnected ec79401490c99d77c878c9a8b7cfb08fe237b949 110.50.1.56: master - 0 1587703705557 10 connected 5461-10922 f66e8c1f3e7e3cd35a86b11dd0506b1fc24691fe 110.50.1.60: slave ec79401490c99d77c878c9a8b7cfb08fe237b949 0 1587703706165 10 connected 6a75af4717e1890a632ffffcad4d1e8b98cebcab 110.50.1.57: master,fail - 1587703692596 1587703691995 13 disconnected 10923-16383 110.50.1.55:6379> set helloworld 333 (error) CLUSTERDOWN The cluster is down
随机关闭两主节点
*#随机关闭两主节点110.50.1.57,110.50.1.56* *#报错(error) CLUSTERDOWN The cluster is down,集群已不可用* 111.110.50.1.55:6379> cluster nodes d851fcaf25e413e9e58d0ad96c91ac2d7b4ae515 110.50.1.55: myself,master - 0 1587704575000 9 connected 0-5460 00d4fdf753d8097637d3c96b1da6abe43ebb7e59 110.50.1.59: slave d851fcaf25e413e9e58d0ad96c91ac2d7b4ae515 0 1587704575494 9 connected b38d96efe145a810dd48e3349002794c98649d72 110.50.1.58: slave 6a75af4717e1890a632ffffcad4d1e8b98cebcab 0 1587704573466 13 connected ec79401490c99d77c878c9a8b7cfb08fe237b949 110.50.1.56: master,fail? - 1587704561741 1587704560000 10 disconnected 5461-10922 f66e8c1f3e7e3cd35a86b11dd0506b1fc24691fe 110.50.1.60: slave ec79401490c99d77c878c9a8b7cfb08fe237b949 0 1587704575496 10 connected 6a75af4717e1890a632ffffcad4d1e8b98cebcab 110.50.1.57: master,fail? - 1587704566981 1587704566000 13 disconnected 10923-16383 110.50.1.55:6379> set helloworld 444 (error) CLUSTERDOWN The cluster is down
验证结论:
1)集群是如何判断是否有某个节点挂掉并进行新的选举
redis集群的每一个节点都存有这个集群所有主节点以及从节点的信息。它们之间通过互相的ping-pong判断是否节点可以连接上。如果有一半以上的节点去ping一个节点的时候没有回应,集群就认为这个节点宕机了,然后去连接它的备用节点,并推选为新的master
2)集群不可用
a.如果集群任意master挂掉,且当前master没有slave.集群不可用
b.如果集群超过半数以上master挂掉,无论是否有slave,集群不可用
三、Redis自带的测试工具
# redis-benchmark -h 110.50.1.59 -p 6382 -t set,get -c 10 -n 1000000 ====== SET ====== 1000000 requests completed in 45.44 seconds 10 parallel clients 3 bytes payload keep alive: 1 89.95% <= 1 milliseconds 99.99% <= 2 milliseconds 100.00% <= 3 milliseconds 100.00% <= 3 milliseconds 22005.59 requests per second ====== GET ====== 1000000 requests completed in 45.68 seconds 10 parallel clients 3 bytes payload keep alive: 1 90.05% <= 1 milliseconds 99.99% <= 2 milliseconds 100.00% <= 3 milliseconds 100.00% <= 5 milliseconds 100.00% <= 7 milliseconds 21891.90 requests per second
四、搭建过程中遇到的问题及解决方法
一、 编译报错
# make && make install prefix=/usr/local/redis/ ake[1]: Entering directory `/usr/local/redis/src‘ CC adlist.o 在包含自 adlist.c:34 的文件中: zmalloc.h:50:31: 错误:jemalloc/jemalloc.h:没有那个文件或目录 zmalloc.h:55:2: 错误:#error "Newer version of jemalloc required" make[1]: *** [adlist.o] 错误 1 make[1]: Leaving directory `/usr/local/redis/src‘ make: *** [all] 错误 2
1)src目录包含了redis server和redis cli的相关命令脚本,应在此目录编译
2)在README 有这个一段话。
Allocator
Selecting a non-default memory allocator when building Redis is done by setting
theMALLOC
environment variable. Redis is compiled and linked against libc
malloc by default, with the exception of jemalloc being the default on Linux systems. This default was picked because jemalloc has proven to have fewer
fragmentation problems than libc malloc.
To force compiling against libc malloc, use:
% make MALLOC=libc
To compile against jemalloc on Mac OS X systems, use:
% make MALLOC=jemalloc
allocator(分配器), 如果有MALLOC这个环境变量, 会有用这个环境变量去建立Redis。默认的分配器是 jemalloc, 因为 jemalloc和libc比有更少的fragmentation problems。因为系统没有jemalloc而只有libc,所以编译报错。
解决办法:
(1)加参数
make MALLOC=libc
(2)下载安装jemalloc
https://github.com/jemalloc/jemalloc/releases
./configure && make && make install
二、启动报错
# dbfilename appendfilename 不能使用路径 # /usr/local/bin/redis-server /usr/local/redis/redis_6379.conf *** FATAL CONFIG FILE ERROR *** Reading the configuration file, at line 253 >>> ‘dbfilename "/appdata/redis/db/dump.rdb"‘ dbfilename can‘t be a path, just a filename *** FATAL CONFIG FILE ERROR *** Reading the configuration file, at line 706 >>> ‘appendfilename "/appdata/redis/aof/appendonly.aof"‘ appendfilename can‘t be a path, just a filename # 禁用命令后不能加注释 # /usr/local/bin/redis-server /usr/local/redis/redis_6379.conf *** FATAL CONFIG FILE ERROR *** Reading the configuration file, at line 325 >>> ‘rename-command FLUSHALL "" #禁用清空所有记录‘ Bad directive or wrong number of arguments
三、连接报错
无论是在redis-server 5.x版本,还是老的ruby创建集群的方式, 在create cluster的环节是不能配置redis密码的, 如果设置了密码,redis-cli --cluster create会报用户认证失败的错误 # redis-cli --cluster create --cluster-replicas 1 110.50.1.55:6379 110.50.1.56:6379 110.50.1.57:6379 110.50.1.58:6379 110.50.1.59:6379 110.50.1.60:6379 [ERR] Node 110.50.1.55:6379 NOAUTH Authentication required. ################################################## 解决方法 在redis-cli --cluster create创建集群时去除所有redis节点的密码 当集群配置完成后,通过config set的方式动态的为每一个节点设置密码(不需要重启redis,且重启后仍然有效) redis-cli -h 127.0.0.1 -p 6379 -c 127.0.0.1:6379> config set requirepass ‘password‘ // 设置密码 127.0.0.1:6379> config set masterauth ‘password‘ // 设置连接密码 127.0.0.1:6379> config rewrite // 把config set 操作写入配置文件中 设置密码后连接集群 redis-cli -h 127.0.0.1 -p 6379 -c -a password
五、 redis-cli --cluster help
redis-cli --cluster help Cluster Manager Commands: create host1:port1 ... hostN:portN #创建集群 --cluster-replicas <arg> #从节点个数 check host:port #检查集群 --cluster-search-multiple-owners #检查是否有槽同时被分配给了多个节点 info host:port #查看集群状态 fix host:port #修复集群 --cluster-search-multiple-owners #修复槽的重复分配问题 reshard host:port #指定集群的任意一节点进行迁移slot,重新分slots --cluster-from <arg> #需要从哪些源节点上迁移slot,可从多个源节点完成迁移,以逗号隔开,传递的是节点的node id,还可以直接传递--from all,这样源节点就是集群的所有节点,不传递该参数的话,则会在迁移过程中提示用户输入 --cluster-to <arg> #slot需要迁移的目的节点的node id,目的节点只能填写一个,不传递该参数的话,则会在迁移过程中提示用户输入 --cluster-slots <arg> #需要迁移的slot数量,不传递该参数的话,则会在迁移过程中提示用户输入。 --cluster-yes #指定迁移时的确认输入 --cluster-timeout <arg> #设置migrate命令的超时时间 --cluster-pipeline <arg> #定义cluster getkeysinslot命令一次取出的key数量,不传的话使用默认值为10 --cluster-replace #是否直接replace到目标节点 rebalance host:port #指定集群的任意一节点进行平衡集群节点slot数量 --cluster-weight <node1=w1...nodeN=wN> #指定集群节点的权重 --cluster-use-empty-masters #设置可以让没有分配slot的主节点参与,默认不允许 --cluster-timeout <arg> #设置migrate命令的超时时间 --cluster-simulate #模拟rebalance操作,不会真正执行迁移操作 --cluster-pipeline <arg> #定义cluster getkeysinslot命令一次取出的key数量,默认值为10 --cluster-threshold <arg> #迁移的slot阈值超过threshold,执行rebalance操作 --cluster-replace #是否直接replace到目标节点 add-node new_host:new_port existing_host:existing_port #添加节点,把新节点加入到指定的集群,默认添加主节点 --cluster-slave #新节点作为从节点,默认随机一个主节点 --cluster-master-id <arg> #给新节点指定主节点 del-node host:port node_id #删除给定的一个节点,成功后关闭该节点服务 call host:port command arg arg .. arg #在集群的所有节点执行相关命令 set-timeout host:port milliseconds #设置cluster-node-timeout import host:port #将外部redis数据导入集群 --cluster-from <arg> #将指定实例的数据导入到集群 --cluster-copy #migrate时指定copy --cluster-replace #migrate时指定replace help For check, fix, reshard, del-node, set-timeout you can specify the host and port of any working node in the cluster.