SolrCloud/ZooKeeper优化
SolrCloud优化:
1:CPU主频
2:ZooKeeper的优化项: 参考:http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html
Things to Avoid
Here are some common problems you can avoid by configuring ZooKeeper correctly:
inconsistent lists of serversThe list of ZooKeeper servers used by the clients must match the list of ZooKeeper servers that each ZooKeeper server has. Things work okay if the client list is a subset of the real list, but things will really act strange if clients have a list of ZooKeeper servers that are in different ZooKeeper clusters. Also, the server lists in each Zookeeper server configuration file should be consistent with one another.
incorrect placement of transasction logThe most performance critical part of ZooKeeper is the transaction log. ZooKeeper syncs transactions to media before it returns a response. A dedicated transaction log device is key to consistent good performance. Putting the log on a busy device will adversely effect performance. If you only have one storage device, put trace files on NFS and increase the snapshotCount; it doesn't eliminate the problem, but it should mitigate it.
incorrect Java heap sizeYou should take special care to set your Java max heap size correctly. In particular, you should not create a situation in which ZooKeeper swaps to disk. The disk is death to ZooKeeper. Everything is ordered, so if processing one request swaps the disk, all other queued requests will probably do the same. the disk. DON'T SWAP.
Be conservative in your estimates: if you have 4G of RAM, do not set the Java max heap size to 6G or even 4G. For example, it is more likely you would use a 3G heap for a 4G machine, as the operating system and the cache also need memory. The best and only recommend practice for estimating the heap size your system needs is to run load tests, and then make sure you are well below the usage limit that would cause the system to swap.
3:
每指定个maxBufferedDocs 为一个 segment ,每指定个mergeFactor 为一个single index file,适当调整maxBufferedDocs 和 mergeFactor 参数以致优化
4:点击solr admin UI 中的 Optimize 按钮,会将 single index file 合成一个索引文件, Optimize 是一个I/O高密集形任务,且 solr数据频繁的更新也会导致 Optimize 后的索引使用不了多长时间就得重新 Optimize ;
5: 参考:http://www.solr.cc/blog/?p=788
1、数据更新频率:每天数据增量有多大,随时更新还是定时更新
2、数据总量:数据要保存多长时间
3、一致性要求:期望多长时间内看到更新的数据,最长允许多长时间延迟
4、数据特点:数据源包括哪些,平均单条记录大小
5、业务特点:有哪些排序要求,检索条件
6、资源复用:已有的硬件配置是怎样的,是否有升级计划