solrj中LBHttpSolrServer的使用说明

工作中用到solr做搜索,由于没有Lucene基础,有些理论性的知识完全不能理解,不过大概还是把搜索的任务做好了,没有用到分词等功能。在使用LBHttpSolrServer的时候遇到了一个问题,就是写入的数据比预期的少一半,Solr的Wiki上这样解释LBHttpSolrServer的用法:

What is LBHttpSolrServer?

LBHttpSolrServer or "Load Balanced HttpSolrServer" is just a wrapper to CommonsHttpSolrServer. This is useful when you have multiple SolrServers and query requests need to be Load Balanced among them. It offers automatic failover when a server goes down and it detects when the server comes back up.

This should NOT be used for indexing in traditional master/slave architectures since updates have to be routed to the correct master. In SolrCloud architectures, use CloudSolrServer which will take advantage of this class automatically.

 

其实已经有点明白了,上面说在更新索引的时候不要使用LBHttpSolrServer了,而恰好我在更新的时候用了这个API,通过阅读源码,弄明白了LBHttpSolrServer的原理,与我预期想的不一致造成了问题的产生。我以为LBHttpSolrServer首先拿第一个可获得的节点去操作,如果当前节点挂了自动去获取下一个节点,其实LBHttpSolrServer是轮流的拿节点来操作。

 

 

public class LBHttpSolrServer extends SolrServer {
  private final CopyOnWriteArrayList<ServerWrapper> aliveServers = new CopyOnWriteArrayList<ServerWrapper>();
  private final CopyOnWriteArrayList<ServerWrapper> zombieServers = new CopyOnWriteArrayList<ServerWrapper>();
  private ScheduledExecutorService aliveCheckExecutor;

  private HttpClient httpClient;
  private final AtomicInteger counter = new AtomicInteger(-1);

  private ReentrantLock checkLock = new ReentrantLock();
  private static final SolrQuery solrQuery = new SolrQuery("*:*");
  .....

 上面摘选了LBHttpSolrServer的相关属性部门,其中值得注意的是 counter 这个属性,LBHttpSolrServer通过这个属性来记录使用LBHttpSolrServer中的哪个地址来做相关操作。通过 counter 与 存活的 server 总数取余,代码如下:

public NamedList<Object> request(final SolrRequest request)
          throws SolrServerException, IOException {
    int count = counter.incrementAndGet();
    int attempts = 0;
    Exception ex;
    int startSize = aliveServers.size();
    while (true) {
      int size = aliveServers.size();
      if (size < 1) throw new SolrServerException("No live SolrServers available to handle this request");
      ServerWrapper solrServer;
      try {
        solrServer = aliveServers.get(count % size); //这里是关键
      } catch (IndexOutOfBoundsException e) {
        //this list changes dynamically. so it is expected to get IndexOutOfBoundsException
        continue;
      }
      try {
        return solrServer.solrServer.request(request);
      } catch (SolrException e) {
        // Server is alive but the request was malformed or invalid
        throw e;
      } catch (SolrServerException e) {
        if (e.getRootCause() instanceof IOException) {
          ex = e;
          moveAliveToDead(solrServer);
        } else {
          throw e;
        }
      } catch (Exception e) {
        throw new SolrServerException(e);
      }
      attempts++;
      if (attempts >= startSize)
        throw new SolrServerException("No live SolrServers available to handle this request", ex);
    }
  }

 其实代码中都已经详细说明了LBHttpSolrServer的相关特性,希望在用LBHttpSolrServer操作solr的时候注意好本身项目的特性,采用合适的做法。我的项目做法,不适合使用LBHttpSolrServer,原因如下:

我的项目没有采用SolrCloud做备份等操作,而是自己定义一个节点做所谓的master,这个负责读写索引,而在其他机器上装一个类似的环境slave,这个负责做备份,而master完全不知道有哪些节点在做它的备份,只需要在slave上执行如下命令即可同步slave与master一致:

http://slave:8984/solr/core1/replication?command=fetchindex&masterUrl=http://master:7674/solr/core1/replication

至于为什么采用这种方式备份,是因为SolrCloud不适合我的应用,我的应用会频繁的做大量数据的写入索引,每一次在提交时都会触发备份,导致服务器负载非常高,这里采用这种办法缓解服务器压力,只需要在slave机器上通过curl或者其他命令执行以上命令就可以完成备份操作。

相关推荐