一.简介
SolrCloud是Solr4.0版本以后基于Solr和Zookeeper的分布式搜索方案。SolrCloud是Solr的基于Zookeeper一种部署方式。Solr可以以多种方式部署,例如单机方式,多机Master-Slaver方式。
二.特色功能
SolrCloud有几个特色功能:
集中式的配置信息使用ZK进行集中配置。启动时可以指定把Solr的相关配置文件上传Zookeeper,多机器共用。这些ZK中的配置不会再拿到本地缓存,Solr直接读取ZK中的配置信息。配置文件的变动,所有机器都可以感知到。另外,Solr的一些任务也是通过ZK作为媒介发布的。目的是为了容错。接收到任务,但在执行任务时崩溃的机器,在重启后,或者集群选出候选者时,可以再次执行这个未完成的任务。
自动容错SolrCloud对索引分片,并对每个分片创建多个Replication。每个Replication都可以对外提供服务。一个Replication挂掉不会影响索引服务。更强大的是,它还能自动的在其它机器上帮你把失败机器上的索引Replication重建并投入使用。
近实时搜索立即推送式的replication(也支持慢推送)。可以在秒内检索到新加入索引。
查询时自动负载均衡SolrCloud索引的多个Replication可以分布在多台机器上,均衡查询压力。如果查询压力大,可以通过扩展机器,增加Replication来减缓。
自动分发的索引和索引分片发送文档到任何节点,它都会转发到正确节点。
事务日志事务日志确保更新无丢失,即使文档没有索引到磁盘。
其它值得一提的功能有:
索引存储在HDFS上索引的大小通常在G和几十G,上百G的很少,这样的功能或许很难实用。但是,如果你有上亿数据来建索引的话,也是可以考虑一下的。我觉得这个功能最大的好处或许就是和下面这个“通过MR批量创建索引”联合实用。
通过MR批量创建索引有了这个功能,你还担心创建索引慢吗?
强大的RESTful API通常你能想到的管理功能,都可以通过此API方式调用。这样写一些维护和管理脚本就方便多了。
优秀的管理界面主要信息一目了然;可以清晰的以图形化方式看到SolrCloud的部署分布;当然还有不可或缺的Debug功能。
三.概念
Collection:在SolrCloud集群中逻辑意义上的完整的索引。它常常被划分为一个或多个Shard,它们使用相同的Config Set。如果Shard数超过一个,它就是分布式索引,SolrCloud让你通过Collection名称引用它,而不需要关心分布式检索时需要使用的和Shard相关参数。
Config Set: Solr Core提供服务必须的一组配置文件。每个config set有一个名字。最小需要包括solrconfig.xml (SolrConfigXml)和schema.xml (SchemaXml),除此之外,依据这两个文件的配置内容,可能还需要包含其它文件。它存储在Zookeeper中。Config sets可以重新上传或者使用upconfig命令更新,使用Solr的启动参数bootstrap_confdir指定可以初始化或更新它。
Core: 也就是Solr Core,一个Solr中包含一个或者多个Solr Core,每个Solr Core可以独立提供索引和查询功能,每个Solr Core对应一个索引或者Collection的Shard,Solr Core的提出是为了增加管理灵活性和共用资源。在SolrCloud中有个不同点是它使用的配置是在Zookeeper中的,传统的Solr core的配置文件是在磁盘上的配置目录中。
Leader: 赢得选举的Shard replicas。每个Shard有多个Replicas,这几个Replicas需要选举来确定一个Leader。选举可以发生在任何时间,但是通常他们仅在某个Solr实例发生故障时才会触发。当索引documents时,SolrCloud会传递它们到此Shard对应的leader,leader再分发它们到全部Shard的replicas。
Replica: Shard的一个拷贝。每个Replica存在于Solr的一个Core中。一个命名为“test”的collection以numShards=1创建,并且指定replicationFactor设置为2,这会产生2个replicas,也就是对应会有2个Core,每个在不同的机器或者Solr实例。一个会被命名为test_shard1_replica1,另一个命名为test_shard1_replica2。它们中的一个会被选举为Leader。
Shard: Collection的逻辑分片。每个Shard被化成一个或者多个replicas,通过选举确定哪个是Leader。
Zookeeper: Zookeeper提供分布式锁功能,对SolrCloud是必须的。它处理Leader选举。Solr可以以内嵌的Zookeeper运行,但是建议用独立的,并且最好有3个以上的主机。
四.架构图
索引(collection)的逻辑图
Solr和索引对照图
创建索引过程
分布式查询
Shard Splitting
五.其它
NRT Solr的建索引数据是要在提交时写入磁盘的,这是硬提交,确保即便是停电也不会丢失数据;为了提供更实时的检索能力,Solr设定了一种软提交方式。软提交(soft commit):仅把数据提交到内存,index可见,此时没有写入到磁盘索引文件中。
一个通常的用法是:每1-10分钟自动触发硬提交,每秒钟自动触发软提交。
RealTime Get 实时获取允许通过唯一键查找任何文档的最新版本数据,并且不需要重新打开searcher。这个主要用于把Solr作为NoSQL数据存储服务,而不仅仅是搜索引擎。Realtime Get当前依赖事务日志,默认是开启的。另外,即便是Soft Commit或者commitwithin,get也能得到真实数据。 注:commitwithin是一种数据提交特性,不是立刻,而是要求在一定时间内提交数据.
源码分析开始
一.SolrDispatchFilter初始化
@Override public void init(FilterConfig config) throws ServletException { log.info("SolrDispatchFilter.init(): {}", this.getClass().getClassLoader()); String exclude = config.getInitParameter("excludePatterns"); if(exclude != null) { String[] excludeArray = exclude.split(","); excludePatterns = new ArrayList<>(); for (String element : excludeArray) { excludePatterns.add(Pattern.compile(element)); } } try { Properties extraProperties = (Properties) config.getServletContext().getAttribute(PROPERTIES_ATTRIBUTE); if (extraProperties == null) extraProperties = new Properties(); String solrHome = (String) config.getServletContext().getAttribute(SOLRHOME_ATTRIBUTE); if (solrHome == null) solrHome = SolrResourceLoader.locateSolrHome(); ExecutorUtil.addThreadLocalProvider(SolrRequestInfo.getInheritableThreadLocalProvider()); this.cores = createCoreContainer(solrHome, extraProperties); if (this.cores.getAuthenticationPlugin() != null) { HttpClientConfigurer configurer = this.cores.getAuthenticationPlugin().getDefaultConfigurer(); if (configurer != null) { configurer.configure((DefaultHttpClient) httpClient, new ModifiableSolrParams()); } } log.info("user.dir=" + System.getProperty("user.dir")); } catch( Throwable t ) { // catch this so our filter still works log.error( "Could not start Solr. Check solr/home property and the logs"); SolrCore.log( t ); if (t instanceof Error) { throw (Error) t; } } log.info("SolrDispatchFilter.init() done"); }
二.CoreContainer执行load方法
//------------------------------------------------------------------- // Initialization / Cleanup //------------------------------------------------------------------- /** * Load the cores defined for this CoreContainer */ public void load() { log.info("Loading cores into CoreContainer [instanceDir={}]", loader.getInstanceDir()); // add the sharedLib to the shared resource loader before initializing cfg based plugins String libDir = cfg.getSharedLibDirectory(); if (libDir != null) { File f = FileUtils.resolvePath(new File(solrHome), libDir); log.info("loading shared library: " + f.getAbsolutePath()); loader.addToClassLoader(libDir, null, false); loader.reloadLuceneSPI(); } shardHandlerFactory = ShardHandlerFactory.newInstance(cfg.getShardHandlerFactoryPluginInfo(), loader); updateShardHandler = new UpdateShardHandler(cfg.getUpdateShardHandlerConfig()); solrCores.allocateLazyCores(cfg.getTransientCacheSize(), loader); logging = LogWatcher.newRegisteredLogWatcher(cfg.getLogWatcherConfig(), loader); hostName = cfg.getNodeName(); zkSys.initZooKeeper(this, solrHome, cfg.getCloudConfig()); initializeAuthenticationPlugin(); if (isZooKeeperAware()) { intializeAuthorizationPlugin(); } collectionsHandler = createHandler(cfg.getCollectionsHandlerClass(), CollectionsHandler.class); containerHandlers.put(COLLECTIONS_HANDLER_PATH, collectionsHandler); infoHandler = createHandler(cfg.getInfoHandlerClass(), InfoHandler.class); containerHandlers.put(INFO_HANDLER_PATH, infoHandler); coreAdminHandler = createHandler(cfg.getCoreAdminHandlerClass(), CoreAdminHandler.class); containerHandlers.put(CORES_HANDLER_PATH, coreAdminHandler); coreConfigService = ConfigSetService.createConfigSetService(cfg, loader, zkSys.zkController); containerProperties.putAll(cfg.getSolrProperties()); // setup executor to load cores in parallel // do not limit the size of the executor in zk mode since cores may try and wait for each other. final ExecutorService coreLoadExecutor = ExecutorUtil.newMDCAwareFixedThreadPool( ( zkSys.getZkController() == null ? cfg.getCoreLoadThreadCount() : Integer.MAX_VALUE ), new DefaultSolrThreadFactory("coreLoadExecutor") ); final List> futures = new ArrayList >(); try { List cds = coresLocator.discover(this); checkForDuplicateCoreNames(cds); for (final CoreDescriptor cd : cds) { if (cd.isTransient() || !cd.isLoadOnStartup()) { solrCores.putDynamicDescriptor(cd.getName(), cd); } else if (asyncSolrCoreLoad) { solrCores.markCoreAsLoading(cd); } if (cd.isLoadOnStartup()) { futures.add(coreLoadExecutor.submit(new Callable () { @Override public SolrCore call() throws Exception { SolrCore core; try { if (zkSys.getZkController() != null) { zkSys.getZkController().throwErrorIfReplicaReplaced(cd); } core = create(cd, false); } finally { if (asyncSolrCoreLoad) { solrCores.markCoreAsNotLoading(cd); } } try { zkSys.registerInZk(core, true); } catch (Throwable t) { SolrException.log(log, "Error registering SolrCore", t); } return core; } })); } } // Start the background thread backgroundCloser = new CloserThread(this, solrCores, cfg); backgroundCloser.start(); } finally { if (asyncSolrCoreLoad && futures != null) { Thread shutdownThread = new Thread() { public void run() { try { for (Future future : futures) { try { future.get(); } catch (InterruptedException e) { Thread.currentThread().interrupt(); } catch (ExecutionException e) { log.error("Error waiting for SolrCore to be created", e); } } } finally { ExecutorUtil.shutdownNowAndAwaitTermination(coreLoadExecutor); } } }; coreContainerWorkExecutor.submit(shutdownThread); } else { ExecutorUtil.shutdownAndAwaitTermination(coreLoadExecutor); } } if (isZooKeeperAware()) { zkSys.getZkController().checkOverseerDesignate(); } }
三.ZkContainer调用配置文件,初始化zookeeper
public void initZooKeeper(final CoreContainer cc, String solrHome, CloudConfig config) { ZkController zkController = null; String zkRun = System.getProperty("zkRun"); if (zkRun != null && config == null) throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, "Cannot start Solr in cloud mode - no cloud config provided"); if (config == null) return; // not in zk mode String zookeeperHost = config.getZkHost(); // zookeeper in quorum mode currently causes a failure when trying to // register log4j mbeans. See SOLR-2369 // TODO: remove after updating to an slf4j based zookeeper System.setProperty("zookeeper.jmx.log4j.disable", "true"); if (zkRun != null) { String zkDataHome = System.getProperty("zkServerDataDir", solrHome + "zoo_data"); String zkConfHome = System.getProperty("zkServerConfDir", solrHome); zkServer = new SolrZkServer(stripChroot(zkRun), stripChroot(config.getZkHost()), zkDataHome, zkConfHome, config.getSolrHostPort()); zkServer.parseConfig(); zkServer.start(); // set client from server config if not already set if (zookeeperHost == null) { zookeeperHost = zkServer.getClientString(); } } int zkClientConnectTimeout = 30000; if (zookeeperHost != null) { // we are ZooKeeper enabled try { // If this is an ensemble, allow for a long connect time for other servers to come up if (zkRun != null && zkServer.getServers().size() > 1) { zkClientConnectTimeout = 24 * 60 * 60 * 1000; // 1 day for embedded ensemble log.info("Zookeeper client=" + zookeeperHost + " Waiting for a quorum."); } else { log.info("Zookeeper client=" + zookeeperHost); } String confDir = System.getProperty("bootstrap_confdir"); boolean boostrapConf = Boolean.getBoolean("bootstrap_conf"); if(!ZkController.checkChrootPath(zookeeperHost, (confDir!=null) || boostrapConf || zkRunOnly)) { throw new ZooKeeperException(SolrException.ErrorCode.SERVER_ERROR, "A chroot was specified in ZkHost but the znode doesn't exist. " + zookeeperHost); } zkController = new ZkController(cc, zookeeperHost, zkClientConnectTimeout, config, new CurrentCoreDescriptorProvider() { @Override public ListgetCurrentDescriptors() { List descriptors = new ArrayList<>( cc.getCoreNames().size()); Collection cores = cc.getCores(); for (SolrCore core : cores) { descriptors.add(core.getCoreDescriptor()); } return descriptors; } }); if (zkRun != null && zkServer.getServers().size() > 1 && confDir == null && boostrapConf == false) { // we are part of an ensemble and we are not uploading the config - pause to give the config time // to get up Thread.sleep(10000); } if(confDir != null) { Path configPath = Paths.get(confDir); if (!Files.isDirectory(configPath)) throw new IllegalArgumentException("bootstrap_confdir must be a directory of configuration files"); String confName = System.getProperty(ZkController.COLLECTION_PARAM_PREFIX+ZkController.CONFIGNAME_PROP, "configuration1"); ZkConfigManager configManager = new ZkConfigManager(zkController.getZkClient()); configManager.uploadConfigDir(configPath, confName); } if(boostrapConf) { ZkController.bootstrapConf(zkController.getZkClient(), cc, solrHome); } } catch (InterruptedException e) { // Restore the interrupted status Thread.currentThread().interrupt(); log.error("", e); throw new ZooKeeperException(SolrException.ErrorCode.SERVER_ERROR, "", e); } catch (TimeoutException e) { log.error("Could not connect to ZooKeeper", e); throw new ZooKeeperException(SolrException.ErrorCode.SERVER_ERROR, "", e); } catch (IOException | KeeperException e) { log.error("", e); throw new ZooKeeperException(SolrException.ErrorCode.SERVER_ERROR, "", e); } } this.zkController = zkController; }
四.调用zkController的初始化方法进行选举
private void init(CurrentCoreDescriptorProvider registerOnReconnect) { try { boolean createdWatchesAndUpdated = false; Stat stat = zkClient.exists(ZkStateReader.LIVE_NODES_ZKNODE, null, true); if (stat!= null && stat.getNumChildren()>0) { zkStateReader.createClusterStateWatchersAndUpdate(); createdWatchesAndUpdated = true; publishAndWaitForDownStates(); } createClusterZkNodes(zkClient); createEphemeralLiveNode(); ShardHandler shardHandler; UpdateShardHandler updateShardHandler; shardHandler = cc.getShardHandlerFactory().getShardHandler(); updateShardHandler = cc.getUpdateShardHandler(); if (!zkRunOnly) { overseerElector = new LeaderElector(zkClient); this.overseer = new Overseer(shardHandler, updateShardHandler, CoreContainer.CORES_HANDLER_PATH, zkStateReader, this, cloudConfig); ElectionContext context = new OverseerElectionContext(zkClient, overseer, getNodeName()); overseerElector.setup(context); overseerElector.joinElection(context, false); } if (!createdWatchesAndUpdated) { zkStateReader.createClusterStateWatchersAndUpdate(); } } catch (IOException e) { log.error("", e); throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, "Can't create ZooKeeperController", e); } catch (InterruptedException e) { // Restore the interrupted status Thread.currentThread().interrupt(); log.error("", e); throw new ZooKeeperException(SolrException.ErrorCode.SERVER_ERROR, "", e); } catch (KeeperException e) { log.error("", e); throw new ZooKeeperException(SolrException.ErrorCode.SERVER_ERROR, "", e); } }
五.具体实现为LeaderElector的joinElection()方法
/** * Begin participating in the election process. Gets a new sequential number * and begins watching the node with the sequence number before it, unless it * is the lowest number, in which case, initiates the leader process. If the * node that is watched goes down, check if we are the new lowest node, else * watch the next lowest numbered node. * * @return sequential node number */ public int joinElection(ElectionContext context, boolean replacement,boolean joinAtHead) throws KeeperException, InterruptedException, IOException { context.joinedElectionFired(); final String shardsElectZkPath = context.electionPath + LeaderElector.ELECTION_NODE; long sessionId = zkClient.getSolrZooKeeper().getSessionId(); String id = sessionId + "-" + context.id; String leaderSeqPath = null; boolean cont = true; int tries = 0; while (cont) { try { if(joinAtHead){ log.info("Node {} trying to join election at the head", id); Listnodes = OverseerCollectionProcessor.getSortedElectionNodes(zkClient, shardsElectZkPath); if(nodes.size() <2){ leaderSeqPath = zkClient.create(shardsElectZkPath + "/" + id + "-n_", null, CreateMode.EPHEMERAL_SEQUENTIAL, false); } else { String firstInLine = nodes.get(1); log.info("The current head: {}", firstInLine); Matcher m = LEADER_SEQ.matcher(firstInLine); if (!m.matches()) { throw new IllegalStateException("Could not find regex match in:" + firstInLine); } leaderSeqPath = shardsElectZkPath + "/" + id + "-n_"+ m.group(1); zkClient.create(leaderSeqPath, null, CreateMode.EPHEMERAL, false); } } else { leaderSeqPath = zkClient.create(shardsElectZkPath + "/" + id + "-n_", null, CreateMode.EPHEMERAL_SEQUENTIAL, false); } log.info("Joined leadership election with path: {}", leaderSeqPath); context.leaderSeqPath = leaderSeqPath; cont = false; } catch (ConnectionLossException e) { // we don't know if we made our node or not... List entries = zkClient.getChildren(shardsElectZkPath, null, true); boolean foundId = false; for (String entry : entries) { String nodeId = getNodeId(entry); if (id.equals(nodeId)) { // we did create our node... foundId = true; break; } } if (!foundId) { cont = true; if (tries++ > 20) { throw new ZooKeeperException(SolrException.ErrorCode.SERVER_ERROR, "", e); } try { Thread.sleep(50); } catch (InterruptedException e2) { Thread.currentThread().interrupt(); } } } catch (KeeperException.NoNodeException e) { // we must have failed in creating the election node - someone else must // be working on it, lets try again if (tries++ > 20) { context = null; throw new ZooKeeperException(SolrException.ErrorCode.SERVER_ERROR, "", e); } cont = true; try { Thread.sleep(50); } catch (InterruptedException e2) { Thread.currentThread().interrupt(); } } } checkIfIamLeader(context, replacement); return getSeq(context.leaderSeqPath); }
六.OverseerCollectionProcessor
实现了Runnable接口,故其核心方法是run()方法:
@Override public void run() { log.info("Process current queue of collection creations"); LeaderStatus isLeader = amILeader(); while (isLeader == LeaderStatus.DONT_KNOW) { log.debug("am_i_leader unclear {}", isLeader); isLeader = amILeader(); // not a no, not a yes, try ask again } String oldestItemInWorkQueue = null; // hasLeftOverItems - used for avoiding re-execution of async tasks that were processed by a previous Overseer. // This variable is set in case there's any task found on the workQueue when the OCP starts up and // the id for the queue tail is used as a marker to check for the task in completed/failed map in zk. // Beyond the marker, all tasks can safely be assumed to have never been executed. boolean hasLeftOverItems = true; try { oldestItemInWorkQueue = workQueue.getTailId(); } catch (KeeperException e) { // We don't need to handle this. This is just a fail-safe which comes in handy in skipping already processed // async calls. SolrException.log(log, "", e); } catch (InterruptedException e) { Thread.currentThread().interrupt(); } if (oldestItemInWorkQueue == null) hasLeftOverItems = false; else log.debug("Found already existing elements in the work-queue. Last element: {}", oldestItemInWorkQueue); try { prioritizeOverseerNodes(); } catch (Exception e) { log.error("Unable to prioritize overseer ", e); } // TODO: Make maxThreads configurable. this.tpe = new ExecutorUtil.MDCAwareThreadPoolExecutor(5, 100, 0L, TimeUnit.MILLISECONDS, new SynchronousQueue(), new DefaultSolrThreadFactory("OverseerThreadFactory")); try { while (!this.isClosed) { try { isLeader = amILeader(); if (LeaderStatus.NO == isLeader) { break; } else if (LeaderStatus.YES != isLeader) { log.debug("am_i_leader unclear {}", isLeader); continue; // not a no, not a yes, try asking again } log.debug("Cleaning up work-queue. #Running tasks: {}", runningTasks.size()); cleanUpWorkQueue(); printTrackingMaps(); boolean waited = false; while (runningTasks.size() > maxParallelThreads) { synchronized (waitLock) { waitLock.wait(100);//wait for 100 ms or till a task is complete } waited = true; } if (waited) cleanUpWorkQueue(); List heads = workQueue.peekTopN(maxParallelThreads, runningZKTasks, 2000L); if (heads == null) continue; log.debug("Got {} tasks from work-queue : [{}]", heads.size(), heads.toString()); if (isClosed) break; for (QueueEvent head : heads) { final ZkNodeProps message = ZkNodeProps.load(head.getBytes()); String collectionName = message.containsKey(COLLECTION_PROP) ? message.getStr(COLLECTION_PROP) : message.getStr(NAME); final String asyncId = message.getStr(ASYNC); if (hasLeftOverItems) { if (head.getId().equals(oldestItemInWorkQueue)) hasLeftOverItems = false; if (asyncId != null && (completedMap.contains(asyncId) || failureMap.contains(asyncId))) { log.debug("Found already processed task in workQueue, cleaning up. AsyncId [{}]",asyncId ); workQueue.remove(head); continue; } } if (!checkExclusivity(message, head.getId())) { log.debug("Exclusivity check failed for [{}]", message.toString()); continue; } try { markTaskAsRunning(head, collectionName, asyncId, message); log.debug("Marked task [{}] as running", head.getId()); } catch (KeeperException.NodeExistsException e) { // This should never happen log.error("Tried to pick up task [{}] when it was already running!", head.getId()); } catch (InterruptedException e) { log.error("Thread interrupted while trying to pick task for execution.", head.getId()); Thread.currentThread().interrupt(); } log.info("Overseer Collection Processor: Get the message id:" + head.getId() + " message:" + message.toString()); String operation = message.getStr(Overseer.QUEUE_OPERATION); Runner runner = new Runner(message, operation, head); tpe.execute(runner); } } catch (KeeperException e) { if (e.code() == KeeperException.Code.SESSIONEXPIRED) { log.warn("Overseer cannot talk to ZK"); return; } SolrException.log(log, "", e); } catch (InterruptedException e) { Thread.currentThread().interrupt(); return; } catch (Exception e) { SolrException.log(log, "", e); } } } finally { this.close(); } }
该run()方法由调用了一个内部类Runner,见红线所示,Runner也是一个线程,实现了Runnable接口,其核心方法同样为run():
@Override public void run() { final TimerContext timerContext = stats.time("collection_" + operation); boolean success = false; final String asyncId = message.getStr(ASYNC); String collectionName = message.containsKey(COLLECTION_PROP) ? message.getStr(COLLECTION_PROP) : message.getStr(NAME); try { try { log.debug("Runner processing {}", head.getId()); response = processMessage(message, operation); } finally { timerContext.stop(); updateStats(); } if(asyncId != null) { if (response != null && (response.getResponse().get("failure") != null || response.getResponse().get("exception") != null)) { failureMap.put(asyncId, SolrResponse.serializable(response)); log.debug("Updated failed map for task with zkid:[{}]", head.getId()); } else { completedMap.put(asyncId, SolrResponse.serializable(response)); log.debug("Updated completed map for task with zkid:[{}]", head.getId()); } } else { head.setBytes(SolrResponse.serializable(response)); log.debug("Completed task:[{}]", head.getId()); } markTaskComplete(head.getId(), asyncId, collectionName); log.debug("Marked task [{}] as completed.", head.getId()); printTrackingMaps(); log.info("Overseer Collection Processor: Message id:" + head.getId() + " complete, response:" + response.getResponse().toString()); success = true; } catch (KeeperException e) { SolrException.log(log, "", e); } catch (InterruptedException e) { // Reset task from tracking data structures so that it can be retried. resetTaskWithException(head.getId(), asyncId, collectionName); log.warn("Resetting task {} as the thread was interrupted.", head.getId()); Thread.currentThread().interrupt(); } finally { if(!success) { // Reset task from tracking data structures so that it can be retried. resetTaskWithException(head.getId(), asyncId, collectionName); } synchronized (waitLock){ waitLock.notifyAll(); } } }
上述方法中,使用红线标明了核心实现方法processMessage(),该方法具体实现了Collection的各种操作:
protected SolrResponse processMessage(ZkNodeProps message, String operation) { log.warn("OverseerCollectionProcessor.processMessage : "+ operation + " , "+ message.toString()); NamedList results = new NamedList(); try { // force update the cluster state zkStateReader.updateClusterState(); CollectionParams.CollectionAction action = CollectionParams.CollectionAction.get(operation); if (action == null) { throw new SolrException(ErrorCode.BAD_REQUEST, "Unknown operation:" + operation); } switch (action) { case CREATE: createCollection(zkStateReader.getClusterState(), message, results); break; case DELETE: deleteCollection(message, results); break; case RELOAD: ModifiableSolrParams params = new ModifiableSolrParams(); params.set(CoreAdminParams.ACTION, CoreAdminAction.RELOAD.toString()); collectionCmd(zkStateReader.getClusterState(), message, params, results, Replica.State.ACTIVE); break; case CREATEALIAS: createAlias(zkStateReader.getAliases(), message); break; case DELETEALIAS: deleteAlias(zkStateReader.getAliases(), message); break; case SPLITSHARD: splitShard(zkStateReader.getClusterState(), message, results); break; case DELETESHARD: deleteShard(zkStateReader.getClusterState(), message, results); break; case CREATESHARD: createShard(zkStateReader.getClusterState(), message, results); break; case DELETEREPLICA: deleteReplica(zkStateReader.getClusterState(), message, results); break; case MIGRATE: migrate(zkStateReader.getClusterState(), message, results); break; case ADDROLE: processRoleCommand(message, operation); break; case REMOVEROLE: processRoleCommand(message, operation); break; case ADDREPLICA: addReplica(zkStateReader.getClusterState(), message, results); break; case OVERSEERSTATUS: getOverseerStatus(message, results); break; case CLUSTERSTATUS://TODO . deprecated. OCP does not need to do it .remove in a later release new ClusterStatus(zkStateReader, message).getClusterStatus(results); break; case ADDREPLICAPROP: processReplicaAddPropertyCommand(message); break; case DELETEREPLICAPROP: processReplicaDeletePropertyCommand(message); break; case BALANCESHARDUNIQUE: balanceProperty(message); break; case REBALANCELEADERS: processRebalanceLeaders(message); break; case MODIFYCOLLECTION: overseer.getInQueue(zkStateReader.getZkClient()).offer(Utils.toJSON(message)); break; default: throw new SolrException(ErrorCode.BAD_REQUEST, "Unknown operation:" + operation); } } catch (Exception e) { String collName = message.getStr("collection"); if (collName == null) collName = message.getStr(NAME); if (collName == null) { SolrException.log(log, "Operation " + operation + " failed", e); } else { SolrException.log(log, "Collection: " + collName + " operation: " + operation + " failed", e); } results.add("Operation " + operation + " caused exception:", e); SimpleOrderedMap nl = new SimpleOrderedMap(); nl.add("msg", e.getMessage()); nl.add("rspCode", e instanceof SolrException ? ((SolrException)e).code() : -1); results.add("exception", nl); } return new OverseerSolrResponse(results); }
我们以SPLITSHARD为例说明:
private boolean splitShard(ClusterState clusterState, ZkNodeProps message, NamedList results) { String collectionName = message.getStr("collection"); String slice = message.getStr(ZkStateReader.SHARD_ID_PROP); log.info("Split shard invoked"); String splitKey = message.getStr("split.key"); ShardHandler shardHandler = shardHandlerFactory.getShardHandler(); DocCollection collection = clusterState.getCollection(collectionName); DocRouter router = collection.getRouter() != null ? collection.getRouter() : DocRouter.DEFAULT; Slice parentSlice = null; if (slice == null) { if (router instanceof CompositeIdRouter) { CollectionsearchSlices = router.getSearchSlicesSingle(splitKey, new ModifiableSolrParams(), collection); if (searchSlices.isEmpty()) { throw new SolrException(ErrorCode.BAD_REQUEST, "Unable to find an active shard for split.key: " + splitKey); } if (searchSlices.size() > 1) { throw new SolrException(ErrorCode.BAD_REQUEST, "Splitting a split.key: " + splitKey + " which spans multiple shards is not supported"); } parentSlice = searchSlices.iterator().next(); slice = parentSlice.getName(); log.info("Split by route.key: {}, parent shard is: {} ", splitKey, slice); } else { throw new SolrException(ErrorCode.BAD_REQUEST, "Split by route key can only be used with CompositeIdRouter or subclass. Found router: " + router.getClass().getName()); } } else { parentSlice = clusterState.getSlice(collectionName, slice); } if (parentSlice == null) { if (clusterState.hasCollection(collectionName)) { throw new SolrException(ErrorCode.BAD_REQUEST, "No shard with the specified name exists: " + slice); } else { throw new SolrException(ErrorCode.BAD_REQUEST, "No collection with the specified name exists: " + collectionName); } } // find the leader for the shard Replica parentShardLeader = null; try { parentShardLeader = zkStateReader.getLeaderRetry(collectionName, slice, 10000); } catch (InterruptedException e) { Thread.currentThread().interrupt(); } DocRouter.Range range = parentSlice.getRange(); if (range == null) { range = new PlainIdRouter().fullRange(); } List subRanges = null; String rangesStr = message.getStr(CoreAdminParams.RANGES); if (rangesStr != null) { String[] ranges = rangesStr.split(","); if (ranges.length == 0 || ranges.length == 1) { throw new SolrException(ErrorCode.BAD_REQUEST, "There must be at least two ranges specified to split a shard"); } else { subRanges = new ArrayList<>(ranges.length); for (int i = 0; i < ranges.length; i++) { String r = ranges[i]; try { subRanges.add(DocRouter.DEFAULT.fromString(r)); } catch (Exception e) { throw new SolrException(ErrorCode.BAD_REQUEST, "Exception in parsing hexadecimal hash range: " + r, e); } if (!subRanges.get(i).isSubsetOf(range)) { throw new SolrException(ErrorCode.BAD_REQUEST, "Specified hash range: " + r + " is not a subset of parent shard's range: " + range.toString()); } } List temp = new ArrayList<>(subRanges); // copy to preserve original order Collections.sort(temp); if (!range.equals(new DocRouter.Range(temp.get(0).min, temp.get(temp.size() - 1).max))) { throw new SolrException(ErrorCode.BAD_REQUEST, "Specified hash ranges: " + rangesStr + " do not cover the entire range of parent shard: " + range); } for (int i = 1; i < temp.size(); i++) { if (temp.get(i - 1).max + 1 != temp.get(i).min) { throw new SolrException(ErrorCode.BAD_REQUEST, "Specified hash ranges: " + rangesStr + " either overlap with each other or " + "do not cover the entire range of parent shard: " + range); } } } } else if (splitKey != null) { if (router instanceof CompositeIdRouter) { CompositeIdRouter compositeIdRouter = (CompositeIdRouter) router; subRanges = compositeIdRouter.partitionRangeByKey(splitKey, range); if (subRanges.size() == 1) { throw new SolrException(ErrorCode.BAD_REQUEST, "The split.key: " + splitKey + " has a hash range that is exactly equal to hash range of shard: " + slice); } for (DocRouter.Range subRange : subRanges) { if (subRange.min == subRange.max) { throw new SolrException(ErrorCode.BAD_REQUEST, "The split.key: " + splitKey + " must be a compositeId"); } } log.info("Partitioning parent shard " + slice + " range: " + parentSlice.getRange() + " yields: " + subRanges); rangesStr = ""; for (int i = 0; i < subRanges.size(); i++) { DocRouter.Range subRange = subRanges.get(i); rangesStr += subRange.toString(); if (i < subRanges.size() - 1) rangesStr += ','; } } } else { // todo: fixed to two partitions? subRanges = router.partitionRange(2, range); } try { List subSlices = new ArrayList<>(subRanges.size()); List subShardNames = new ArrayList<>(subRanges.size()); String nodeName = parentShardLeader.getNodeName(); for (int i = 0; i < subRanges.size(); i++) { String subSlice = slice + "_" + i; subSlices.add(subSlice); String subShardName = collectionName + "_" + subSlice + "_replica1"; subShardNames.add(subShardName); Slice oSlice = clusterState.getSlice(collectionName, subSlice); if (oSlice != null) { final Slice.State state = oSlice.getState(); if (state == Slice.State.ACTIVE) { throw new SolrException(ErrorCode.BAD_REQUEST, "Sub-shard: " + subSlice + " exists in active state. Aborting split shard."); } else if (state == Slice.State.CONSTRUCTION || state == Slice.State.RECOVERY) { // delete the shards for (String sub : subSlices) { log.info("Sub-shard: {} already exists therefore requesting its deletion", sub); Map propMap = new HashMap<>(); propMap.put(Overseer.QUEUE_OPERATION, "deleteshard"); propMap.put(COLLECTION_PROP, collectionName); propMap.put(SHARD_ID_PROP, sub); ZkNodeProps m = new ZkNodeProps(propMap); try { deleteShard(clusterState, m, new NamedList()); } catch (Exception e) { throw new SolrException(ErrorCode.SERVER_ERROR, "Unable to delete already existing sub shard: " + sub, e); } } } } } // do not abort splitshard if the unloading fails // this can happen because the replicas created previously may be down // the only side effect of this is that the sub shard may end up having more replicas than we want collectShardResponses(results, false, null, shardHandler); final String asyncId = message.getStr(ASYNC); HashMap requestMap = new HashMap<>(); for (int i = 0; i < subRanges.size(); i++) { String subSlice = subSlices.get(i); String subShardName = subShardNames.get(i); DocRouter.Range subRange = subRanges.get(i); log.info("Creating slice " + subSlice + " of collection " + collectionName + " on " + nodeName); Map propMap = new HashMap<>(); propMap.put(Overseer.QUEUE_OPERATION, CREATESHARD.toLower()); propMap.put(ZkStateReader.SHARD_ID_PROP, subSlice); propMap.put(ZkStateReader.COLLECTION_PROP, collectionName); propMap.put(ZkStateReader.SHARD_RANGE_PROP, subRange.toString()); propMap.put(ZkStateReader.SHARD_STATE_PROP, Slice.State.CONSTRUCTION.toString()); propMap.put(ZkStateReader.SHARD_PARENT_PROP, parentSlice.getName()); DistributedQueue inQueue = Overseer.getInQueue(zkStateReader.getZkClient()); inQueue.offer(Utils.toJSON(new ZkNodeProps(propMap))); // wait until we are able to see the new shard in cluster state waitForNewShard(collectionName, subSlice); // refresh cluster state clusterState = zkStateReader.getClusterState(); log.info("Adding replica " + subShardName + " as part of slice " + subSlice + " of collection " + collectionName + " on " + nodeName); propMap = new HashMap<>(); propMap.put(Overseer.QUEUE_OPERATION, ADDREPLICA.toLower()); propMap.put(COLLECTION_PROP, collectionName); propMap.put(SHARD_ID_PROP, subSlice); propMap.put("node", nodeName); propMap.put(CoreAdminParams.NAME, subShardName); // copy over property params: for (String key : message.keySet()) { if (key.startsWith(COLL_PROP_PREFIX)) { propMap.put(key, message.getStr(key)); } } // add async param if (asyncId != null) { propMap.put(ASYNC, asyncId); } addReplica(clusterState, new ZkNodeProps(propMap), results); } collectShardResponses(results, true, "SPLITSHARD failed to create subshard leaders", shardHandler); completeAsyncRequest(asyncId, requestMap, results); for (String subShardName : subShardNames) { // wait for parent leader to acknowledge the sub-shard core log.info("Asking parent leader to wait for: " + subShardName + " to be alive on: " + nodeName); String coreNodeName = waitForCoreNodeName(collectionName, nodeName, subShardName); CoreAdminRequest.WaitForState cmd = new CoreAdminRequest.WaitForState(); cmd.setCoreName(subShardName); cmd.setNodeName(nodeName); cmd.setCoreNodeName(coreNodeName); cmd.setState(Replica.State.ACTIVE); cmd.setCheckLive(true); cmd.setOnlyIfLeader(true); ModifiableSolrParams p = new ModifiableSolrParams(cmd.getParams()); sendShardRequest(nodeName, p, shardHandler, asyncId, requestMap); } collectShardResponses(results, true, "SPLITSHARD timed out waiting for subshard leaders to come up", shardHandler); completeAsyncRequest(asyncId, requestMap, results); log.info("Successfully created all sub-shards for collection " + collectionName + " parent shard: " + slice + " on: " + parentShardLeader); log.info("Splitting shard " + parentShardLeader.getName() + " as part of slice " + slice + " of collection " + collectionName + " on " + parentShardLeader); ModifiableSolrParams params = new ModifiableSolrParams(); params.set(CoreAdminParams.ACTION, CoreAdminAction.SPLIT.toString()); params.set(CoreAdminParams.CORE, parentShardLeader.getStr("core")); for (int i = 0; i < subShardNames.size(); i++) { String subShardName = subShardNames.get(i); params.add(CoreAdminParams.TARGET_CORE, subShardName); } params.set(CoreAdminParams.RANGES, rangesStr); sendShardRequest(parentShardLeader.getNodeName(), params, shardHandler, asyncId, requestMap); collectShardResponses(results, true, "SPLITSHARD failed to invoke SPLIT core admin command", shardHandler); completeAsyncRequest(asyncId, requestMap, results); log.info("Index on shard: " + nodeName + " split into two successfully"); // apply buffered updates on sub-shards for (int i = 0; i < subShardNames.size(); i++) { String subShardName = subShardNames.get(i); log.info("Applying buffered updates on : " + subShardName); params = new ModifiableSolrParams(); params.set(CoreAdminParams.ACTION, CoreAdminAction.REQUESTAPPLYUPDATES.toString()); params.set(CoreAdminParams.NAME, subShardName); sendShardRequest(nodeName, params, shardHandler, asyncId, requestMap); } collectShardResponses(results, true, "SPLITSHARD failed while asking sub shard leaders to apply buffered updates", shardHandler); completeAsyncRequest(asyncId, requestMap, results); log.info("Successfully applied buffered updates on : " + subShardNames); // Replica creation for the new Slices // look at the replication factor and see if it matches reality // if it does not, find best nodes to create more cores // TODO: Have replication factor decided in some other way instead of numShards for the parent int repFactor = clusterState.getSlice(collectionName, slice).getReplicas().size(); // we need to look at every node and see how many cores it serves // add our new cores to existing nodes serving the least number of cores // but (for now) require that each core goes on a distinct node. // TODO: add smarter options that look at the current number of cores per // node? // for now we just go random Set nodes = clusterState.getLiveNodes(); List nodeList = new ArrayList<>(nodes.size()); nodeList.addAll(nodes); // TODO: Have maxShardsPerNode param for this operation? // Remove the node that hosts the parent shard for replica creation. nodeList.remove(nodeName); // TODO: change this to handle sharding a slice into > 2 sub-shards. List
小结:
solrCloud 从zookeeper开始一步步分析到具体的命令执行,完整了走遍了流程,但因篇幅限制没有就具体细节进行讲解。后续会在后面的文章中分析每个细节。
参考文献:
【1】http://itindex.net/detail/48735-solrcloud