Redis Sentinel vs Clustering – Redis Sentinel与集群的比较

最后修改: 2022年 10月 11日

1. Overview


In this tutorial, we’ll talk about Redis and, more importantly, its two different deployment strategies, Redis Sentinel and Redis Cluster. Then, we’ll discuss the differences between those strategies and their nuances.

在本教程中,我们将谈论Redis,更重要的是,它的两种不同的部署策略,即Redis Sentinel和Redis Cluster。然后,我们将讨论这些策略之间的差异和它们的细微差别。

At the end of it, we hope to understand enough about Redis to judge which deployment strategy better fulfils our needs.


2. Introduction to Redis


Redis is an open-source in-memory data structure store that can be used as a key-value database, cache, and for many other use cases. It aims to provide high-speed access to data.


Our goal is to analyze and compare the two different strategies, Redis Sentinel and Redis Cluster.

我们的目标是分析和比较两种不同的策略,Redis Sentinel和Redis Cluster。

Redis Sentinel is a separate process that Redis provides. Its goal is to monitor the Redis instances, offer notification capabilities, master discovery, automatic failover in case of failure, and master election via majority voting. In other words, Sentinel is a distributed system that provides extra capabilities to Redis, such as high availability and failover. Redis Sentinel combines forces with the standard Redis deployment.

Redis Sentinel是Redis提供的一个独立进程。它的目标是监控 Redis 实例,提供通知功能、主站发现、故障情况下的自动故障转移,以及通过多数投票选举主站。换句话说,Sentinel是一个分布式系统,它为Redis提供了额外的功能,如高可用性和故障转移。Redis Sentinel与标准Redis部署相结合。

Redis Cluster is a deployment strategy that scales even further. Similar to Sentinel, it provides failover, configuration management, etc. The difference is the sharding capabilities, which allow us to scale out capacity almost linearly up to 1000 nodes.

Redis Cluster是一种可进一步扩展的部署策略。与Sentinel类似,它提供故障转移、配置管理等。不同之处在于分片功能,它使我们能够将容量几乎线性地扩展到1000个节点。

3. Basic Concepts


To help us understand the nuances of both strategies, let’s first try to internalize some of the basic concepts and building blocks.


Some concepts may not be strictly related to Redis Cluster or Sentinel. However, others may apply to both, but knowing them will help us understand Redis.

有些概念可能与Redis Cluster或Sentinel没有严格关系。然而,其他的可能适用于两者,但了解它们将有助于我们理解Redis。

3.1. Databases


Redis has support for multiple logical databases. Although still persisted in the same file, they allow the user to have the same key with different values in each database. They’re like different database schemas.


By default, Redis provides 16 logical databases, but the user can change this number. Such databases are identified by their index, starting from zero.


Another essential thing to reinforce is that Redis is a single-threaded data store. Therefore, all database operations go to the same pipeline of execution.


3.2. Hash Slots


Redis Cluster works a bit differently from the standard one. For example, to automatically shard the data and distribute it across the various nodes of the cluster, Redis Cluster uses the so-called hash slots.

Redis Cluster的工作方式与标准的有点不同。例如,为了自动分片并在集群的各个节点上分配数据,Redis Cluster使用了所谓的哈希槽

Redis Cluster doesn’t use consistent hashing to do the distribution job. Instead, it uses the hash slots.

Redis Cluster不使用一致的散列来做分配工作。相反,它使用散列槽。

Each cluster has 16384 hash slots that can be distributed across all the nodes, and during all operations, the Redis client uses the key to calculate the hash by taking the CRC16 of the key modulo 16384. Then, it uses it to route the command to the correct node.


Each node has a subset of hash slots assigned to it, and it’s possible to move slots between nodes using resharding or rebalance operations. Also, given the particularities of this approach, the Redis Cluster doesn’t allow multiple logical databases per node. Therefore, only the database zero is available in each node.

每个节点都有一个分配给它的哈希槽的子集,并且可以使用重新分片或重新平衡操作在节点之间移动槽。另外,鉴于这种方法的特殊性,Redis Cluster不允许每个节点有多个逻辑数据库。因此,每个节点中只有数据库0可用

Last, due to the distribution of the keys in hash slots, Redis Cluster has a caveat when it comes to multiple key operations. Those operations are still available, although all the keys involved have to belong to the same hash slot. Otherwise, Redis rejects the request.

最后,由于键在哈希槽中的分布,Redis Cluster在涉及多键操作时有一个注意事项。这些操作仍然可用,尽管所有涉及的键必须属于同一个哈希槽。否则,Redis将拒绝该请求。

3.3. Hash Tags


This mechanism helps users guarantee a group of keys go to the same hash slot. To define a hash tag, the user has to add a substring between brackets in a key. For instance, the keys app1{user:123}.mykey1 and app1{user:123}.mykey2 would go to the same hash lot.


By doing this, Redis only uses the substring to generate the hash of the key and, consequently, for routing all keys to the same hash slot.


3.4. Asynchronous Replication


Both Redis Cluster and the standard one use asynchronous replication. That means Redis doesn’t wait for replicas to acknowledge writes.

Redis Cluster和标准的都使用异步复制。这意味着Redis不会等待复制体确认写入。

Moreover, there will always be a tiny window of time where a failure may cause an acknowledged write to be lost. However, there are ways to mitigate this risk by limiting this window of time as much as possible. But once again, the risk is always there, so Redis doesn’t guarantee strong consistency.


3.5. Failover


Redis offers different mechanisms to deal with failures and guarantee some level of fault tolerance. Both Redis Cluster and the standard one using Sentinel have such tools. Such systems come with a failure detector based on health checks and timeouts.

Redis提供了不同的机制来处理故障并保证一定程度的容错。Redis Cluster和使用Sentinel的标准的都有这样的工具。这样的系统带有一个基于健康检查和超时的故障检测器。

In the case of Redis Cluster, it uses heartbeat and gossip protocol. For the purpose of this article, we’ll not go deeper into those. But to sum up, each one talks to the other nodes, exchanges packets with metadata, and calculates some timeouts in case a particular node doesn’t respond.

就Redis Cluster而言,它使用心跳和八卦协议。为了本文的目的,我们不会深入研究这些。但总的来说,每个节点都会与其他节点对话,交换带有元数据的数据包,并计算一些超时,以防某个节点没有回应。

Regarding Sentinel, each Sentinel instance monitors a Redis instance. The Sentinel instances also communicate between themselves and, based on the configuration, may execute failovers due to problems with communication and timeouts.

关于Sentinel,每个Sentinel实例监控一个Redis实例。Sentinel 实例之间也进行通信,并且基于配置,可能会因通信和超时问题而执行故障转移。

3.6. Master Election


Again, both implementations have strategies for voting, and the process has different phases. But the main goal’s to decide if the failover should happen and which to promote.


Sentinel works with the concept of quorum. The quorum is the minimum number of Sentinel instances that need to find the consensus about whether a master node is reachable or not.


Once this happens, the master is marked as failing, and eventually, a failover process will start if possible. After that, at least a majority of Sentinels must authorize the failover and elect the Sentinel instance responsible for the failover. Then, this instance selects the best read replica to promote and executes the failover. Finally, the instance starts broadcasting the new setup to the other Sentinel instances.


The process changes a bit when it comes to Redis Cluster. This time, the replica nodes are in charge. As we mentioned before, in this case, all nodes communicate between them, so once one or more replicas detect the failure, they can start the election. The next step is to request the vote of the masters. Once the voting happens, a replica receives the failover rights.

当涉及到Redis Cluster时,这个过程发生了一些变化。这一次,由复制节点负责。正如我们之前提到的,在这种情况下,所有节点之间都会进行通信,所以一旦一个或多个副本检测到故障,他们就可以开始选举。下一步是请求主人的投票。一旦投票发生,一个副本就会收到故障转移的权利。

Given the nature of those vote mechanisms, Redis recommends always using an odd number of nodes in the cluster, which applies to both implementations.


3.7. Network Partition


Redis can survive many different failures, and its design is robust enough to provide continuous operation even when nodes go down. However, one of the critical problem scenarios we can face is the so-called network partition. Due to its topology, Redis faces challenging situations when dealing with such problems.


The so-called split brain’s one of the most straightforward problems in this category. Let’s use the following example:


Split brain step 1
Here we can observe that we have a network partition, and from one side, a client is communicating with our cluster that now only can reach two instances, 1 and 2.

Split brain step 1

On the other side, another client communicates with the other partition where only 3 and 4 are reachable in the partition. Imagining our quorum is 2, at some point, the failover will happen on the right side, and we’ll reach the outcome:
Split brain step 2

Split brain step 2

Now, we’ve two clusters (Redis 3 has been promoted to master). At some point, when the network partition goes away, the cluster may have a problem in case the same key was written on both sides with different values during the separation.

现在,我们有两个集群(Redis 3已经被提升为主集群)。在某些时候,当网络分区消失后,集群可能会出现问题,如果在分离过程中,同一键被写入两边的值不同。

We can apply the same principles to a Redis Cluster, and the outcome would be similar. This illustrates why odd numbers are recommended and configurations such as min-replicas-to-write are used. The idea is always to have majority and minority partitions where only the majority would be able to perform the failover.


As we may imagine, there are many other possible scenarios during network partitions. For that reason, it’s essential to understand and have it in mind when designing our cluster, no matter which option we choose.


4. Redis Sentinel vs Clustering

4.Redis Sentinel vs Clustering

The table below compares some of their main features:


Comparing Redis standard with Sentinel vs Cluster

We can connect the dots by using all the details and nuances of Redis Cluster and the standard implementation combined with Redis Sentinel.

我们可以通过使用Redis Cluster的所有细节和细微差别以及与Redis Sentinel相结合的标准实现来连接这些点。

We can conclude that Sentinel provides monitoring, fault tolerance, and notifications. It’s also a configuration provider. Moreover, it can provide authentication capabilities and client service discovery. It can provide 16 logical databases, async replication and high availability for your cluster.


Nonetheless, the single-threaded nature of Redis and the barriers of vertical scaling are limiting factors to Sentinel’s scaling capabilities. However, for small and medium projects, it may be ideal.


It’s also true that deployment with Sentinel requires fewer nodes. Therefore, it’s more cost-effective.


A point worth mentioning is that we can also scale read-only operations further by using read replicas.


Redis Cluster is a fully distributed implementation with automated sharding capabilities (horizontal scaling capabilities), designed for high performance and linear scaling up to 1000 nodes. In addition, it provides a reasonable degree of writing safety and means to survive disasters as long as all the data of masters are reachable from the majority side (by the master or replicas cable to do the failover).

Redis Cluster是一个完全分布式的实现,具有自动分片功能(水平扩展能力),旨在实现高性能和线性扩展,最高可达1000个节点。此外,只要主站的所有数据都能从多数站(由主站或复制站电缆进行故障转移)到达,它就能提供合理程度的写入安全性和在灾难中生存的手段。

Another point is that during a network partition, the minority part of the cluster will stop accepting requests. On the other hand, a Sentinel deployment may continue working partially, depending on the configuration. As mentioned previously, the availability of the cluster and the fault tolerance depends on the cluster configuration and composition.


The Cluster version also offers the ability to reconfigure the mapping between masters and replicas to rebalance in case of multiple independent failures of a single node occur. However, once again, more nodes are necessary for a more robust cluster and, therefore, more cost. Besides, managing more nodes and balancing shards is extra complexity that can’t be overlooked. 


To sum up, Redis Cluster shines when it comes to large deployment with big data sets that require high throughput and scaling capabilities.

总之,当涉及到需要高吞吐量和扩展能力的大数据集的大型部署时,Redis Cluster大放异彩。

5. Conclusion


In this article, we discussed Redis and its different implementations, Redis Cluster and Redis standard using Sentinel.


We also understood all the basic building blocks of the solution and how to use them to extract the most out of it. We hope to have the tools to decide which option to use based on the use case.