Why do I need at least three nodes for a cluster?

In a cluster, each node must be able to decide whether it is still part of the active cluster. The simplest and most reliable way to do this is by checking for a majority, also known as a quorum. If a node can communicate with a majority of the cluster’s nodes, it remains a member and continues to accept work. If it cannot reach a majority, it will stop accepting incoming connections to prevent data inconsistency (often called split brain).

Two-node clusters are a special case. If the connection between the two nodes fails, neither node can form a majority on its own. As a result, both nodes will refuse new connections until quorum is restored. For this reason, we strongly recommend deploying an odd number of nodes so that a majority can be formed even during a network split.

Practical guidance:

  • Use an odd number of nodes (for example, 3 or 5). A 3-node cluster can tolerate 1 node or link failure; a 5-node cluster can tolerate 2.

  • If you must run with only two data nodes, add a third quorum component (often called a witness or arbiter) to provide the tie-breaking vote. This allows one data node to continue operating if the other or the network link fails.

  • During maintenance, ensure that the remaining online nodes can still form a majority. If quorum is not available, service will pause by design until connectivity or nodes are restored.

  • Monitor cluster health and network connectivity so you can quickly restore quorum if a failure occurs.

Can a node run in a different data center than the other nodes?

Using two data centers is possible, but the reliability of the network link between them is critical. Whether two data centers are sufficient depends on the level of high availability (HA) you need. HA refers to the system’s ability to remain operational with minimal downtime, even when components fail.

Most clustered systems rely on a quorum (a majority of nodes) to decide which part of the cluster is allowed to continue operating. This prevents split-brain situations, where two partitions both believe they are the primary.

Example with two data centers and three nodes:

  • If you deploy two nodes in Data Center A and one node in Data Center B (an uneven total of three nodes), the cluster requires a majority of two to stay available.

  • If the link between A and B fails, the two nodes in A still have a majority (two out of three) and continue to serve requests. The single node in B loses the majority and will stop accepting connections to protect data integrity.

  • If Data Center A is completely lost (for example, due to a fire), only the single node in B remains. Because it no longer has a majority, it will not accept connections until you manually promote or bootstrap it to form a new, healthy cluster. This manual step is required to avoid data divergence.

Key implications:

  • Two data centers with three nodes can tolerate a link failure but cannot automatically fail over if the data center hosting the majority is lost.

  • To achieve higher levels of HA with automatic failover during a full data center loss, you should use an uneven number of nodes distributed across an uneven number of locations. A common recommendation is at least five nodes across three data centers (for example, 2-2-1). With five nodes, the quorum is three; losing any single site still leaves a majority and allows the cluster to continue automatically.

Practical guidance:

  • Always use an uneven number of total nodes to maintain a clear quorum.

  • Prefer three locations when you need automatic failover across data centers. If a third full data center is not feasible, consider a lightweight witness or quorum service in a third site or cloud region to break ties.

  • Validate latency and reliability of inter-site links; unstable connectivity can cause frequent failovers or service interruptions.

  • Document and rehearse the manual bootstrap procedure for two-site deployments so operators can restore service safely if the majority site is lost.

If two nodes are running in data center A and one node in data center B and the connection between the data centers fail, can I make sure that CipherMail is functional in both data centers?

If the connection between the data centers fails, the two nodes in data center A continue to operate because they hold the majority (2 out of 3). The single node in data center B stops accepting new connections.

You can force-boot the node in data center B, but this creates two independent clusters: one in data center A and one in data center B. These clusters are not synchronized (a split-brain situation). Any changes made in either data center will diverge, and when connectivity is later restored, some changes may be overwritten, depending on which node is used to re-establish the cluster.

Only use forced bootstrapping if the outage is expected to be long or if data center recovery will be significantly delayed (for example, if data center A has been destroyed).