Skip to content

Improve Membership Change API with Trait Abstraction and Granular Node Functions #1395

@drmingdrmer

Description

@drmingdrmer

Background

The current batch membership change implementation has several limitations and semantic issues identified in PR #1351

Key Problems to Address

1. Retain Parameter Conflicts in Concurrent Changes

When multiple membership changes happen concurrently, their retain parameters can "pollute" each other, leading to unpredictable behavior where nodes may be unexpectedly retained or removed based on the order of operations.

Example:

Starting cluster membership: [{n1,n2,n3}]
Task 1: RemoveVoter(n1), retain = true
Task 2: RemoveVoter(n2), retain = false

Interleaving:
1. First Task1: [{n1,n2,n3}] → [{n1,n2,n3}, {n2,n3}]
2. Then Task2: [{n1,n2,n3}, {n2,n3}] → [{n2,n3}, {n3}]
   - At this point, n1 is removed as a learner because Task2 has retain = false
3. Then either Task1 or Task2: [{n2,n3}, {n3}] → [{n3}]
   - If Task1 runs first: n2 is kept as a learner
   - If Task2 runs first: n2 is removed as a learner

Result: The retain behavior depends on execution order, not the original intent.

2. Mixing Voter and Node Changes

Current implementation doesn't properly handle simultaneous removal of voters and their corresponding nodes in the same batch. This can lead to constraint violations in ensure_voter_nodes.

Example:

Batch operation (with retain = true): RemoveVoters(n1), RemoveNode(n1)
Starting config: [{n3,n1}, {n2,n1}]

What happens:
1. RemoveVoters(n1) requires a joint config transition
2. RemoveNode(n1) tries to execute immediately
3. ERROR: violates ensure_voter_nodes constraint because n1 is still a voter in the config

The issue: Nodes can only be removed after the corresponding voter has been removed from the config.

3. Confusing Semantics

Learner changes happen immediately while voter changes go through joint configurations, making batch operations with mixed change types confusing and error-prone.

Example:

Batch operation: AddVoter(n4), AddNode(n5)
Starting config: [{n1,n2,n3}]

What happens:
1. AddVoter(n4): Creates joint config [{n1,n2,n3}, {n1,n2,n3,n4}] (staged change)
2. AddNode(n5): Immediately adds n5 to nodes map (immediate change)

Result: 
- n4 is not available as a node until the joint config is flattened
- n5 is immediately available as a learner
- Mixed timing makes it hard to reason about the state

Proposed Solution

Implement explicit learner tracking in the membership structure:

Current structure:

struct Membership {
    configs: Vec<BTreeSet<VoterId>>,
    nodes: BTreeMap<NodeId, Node>, 
}

Proposed structure:

/// Defines the role capabilities of a node in the Raft cluster.
///
/// In standard Raft:
/// - A voter can start elections and vote in election, thus it has `Elect` and `Vote`
/// - A learner only receives logs from the leader and cannot vote or start elections, thus it has `AcceptLog`
///
/// This enum breaks down these capabilities into granular roles for more flexible node configurations.
/// 
/// | Functions              | Vote Storage | Log Storage | Description                                                                                                                     |
/// | :---                   | :---         | :---        | :---                                                                                                                            |
/// | `Elect`                | No           | No          | Can become a leader, without local storage                                                                                      |
/// | `Vote`                 | Yes          | No          | Participate in election voting, counts toward **read-quorum**                                                                   |
/// | `LearnLog`             | No           | Yes         | Receive log replication, A `Learner` in std Raft. Saving `Vote` is not required, but reduce confliction when becoming a `Elect` |
/// | `AcceptLog`            | Yes          | Yes         | Receive log replication, counts toward **write-quorum** for commits. `AcceptLog` implies `LearnLog`                             |
/// | `Elect|Vote|AcceptLog` | Yes          | Yes         | A `Voter` in std Raft                                                                                                           |
///
/// Note:
/// - `AcceptLog` always requires `Vote` storage to prevent receiving log from older Leader.
/// - Currently 2025-07-23, we do not support read/write quorum separation yet.
enum NodeFunction {
    /// Node can initiate elections and become a leader.
    ///
    /// All nodes, including learners, can potentially be elected as leader.
    /// 
    /// In special cases, a node with only `Elect` can elect itself as leader without writing logs to its local storage.
    Elect,
    
    /// Participate in election voting, counts toward **read-quorum**, without local storage
    Vote,
    
    /// Receive log replication, counts toward **write-quorum** for commits
    AcceptLog,
}

/// Represents a cluster member with specific functions and node information.
///
/// A member combines the functional capabilities of a node (what it can do)
/// with the actual node data (how to reach it, metadata, etc.).
struct Member<C: RaftTypeConfig> {
    /// Set of functions this member can perform in the cluster.
    ///
    /// This determines whether the member can vote, accept logs, or initiate elections.
    /// Multiple functions can be combined (e.g., a voter has both Vote and AcceptLog).
    functions: BTreeSet<NodeFunction>,
    
    /// The actual node information for this member.
    ///
    /// Contains network address, metadata, and other node-specific data
    /// as defined by the application's RaftTypeConfig.
    node: C::Node,
}

/// Represents a single configuration step in the membership.
///
/// In standard Raft, this would be the set of voters at a given point in time.
struct Config<C: RaftTypeConfig> {
    /// Map of all members participating in this configuration step.
    ///
    /// The key is the NodeId, and the value contains both the member's
    /// functional capabilities and connection information.
    members: BTreeMap<NodeId, Member<C>>,
}

/// Defines the complete membership configuration for a Raft cluster.
///
/// The membership can contain one or more configuration steps:
/// - Single config: Standard membership with one set of members
/// - Joint config: Transition state with multiple configs during membership changes
///
/// During a joint configuration, operations require quorum from ALL configs,
/// making membership changes safer but potentially slower.
struct EnhancedMembership<C: RaftTypeConfig> {
    /// One or more config entry.
    ///
    /// With more than one Config, it is a **joint config**, which means,
    /// A quorum of a joint config is a union of a quorum in each `Config`.
    configs: Vec<Config<C>>,
}

Benefits

  • Better State Representation: Explicitly track learners for each config step
  • Improved Batch Operations: Enable proper handling of mixed voter/learner changes
  • Clearer Semantics: Remove ambiguity around immediate vs. staged changes
  • Enhanced Flexibility: Support operations that weren't possible before

Implementation Considerations

  • Must be implemented as backward-compatible change to avoid breaking existing applications
  • Consider renaming AddNodes/RemoveNodes to AddLearners/RemoveLearners for clarity
  • Review semantics of SetNodes and ReplaceAllNodes operations

TODO:

  • Design RaftMembership trait and update RaftTypeConfig

    • Define trait with methods: quorum calculation, member lookup, vote/learner separation
    • Add type Membership: RaftMembership to RaftTypeConfig
    • Implement trait for existing Membership struct
    • Update RaftCore to use C::Membership
  • Create EnhancedMembership with new semantics

    • Implement granular node functions (Elect, Vote, AcceptLog)
    • Fix retain parameter conflicts and batch operation issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions