LogCabin
|
An implementation of the Raft consensus algorithm. More...
#include <RaftConsensus.h>
Classes | |
struct | Entry |
This is returned by getNextEntry(). More... | |
Public Types | |
enum | ClientResult { SUCCESS, FAIL, RETRY, NOT_LEADER } |
typedef RaftConsensusInternal::Invariants | Invariants |
typedef RaftConsensusInternal::Server | Server |
typedef RaftConsensusInternal::LocalServer | LocalServer |
typedef RaftConsensusInternal::Peer | Peer |
typedef RaftConsensusInternal::Configuration | Configuration |
typedef RaftConsensusInternal::ConfigurationManager | ConfigurationManager |
typedef RaftConsensusInternal::ClusterClock | ClusterClock |
typedef RaftConsensusInternal::Mutex | Mutex |
typedef RaftConsensusInternal::Clock | Clock |
typedef RaftConsensusInternal::TimePoint | TimePoint |
Public Member Functions | |
RaftConsensus (Globals &globals) | |
Constructor. | |
~RaftConsensus () | |
Destructor. | |
void | init () |
Initialize. Must be called before any other method. | |
void | exit () |
Signal the consensus module to exit (shut down threads, etc). | |
void | bootstrapConfiguration () |
Initialize the log with a configuration consisting of just this server. | |
ClientResult | getConfiguration (Protocol::Raft::SimpleConfiguration &configuration, uint64_t &id) const |
Get the current leader's active, committed, simple cluster configuration. | |
std::pair< ClientResult, uint64_t > | getLastCommitIndex () const |
Return the most recent entry ID that has been externalized by the replicated log. | |
std::string | getLeaderHint () const |
Return the network address for a recent leader, if known, or empty string otherwise. | |
Entry | getNextEntry (uint64_t lastIndex) const |
This returns the entry following lastIndex in the replicated log. | |
SnapshotStats::SnapshotStats | getSnapshotStats () const |
Return statistics that may be useful in deciding when to snapshot. | |
void | handleAppendEntries (const Protocol::Raft::AppendEntries::Request &request, Protocol::Raft::AppendEntries::Response &response) |
Process an AppendEntries RPC from another server. | |
void | handleInstallSnapshot (const Protocol::Raft::InstallSnapshot::Request &request, Protocol::Raft::InstallSnapshot::Response &response) |
Process an InstallSnapshot RPC from another server. | |
void | handleRequestVote (const Protocol::Raft::RequestVote::Request &request, Protocol::Raft::RequestVote::Response &response) |
Process a RequestVote RPC from another server. | |
std::pair< ClientResult, uint64_t > | replicate (const Core::Buffer &operation) |
Submit an operation to the replicated log. | |
ClientResult | setConfiguration (const Protocol::Client::SetConfiguration::Request &request, Protocol::Client::SetConfiguration::Response &response) |
Change the cluster's configuration. | |
void | setSupportedStateMachineVersions (uint16_t minSupported, uint16_t maxSupported) |
Register which versions of client commands/behavior the local state machine supports. | |
std::unique_ptr < Storage::SnapshotFile::Writer > | beginSnapshot (uint64_t lastIncludedIndex) |
Start taking a snapshot. | |
void | snapshotDone (uint64_t lastIncludedIndex, std::unique_ptr< Storage::SnapshotFile::Writer > writer) |
Complete taking a snapshot for the log entries in range [1, lastIncludedIndex]. | |
void | updateServerStats (Protocol::ServerStats &serverStats) const |
Add information about the consensus state to the given structure. | |
Public Attributes | |
uint64_t | serverId |
This server's unique ID. | |
std::string | serverAddresses |
The addresses that this server is listening on. | |
Private Types | |
enum | State { FOLLOWER, CANDIDATE, LEADER } |
See state. More... | |
Private Member Functions | |
void | leaderDiskThreadMain () |
Flush log entries to stable storage in the background on leaders. | |
void | timerThreadMain () |
Start new elections when it's time to do so. | |
void | peerThreadMain (std::shared_ptr< Peer > peer) |
Initiate RPCs to a specific server as necessary. | |
void | stateMachineUpdaterThreadMain () |
Append advance state machine version entries to the log as leader once all servers can support a new state machine version. | |
void | stepDownThreadMain () |
Return to follower state when, as leader, this server is not able to communicate with a quorum. | |
void | advanceCommitIndex () |
Move forward commitIndex if possible. | |
void | append (const std::vector< const Storage::Log::Entry * > &entries) |
Append entries to the log, set the configuration if this contains a configuration entry, and notify stateChanged. | |
void | appendEntries (std::unique_lock< Mutex > &lockGuard, Peer &peer) |
Send an AppendEntries RPC to the server (either a heartbeat or containing an entry to replicate). | |
void | installSnapshot (std::unique_lock< Mutex > &lockGuard, Peer &peer) |
Send an InstallSnapshot RPC to the server (containing part of a snapshot file to replicate). | |
void | becomeLeader () |
Transition to being a leader. | |
void | discardUnneededEntries () |
Remove the prefix of the log that is redundant with this server's snapshot. | |
uint64_t | getLastLogTerm () const |
Return the term corresponding to log->getLastLogIndex(). | |
void | interruptAll () |
Notify the stateChanged condition variable and cancel all current RPCs. | |
uint64_t | packEntries (uint64_t nextIndex, Protocol::Raft::AppendEntries::Request &request) const |
Helper for appendEntries() to put the right number of entries into the request. | |
void | readSnapshot () |
Try to read the latest good snapshot from disk. | |
std::pair< ClientResult, uint64_t > | replicateEntry (Storage::Log::Entry &entry, std::unique_lock< Mutex > &lockGuard) |
Append an entry to the log and wait for it to be committed. | |
void | requestVote (std::unique_lock< Mutex > &lockGuard, Peer &peer) |
Send a RequestVote RPC to the server. | |
void | printElectionState () const |
Dumps serverId, currentTerm, state, leaderId, and votedFor to the debug log. | |
void | setElectionTimer () |
Set the timer to start a new election and notify stateChanged. | |
void | startNewElection () |
Transitions to being a candidate from being a follower or candidate. | |
void | stepDown (uint64_t newTerm) |
Transition to being a follower. | |
void | updateLogMetadata () |
Persist critical state, such as the term and the vote, to stable storage. | |
bool | upToDateLeader (std::unique_lock< Mutex > &lockGuard) const |
Return true if every entry that might have already been marked committed on any leader is marked committed on this leader by the time this call returns. | |
Private Attributes | |
const std::chrono::nanoseconds | ELECTION_TIMEOUT |
A follower waits for about this much inactivity before becoming a candidate and starting a new election. | |
const std::chrono::nanoseconds | HEARTBEAT_PERIOD |
A leader sends RPCs at least this often, even if there is no data to send. | |
uint64_t | MAX_LOG_ENTRIES_PER_REQUEST |
A leader will pack at most this many entries into an AppendEntries request message. | |
const std::chrono::nanoseconds | RPC_FAILURE_BACKOFF |
A candidate or leader waits this long after an RPC fails before sending another one, so as to not overwhelm the network with retries. | |
const std::chrono::nanoseconds | STATE_MACHINE_UPDATER_BACKOFF |
How long the state machine updater thread should sleep if: | |
uint64_t | SOFT_RPC_SIZE_LIMIT |
Prefer to keep RPC requests under this size. | |
Globals & | globals |
The LogCabin daemon's top-level objects. | |
Storage::Layout | storageLayout |
Where the files for the log and snapshots are stored. | |
Client::SessionManager | sessionManager |
Used to create new sessions. | |
Mutex | mutex |
This class behaves mostly like a monitor. | |
Core::ConditionVariable | stateChanged |
Notified when basically anything changes. | |
bool | exiting |
Set to true when this class is about to be destroyed. | |
uint32_t | numPeerThreads |
The number of Peer::thread threads that are still using this RaftConsensus object. | |
std::unique_ptr< Storage::Log > | log |
Provides all storage for this server. | |
bool | logSyncQueued |
Flag to indicate that leaderDiskThreadMain should flush recent log writes to stable storage. | |
std::atomic< bool > | leaderDiskThreadWorking |
Used for stepDown() to wait on leaderDiskThread without releasing mutex. | |
std::unique_ptr< Configuration > | configuration |
Defines the servers that are part of the cluster. | |
std::unique_ptr < ConfigurationManager > | configurationManager |
Ensures that 'configuration' reflects the latest state of the log and snapshot. | |
uint64_t | currentTerm |
The latest term this server has seen. | |
State | state |
The server's current role in the cluster (follower, candidate, or leader). | |
uint64_t | lastSnapshotIndex |
The latest good snapshot covers entries 1 through 'lastSnapshotIndex' (inclusive). | |
uint64_t | lastSnapshotTerm |
The term of the last entry covered by the latest good snapshot, or 0 if we have no snapshot. | |
uint64_t | lastSnapshotClusterTime |
The cluster time of the last entry covered by the latest good snapshot, or 0 if we have no snapshot. | |
uint64_t | lastSnapshotBytes |
The size of the latest good snapshot in bytes, or 0 if we have no snapshot. | |
std::unique_ptr < Storage::SnapshotFile::Reader > | snapshotReader |
If not NULL, this is a Storage::SnapshotFile::Reader that covers up through lastSnapshotIndex. | |
std::unique_ptr < Storage::SnapshotFile::Writer > | snapshotWriter |
This is used in handleInstallSnapshot when receiving a snapshot from the current leader. | |
uint64_t | commitIndex |
The largest entry ID for which a quorum is known to have stored the same entry as this server has. | |
uint64_t | leaderId |
The server ID of the leader for this term. | |
uint64_t | votedFor |
The server ID that this server voted for during this term's election, if any. | |
uint64_t | currentEpoch |
A logical clock used to confirm leadership and connectivity. | |
ClusterClock | clusterClock |
Tracks the passage of "cluster time". | |
TimePoint | startElectionAt |
The earliest time at which timerThread should begin a new election with startNewElection(). | |
TimePoint | withholdVotesUntil |
The earliest time at which RequestVote messages should be processed. | |
uint64_t | numEntriesTruncated |
The total number of entries ever truncated from the end of the log. | |
std::thread | leaderDiskThread |
The thread that executes leaderDiskThreadMain() to flush log entries to stable storage in the background on leaders. | |
std::thread | timerThread |
The thread that executes timerThreadMain() to begin new elections after periods of inactivity. | |
std::thread | stateMachineUpdaterThread |
The thread that executes stateMachineUpdaterThreadMain() to append advance state machine version entries to the log on leaders. | |
std::thread | stepDownThread |
The thread that executes stepDownThreadMain() to return to the follower state if the leader becomes disconnected from a quorum of servers. | |
Invariants | invariants |
Friends | |
class | RaftConsensusInternal::LocalServer |
class | RaftConsensusInternal::Peer |
class | RaftConsensusInternal::Invariants |
std::ostream & | operator<< (std::ostream &os, const RaftConsensus &raft) |
Print out the contents of this class for debugging purposes. | |
std::ostream & | operator<< (std::ostream &os, ClientResult clientResult) |
Print out a ClientResult for debugging purposes. | |
std::ostream & | operator<< (std::ostream &os, State state) |
Print out a State for debugging purposes. |
An implementation of the Raft consensus algorithm.
The algorithm is described at https://raftconsensus.github.io . In brief, Raft divides time into terms and elects a leader at the beginning of each term. This election mechanism guarantees that the emerging leader has at least all committed log entries. Once a candidate has received votes from a quorum, it replicates its own log entries in order to the followers. The leader is the only machine that serves client requests.
Definition at line 883 of file RaftConsensus.h.
Definition at line 885 of file RaftConsensus.h.
Definition at line 886 of file RaftConsensus.h.
Definition at line 887 of file RaftConsensus.h.
Definition at line 888 of file RaftConsensus.h.
Definition at line 889 of file RaftConsensus.h.
typedef RaftConsensusInternal::ConfigurationManager LogCabin::Server::RaftConsensus::ConfigurationManager |
Definition at line 890 of file RaftConsensus.h.
Definition at line 891 of file RaftConsensus.h.
Definition at line 892 of file RaftConsensus.h.
Definition at line 893 of file RaftConsensus.h.
Definition at line 894 of file RaftConsensus.h.
SUCCESS |
Request completed successfully. |
FAIL |
Returned by setConfiguration() if the configuration could not be set because the previous configuration was unsuitable or because the new servers could not be caught up. |
RETRY |
Returned by getConfiguration() if the configuration is not stable or is not committed. The client should wait and retry later. |
NOT_LEADER |
Cannot process the request because this server is not leader or temporarily lost its leadership. |
Definition at line 964 of file RaftConsensus.h.
enum LogCabin::Server::RaftConsensus::State [private] |
See state.
FOLLOWER |
A follower does not initiate RPCs. It becomes a candidate with startNewElection() when a timeout elapses without hearing from a candidate/leader. This is the initial state for servers when they start up. |
CANDIDATE |
A candidate sends RequestVote RPCs in an attempt to become a leader. It steps down to be a follower if it discovers a current leader, and it becomes leader if it collects votes from a quorum. |
LEADER |
A leader sends AppendEntries RPCs to replicate its log onto followers. It also sends heartbeats periodically during periods of inactivity to delay its followers from becoming candidates. It steps down to be a follower if it discovers a server with a higher term, if it can't communicate with a quorum, or if it is not part of the latest committed configuration. |
Definition at line 1169 of file RaftConsensus.h.
LogCabin::Server::RaftConsensus::RaftConsensus | ( | Globals & | globals | ) | [explicit] |
Constructor.
globals | Handle to LogCabin's top-level objects. |
Definition at line 933 of file RaftConsensus.cc.
Destructor.
Definition at line 1002 of file RaftConsensus.cc.
Initialize. Must be called before any other method.
Definition at line 1032 of file RaftConsensus.cc.
Signal the consensus module to exit (shut down threads, etc).
Definition at line 1120 of file RaftConsensus.cc.
Initialize the log with a configuration consisting of just this server.
This should be called just once the very first time the very first server in your cluster is started. PANICs if any log entries or snapshots already exist.
Definition at line 1131 of file RaftConsensus.cc.
RaftConsensus::ClientResult LogCabin::Server::RaftConsensus::getConfiguration | ( | Protocol::Raft::SimpleConfiguration & | configuration, |
uint64_t & | id | ||
) | const |
Get the current leader's active, committed, simple cluster configuration.
Definition at line 1159 of file RaftConsensus.cc.
std::pair< RaftConsensus::ClientResult, uint64_t > LogCabin::Server::RaftConsensus::getLastCommitIndex | ( | ) | const |
Return the most recent entry ID that has been externalized by the replicated log.
This is used to provide non-stale reads to the state machine.
Definition at line 1176 of file RaftConsensus.cc.
std::string LogCabin::Server::RaftConsensus::getLeaderHint | ( | ) | const |
Return the network address for a recent leader, if known, or empty string otherwise.
Definition at line 1186 of file RaftConsensus.cc.
RaftConsensus::Entry LogCabin::Server::RaftConsensus::getNextEntry | ( | uint64_t | lastIndex | ) | const |
This returns the entry following lastIndex in the replicated log.
Some entries may be used internally by the consensus module. These will have Entry.hasData set to false. The reason these are exposed to the state machine is that the state machine waits to be caught up to the latest committed entry in the replicated log sometimes, but if that entry was for internal use, it would would otherwise never reach the state machine.
Core::Util::ThreadInterruptedException | Thread should exit. |
Definition at line 1193 of file RaftConsensus.cc.
SnapshotStats::SnapshotStats LogCabin::Server::RaftConsensus::getSnapshotStats | ( | ) | const |
Return statistics that may be useful in deciding when to snapshot.
Definition at line 1248 of file RaftConsensus.cc.
void LogCabin::Server::RaftConsensus::handleAppendEntries | ( | const Protocol::Raft::AppendEntries::Request & | request, |
Protocol::Raft::AppendEntries::Response & | response | ||
) |
Process an AppendEntries RPC from another server.
Called by RaftService.
[in] | request | The request that was received from the other server. |
[out] | response | Where the reply should be placed. |
Definition at line 1263 of file RaftConsensus.cc.
void LogCabin::Server::RaftConsensus::handleInstallSnapshot | ( | const Protocol::Raft::InstallSnapshot::Request & | request, |
Protocol::Raft::InstallSnapshot::Response & | response | ||
) |
Process an InstallSnapshot RPC from another server.
Called by RaftService.
[in] | request | The request that was received from the other server. |
[out] | response | Where the reply should be placed. |
Definition at line 1430 of file RaftConsensus.cc.
void LogCabin::Server::RaftConsensus::handleRequestVote | ( | const Protocol::Raft::RequestVote::Request & | request, |
Protocol::Raft::RequestVote::Response & | response | ||
) |
Process a RequestVote RPC from another server.
Called by RaftService.
[in] | request | The request that was received from the other server. |
[out] | response | Where the reply should be placed. |
Definition at line 1526 of file RaftConsensus.cc.
std::pair< RaftConsensus::ClientResult, uint64_t > LogCabin::Server::RaftConsensus::replicate | ( | const Core::Buffer & | operation | ) |
Submit an operation to the replicated log.
operation | If the cluster accepts this operation, then it will be added to the log and the state machine will eventually apply it. |
Definition at line 1585 of file RaftConsensus.cc.
RaftConsensus::ClientResult LogCabin::Server::RaftConsensus::setConfiguration | ( | const Protocol::Client::SetConfiguration::Request & | request, |
Protocol::Client::SetConfiguration::Response & | response | ||
) |
Change the cluster's configuration.
Returns successfully once operation completed and old servers are no longer needed.
Definition at line 1595 of file RaftConsensus.cc.
void LogCabin::Server::RaftConsensus::setSupportedStateMachineVersions | ( | uint16_t | minSupported, |
uint16_t | maxSupported | ||
) |
Register which versions of client commands/behavior the local state machine supports.
Invoked just once on boot (though calling this multiple times is safe). This information is used to support upgrades to the running replicated state machine version, and it is transmitted to other servers as needed. See stateMachineUpdaterThreadMain.
minSupported | The smallest version the local state machine can support. |
maxSupported | The largest version the local state machine can support. |
Definition at line 1729 of file RaftConsensus.cc.
std::unique_ptr< Storage::SnapshotFile::Writer > LogCabin::Server::RaftConsensus::beginSnapshot | ( | uint64_t | lastIncludedIndex | ) |
Start taking a snapshot.
Called by the state machine when it wants to take a snapshot.
lastIncludedIndex | The snapshot will cover log entries in the range [1, lastIncludedIndex]. lastIncludedIndex must be committed (must have been previously returned by getNextEntry()). |
Definition at line 1746 of file RaftConsensus.cc.
void LogCabin::Server::RaftConsensus::snapshotDone | ( | uint64_t | lastIncludedIndex, |
std::unique_ptr< Storage::SnapshotFile::Writer > | writer | ||
) |
Complete taking a snapshot for the log entries in range [1, lastIncludedIndex].
Called by the state machine when it is done taking a snapshot.
lastIncludedIndex | The snapshot will cover log entries in the range [1, lastIncludedIndex]. |
writer | A writer that has not yet been saved: the consensus module may have to discard the snapshot in case it's gotten a better snapshot from another server. If this snapshot is to be saved (normal case), the consensus module will call save() on it. |
Definition at line 1814 of file RaftConsensus.cc.
void LogCabin::Server::RaftConsensus::updateServerStats | ( | Protocol::ServerStats & | serverStats | ) | const |
Add information about the consensus state to the given structure.
Definition at line 1865 of file RaftConsensus.cc.
void LogCabin::Server::RaftConsensus::leaderDiskThreadMain | ( | ) | [private] |
Flush log entries to stable storage in the background on leaders.
Once they're flushed, it tries to advance the commitIndex. This is the method that leaderDiskThread executes.
Definition at line 2025 of file RaftConsensus.cc.
void LogCabin::Server::RaftConsensus::timerThreadMain | ( | ) | [private] |
Start new elections when it's time to do so.
This is the method that timerThread executes.
Definition at line 2057 of file RaftConsensus.cc.
void LogCabin::Server::RaftConsensus::peerThreadMain | ( | std::shared_ptr< Peer > | peer | ) | [private] |
Initiate RPCs to a specific server as necessary.
One thread for each remote server calls this method (see Peer::thread).
Definition at line 2069 of file RaftConsensus.cc.
void LogCabin::Server::RaftConsensus::stateMachineUpdaterThreadMain | ( | ) | [private] |
Append advance state machine version entries to the log as leader once all servers can support a new state machine version.
Definition at line 1941 of file RaftConsensus.cc.
void LogCabin::Server::RaftConsensus::stepDownThreadMain | ( | ) | [private] |
Return to follower state when, as leader, this server is not able to communicate with a quorum.
This helps two things in cases where a quorum is not available to this leader but clients can still communicate with the leader. First, it returns to clients in a timely manner so that they can try to find another current leader, if one exists. Second, it frees up the resources associated with those client's RPCs on the server. This is the method that stepDownThread executes.
Definition at line 2123 of file RaftConsensus.cc.
void LogCabin::Server::RaftConsensus::advanceCommitIndex | ( | ) | [private] |
Move forward commitIndex if possible.
Called only on leaders after receiving RPC responses and flushing entries to disk. If commitIndex changes, this will notify stateChanged. It will also change the configuration or step down due to a configuration change when appropriate.
commitIndex can jump by more than 1 on new leaders, since their commitIndex may be well out of date until they figure out which log entries their followers have.
Definition at line 2174 of file RaftConsensus.cc.
void LogCabin::Server::RaftConsensus::append | ( | const std::vector< const Storage::Log::Entry * > & | entries | ) | [private] |
Append entries to the log, set the configuration if this contains a configuration entry, and notify stateChanged.
Definition at line 2226 of file RaftConsensus.cc.
void LogCabin::Server::RaftConsensus::appendEntries | ( | std::unique_lock< Mutex > & | lockGuard, |
Peer & | peer | ||
) | [private] |
Send an AppendEntries RPC to the server (either a heartbeat or containing an entry to replicate).
lockGuard | Used to temporarily release the lock while invoking the RPC, so as to allow for some concurrency. |
peer | State used in communicating with the follower, building the RPC request, and processing its result. |
Definition at line 2249 of file RaftConsensus.cc.
void LogCabin::Server::RaftConsensus::installSnapshot | ( | std::unique_lock< Mutex > & | lockGuard, |
Peer & | peer | ||
) | [private] |
Send an InstallSnapshot RPC to the server (containing part of a snapshot file to replicate).
lockGuard | Used to temporarily release the lock while invoking the RPC, so as to allow for some concurrency. |
peer | State used in communicating with the follower, building the RPC request, and processing its result. |
Definition at line 2387 of file RaftConsensus.cc.
void LogCabin::Server::RaftConsensus::becomeLeader | ( | ) | [private] |
Transition to being a leader.
This is called when a candidate has received votes from a quorum.
Definition at line 2493 of file RaftConsensus.cc.
void LogCabin::Server::RaftConsensus::discardUnneededEntries | ( | ) | [private] |
Remove the prefix of the log that is redundant with this server's snapshot.
Definition at line 2531 of file RaftConsensus.cc.
uint64_t LogCabin::Server::RaftConsensus::getLastLogTerm | ( | ) | const [private] |
Return the term corresponding to log->getLastLogIndex().
This may come from the log, from the snapshot, or it may be 0.
Definition at line 2550 of file RaftConsensus.cc.
void LogCabin::Server::RaftConsensus::interruptAll | ( | ) | [private] |
Notify the stateChanged condition variable and cancel all current RPCs.
This should be called when stepping down, starting a new election, becoming leader, or exiting.
Definition at line 2562 of file RaftConsensus.cc.
uint64_t LogCabin::Server::RaftConsensus::packEntries | ( | uint64_t | nextIndex, |
Protocol::Raft::AppendEntries::Request & | request | ||
) | const [private] |
Helper for appendEntries() to put the right number of entries into the request.
nextIndex | First entry to send to the follower. |
request | AppendEntries request ProtoBuf in which to pack the entries. |
Definition at line 2571 of file RaftConsensus.cc.
void LogCabin::Server::RaftConsensus::readSnapshot | ( | ) | [private] |
Try to read the latest good snapshot from disk.
Loads the header of the snapshot file, which is used internally by the consensus module. The rest of the file reader is kept in snapshotReader for the state machine to process upon a future getNextEntry().
If the snapshot file on disk is no good, snapshotReader will remain NULL.
Definition at line 2635 of file RaftConsensus.cc.
std::pair< RaftConsensus::ClientResult, uint64_t > LogCabin::Server::RaftConsensus::replicateEntry | ( | Storage::Log::Entry & | entry, |
std::unique_lock< Mutex > & | lockGuard | ||
) | [private] |
Append an entry to the log and wait for it to be committed.
Definition at line 2742 of file RaftConsensus.cc.
void LogCabin::Server::RaftConsensus::requestVote | ( | std::unique_lock< Mutex > & | lockGuard, |
Peer & | peer | ||
) | [private] |
Send a RequestVote RPC to the server.
This is used by candidates to request a server's vote and by new leaders to retrieve information about the server's log.
lockGuard | Used to temporarily release the lock while invoking the RPC, so as to allow for some concurrency. |
peer | State used in communicating with the follower, building the RPC request, and processing its result. |
Definition at line 2762 of file RaftConsensus.cc.
void LogCabin::Server::RaftConsensus::printElectionState | ( | ) | const [private] |
Dumps serverId, currentTerm, state, leaderId, and votedFor to the debug log.
This is intended to be easy to grep and parse.
Definition at line 2835 of file RaftConsensus.cc.
void LogCabin::Server::RaftConsensus::setElectionTimer | ( | ) | [private] |
Set the timer to start a new election and notify stateChanged.
The timer is set for ELECTION_TIMEOUT plus some random jitter from now.
Definition at line 2822 of file RaftConsensus.cc.
void LogCabin::Server::RaftConsensus::startNewElection | ( | ) | [private] |
Transitions to being a candidate from being a follower or candidate.
This is called when a timeout elapses. If the configuration is blank, it does nothing. Moreover, if this server forms a quorum (it is the only server in the configuration), this will immediately transition to leader.
Definition at line 2858 of file RaftConsensus.cc.
void LogCabin::Server::RaftConsensus::stepDown | ( | uint64_t | newTerm | ) | [private] |
Transition to being a follower.
This is called when we receive an RPC request with newer term, receive an RPC response indicating our term is stale, or discover a current leader while a candidate. In this last case, newTerm will be the same as currentTerm. This will call setElectionTimer for you if no election timer is currently set.
Definition at line 2907 of file RaftConsensus.cc.
void LogCabin::Server::RaftConsensus::updateLogMetadata | ( | ) | [private] |
Persist critical state, such as the term and the vote, to stable storage.
Definition at line 2955 of file RaftConsensus.cc.
bool LogCabin::Server::RaftConsensus::upToDateLeader | ( | std::unique_lock< Mutex > & | lockGuard | ) | const [private] |
Return true if every entry that might have already been marked committed on any leader is marked committed on this leader by the time this call returns.
This is used to provide non-stale read operations to clients. It gives up after ELECTION_TIMEOUT, since stepDownThread will return to the follower state after that time.
Definition at line 2965 of file RaftConsensus.cc.
friend class RaftConsensusInternal::LocalServer [friend] |
Definition at line 1717 of file RaftConsensus.h.
friend class RaftConsensusInternal::Peer [friend] |
Definition at line 1718 of file RaftConsensus.h.
friend class RaftConsensusInternal::Invariants [friend] |
Definition at line 1719 of file RaftConsensus.h.
std::ostream& operator<< | ( | std::ostream & | os, |
const RaftConsensus & | raft | ||
) | [friend] |
Print out the contents of this class for debugging purposes.
Definition at line 1905 of file RaftConsensus.cc.
std::ostream& operator<< | ( | std::ostream & | os, |
RaftConsensus::ClientResult | clientResult | ||
) | [friend] |
Print out a ClientResult for debugging purposes.
Definition at line 2998 of file RaftConsensus.cc.
std::ostream& operator<< | ( | std::ostream & | os, |
RaftConsensus::State | state | ||
) | [friend] |
Print out a State for debugging purposes.
Definition at line 3019 of file RaftConsensus.cc.
const std::chrono::nanoseconds LogCabin::Server::RaftConsensus::ELECTION_TIMEOUT [private] |
A follower waits for about this much inactivity before becoming a candidate and starting a new election.
Definition at line 1418 of file RaftConsensus.h.
const std::chrono::nanoseconds LogCabin::Server::RaftConsensus::HEARTBEAT_PERIOD [private] |
A leader sends RPCs at least this often, even if there is no data to send.
Definition at line 1424 of file RaftConsensus.h.
uint64_t LogCabin::Server::RaftConsensus::MAX_LOG_ENTRIES_PER_REQUEST [private] |
A leader will pack at most this many entries into an AppendEntries request message.
This helps bound processing time when entries are very small in size. Const except for unit tests.
Definition at line 1432 of file RaftConsensus.h.
const std::chrono::nanoseconds LogCabin::Server::RaftConsensus::RPC_FAILURE_BACKOFF [private] |
A candidate or leader waits this long after an RPC fails before sending another one, so as to not overwhelm the network with retries.
Definition at line 1438 of file RaftConsensus.h.
const std::chrono::nanoseconds LogCabin::Server::RaftConsensus::STATE_MACHINE_UPDATER_BACKOFF [private] |
How long the state machine updater thread should sleep if:
Definition at line 1448 of file RaftConsensus.h.
uint64_t LogCabin::Server::RaftConsensus::SOFT_RPC_SIZE_LIMIT [private] |
Prefer to keep RPC requests under this size.
Const except for unit tests.
Definition at line 1454 of file RaftConsensus.h.
This server's unique ID.
Not available until init() is called.
Definition at line 1460 of file RaftConsensus.h.
std::string LogCabin::Server::RaftConsensus::serverAddresses |
The addresses that this server is listening on.
Not available until init() is called.
Definition at line 1466 of file RaftConsensus.h.
Globals& LogCabin::Server::RaftConsensus::globals [private] |
The LogCabin daemon's top-level objects.
Definition at line 1473 of file RaftConsensus.h.
Where the files for the log and snapshots are stored.
Definition at line 1478 of file RaftConsensus.h.
Used to create new sessions.
Definition at line 1483 of file RaftConsensus.h.
Mutex LogCabin::Server::RaftConsensus::mutex [mutable, private] |
This class behaves mostly like a monitor.
This protects all the state in this class and almost all of the Peer class (with some documented exceptions).
Definition at line 1490 of file RaftConsensus.h.
Core::ConditionVariable LogCabin::Server::RaftConsensus::stateChanged [mutable, private] |
Notified when basically anything changes.
Specifically, this is notified when any of the following events occur:
Definition at line 1509 of file RaftConsensus.h.
bool LogCabin::Server::RaftConsensus::exiting [private] |
Set to true when this class is about to be destroyed.
When this is true, threads must exit right away and no more RPCs should be sent or processed.
Definition at line 1516 of file RaftConsensus.h.
uint32_t LogCabin::Server::RaftConsensus::numPeerThreads [private] |
The number of Peer::thread threads that are still using this RaftConsensus object.
When they exit, they decrement this and notify stateChanged.
Definition at line 1523 of file RaftConsensus.h.
std::unique_ptr<Storage::Log> LogCabin::Server::RaftConsensus::log [private] |
Provides all storage for this server.
Keeps track of all log entries and some additional metadata.
If you modify this, be sure to keep configurationManager consistent.
Definition at line 1531 of file RaftConsensus.h.
bool LogCabin::Server::RaftConsensus::logSyncQueued [private] |
Flag to indicate that leaderDiskThreadMain should flush recent log writes to stable storage.
This is always false for followers and candidates and is only used for leaders.
When a server steps down, it waits for all syncs to complete, that way followers can assume that all of their log entries are durable when replying to leaders.
Definition at line 1542 of file RaftConsensus.h.
std::atomic<bool> LogCabin::Server::RaftConsensus::leaderDiskThreadWorking [private] |
Used for stepDown() to wait on leaderDiskThread without releasing mutex.
This is true while leaderDiskThread is writing to disk. It's set to true while holding mutex; set to false without mutex.
Definition at line 1549 of file RaftConsensus.h.
std::unique_ptr<Configuration> LogCabin::Server::RaftConsensus::configuration [private] |
Defines the servers that are part of the cluster.
See Configuration.
Definition at line 1554 of file RaftConsensus.h.
std::unique_ptr<ConfigurationManager> LogCabin::Server::RaftConsensus::configurationManager [private] |
Ensures that 'configuration' reflects the latest state of the log and snapshot.
Definition at line 1560 of file RaftConsensus.h.
uint64_t LogCabin::Server::RaftConsensus::currentTerm [private] |
The latest term this server has seen.
This value monotonically increases over time. It gets updated in stepDown(), startNewElection(), and when a candidate receives a vote response with a newer term.
Definition at line 1570 of file RaftConsensus.h.
State LogCabin::Server::RaftConsensus::state [private] |
The server's current role in the cluster (follower, candidate, or leader).
See State.
Definition at line 1576 of file RaftConsensus.h.
uint64_t LogCabin::Server::RaftConsensus::lastSnapshotIndex [private] |
The latest good snapshot covers entries 1 through 'lastSnapshotIndex' (inclusive).
It is known that these are committed. They are safe to remove from the log, but it may be advantageous to keep them around for a little while (to avoid shipping snapshots to straggling followers). Thus, the log may or may not have some of the entries in this range.
Definition at line 1585 of file RaftConsensus.h.
uint64_t LogCabin::Server::RaftConsensus::lastSnapshotTerm [private] |
The term of the last entry covered by the latest good snapshot, or 0 if we have no snapshot.
Definition at line 1591 of file RaftConsensus.h.
uint64_t LogCabin::Server::RaftConsensus::lastSnapshotClusterTime [private] |
The cluster time of the last entry covered by the latest good snapshot, or 0 if we have no snapshot.
Definition at line 1597 of file RaftConsensus.h.
uint64_t LogCabin::Server::RaftConsensus::lastSnapshotBytes [private] |
The size of the latest good snapshot in bytes, or 0 if we have no snapshot.
Definition at line 1603 of file RaftConsensus.h.
std::unique_ptr<Storage::SnapshotFile::Reader> LogCabin::Server::RaftConsensus::snapshotReader [mutable, private] |
If not NULL, this is a Storage::SnapshotFile::Reader that covers up through lastSnapshotIndex.
This is ready for the state machine to process and is returned to the state machine in getNextEntry(). It's just a cache which can be repopulated with readSnapshot().
Definition at line 1611 of file RaftConsensus.h.
std::unique_ptr<Storage::SnapshotFile::Writer> LogCabin::Server::RaftConsensus::snapshotWriter [private] |
This is used in handleInstallSnapshot when receiving a snapshot from the current leader.
The leader is assumed to send at most one snapshot at a time, and any partial snapshots here are discarded when the term changes.
Definition at line 1619 of file RaftConsensus.h.
uint64_t LogCabin::Server::RaftConsensus::commitIndex [private] |
The largest entry ID for which a quorum is known to have stored the same entry as this server has.
Entries 1 through commitIndex as stored in this server's log are guaranteed to never change. This value will monotonically increase over time.
Definition at line 1627 of file RaftConsensus.h.
uint64_t LogCabin::Server::RaftConsensus::leaderId [private] |
The server ID of the leader for this term.
This is used to help point clients to the right server. The special value 0 means either there is no leader for this term yet or this server does not know who it is yet.
Definition at line 1634 of file RaftConsensus.h.
uint64_t LogCabin::Server::RaftConsensus::votedFor [private] |
The server ID that this server voted for during this term's election, if any.
The special value 0 means no vote has been given out during this term.
Definition at line 1644 of file RaftConsensus.h.
uint64_t LogCabin::Server::RaftConsensus::currentEpoch [mutable, private] |
A logical clock used to confirm leadership and connectivity.
Definition at line 1650 of file RaftConsensus.h.
Tracks the passage of "cluster time".
See ClusterClock.
Definition at line 1655 of file RaftConsensus.h.
The earliest time at which timerThread should begin a new election with startNewElection().
It is safe for increases to startElectionAt to not notify the condition variable. Decreases to this value, however, must notify the condition variable to make sure the timerThread gets woken in a timely manner. Unfortunately, startElectionAt does not monotonically increase because of the random jitter that is applied to the follower timeout, and it would reduce the jitter's effectiveness for the thread to wait as long as the largest startElectionAt value.
Definition at line 1669 of file RaftConsensus.h.
The earliest time at which RequestVote messages should be processed.
Until this time, they are rejected, as processing them risks causing the cluster leader to needlessly step down. For more motivation, see the "disruptive servers" issue in membership changes described in the Raft paper/thesis.
This is set to the current time + an election timeout when a heartbeat is received, and it's set to infinity for leaders (who begin processing RequestVote messages again immediately when they step down).
Definition at line 1682 of file RaftConsensus.h.
uint64_t LogCabin::Server::RaftConsensus::numEntriesTruncated [private] |
The total number of entries ever truncated from the end of the log.
This happens only when a new leader tells this server to remove extraneous uncommitted entries from its log.
Definition at line 1689 of file RaftConsensus.h.
std::thread LogCabin::Server::RaftConsensus::leaderDiskThread [private] |
The thread that executes leaderDiskThreadMain() to flush log entries to stable storage in the background on leaders.
Definition at line 1695 of file RaftConsensus.h.
std::thread LogCabin::Server::RaftConsensus::timerThread [private] |
The thread that executes timerThreadMain() to begin new elections after periods of inactivity.
Definition at line 1701 of file RaftConsensus.h.
std::thread LogCabin::Server::RaftConsensus::stateMachineUpdaterThread [private] |
The thread that executes stateMachineUpdaterThreadMain() to append advance state machine version entries to the log on leaders.
Definition at line 1707 of file RaftConsensus.h.
std::thread LogCabin::Server::RaftConsensus::stepDownThread [private] |
The thread that executes stepDownThreadMain() to return to the follower state if the leader becomes disconnected from a quorum of servers.
Definition at line 1713 of file RaftConsensus.h.
Definition at line 1715 of file RaftConsensus.h.