LogCabin
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Defines
Classes | Public Types | Public Member Functions | Public Attributes | Private Types | Private Member Functions | Private Attributes | Friends
LogCabin::Server::RaftConsensus Class Reference

An implementation of the Raft consensus algorithm. More...

#include <RaftConsensus.h>

List of all members.

Classes

struct  Entry
 This is returned by getNextEntry(). More...

Public Types

enum  ClientResult {
  SUCCESS,
  FAIL,
  RETRY,
  NOT_LEADER
}
typedef
RaftConsensusInternal::Invariants 
Invariants
typedef
RaftConsensusInternal::Server 
Server
typedef
RaftConsensusInternal::LocalServer 
LocalServer
typedef RaftConsensusInternal::Peer Peer
typedef
RaftConsensusInternal::Configuration 
Configuration
typedef
RaftConsensusInternal::ConfigurationManager 
ConfigurationManager
typedef
RaftConsensusInternal::ClusterClock 
ClusterClock
typedef
RaftConsensusInternal::Mutex 
Mutex
typedef
RaftConsensusInternal::Clock 
Clock
typedef
RaftConsensusInternal::TimePoint 
TimePoint

Public Member Functions

 RaftConsensus (Globals &globals)
 Constructor.
 ~RaftConsensus ()
 Destructor.
void init ()
 Initialize. Must be called before any other method.
void exit ()
 Signal the consensus module to exit (shut down threads, etc).
void bootstrapConfiguration ()
 Initialize the log with a configuration consisting of just this server.
ClientResult getConfiguration (Protocol::Raft::SimpleConfiguration &configuration, uint64_t &id) const
 Get the current leader's active, committed, simple cluster configuration.
std::pair< ClientResult, uint64_t > getLastCommitIndex () const
 Return the most recent entry ID that has been externalized by the replicated log.
std::string getLeaderHint () const
 Return the network address for a recent leader, if known, or empty string otherwise.
Entry getNextEntry (uint64_t lastIndex) const
 This returns the entry following lastIndex in the replicated log.
SnapshotStats::SnapshotStats getSnapshotStats () const
 Return statistics that may be useful in deciding when to snapshot.
void handleAppendEntries (const Protocol::Raft::AppendEntries::Request &request, Protocol::Raft::AppendEntries::Response &response)
 Process an AppendEntries RPC from another server.
void handleInstallSnapshot (const Protocol::Raft::InstallSnapshot::Request &request, Protocol::Raft::InstallSnapshot::Response &response)
 Process an InstallSnapshot RPC from another server.
void handleRequestVote (const Protocol::Raft::RequestVote::Request &request, Protocol::Raft::RequestVote::Response &response)
 Process a RequestVote RPC from another server.
std::pair< ClientResult, uint64_t > replicate (const Core::Buffer &operation)
 Submit an operation to the replicated log.
ClientResult setConfiguration (const Protocol::Client::SetConfiguration::Request &request, Protocol::Client::SetConfiguration::Response &response)
 Change the cluster's configuration.
void setSupportedStateMachineVersions (uint16_t minSupported, uint16_t maxSupported)
 Register which versions of client commands/behavior the local state machine supports.
std::unique_ptr
< Storage::SnapshotFile::Writer
beginSnapshot (uint64_t lastIncludedIndex)
 Start taking a snapshot.
void snapshotDone (uint64_t lastIncludedIndex, std::unique_ptr< Storage::SnapshotFile::Writer > writer)
 Complete taking a snapshot for the log entries in range [1, lastIncludedIndex].
void updateServerStats (Protocol::ServerStats &serverStats) const
 Add information about the consensus state to the given structure.

Public Attributes

uint64_t serverId
 This server's unique ID.
std::string serverAddresses
 The addresses that this server is listening on.

Private Types

enum  State {
  FOLLOWER,
  CANDIDATE,
  LEADER
}
 See state. More...

Private Member Functions

void leaderDiskThreadMain ()
 Flush log entries to stable storage in the background on leaders.
void timerThreadMain ()
 Start new elections when it's time to do so.
void peerThreadMain (std::shared_ptr< Peer > peer)
 Initiate RPCs to a specific server as necessary.
void stateMachineUpdaterThreadMain ()
 Append advance state machine version entries to the log as leader once all servers can support a new state machine version.
void stepDownThreadMain ()
 Return to follower state when, as leader, this server is not able to communicate with a quorum.
void advanceCommitIndex ()
 Move forward commitIndex if possible.
void append (const std::vector< const Storage::Log::Entry * > &entries)
 Append entries to the log, set the configuration if this contains a configuration entry, and notify stateChanged.
void appendEntries (std::unique_lock< Mutex > &lockGuard, Peer &peer)
 Send an AppendEntries RPC to the server (either a heartbeat or containing an entry to replicate).
void installSnapshot (std::unique_lock< Mutex > &lockGuard, Peer &peer)
 Send an InstallSnapshot RPC to the server (containing part of a snapshot file to replicate).
void becomeLeader ()
 Transition to being a leader.
void discardUnneededEntries ()
 Remove the prefix of the log that is redundant with this server's snapshot.
uint64_t getLastLogTerm () const
 Return the term corresponding to log->getLastLogIndex().
void interruptAll ()
 Notify the stateChanged condition variable and cancel all current RPCs.
uint64_t packEntries (uint64_t nextIndex, Protocol::Raft::AppendEntries::Request &request) const
 Helper for appendEntries() to put the right number of entries into the request.
void readSnapshot ()
 Try to read the latest good snapshot from disk.
std::pair< ClientResult, uint64_t > replicateEntry (Storage::Log::Entry &entry, std::unique_lock< Mutex > &lockGuard)
 Append an entry to the log and wait for it to be committed.
void requestVote (std::unique_lock< Mutex > &lockGuard, Peer &peer)
 Send a RequestVote RPC to the server.
void printElectionState () const
 Dumps serverId, currentTerm, state, leaderId, and votedFor to the debug log.
void setElectionTimer ()
 Set the timer to start a new election and notify stateChanged.
void startNewElection ()
 Transitions to being a candidate from being a follower or candidate.
void stepDown (uint64_t newTerm)
 Transition to being a follower.
void updateLogMetadata ()
 Persist critical state, such as the term and the vote, to stable storage.
bool upToDateLeader (std::unique_lock< Mutex > &lockGuard) const
 Return true if every entry that might have already been marked committed on any leader is marked committed on this leader by the time this call returns.

Private Attributes

const std::chrono::nanoseconds ELECTION_TIMEOUT
 A follower waits for about this much inactivity before becoming a candidate and starting a new election.
const std::chrono::nanoseconds HEARTBEAT_PERIOD
 A leader sends RPCs at least this often, even if there is no data to send.
uint64_t MAX_LOG_ENTRIES_PER_REQUEST
 A leader will pack at most this many entries into an AppendEntries request message.
const std::chrono::nanoseconds RPC_FAILURE_BACKOFF
 A candidate or leader waits this long after an RPC fails before sending another one, so as to not overwhelm the network with retries.
const std::chrono::nanoseconds STATE_MACHINE_UPDATER_BACKOFF
 How long the state machine updater thread should sleep if:
uint64_t SOFT_RPC_SIZE_LIMIT
 Prefer to keep RPC requests under this size.
Globalsglobals
 The LogCabin daemon's top-level objects.
Storage::Layout storageLayout
 Where the files for the log and snapshots are stored.
Client::SessionManager sessionManager
 Used to create new sessions.
Mutex mutex
 This class behaves mostly like a monitor.
Core::ConditionVariable stateChanged
 Notified when basically anything changes.
bool exiting
 Set to true when this class is about to be destroyed.
uint32_t numPeerThreads
 The number of Peer::thread threads that are still using this RaftConsensus object.
std::unique_ptr< Storage::Loglog
 Provides all storage for this server.
bool logSyncQueued
 Flag to indicate that leaderDiskThreadMain should flush recent log writes to stable storage.
std::atomic< bool > leaderDiskThreadWorking
 Used for stepDown() to wait on leaderDiskThread without releasing mutex.
std::unique_ptr< Configurationconfiguration
 Defines the servers that are part of the cluster.
std::unique_ptr
< ConfigurationManager
configurationManager
 Ensures that 'configuration' reflects the latest state of the log and snapshot.
uint64_t currentTerm
 The latest term this server has seen.
State state
 The server's current role in the cluster (follower, candidate, or leader).
uint64_t lastSnapshotIndex
 The latest good snapshot covers entries 1 through 'lastSnapshotIndex' (inclusive).
uint64_t lastSnapshotTerm
 The term of the last entry covered by the latest good snapshot, or 0 if we have no snapshot.
uint64_t lastSnapshotClusterTime
 The cluster time of the last entry covered by the latest good snapshot, or 0 if we have no snapshot.
uint64_t lastSnapshotBytes
 The size of the latest good snapshot in bytes, or 0 if we have no snapshot.
std::unique_ptr
< Storage::SnapshotFile::Reader
snapshotReader
 If not NULL, this is a Storage::SnapshotFile::Reader that covers up through lastSnapshotIndex.
std::unique_ptr
< Storage::SnapshotFile::Writer
snapshotWriter
 This is used in handleInstallSnapshot when receiving a snapshot from the current leader.
uint64_t commitIndex
 The largest entry ID for which a quorum is known to have stored the same entry as this server has.
uint64_t leaderId
 The server ID of the leader for this term.
uint64_t votedFor
 The server ID that this server voted for during this term's election, if any.
uint64_t currentEpoch
 A logical clock used to confirm leadership and connectivity.
ClusterClock clusterClock
 Tracks the passage of "cluster time".
TimePoint startElectionAt
 The earliest time at which timerThread should begin a new election with startNewElection().
TimePoint withholdVotesUntil
 The earliest time at which RequestVote messages should be processed.
uint64_t numEntriesTruncated
 The total number of entries ever truncated from the end of the log.
std::thread leaderDiskThread
 The thread that executes leaderDiskThreadMain() to flush log entries to stable storage in the background on leaders.
std::thread timerThread
 The thread that executes timerThreadMain() to begin new elections after periods of inactivity.
std::thread stateMachineUpdaterThread
 The thread that executes stateMachineUpdaterThreadMain() to append advance state machine version entries to the log on leaders.
std::thread stepDownThread
 The thread that executes stepDownThreadMain() to return to the follower state if the leader becomes disconnected from a quorum of servers.
Invariants invariants

Friends

class RaftConsensusInternal::LocalServer
class RaftConsensusInternal::Peer
class RaftConsensusInternal::Invariants
std::ostream & operator<< (std::ostream &os, const RaftConsensus &raft)
 Print out the contents of this class for debugging purposes.
std::ostream & operator<< (std::ostream &os, ClientResult clientResult)
 Print out a ClientResult for debugging purposes.
std::ostream & operator<< (std::ostream &os, State state)
 Print out a State for debugging purposes.

Detailed Description

An implementation of the Raft consensus algorithm.

The algorithm is described at https://raftconsensus.github.io . In brief, Raft divides time into terms and elects a leader at the beginning of each term. This election mechanism guarantees that the emerging leader has at least all committed log entries. Once a candidate has received votes from a quorum, it replicates its own log entries in order to the followers. The leader is the only machine that serves client requests.

Definition at line 883 of file RaftConsensus.h.


Member Typedef Documentation

Definition at line 885 of file RaftConsensus.h.

Definition at line 886 of file RaftConsensus.h.

Definition at line 887 of file RaftConsensus.h.

Definition at line 888 of file RaftConsensus.h.

Definition at line 889 of file RaftConsensus.h.

Definition at line 890 of file RaftConsensus.h.

Definition at line 891 of file RaftConsensus.h.

Definition at line 892 of file RaftConsensus.h.

Definition at line 893 of file RaftConsensus.h.

Definition at line 894 of file RaftConsensus.h.


Member Enumeration Documentation

Enumerator:
SUCCESS 

Request completed successfully.

FAIL 

Returned by setConfiguration() if the configuration could not be set because the previous configuration was unsuitable or because the new servers could not be caught up.

RETRY 

Returned by getConfiguration() if the configuration is not stable or is not committed.

The client should wait and retry later.

NOT_LEADER 

Cannot process the request because this server is not leader or temporarily lost its leadership.

Definition at line 964 of file RaftConsensus.h.

See state.

Enumerator:
FOLLOWER 

A follower does not initiate RPCs.

It becomes a candidate with startNewElection() when a timeout elapses without hearing from a candidate/leader. This is the initial state for servers when they start up.

CANDIDATE 

A candidate sends RequestVote RPCs in an attempt to become a leader.

It steps down to be a follower if it discovers a current leader, and it becomes leader if it collects votes from a quorum.

LEADER 

A leader sends AppendEntries RPCs to replicate its log onto followers.

It also sends heartbeats periodically during periods of inactivity to delay its followers from becoming candidates. It steps down to be a follower if it discovers a server with a higher term, if it can't communicate with a quorum, or if it is not part of the latest committed configuration.

Definition at line 1169 of file RaftConsensus.h.


Constructor & Destructor Documentation

Constructor.

Parameters:
globalsHandle to LogCabin's top-level objects.

Definition at line 933 of file RaftConsensus.cc.

Destructor.

Definition at line 1002 of file RaftConsensus.cc.


Member Function Documentation

Initialize. Must be called before any other method.

Definition at line 1032 of file RaftConsensus.cc.

Signal the consensus module to exit (shut down threads, etc).

Definition at line 1120 of file RaftConsensus.cc.

Initialize the log with a configuration consisting of just this server.

This should be called just once the very first time the very first server in your cluster is started. PANICs if any log entries or snapshots already exist.

Definition at line 1131 of file RaftConsensus.cc.

RaftConsensus::ClientResult LogCabin::Server::RaftConsensus::getConfiguration ( Protocol::Raft::SimpleConfiguration &  configuration,
uint64_t &  id 
) const

Get the current leader's active, committed, simple cluster configuration.

Definition at line 1159 of file RaftConsensus.cc.

Return the most recent entry ID that has been externalized by the replicated log.

This is used to provide non-stale reads to the state machine.

Definition at line 1176 of file RaftConsensus.cc.

Return the network address for a recent leader, if known, or empty string otherwise.

Definition at line 1186 of file RaftConsensus.cc.

This returns the entry following lastIndex in the replicated log.

Some entries may be used internally by the consensus module. These will have Entry.hasData set to false. The reason these are exposed to the state machine is that the state machine waits to be caught up to the latest committed entry in the replicated log sometimes, but if that entry was for internal use, it would would otherwise never reach the state machine.

Exceptions:
Core::Util::ThreadInterruptedExceptionThread should exit.

Definition at line 1193 of file RaftConsensus.cc.

SnapshotStats::SnapshotStats LogCabin::Server::RaftConsensus::getSnapshotStats ( ) const

Return statistics that may be useful in deciding when to snapshot.

Definition at line 1248 of file RaftConsensus.cc.

void LogCabin::Server::RaftConsensus::handleAppendEntries ( const Protocol::Raft::AppendEntries::Request &  request,
Protocol::Raft::AppendEntries::Response &  response 
)

Process an AppendEntries RPC from another server.

Called by RaftService.

Parameters:
[in]requestThe request that was received from the other server.
[out]responseWhere the reply should be placed.

Definition at line 1263 of file RaftConsensus.cc.

void LogCabin::Server::RaftConsensus::handleInstallSnapshot ( const Protocol::Raft::InstallSnapshot::Request &  request,
Protocol::Raft::InstallSnapshot::Response &  response 
)

Process an InstallSnapshot RPC from another server.

Called by RaftService.

Parameters:
[in]requestThe request that was received from the other server.
[out]responseWhere the reply should be placed.

Definition at line 1430 of file RaftConsensus.cc.

void LogCabin::Server::RaftConsensus::handleRequestVote ( const Protocol::Raft::RequestVote::Request &  request,
Protocol::Raft::RequestVote::Response &  response 
)

Process a RequestVote RPC from another server.

Called by RaftService.

Parameters:
[in]requestThe request that was received from the other server.
[out]responseWhere the reply should be placed.

Definition at line 1526 of file RaftConsensus.cc.

Submit an operation to the replicated log.

Parameters:
operationIf the cluster accepts this operation, then it will be added to the log and the state machine will eventually apply it.
Returns:
First component is status code. If SUCCESS, second component is the log index at which the entry has been committed to the replicated log.

Definition at line 1585 of file RaftConsensus.cc.

RaftConsensus::ClientResult LogCabin::Server::RaftConsensus::setConfiguration ( const Protocol::Client::SetConfiguration::Request &  request,
Protocol::Client::SetConfiguration::Response &  response 
)

Change the cluster's configuration.

Returns successfully once operation completed and old servers are no longer needed.

Returns:
NOT_LEADER, or other code with response filled in.

Definition at line 1595 of file RaftConsensus.cc.

void LogCabin::Server::RaftConsensus::setSupportedStateMachineVersions ( uint16_t  minSupported,
uint16_t  maxSupported 
)

Register which versions of client commands/behavior the local state machine supports.

Invoked just once on boot (though calling this multiple times is safe). This information is used to support upgrades to the running replicated state machine version, and it is transmitted to other servers as needed. See stateMachineUpdaterThreadMain.

Parameters:
minSupportedThe smallest version the local state machine can support.
maxSupportedThe largest version the local state machine can support.

Definition at line 1729 of file RaftConsensus.cc.

std::unique_ptr< Storage::SnapshotFile::Writer > LogCabin::Server::RaftConsensus::beginSnapshot ( uint64_t  lastIncludedIndex)

Start taking a snapshot.

Called by the state machine when it wants to take a snapshot.

Parameters:
lastIncludedIndexThe snapshot will cover log entries in the range [1, lastIncludedIndex]. lastIncludedIndex must be committed (must have been previously returned by getNextEntry()).
Returns:
A file the state machine can dump its snapshot into.

Definition at line 1746 of file RaftConsensus.cc.

void LogCabin::Server::RaftConsensus::snapshotDone ( uint64_t  lastIncludedIndex,
std::unique_ptr< Storage::SnapshotFile::Writer writer 
)

Complete taking a snapshot for the log entries in range [1, lastIncludedIndex].

Called by the state machine when it is done taking a snapshot.

Parameters:
lastIncludedIndexThe snapshot will cover log entries in the range [1, lastIncludedIndex].
writerA writer that has not yet been saved: the consensus module may have to discard the snapshot in case it's gotten a better snapshot from another server. If this snapshot is to be saved (normal case), the consensus module will call save() on it.

Definition at line 1814 of file RaftConsensus.cc.

void LogCabin::Server::RaftConsensus::updateServerStats ( Protocol::ServerStats &  serverStats) const

Add information about the consensus state to the given structure.

Definition at line 1865 of file RaftConsensus.cc.

Flush log entries to stable storage in the background on leaders.

Once they're flushed, it tries to advance the commitIndex. This is the method that leaderDiskThread executes.

Definition at line 2025 of file RaftConsensus.cc.

Start new elections when it's time to do so.

This is the method that timerThread executes.

Definition at line 2057 of file RaftConsensus.cc.

void LogCabin::Server::RaftConsensus::peerThreadMain ( std::shared_ptr< Peer peer) [private]

Initiate RPCs to a specific server as necessary.

One thread for each remote server calls this method (see Peer::thread).

Definition at line 2069 of file RaftConsensus.cc.

Append advance state machine version entries to the log as leader once all servers can support a new state machine version.

Definition at line 1941 of file RaftConsensus.cc.

Return to follower state when, as leader, this server is not able to communicate with a quorum.

This helps two things in cases where a quorum is not available to this leader but clients can still communicate with the leader. First, it returns to clients in a timely manner so that they can try to find another current leader, if one exists. Second, it frees up the resources associated with those client's RPCs on the server. This is the method that stepDownThread executes.

Definition at line 2123 of file RaftConsensus.cc.

Move forward commitIndex if possible.

Called only on leaders after receiving RPC responses and flushing entries to disk. If commitIndex changes, this will notify stateChanged. It will also change the configuration or step down due to a configuration change when appropriate.

commitIndex can jump by more than 1 on new leaders, since their commitIndex may be well out of date until they figure out which log entries their followers have.

Precondition:
state is LEADER.

Definition at line 2174 of file RaftConsensus.cc.

void LogCabin::Server::RaftConsensus::append ( const std::vector< const Storage::Log::Entry * > &  entries) [private]

Append entries to the log, set the configuration if this contains a configuration entry, and notify stateChanged.

Definition at line 2226 of file RaftConsensus.cc.

void LogCabin::Server::RaftConsensus::appendEntries ( std::unique_lock< Mutex > &  lockGuard,
Peer peer 
) [private]

Send an AppendEntries RPC to the server (either a heartbeat or containing an entry to replicate).

Parameters:
lockGuardUsed to temporarily release the lock while invoking the RPC, so as to allow for some concurrency.
peerState used in communicating with the follower, building the RPC request, and processing its result.

Definition at line 2249 of file RaftConsensus.cc.

void LogCabin::Server::RaftConsensus::installSnapshot ( std::unique_lock< Mutex > &  lockGuard,
Peer peer 
) [private]

Send an InstallSnapshot RPC to the server (containing part of a snapshot file to replicate).

Parameters:
lockGuardUsed to temporarily release the lock while invoking the RPC, so as to allow for some concurrency.
peerState used in communicating with the follower, building the RPC request, and processing its result.

Definition at line 2387 of file RaftConsensus.cc.

Transition to being a leader.

This is called when a candidate has received votes from a quorum.

Definition at line 2493 of file RaftConsensus.cc.

Remove the prefix of the log that is redundant with this server's snapshot.

Definition at line 2531 of file RaftConsensus.cc.

Return the term corresponding to log->getLastLogIndex().

This may come from the log, from the snapshot, or it may be 0.

Definition at line 2550 of file RaftConsensus.cc.

Notify the stateChanged condition variable and cancel all current RPCs.

This should be called when stepping down, starting a new election, becoming leader, or exiting.

Definition at line 2562 of file RaftConsensus.cc.

uint64_t LogCabin::Server::RaftConsensus::packEntries ( uint64_t  nextIndex,
Protocol::Raft::AppendEntries::Request &  request 
) const [private]

Helper for appendEntries() to put the right number of entries into the request.

Parameters:
nextIndexFirst entry to send to the follower.
requestAppendEntries request ProtoBuf in which to pack the entries.
Returns:
Number of entries in the request.

Definition at line 2571 of file RaftConsensus.cc.

Try to read the latest good snapshot from disk.

Loads the header of the snapshot file, which is used internally by the consensus module. The rest of the file reader is kept in snapshotReader for the state machine to process upon a future getNextEntry().

If the snapshot file on disk is no good, snapshotReader will remain NULL.

Definition at line 2635 of file RaftConsensus.cc.

std::pair< RaftConsensus::ClientResult, uint64_t > LogCabin::Server::RaftConsensus::replicateEntry ( Storage::Log::Entry entry,
std::unique_lock< Mutex > &  lockGuard 
) [private]

Append an entry to the log and wait for it to be committed.

Definition at line 2742 of file RaftConsensus.cc.

void LogCabin::Server::RaftConsensus::requestVote ( std::unique_lock< Mutex > &  lockGuard,
Peer peer 
) [private]

Send a RequestVote RPC to the server.

This is used by candidates to request a server's vote and by new leaders to retrieve information about the server's log.

Parameters:
lockGuardUsed to temporarily release the lock while invoking the RPC, so as to allow for some concurrency.
peerState used in communicating with the follower, building the RPC request, and processing its result.

Definition at line 2762 of file RaftConsensus.cc.

Dumps serverId, currentTerm, state, leaderId, and votedFor to the debug log.

This is intended to be easy to grep and parse.

Definition at line 2835 of file RaftConsensus.cc.

Set the timer to start a new election and notify stateChanged.

The timer is set for ELECTION_TIMEOUT plus some random jitter from now.

Definition at line 2822 of file RaftConsensus.cc.

Transitions to being a candidate from being a follower or candidate.

This is called when a timeout elapses. If the configuration is blank, it does nothing. Moreover, if this server forms a quorum (it is the only server in the configuration), this will immediately transition to leader.

Definition at line 2858 of file RaftConsensus.cc.

void LogCabin::Server::RaftConsensus::stepDown ( uint64_t  newTerm) [private]

Transition to being a follower.

This is called when we receive an RPC request with newer term, receive an RPC response indicating our term is stale, or discover a current leader while a candidate. In this last case, newTerm will be the same as currentTerm. This will call setElectionTimer for you if no election timer is currently set.

Definition at line 2907 of file RaftConsensus.cc.

Persist critical state, such as the term and the vote, to stable storage.

Definition at line 2955 of file RaftConsensus.cc.

bool LogCabin::Server::RaftConsensus::upToDateLeader ( std::unique_lock< Mutex > &  lockGuard) const [private]

Return true if every entry that might have already been marked committed on any leader is marked committed on this leader by the time this call returns.

This is used to provide non-stale read operations to clients. It gives up after ELECTION_TIMEOUT, since stepDownThread will return to the follower state after that time.

Definition at line 2965 of file RaftConsensus.cc.


Friends And Related Function Documentation

friend class RaftConsensusInternal::LocalServer [friend]

Definition at line 1717 of file RaftConsensus.h.

friend class RaftConsensusInternal::Peer [friend]

Definition at line 1718 of file RaftConsensus.h.

friend class RaftConsensusInternal::Invariants [friend]

Definition at line 1719 of file RaftConsensus.h.

std::ostream& operator<< ( std::ostream &  os,
const RaftConsensus raft 
) [friend]

Print out the contents of this class for debugging purposes.

Definition at line 1905 of file RaftConsensus.cc.

std::ostream& operator<< ( std::ostream &  os,
RaftConsensus::ClientResult  clientResult 
) [friend]

Print out a ClientResult for debugging purposes.

Definition at line 2998 of file RaftConsensus.cc.

std::ostream& operator<< ( std::ostream &  os,
RaftConsensus::State  state 
) [friend]

Print out a State for debugging purposes.

Definition at line 3019 of file RaftConsensus.cc.


Member Data Documentation

const std::chrono::nanoseconds LogCabin::Server::RaftConsensus::ELECTION_TIMEOUT [private]

A follower waits for about this much inactivity before becoming a candidate and starting a new election.

Definition at line 1418 of file RaftConsensus.h.

const std::chrono::nanoseconds LogCabin::Server::RaftConsensus::HEARTBEAT_PERIOD [private]

A leader sends RPCs at least this often, even if there is no data to send.

Definition at line 1424 of file RaftConsensus.h.

A leader will pack at most this many entries into an AppendEntries request message.

This helps bound processing time when entries are very small in size. Const except for unit tests.

Definition at line 1432 of file RaftConsensus.h.

const std::chrono::nanoseconds LogCabin::Server::RaftConsensus::RPC_FAILURE_BACKOFF [private]

A candidate or leader waits this long after an RPC fails before sending another one, so as to not overwhelm the network with retries.

Definition at line 1438 of file RaftConsensus.h.

const std::chrono::nanoseconds LogCabin::Server::RaftConsensus::STATE_MACHINE_UPDATER_BACKOFF [private]

How long the state machine updater thread should sleep if:

  • The servers do not currently support a common version, or
  • This server has not yet received version information from all other servers, or
  • An advance state machine entry failed to commit (probably due to lost leadership).

Definition at line 1448 of file RaftConsensus.h.

Prefer to keep RPC requests under this size.

Const except for unit tests.

Definition at line 1454 of file RaftConsensus.h.

This server's unique ID.

Not available until init() is called.

Definition at line 1460 of file RaftConsensus.h.

The addresses that this server is listening on.

Not available until init() is called.

Definition at line 1466 of file RaftConsensus.h.

The LogCabin daemon's top-level objects.

Definition at line 1473 of file RaftConsensus.h.

Where the files for the log and snapshots are stored.

Definition at line 1478 of file RaftConsensus.h.

Used to create new sessions.

Definition at line 1483 of file RaftConsensus.h.

This class behaves mostly like a monitor.

This protects all the state in this class and almost all of the Peer class (with some documented exceptions).

Definition at line 1490 of file RaftConsensus.h.

Notified when basically anything changes.

Specifically, this is notified when any of the following events occur:

  • term changes.
  • state changes.
  • log changes.
  • commitIndex changes.
  • exiting is set.
  • numPeerThreads is decremented.
  • configuration changes.
  • startElectionAt changes (see note under startElectionAt).
  • an acknowledgement from a peer is received.
  • a server goes from not caught up to caught up.
  • a heartbeat is scheduled. TODO(ongaro): Should there be multiple condition variables? This one is used by a lot of threads for a lot of different conditions.

Definition at line 1509 of file RaftConsensus.h.

Set to true when this class is about to be destroyed.

When this is true, threads must exit right away and no more RPCs should be sent or processed.

Definition at line 1516 of file RaftConsensus.h.

The number of Peer::thread threads that are still using this RaftConsensus object.

When they exit, they decrement this and notify stateChanged.

Definition at line 1523 of file RaftConsensus.h.

Provides all storage for this server.

Keeps track of all log entries and some additional metadata.

If you modify this, be sure to keep configurationManager consistent.

Definition at line 1531 of file RaftConsensus.h.

Flag to indicate that leaderDiskThreadMain should flush recent log writes to stable storage.

This is always false for followers and candidates and is only used for leaders.

When a server steps down, it waits for all syncs to complete, that way followers can assume that all of their log entries are durable when replying to leaders.

Definition at line 1542 of file RaftConsensus.h.

Used for stepDown() to wait on leaderDiskThread without releasing mutex.

This is true while leaderDiskThread is writing to disk. It's set to true while holding mutex; set to false without mutex.

Definition at line 1549 of file RaftConsensus.h.

Defines the servers that are part of the cluster.

See Configuration.

Definition at line 1554 of file RaftConsensus.h.

Ensures that 'configuration' reflects the latest state of the log and snapshot.

Definition at line 1560 of file RaftConsensus.h.

The latest term this server has seen.

This value monotonically increases over time. It gets updated in stepDown(), startNewElection(), and when a candidate receives a vote response with a newer term.

Warning:
After setting this value, you must call updateLogMetadata() to persist it.

Definition at line 1570 of file RaftConsensus.h.

The server's current role in the cluster (follower, candidate, or leader).

See State.

Definition at line 1576 of file RaftConsensus.h.

The latest good snapshot covers entries 1 through 'lastSnapshotIndex' (inclusive).

It is known that these are committed. They are safe to remove from the log, but it may be advantageous to keep them around for a little while (to avoid shipping snapshots to straggling followers). Thus, the log may or may not have some of the entries in this range.

Definition at line 1585 of file RaftConsensus.h.

The term of the last entry covered by the latest good snapshot, or 0 if we have no snapshot.

Definition at line 1591 of file RaftConsensus.h.

The cluster time of the last entry covered by the latest good snapshot, or 0 if we have no snapshot.

Definition at line 1597 of file RaftConsensus.h.

The size of the latest good snapshot in bytes, or 0 if we have no snapshot.

Definition at line 1603 of file RaftConsensus.h.

If not NULL, this is a Storage::SnapshotFile::Reader that covers up through lastSnapshotIndex.

This is ready for the state machine to process and is returned to the state machine in getNextEntry(). It's just a cache which can be repopulated with readSnapshot().

Definition at line 1611 of file RaftConsensus.h.

This is used in handleInstallSnapshot when receiving a snapshot from the current leader.

The leader is assumed to send at most one snapshot at a time, and any partial snapshots here are discarded when the term changes.

Definition at line 1619 of file RaftConsensus.h.

The largest entry ID for which a quorum is known to have stored the same entry as this server has.

Entries 1 through commitIndex as stored in this server's log are guaranteed to never change. This value will monotonically increase over time.

Definition at line 1627 of file RaftConsensus.h.

The server ID of the leader for this term.

This is used to help point clients to the right server. The special value 0 means either there is no leader for this term yet or this server does not know who it is yet.

Definition at line 1634 of file RaftConsensus.h.

The server ID that this server voted for during this term's election, if any.

The special value 0 means no vote has been given out during this term.

Warning:
After setting this value, you must call updateLogMetadata() to persist it.

Definition at line 1644 of file RaftConsensus.h.

uint64_t LogCabin::Server::RaftConsensus::currentEpoch [mutable, private]

A logical clock used to confirm leadership and connectivity.

Definition at line 1650 of file RaftConsensus.h.

Tracks the passage of "cluster time".

See ClusterClock.

Definition at line 1655 of file RaftConsensus.h.

The earliest time at which timerThread should begin a new election with startNewElection().

It is safe for increases to startElectionAt to not notify the condition variable. Decreases to this value, however, must notify the condition variable to make sure the timerThread gets woken in a timely manner. Unfortunately, startElectionAt does not monotonically increase because of the random jitter that is applied to the follower timeout, and it would reduce the jitter's effectiveness for the thread to wait as long as the largest startElectionAt value.

Definition at line 1669 of file RaftConsensus.h.

The earliest time at which RequestVote messages should be processed.

Until this time, they are rejected, as processing them risks causing the cluster leader to needlessly step down. For more motivation, see the "disruptive servers" issue in membership changes described in the Raft paper/thesis.

This is set to the current time + an election timeout when a heartbeat is received, and it's set to infinity for leaders (who begin processing RequestVote messages again immediately when they step down).

Definition at line 1682 of file RaftConsensus.h.

The total number of entries ever truncated from the end of the log.

This happens only when a new leader tells this server to remove extraneous uncommitted entries from its log.

Definition at line 1689 of file RaftConsensus.h.

The thread that executes leaderDiskThreadMain() to flush log entries to stable storage in the background on leaders.

Definition at line 1695 of file RaftConsensus.h.

The thread that executes timerThreadMain() to begin new elections after periods of inactivity.

Definition at line 1701 of file RaftConsensus.h.

The thread that executes stateMachineUpdaterThreadMain() to append advance state machine version entries to the log on leaders.

Definition at line 1707 of file RaftConsensus.h.

The thread that executes stepDownThreadMain() to return to the follower state if the leader becomes disconnected from a quorum of servers.

Definition at line 1713 of file RaftConsensus.h.

Definition at line 1715 of file RaftConsensus.h.


The documentation for this class was generated from the following files:
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Defines