An implementation of the Raft consensus algorithm. More...

#include <RaftConsensus.h>

Classes
struct	Entry
	This is returned by getNextEntry(). More...
Public Types
enum	ClientResult { SUCCESS, FAIL, RETRY, NOT_LEADER }
typedef RaftConsensusInternal::Invariants	Invariants
typedef RaftConsensusInternal::Server	Server
typedef RaftConsensusInternal::LocalServer	LocalServer
typedef RaftConsensusInternal::Peer	Peer
typedef RaftConsensusInternal::Configuration	Configuration
typedef RaftConsensusInternal::ConfigurationManager	ConfigurationManager
typedef RaftConsensusInternal::ClusterClock	ClusterClock
typedef RaftConsensusInternal::Mutex	Mutex
typedef RaftConsensusInternal::Clock	Clock
typedef RaftConsensusInternal::TimePoint	TimePoint
Public Member Functions
	RaftConsensus (Globals &globals)
	Constructor.
	~RaftConsensus ()
	Destructor.
void	init ()
	Initialize. Must be called before any other method.
void	exit ()
	Signal the consensus module to exit (shut down threads, etc).
void	bootstrapConfiguration ()
	Initialize the log with a configuration consisting of just this server.
ClientResult	getConfiguration (Protocol::Raft::SimpleConfiguration &configuration, uint64_t &id) const
	Get the current leader's active, committed, simple cluster configuration.
std::pair< ClientResult, uint64_t >	getLastCommitIndex () const
	Return the most recent entry ID that has been externalized by the replicated log.
std::string	getLeaderHint () const
	Return the network address for a recent leader, if known, or empty string otherwise.
Entry	getNextEntry (uint64_t lastIndex) const
	This returns the entry following lastIndex in the replicated log.
SnapshotStats::SnapshotStats	getSnapshotStats () const
	Return statistics that may be useful in deciding when to snapshot.
void	handleAppendEntries (const Protocol::Raft::AppendEntries::Request &request, Protocol::Raft::AppendEntries::Response &response)
	Process an AppendEntries RPC from another server.
void	handleInstallSnapshot (const Protocol::Raft::InstallSnapshot::Request &request, Protocol::Raft::InstallSnapshot::Response &response)
	Process an InstallSnapshot RPC from another server.
void	handleRequestVote (const Protocol::Raft::RequestVote::Request &request, Protocol::Raft::RequestVote::Response &response)
	Process a RequestVote RPC from another server.
std::pair< ClientResult, uint64_t >	replicate (const Core::Buffer &operation)
	Submit an operation to the replicated log.
ClientResult	setConfiguration (const Protocol::Client::SetConfiguration::Request &request, Protocol::Client::SetConfiguration::Response &response)
	Change the cluster's configuration.
void	setSupportedStateMachineVersions (uint16_t minSupported, uint16_t maxSupported)
	Register which versions of client commands/behavior the local state machine supports.
std::unique_ptr < Storage::SnapshotFile::Writer >	beginSnapshot (uint64_t lastIncludedIndex)
	Start taking a snapshot.
void	snapshotDone (uint64_t lastIncludedIndex, std::unique_ptr< Storage::SnapshotFile::Writer > writer)
	Complete taking a snapshot for the log entries in range [1, lastIncludedIndex].
void	updateServerStats (Protocol::ServerStats &serverStats) const
	Add information about the consensus state to the given structure.
Public Attributes
uint64_t	serverId
	This server's unique ID.
std::string	serverAddresses
	The addresses that this server is listening on.
Private Types
enum	State { FOLLOWER, CANDIDATE, LEADER }
	See state. More...
Private Member Functions
void	leaderDiskThreadMain ()
	Flush log entries to stable storage in the background on leaders.
void	timerThreadMain ()
	Start new elections when it's time to do so.
void	peerThreadMain (std::shared_ptr< Peer > peer)
	Initiate RPCs to a specific server as necessary.
void	stateMachineUpdaterThreadMain ()
	Append advance state machine version entries to the log as leader once all servers can support a new state machine version.
void	stepDownThreadMain ()
	Return to follower state when, as leader, this server is not able to communicate with a quorum.
void	advanceCommitIndex ()
	Move forward commitIndex if possible.
void	append (const std::vector< const Storage::Log::Entry * > &entries)
	Append entries to the log, set the configuration if this contains a configuration entry, and notify stateChanged.
void	appendEntries (std::unique_lock< Mutex > &lockGuard, Peer &peer)
	Send an AppendEntries RPC to the server (either a heartbeat or containing an entry to replicate).
void	installSnapshot (std::unique_lock< Mutex > &lockGuard, Peer &peer)
	Send an InstallSnapshot RPC to the server (containing part of a snapshot file to replicate).
void	becomeLeader ()
	Transition to being a leader.
void	discardUnneededEntries ()
	Remove the prefix of the log that is redundant with this server's snapshot.
uint64_t	getLastLogTerm () const
	Return the term corresponding to log->getLastLogIndex().
void	interruptAll ()
	Notify the stateChanged condition variable and cancel all current RPCs.
uint64_t	packEntries (uint64_t nextIndex, Protocol::Raft::AppendEntries::Request &request) const
	Helper for appendEntries() to put the right number of entries into the request.
void	readSnapshot ()
	Try to read the latest good snapshot from disk.
std::pair< ClientResult, uint64_t >	replicateEntry (Storage::Log::Entry &entry, std::unique_lock< Mutex > &lockGuard)
	Append an entry to the log and wait for it to be committed.
void	requestVote (std::unique_lock< Mutex > &lockGuard, Peer &peer)
	Send a RequestVote RPC to the server.
void	printElectionState () const
	Dumps serverId, currentTerm, state, leaderId, and votedFor to the debug log.
void	setElectionTimer ()
	Set the timer to start a new election and notify stateChanged.
void	startNewElection ()
	Transitions to being a candidate from being a follower or candidate.
void	stepDown (uint64_t newTerm)
	Transition to being a follower.
void	updateLogMetadata ()
	Persist critical state, such as the term and the vote, to stable storage.
bool	upToDateLeader (std::unique_lock< Mutex > &lockGuard) const
	Return true if every entry that might have already been marked committed on any leader is marked committed on this leader by the time this call returns.
Private Attributes
const std::chrono::nanoseconds	ELECTION_TIMEOUT
	A follower waits for about this much inactivity before becoming a candidate and starting a new election.
const std::chrono::nanoseconds	HEARTBEAT_PERIOD
	A leader sends RPCs at least this often, even if there is no data to send.
uint64_t	MAX_LOG_ENTRIES_PER_REQUEST
	A leader will pack at most this many entries into an AppendEntries request message.
const std::chrono::nanoseconds	RPC_FAILURE_BACKOFF
	A candidate or leader waits this long after an RPC fails before sending another one, so as to not overwhelm the network with retries.
const std::chrono::nanoseconds	STATE_MACHINE_UPDATER_BACKOFF
	How long the state machine updater thread should sleep if:
uint64_t	SOFT_RPC_SIZE_LIMIT
	Prefer to keep RPC requests under this size.
Globals &	globals
	The LogCabin daemon's top-level objects.
Storage::Layout	storageLayout
	Where the files for the log and snapshots are stored.
Client::SessionManager	sessionManager
	Used to create new sessions.
Mutex	mutex
	This class behaves mostly like a monitor.
Core::ConditionVariable	stateChanged
	Notified when basically anything changes.
bool	exiting
	Set to true when this class is about to be destroyed.
uint32_t	numPeerThreads
	The number of Peer::thread threads that are still using this RaftConsensus object.
std::unique_ptr< Storage::Log >	log
	Provides all storage for this server.
bool	logSyncQueued
	Flag to indicate that leaderDiskThreadMain should flush recent log writes to stable storage.
std::atomic< bool >	leaderDiskThreadWorking
	Used for stepDown() to wait on leaderDiskThread without releasing mutex.
std::unique_ptr< Configuration >	configuration
	Defines the servers that are part of the cluster.
std::unique_ptr < ConfigurationManager >	configurationManager
	Ensures that 'configuration' reflects the latest state of the log and snapshot.
uint64_t	currentTerm
	The latest term this server has seen.
State	state
	The server's current role in the cluster (follower, candidate, or leader).
uint64_t	lastSnapshotIndex
	The latest good snapshot covers entries 1 through 'lastSnapshotIndex' (inclusive).
uint64_t	lastSnapshotTerm
	The term of the last entry covered by the latest good snapshot, or 0 if we have no snapshot.
uint64_t	lastSnapshotClusterTime
	The cluster time of the last entry covered by the latest good snapshot, or 0 if we have no snapshot.
uint64_t	lastSnapshotBytes
	The size of the latest good snapshot in bytes, or 0 if we have no snapshot.
std::unique_ptr < Storage::SnapshotFile::Reader >	snapshotReader
	If not NULL, this is a Storage::SnapshotFile::Reader that covers up through lastSnapshotIndex.
std::unique_ptr < Storage::SnapshotFile::Writer >	snapshotWriter
	This is used in handleInstallSnapshot when receiving a snapshot from the current leader.
uint64_t	commitIndex
	The largest entry ID for which a quorum is known to have stored the same entry as this server has.
uint64_t	leaderId
	The server ID of the leader for this term.
uint64_t	votedFor
	The server ID that this server voted for during this term's election, if any.
uint64_t	currentEpoch
	A logical clock used to confirm leadership and connectivity.
ClusterClock	clusterClock
	Tracks the passage of "cluster time".
TimePoint	startElectionAt
	The earliest time at which timerThread should begin a new election with startNewElection().
TimePoint	withholdVotesUntil
	The earliest time at which RequestVote messages should be processed.
uint64_t	numEntriesTruncated
	The total number of entries ever truncated from the end of the log.
std::thread	leaderDiskThread
	The thread that executes leaderDiskThreadMain() to flush log entries to stable storage in the background on leaders.
std::thread	timerThread
	The thread that executes timerThreadMain() to begin new elections after periods of inactivity.
std::thread	stateMachineUpdaterThread
	The thread that executes stateMachineUpdaterThreadMain() to append advance state machine version entries to the log on leaders.
std::thread	stepDownThread
	The thread that executes stepDownThreadMain() to return to the follower state if the leader becomes disconnected from a quorum of servers.
Invariants	invariants
Friends
class	RaftConsensusInternal::LocalServer
class	RaftConsensusInternal::Peer
class	RaftConsensusInternal::Invariants
std::ostream &	operator<< (std::ostream &os, const RaftConsensus &raft)
	Print out the contents of this class for debugging purposes.
std::ostream &	operator<< (std::ostream &os, ClientResult clientResult)
	Print out a ClientResult for debugging purposes.
std::ostream &	operator<< (std::ostream &os, State state)
	Print out a State for debugging purposes.

Detailed Description

An implementation of the Raft consensus algorithm.

The algorithm is described at https://raftconsensus.github.io . In brief, Raft divides time into terms and elects a leader at the beginning of each term. This election mechanism guarantees that the emerging leader has at least all committed log entries. Once a candidate has received votes from a quorum, it replicates its own log entries in order to the followers. The leader is the only machine that serves client requests.

Definition at line 883 of file RaftConsensus.h.

Member Typedef Documentation

typedef RaftConsensusInternal::Invariants LogCabin::Server::RaftConsensus::Invariants

Definition at line 885 of file RaftConsensus.h.

typedef RaftConsensusInternal::Server LogCabin::Server::RaftConsensus::Server

Definition at line 886 of file RaftConsensus.h.

typedef RaftConsensusInternal::LocalServer LogCabin::Server::RaftConsensus::LocalServer

Definition at line 887 of file RaftConsensus.h.

typedef RaftConsensusInternal::Peer LogCabin::Server::RaftConsensus::Peer

Definition at line 888 of file RaftConsensus.h.

typedef RaftConsensusInternal::Configuration LogCabin::Server::RaftConsensus::Configuration

Definition at line 889 of file RaftConsensus.h.

typedef RaftConsensusInternal::ConfigurationManager LogCabin::Server::RaftConsensus::ConfigurationManager

Definition at line 890 of file RaftConsensus.h.

typedef RaftConsensusInternal::ClusterClock LogCabin::Server::RaftConsensus::ClusterClock

Definition at line 891 of file RaftConsensus.h.

typedef RaftConsensusInternal::Mutex LogCabin::Server::RaftConsensus::Mutex

Definition at line 892 of file RaftConsensus.h.

typedef RaftConsensusInternal::Clock LogCabin::Server::RaftConsensus::Clock

Definition at line 893 of file RaftConsensus.h.

typedef RaftConsensusInternal::TimePoint LogCabin::Server::RaftConsensus::TimePoint

Definition at line 894 of file RaftConsensus.h.

Member Enumeration Documentation

enum LogCabin::Server::RaftConsensus::ClientResult

Enumerator:

SUCCESS	Request completed successfully.
FAIL	Returned by setConfiguration() if the configuration could not be set because the previous configuration was unsuitable or because the new servers could not be caught up.
RETRY	Returned by getConfiguration() if the configuration is not stable or is not committed. The client should wait and retry later.
NOT_LEADER	Cannot process the request because this server is not leader or temporarily lost its leadership.

Definition at line 964 of file RaftConsensus.h.

enum LogCabin::Server::RaftConsensus::State [private]

See state.

Enumerator:

FOLLOWER

A follower does not initiate RPCs.

It becomes a candidate with startNewElection() when a timeout elapses without hearing from a candidate/leader. This is the initial state for servers when they start up.

CANDIDATE

A candidate sends RequestVote RPCs in an attempt to become a leader.

It steps down to be a follower if it discovers a current leader, and it becomes leader if it collects votes from a quorum.

LEADER

A leader sends AppendEntries RPCs to replicate its log onto followers.

It also sends heartbeats periodically during periods of inactivity to delay its followers from becoming candidates. It steps down to be a follower if it discovers a server with a higher term, if it can't communicate with a quorum, or if it is not part of the latest committed configuration.

Definition at line 1169 of file RaftConsensus.h.

Constructor & Destructor Documentation

LogCabin::Server::RaftConsensus::RaftConsensus ( Globals & globals ) [explicit]

Constructor.

Parameters:

globals Handle to LogCabin's top-level objects.

Definition at line 933 of file RaftConsensus.cc.

LogCabin::Server::RaftConsensus::~RaftConsensus ( )

Destructor.

Definition at line 1002 of file RaftConsensus.cc.

Member Function Documentation

void LogCabin::Server::RaftConsensus::init ( )

Initialize. Must be called before any other method.

Definition at line 1032 of file RaftConsensus.cc.

void LogCabin::Server::RaftConsensus::exit ( )

Signal the consensus module to exit (shut down threads, etc).

Definition at line 1120 of file RaftConsensus.cc.

void LogCabin::Server::RaftConsensus::bootstrapConfiguration ( )

Initialize the log with a configuration consisting of just this server.

This should be called just once the very first time the very first server in your cluster is started. PANICs if any log entries or snapshots already exist.

Definition at line 1131 of file RaftConsensus.cc.

RaftConsensus::ClientResult LogCabin::Server::RaftConsensus::getConfiguration	(	Protocol::Raft::SimpleConfiguration &	configuration,
		uint64_t &	id
	)		const

Get the current leader's active, committed, simple cluster configuration.

Definition at line 1159 of file RaftConsensus.cc.

std::pair< RaftConsensus::ClientResult, uint64_t > LogCabin::Server::RaftConsensus::getLastCommitIndex ( ) const

Return the most recent entry ID that has been externalized by the replicated log.

This is used to provide non-stale reads to the state machine.

Definition at line 1176 of file RaftConsensus.cc.

std::string LogCabin::Server::RaftConsensus::getLeaderHint ( ) const

Return the network address for a recent leader, if known, or empty string otherwise.

Definition at line 1186 of file RaftConsensus.cc.

RaftConsensus::Entry LogCabin::Server::RaftConsensus::getNextEntry ( uint64_t lastIndex ) const

This returns the entry following lastIndex in the replicated log.

Some entries may be used internally by the consensus module. These will have Entry.hasData set to false. The reason these are exposed to the state machine is that the state machine waits to be caught up to the latest committed entry in the replicated log sometimes, but if that entry was for internal use, it would would otherwise never reach the state machine.

Exceptions:

Core::Util::ThreadInterruptedException Thread should exit.

Definition at line 1193 of file RaftConsensus.cc.

SnapshotStats::SnapshotStats LogCabin::Server::RaftConsensus::getSnapshotStats ( ) const

Return statistics that may be useful in deciding when to snapshot.

Definition at line 1248 of file RaftConsensus.cc.

void LogCabin::Server::RaftConsensus::handleAppendEntries	(	const Protocol::Raft::AppendEntries::Request &	request,
		Protocol::Raft::AppendEntries::Response &	response
	)

Process an AppendEntries RPC from another server.

Called by RaftService.

Parameters:

[in]	request	The request that was received from the other server.
[out]	response	Where the reply should be placed.

Definition at line 1263 of file RaftConsensus.cc.

void LogCabin::Server::RaftConsensus::handleInstallSnapshot	(	const Protocol::Raft::InstallSnapshot::Request &	request,
		Protocol::Raft::InstallSnapshot::Response &	response
	)

Process an InstallSnapshot RPC from another server.

Called by RaftService.

Parameters:

[in]	request	The request that was received from the other server.
[out]	response	Where the reply should be placed.

Definition at line 1430 of file RaftConsensus.cc.

void LogCabin::Server::RaftConsensus::handleRequestVote	(	const Protocol::Raft::RequestVote::Request &	request,
		Protocol::Raft::RequestVote::Response &	response
	)

Process a RequestVote RPC from another server.

Called by RaftService.

Parameters:

[in]	request	The request that was received from the other server.
[out]	response	Where the reply should be placed.

Definition at line 1526 of file RaftConsensus.cc.

std::pair< RaftConsensus::ClientResult, uint64_t > LogCabin::Server::RaftConsensus::replicate ( const Core::Buffer & operation )

Submit an operation to the replicated log.

Parameters:

operation If the cluster accepts this operation, then it will be added to the log and the state machine will eventually apply it.

Returns:: First component is status code. If SUCCESS, second component is the log index at which the entry has been committed to the replicated log.

Definition at line 1585 of file RaftConsensus.cc.

RaftConsensus::ClientResult LogCabin::Server::RaftConsensus::setConfiguration	(	const Protocol::Client::SetConfiguration::Request &	request,
		Protocol::Client::SetConfiguration::Response &	response
	)

Change the cluster's configuration.

Returns successfully once operation completed and old servers are no longer needed.

Returns:: NOT_LEADER, or other code with response filled in.

Definition at line 1595 of file RaftConsensus.cc.

void LogCabin::Server::RaftConsensus::setSupportedStateMachineVersions	(	uint16_t	minSupported,
		uint16_t	maxSupported
	)

Register which versions of client commands/behavior the local state machine supports.

Invoked just once on boot (though calling this multiple times is safe). This information is used to support upgrades to the running replicated state machine version, and it is transmitted to other servers as needed. See stateMachineUpdaterThreadMain.

Parameters:

minSupported	The smallest version the local state machine can support.
maxSupported	The largest version the local state machine can support.

Definition at line 1729 of file RaftConsensus.cc.

std::unique_ptr< Storage::SnapshotFile::Writer > LogCabin::Server::RaftConsensus::beginSnapshot ( uint64_t lastIncludedIndex )

Start taking a snapshot.

Called by the state machine when it wants to take a snapshot.

Parameters:

lastIncludedIndex The snapshot will cover log entries in the range [1, lastIncludedIndex]. lastIncludedIndex must be committed (must have been previously returned by getNextEntry()).

Returns:: A file the state machine can dump its snapshot into.

Definition at line 1746 of file RaftConsensus.cc.

void LogCabin::Server::RaftConsensus::snapshotDone	(	uint64_t	lastIncludedIndex,
		std::unique_ptr< Storage::SnapshotFile::Writer >	writer
	)

Complete taking a snapshot for the log entries in range [1, lastIncludedIndex].

Called by the state machine when it is done taking a snapshot.

Parameters:

lastIncludedIndex	The snapshot will cover log entries in the range [1, lastIncludedIndex].
writer	A writer that has not yet been saved: the consensus module may have to discard the snapshot in case it's gotten a better snapshot from another server. If this snapshot is to be saved (normal case), the consensus module will call save() on it.

Definition at line 1814 of file RaftConsensus.cc.

void LogCabin::Server::RaftConsensus::updateServerStats ( Protocol::ServerStats & serverStats ) const

Add information about the consensus state to the given structure.

Definition at line 1865 of file RaftConsensus.cc.

void LogCabin::Server::RaftConsensus::leaderDiskThreadMain ( ) [private]

Flush log entries to stable storage in the background on leaders.

Once they're flushed, it tries to advance the commitIndex. This is the method that leaderDiskThread executes.

Definition at line 2025 of file RaftConsensus.cc.

void LogCabin::Server::RaftConsensus::timerThreadMain ( ) [private]

Start new elections when it's time to do so.

This is the method that timerThread executes.

Definition at line 2057 of file RaftConsensus.cc.

void LogCabin::Server::RaftConsensus::peerThreadMain ( std::shared_ptr< Peer > peer ) [private]

Initiate RPCs to a specific server as necessary.

One thread for each remote server calls this method (see Peer::thread).

Definition at line 2069 of file RaftConsensus.cc.

void LogCabin::Server::RaftConsensus::stateMachineUpdaterThreadMain ( ) [private]

Append advance state machine version entries to the log as leader once all servers can support a new state machine version.

Definition at line 1941 of file RaftConsensus.cc.

void LogCabin::Server::RaftConsensus::stepDownThreadMain ( ) [private]

Return to follower state when, as leader, this server is not able to communicate with a quorum.

This helps two things in cases where a quorum is not available to this leader but clients can still communicate with the leader. First, it returns to clients in a timely manner so that they can try to find another current leader, if one exists. Second, it frees up the resources associated with those client's RPCs on the server. This is the method that stepDownThread executes.

Definition at line 2123 of file RaftConsensus.cc.

void LogCabin::Server::RaftConsensus::advanceCommitIndex ( ) [private]

Move forward commitIndex if possible.

Called only on leaders after receiving RPC responses and flushing entries to disk. If commitIndex changes, this will notify stateChanged. It will also change the configuration or step down due to a configuration change when appropriate.

commitIndex can jump by more than 1 on new leaders, since their commitIndex may be well out of date until they figure out which log entries their followers have.

Precondition:: state is LEADER.

Definition at line 2174 of file RaftConsensus.cc.

void LogCabin::Server::RaftConsensus::append ( const std::vector< const Storage::Log::Entry * > & entries ) [private]

Append entries to the log, set the configuration if this contains a configuration entry, and notify stateChanged.

Definition at line 2226 of file RaftConsensus.cc.

void LogCabin::Server::RaftConsensus::appendEntries	(	std::unique_lock< Mutex > &	lockGuard,
		Peer &	peer
	)		`[private]`

Send an AppendEntries RPC to the server (either a heartbeat or containing an entry to replicate).

Parameters:

lockGuard	Used to temporarily release the lock while invoking the RPC, so as to allow for some concurrency.
peer	State used in communicating with the follower, building the RPC request, and processing its result.

Definition at line 2249 of file RaftConsensus.cc.

void LogCabin::Server::RaftConsensus::installSnapshot	(	std::unique_lock< Mutex > &	lockGuard,
		Peer &	peer
	)		`[private]`

Send an InstallSnapshot RPC to the server (containing part of a snapshot file to replicate).

Parameters:

lockGuard	Used to temporarily release the lock while invoking the RPC, so as to allow for some concurrency.
peer	State used in communicating with the follower, building the RPC request, and processing its result.

Definition at line 2387 of file RaftConsensus.cc.

void LogCabin::Server::RaftConsensus::becomeLeader ( ) [private]

Transition to being a leader.

This is called when a candidate has received votes from a quorum.

Definition at line 2493 of file RaftConsensus.cc.

void LogCabin::Server::RaftConsensus::discardUnneededEntries ( ) [private]

Remove the prefix of the log that is redundant with this server's snapshot.

Definition at line 2531 of file RaftConsensus.cc.

uint64_t LogCabin::Server::RaftConsensus::getLastLogTerm ( ) const [private]

Return the term corresponding to log->getLastLogIndex().

This may come from the log, from the snapshot, or it may be 0.

Definition at line 2550 of file RaftConsensus.cc.

void LogCabin::Server::RaftConsensus::interruptAll ( ) [private]

Notify the stateChanged condition variable and cancel all current RPCs.

This should be called when stepping down, starting a new election, becoming leader, or exiting.

Definition at line 2562 of file RaftConsensus.cc.

uint64_t LogCabin::Server::RaftConsensus::packEntries	(	uint64_t	nextIndex,
		Protocol::Raft::AppendEntries::Request &	request
	)		const `[private]`

Helper for appendEntries() to put the right number of entries into the request.

Parameters:

nextIndex	First entry to send to the follower.
request	AppendEntries request ProtoBuf in which to pack the entries.

Returns:: Number of entries in the request.

Definition at line 2571 of file RaftConsensus.cc.

void LogCabin::Server::RaftConsensus::readSnapshot ( ) [private]

Try to read the latest good snapshot from disk.

Loads the header of the snapshot file, which is used internally by the consensus module. The rest of the file reader is kept in snapshotReader for the state machine to process upon a future getNextEntry().

If the snapshot file on disk is no good, snapshotReader will remain NULL.

Definition at line 2635 of file RaftConsensus.cc.

std::pair< RaftConsensus::ClientResult, uint64_t > LogCabin::Server::RaftConsensus::replicateEntry	(	Storage::Log::Entry &	entry,
		std::unique_lock< Mutex > &	lockGuard
	)		`[private]`

Append an entry to the log and wait for it to be committed.

Definition at line 2742 of file RaftConsensus.cc.

void LogCabin::Server::RaftConsensus::requestVote	(	std::unique_lock< Mutex > &	lockGuard,
		Peer &	peer
	)		`[private]`

Send a RequestVote RPC to the server.

This is used by candidates to request a server's vote and by new leaders to retrieve information about the server's log.

Parameters:

lockGuard	Used to temporarily release the lock while invoking the RPC, so as to allow for some concurrency.
peer	State used in communicating with the follower, building the RPC request, and processing its result.

Definition at line 2762 of file RaftConsensus.cc.

void LogCabin::Server::RaftConsensus::printElectionState ( ) const [private]

Dumps serverId, currentTerm, state, leaderId, and votedFor to the debug log.

This is intended to be easy to grep and parse.

Definition at line 2835 of file RaftConsensus.cc.

void LogCabin::Server::RaftConsensus::setElectionTimer ( ) [private]

Set the timer to start a new election and notify stateChanged.

The timer is set for ELECTION_TIMEOUT plus some random jitter from now.

Definition at line 2822 of file RaftConsensus.cc.

void LogCabin::Server::RaftConsensus::startNewElection ( ) [private]

Transitions to being a candidate from being a follower or candidate.

This is called when a timeout elapses. If the configuration is blank, it does nothing. Moreover, if this server forms a quorum (it is the only server in the configuration), this will immediately transition to leader.

Definition at line 2858 of file RaftConsensus.cc.

void LogCabin::Server::RaftConsensus::stepDown ( uint64_t newTerm ) [private]

Transition to being a follower.

This is called when we receive an RPC request with newer term, receive an RPC response indicating our term is stale, or discover a current leader while a candidate. In this last case, newTerm will be the same as currentTerm. This will call setElectionTimer for you if no election timer is currently set.

Definition at line 2907 of file RaftConsensus.cc.

void LogCabin::Server::RaftConsensus::updateLogMetadata ( ) [private]

Persist critical state, such as the term and the vote, to stable storage.

Definition at line 2955 of file RaftConsensus.cc.

bool LogCabin::Server::RaftConsensus::upToDateLeader ( std::unique_lock< Mutex > & lockGuard ) const [private]

Return true if every entry that might have already been marked committed on any leader is marked committed on this leader by the time this call returns.

This is used to provide non-stale read operations to clients. It gives up after ELECTION_TIMEOUT, since stepDownThread will return to the follower state after that time.

Definition at line 2965 of file RaftConsensus.cc.

Friends And Related Function Documentation

friend class RaftConsensusInternal::LocalServer [friend]

Definition at line 1717 of file RaftConsensus.h.

friend class RaftConsensusInternal::Peer [friend]

Definition at line 1718 of file RaftConsensus.h.

friend class RaftConsensusInternal::Invariants [friend]

Definition at line 1719 of file RaftConsensus.h.

std::ostream& operator<<	(	std::ostream &	os,
		const RaftConsensus &	raft
	)		`[friend]`

Print out the contents of this class for debugging purposes.

Definition at line 1905 of file RaftConsensus.cc.

std::ostream& operator<<	(	std::ostream &	os,
		RaftConsensus::ClientResult	clientResult
	)		`[friend]`

Print out a ClientResult for debugging purposes.

Definition at line 2998 of file RaftConsensus.cc.

std::ostream& operator<<	(	std::ostream &	os,
		RaftConsensus::State	state
	)		`[friend]`

Print out a State for debugging purposes.

Definition at line 3019 of file RaftConsensus.cc.

Member Data Documentation

const std::chrono::nanoseconds LogCabin::Server::RaftConsensus::ELECTION_TIMEOUT [private]

A follower waits for about this much inactivity before becoming a candidate and starting a new election.

Definition at line 1418 of file RaftConsensus.h.

const std::chrono::nanoseconds LogCabin::Server::RaftConsensus::HEARTBEAT_PERIOD [private]

A leader sends RPCs at least this often, even if there is no data to send.

Definition at line 1424 of file RaftConsensus.h.

uint64_t LogCabin::Server::RaftConsensus::MAX_LOG_ENTRIES_PER_REQUEST [private]

A leader will pack at most this many entries into an AppendEntries request message.

This helps bound processing time when entries are very small in size. Const except for unit tests.

Definition at line 1432 of file RaftConsensus.h.

const std::chrono::nanoseconds LogCabin::Server::RaftConsensus::RPC_FAILURE_BACKOFF [private]

A candidate or leader waits this long after an RPC fails before sending another one, so as to not overwhelm the network with retries.

Definition at line 1438 of file RaftConsensus.h.

const std::chrono::nanoseconds LogCabin::Server::RaftConsensus::STATE_MACHINE_UPDATER_BACKOFF [private]

How long the state machine updater thread should sleep if:

The servers do not currently support a common version, or
This server has not yet received version information from all other servers, or
An advance state machine entry failed to commit (probably due to lost leadership).

Definition at line 1448 of file RaftConsensus.h.

uint64_t LogCabin::Server::RaftConsensus::SOFT_RPC_SIZE_LIMIT [private]

Prefer to keep RPC requests under this size.

Const except for unit tests.

Definition at line 1454 of file RaftConsensus.h.

uint64_t LogCabin::Server::RaftConsensus::serverId

This server's unique ID.

Not available until init() is called.

Definition at line 1460 of file RaftConsensus.h.

std::string LogCabin::Server::RaftConsensus::serverAddresses

The addresses that this server is listening on.

Not available until init() is called.

Definition at line 1466 of file RaftConsensus.h.

Globals& LogCabin::Server::RaftConsensus::globals [private]

The LogCabin daemon's top-level objects.

Definition at line 1473 of file RaftConsensus.h.

Storage::Layout LogCabin::Server::RaftConsensus::storageLayout [private]

Where the files for the log and snapshots are stored.

Definition at line 1478 of file RaftConsensus.h.

Client::SessionManager LogCabin::Server::RaftConsensus::sessionManager [private]

Used to create new sessions.

Definition at line 1483 of file RaftConsensus.h.

Mutex LogCabin::Server::RaftConsensus::mutex [mutable, private]

This class behaves mostly like a monitor.

This protects all the state in this class and almost all of the Peer class (with some documented exceptions).

Definition at line 1490 of file RaftConsensus.h.

Core::ConditionVariable LogCabin::Server::RaftConsensus::stateChanged [mutable, private]

Notified when basically anything changes.

Specifically, this is notified when any of the following events occur:

term changes.
state changes.
log changes.
commitIndex changes.
exiting is set.
numPeerThreads is decremented.
configuration changes.
startElectionAt changes (see note under startElectionAt).
an acknowledgement from a peer is received.
a server goes from not caught up to caught up.
a heartbeat is scheduled. TODO(ongaro): Should there be multiple condition variables? This one is used by a lot of threads for a lot of different conditions.

Definition at line 1509 of file RaftConsensus.h.

bool LogCabin::Server::RaftConsensus::exiting [private]

Set to true when this class is about to be destroyed.

When this is true, threads must exit right away and no more RPCs should be sent or processed.

Definition at line 1516 of file RaftConsensus.h.

uint32_t LogCabin::Server::RaftConsensus::numPeerThreads [private]

The number of Peer::thread threads that are still using this RaftConsensus object.

When they exit, they decrement this and notify stateChanged.

Definition at line 1523 of file RaftConsensus.h.

std::unique_ptr<Storage::Log> LogCabin::Server::RaftConsensus::log [private]

Provides all storage for this server.

Keeps track of all log entries and some additional metadata.

If you modify this, be sure to keep configurationManager consistent.

Definition at line 1531 of file RaftConsensus.h.

bool LogCabin::Server::RaftConsensus::logSyncQueued [private]

Flag to indicate that leaderDiskThreadMain should flush recent log writes to stable storage.

This is always false for followers and candidates and is only used for leaders.

When a server steps down, it waits for all syncs to complete, that way followers can assume that all of their log entries are durable when replying to leaders.

Definition at line 1542 of file RaftConsensus.h.

std::atomic<bool> LogCabin::Server::RaftConsensus::leaderDiskThreadWorking [private]

Used for stepDown() to wait on leaderDiskThread without releasing mutex.

This is true while leaderDiskThread is writing to disk. It's set to true while holding mutex; set to false without mutex.

Definition at line 1549 of file RaftConsensus.h.

std::unique_ptr<Configuration> LogCabin::Server::RaftConsensus::configuration [private]

Defines the servers that are part of the cluster.

See Configuration.

Definition at line 1554 of file RaftConsensus.h.

std::unique_ptr<ConfigurationManager> LogCabin::Server::RaftConsensus::configurationManager [private]

Ensures that 'configuration' reflects the latest state of the log and snapshot.

Definition at line 1560 of file RaftConsensus.h.

uint64_t LogCabin::Server::RaftConsensus::currentTerm [private]

The latest term this server has seen.

This value monotonically increases over time. It gets updated in stepDown(), startNewElection(), and when a candidate receives a vote response with a newer term.

Warning:: After setting this value, you must call updateLogMetadata() to persist it.

Definition at line 1570 of file RaftConsensus.h.

State LogCabin::Server::RaftConsensus::state [private]

The server's current role in the cluster (follower, candidate, or leader).

See State.

Definition at line 1576 of file RaftConsensus.h.

uint64_t LogCabin::Server::RaftConsensus::lastSnapshotIndex [private]

The latest good snapshot covers entries 1 through 'lastSnapshotIndex' (inclusive).

It is known that these are committed. They are safe to remove from the log, but it may be advantageous to keep them around for a little while (to avoid shipping snapshots to straggling followers). Thus, the log may or may not have some of the entries in this range.

Definition at line 1585 of file RaftConsensus.h.

uint64_t LogCabin::Server::RaftConsensus::lastSnapshotTerm [private]

The term of the last entry covered by the latest good snapshot, or 0 if we have no snapshot.

Definition at line 1591 of file RaftConsensus.h.

uint64_t LogCabin::Server::RaftConsensus::lastSnapshotClusterTime [private]

The cluster time of the last entry covered by the latest good snapshot, or 0 if we have no snapshot.

Definition at line 1597 of file RaftConsensus.h.

uint64_t LogCabin::Server::RaftConsensus::lastSnapshotBytes [private]

The size of the latest good snapshot in bytes, or 0 if we have no snapshot.

Definition at line 1603 of file RaftConsensus.h.

std::unique_ptr<Storage::SnapshotFile::Reader> LogCabin::Server::RaftConsensus::snapshotReader [mutable, private]

If not NULL, this is a Storage::SnapshotFile::Reader that covers up through lastSnapshotIndex.

This is ready for the state machine to process and is returned to the state machine in getNextEntry(). It's just a cache which can be repopulated with readSnapshot().

Definition at line 1611 of file RaftConsensus.h.

std::unique_ptr<Storage::SnapshotFile::Writer> LogCabin::Server::RaftConsensus::snapshotWriter [private]

This is used in handleInstallSnapshot when receiving a snapshot from the current leader.

The leader is assumed to send at most one snapshot at a time, and any partial snapshots here are discarded when the term changes.

Definition at line 1619 of file RaftConsensus.h.

uint64_t LogCabin::Server::RaftConsensus::commitIndex [private]

The largest entry ID for which a quorum is known to have stored the same entry as this server has.

Entries 1 through commitIndex as stored in this server's log are guaranteed to never change. This value will monotonically increase over time.

Definition at line 1627 of file RaftConsensus.h.

uint64_t LogCabin::Server::RaftConsensus::leaderId [private]

The server ID of the leader for this term.

This is used to help point clients to the right server. The special value 0 means either there is no leader for this term yet or this server does not know who it is yet.

Definition at line 1634 of file RaftConsensus.h.

uint64_t LogCabin::Server::RaftConsensus::votedFor [private]

The server ID that this server voted for during this term's election, if any.

The special value 0 means no vote has been given out during this term.

Warning:: After setting this value, you must call updateLogMetadata() to persist it.

Definition at line 1644 of file RaftConsensus.h.

uint64_t LogCabin::Server::RaftConsensus::currentEpoch [mutable, private]

A logical clock used to confirm leadership and connectivity.

Definition at line 1650 of file RaftConsensus.h.

ClusterClock LogCabin::Server::RaftConsensus::clusterClock [private]

Tracks the passage of "cluster time".

See ClusterClock.

Definition at line 1655 of file RaftConsensus.h.

TimePoint LogCabin::Server::RaftConsensus::startElectionAt [private]

The earliest time at which timerThread should begin a new election with startNewElection().

It is safe for increases to startElectionAt to not notify the condition variable. Decreases to this value, however, must notify the condition variable to make sure the timerThread gets woken in a timely manner. Unfortunately, startElectionAt does not monotonically increase because of the random jitter that is applied to the follower timeout, and it would reduce the jitter's effectiveness for the thread to wait as long as the largest startElectionAt value.

Definition at line 1669 of file RaftConsensus.h.

TimePoint LogCabin::Server::RaftConsensus::withholdVotesUntil [private]

The earliest time at which RequestVote messages should be processed.

Until this time, they are rejected, as processing them risks causing the cluster leader to needlessly step down. For more motivation, see the "disruptive servers" issue in membership changes described in the Raft paper/thesis.

This is set to the current time + an election timeout when a heartbeat is received, and it's set to infinity for leaders (who begin processing RequestVote messages again immediately when they step down).

Definition at line 1682 of file RaftConsensus.h.

uint64_t LogCabin::Server::RaftConsensus::numEntriesTruncated [private]

The total number of entries ever truncated from the end of the log.

This happens only when a new leader tells this server to remove extraneous uncommitted entries from its log.

Definition at line 1689 of file RaftConsensus.h.

std::thread LogCabin::Server::RaftConsensus::leaderDiskThread [private]

The thread that executes leaderDiskThreadMain() to flush log entries to stable storage in the background on leaders.

Definition at line 1695 of file RaftConsensus.h.

std::thread LogCabin::Server::RaftConsensus::timerThread [private]

The thread that executes timerThreadMain() to begin new elections after periods of inactivity.

Definition at line 1701 of file RaftConsensus.h.

std::thread LogCabin::Server::RaftConsensus::stateMachineUpdaterThread [private]