Preface:
Diego Ongaro hastily prepared and presented this talk at Scale
Computing's Engineering Week in July 2015. Though its organization is
poor, it contains some good content, some of which cannot be found
elsewhere.
There's sort of two axes to this talk: (1) usage, operations,
internals, and (2) aspect of LogCabin such as writes, reads, membership
changes, compaction.
Right now it's all mushed together. On the next revision,
it'd be better to do a conceptual overview, then organize by (2),
interspersing the (1)s.
Copyright 2015 Diego Ongaro.
Source code available at https://github.com/logcabin/logcabin-talk.
This work is licensed under the Creative Commons Attribution 4.0 International License.
$ logcabin -h Run various operations on a LogCabin replicated state machine. Usage: logcabin [options] <command> [<args>] Commands: mkdir <path> If no directory exists at <path>, create it. list <path> List keys within directory at <path>. dump [<path>] Recursively print keys and values within directory at <path>. Defaults to printing all keys and values from root of tree. rmdir <path> Recursively remove directory at <path>, if any. write <path> Set/create value of file at <path> to stdin. read <path> Print value of file at <path>. remove <path> Remove file at <path>, if any. Options: -c <addresses>, --cluster=<addresses> Network addresses of the LogCabin servers, comma-separated [default: logcabin:5254] -d <path>, --dir=<path> Set working directory [default: /] -p <pred>, --condition=<pred> Set predicate on the operation of the form <path>:<value>, indicating that the key at <path> must have the given value. -t <time>, --timeout=<time> Set timeout for the operation (0 means wait forever) [default: 0s]
An operation is linearizable if it appears to occur instantaneously and exactly once at some point in time between its invocation and its response.
An operation is linearizable if it appears to occur instantaneously and exactly once at some point in time between its invocation and its response.
For more detail, see Linearizability: A correctness condition for concurrent objects, Maurice P. Herlihy and Jeannette M. Wing, 1990; and Linearizability versus Serializability, a blog post by Peter Bailis, 2014.
LogCabin is one of these.
Very first server needs to be bootstrapped
Initializes the very first server's log with a configuration entry made up of just itself.
Server1$ cat logcabin.conf serverId = 1 listenAddresses = 192.168.5.12 storagePath = storage
Server1$ logcabind --config=logcabin.conf --bootstrap ... Server/RaftConsensus.cc:572 in setConfiguration(): Activating configuration 1: prev_configuration { servers { server_id: 1 addresses: "192.168.5.12" } } Server/Main.cc:346 in main(): Done bootstrapping configuration. Exiting. ...
The directory structure shows a single log segment with a single entry, and no snapshot.
Server1$ tree -sh --du -F storage storage └── [ 16K] server1/ ├── [ 0] lock ├── [8.1K] log/ │ └── [4.1K] Segmented-Binary/ │ ├── [ 51] 00000000000000000001-00000000000000000001 │ ├── [ 35] metadata1 │ └── [ 35] metadata2 └── [4.0K] snapshot/
logcabin-storage
dumps out the log.
Server1$ logcabin-storage --config=logcabin.conf ... Storage/Tool.cc:255 in main(): Log contents start Log: metadata start: current_term: 1 voted_for: 0 end of metadata startIndex: 1 Entry 1 start: term: 1 type: CONFIGURATION configuration { prev_configuration { servers { server_id: 1 addresses: "192.168.5.12" } } } index: 1 cluster_time: 0 end of entry 1 Storage/Tool.cc:257 in main(): Log contents end ... Storage/Tool.cc:260 in main(): Reading snapshot at storage/server1 Storage/Tool.cc:153 in readSnapshot(): Snapshot file not found in storage/server1/snapshot
First, let's start up the servers.
Server2$ cat logcabin.conf serverId = 2 listenAddresses = 192.168.5.13 storagePath = storage
Server3$ cat logcabin.conf serverId = 3 listenAddresses = 192.168.5.14 storagePath = storage
Server1$ logcabind --config=logcabin.conf --daemon --log=server1.log Server2$ logcabind --config=logcabin.conf --daemon --log=server2.log Server3$ logcabind --config=logcabin.conf --daemon --log=server3.log
The first server (bootstrapped) gets to become leader of itself.
$ logcabinctl --server=$SERVER1IP stats get ... server_id: 1 addresses: "192.168.5.12" ... raft { current_term: 2 state: LEADER commit_index: 6 last_log_index: 6 leader_id: 1 voted_for: 1 ... last_snapshot_index: 0 last_snapshot_bytes: 0 log_start_index: 1 log_bytes: 679 ... peer { server_id: 1 addresses: "192.168.5.12" old_member: true new_member: false staging_member: false last_synced_index: 6 } } ...
Other servers just sit idle, since they don't have a configuration.
$ logcabinctl --server=$SERVER2IP stats get ... server_id: 2 addresses: "192.168.5.13" ... raft { current_term: 0 state: FOLLOWER commit_index: 0 last_log_index: 0 leader_id: 0 voted_for: 0 ... last_snapshot_index: 0 last_snapshot_bytes: 0 log_start_index: 1 log_bytes: 1 ... peer { server_id: 2 addresses: "" old_member: false new_member: false staging_member: false } } ...
We can ask the leader to add the other two.
$ export CLUSTER=$SERVER1IP,$SERVER2IP,$SERVER3IP $ logcabin-reconfigure --cluster=$CLUSTER set $SERVER1IP $SERVER2IP $SERVER3IP Current configuration: Configuration 1: - 1: 192.168.5.12 Attempting to change cluster membership to the following: 1: 192.168.5.12 (given as 192.168.5.12) 2: 192.168.5.13 (given as 192.168.5.13) 3: 192.168.5.14 (given as 192.168.5.14) Membership change result: OK Current configuration: Configuration 11: - 1: 192.168.5.12 - 2: 192.168.5.13 - 3: 192.168.5.14
Now servers 2 and 3 are part of the cluster, have the latest configuration, and are proper followers.
$ logcabinctl --server=$SERVER2IP stats get ... raft { current_term: 17 state: FOLLOWER commit_index: 1045 last_log_index: 1045 leader_id: 1 voted_for: 0 ... peer { server_id: 1 addresses: "192.168.5.12" old_member: true new_member: false staging_member: false } peer { server_id: 2 addresses: "192.168.5.13" old_member: true new_member: false staging_member: false } peer { server_id: 3 addresses: "192.168.5.14" old_member: true new_member: false staging_member: false } }
LogCabin uses the joint consensus approach to Raft membership changes.
More details in Membership Changes chapter of Raft dissertation.
More details in Membership Changes chapter of Raft dissertation.
Log from adding servers after bootstrapping
Server1$ cat server1.log ... Server/RaftConsensus.cc:1595 in setConfiguration(): Attempting to change the configuration from 1 Server/RaftConsensus.cc:1603 in setConfiguration(): Adding server 1 at 192.168.5.12 to staging servers Server/RaftConsensus.cc:1603 in setConfiguration(): Adding server 2 at 192.168.5.13 to staging servers Server/RaftConsensus.cc:1603 in setConfiguration(): Adding server 3 at 192.168.5.14 to staging servers ... Server/RaftConsensus.cc:1625 in setConfiguration(): Done catching up servers Server/RaftConsensus.cc:1650 in setConfiguration(): Writing transitional configuration entry ... Server/RaftConsensus.cc:572 in setConfiguration(): Activating configuration 10: prev_configuration { servers { server_id: 1, addresses: "192.168.5.12" } } next_configuration { servers { server_id: 1, addresses: "192.168.5.12" }, servers { server_id: 2, addresses: "192.168.5.13" }, servers { server_id: 3, addresses: "192.168.5.14" } } Server/RaftConsensus.cc:572 in setConfiguration(): Activating configuration 11: prev_configuration { servers { server_id: 1, addresses: "192.168.5.12" }, servers { server_id: 2, addresses: "192.168.5.13" }, servers { server_id: 3, addresses: "192.168.5.14" } } ...
$ logcabin-reconfigure --cluster=$CLUSTER set $SERVER1IP Current configuration: Configuration 11: - 1: 192.168.5.12 - 2: 192.168.5.13 - 3: 192.168.5.14 Attempting to change cluster membership to the following: 1: 192.168.5.12 (given as 192.168.5.12) Membership change result: OK Current configuration: Configuration 2357: - 1: 192.168.5.12 $ logcabin-reconfigure --cluster=$CLUSTER set $SERVER2IP $SERVER3IP Current configuration: Configuration 2357: - 1: 192.168.5.12 Attempting to change cluster membership to the following: 2: 192.168.5.13 (given as 192.168.5.13) 3: 192.168.5.14 (given as 192.168.5.14) [...wait for leader election...] Membership change result: OK Current configuration: Configuration 2378: - 2: 192.168.5.13 - 3: 192.168.5.14
Snapshotting is discussed later, but you'll probably need some concept of it to understand half of logcabinctl.
Goal: reclaim log space
$ logcabinctl --help Inspect or modify the state of a single LogCabin server. ... Commands: info get Print server ID and addresses. debug filename get Print the server's debug log filename. debug filename set <path> Change the server's debug log filename. debug policy get Print the server's debug log policy. debug policy set <value> Change the server's debug log policy. debug rotate Rotate the server's debug log file. snapshot inhibit get Print the remaining time for which the server was prevented from taking snapshots. snapshot inhibit set [<time>] Abort the server's current snapshot if one is in progress, and disallow the server from starting automated snapshots for the given duration [default: 1week]. snapshot inhibit clear Allow the server to take snapshots normally. snapshot start Begin taking a snapshot if none is in progress. snapshot stop Abort the current snapshot if one is in progress. snapshot restart Abort the current snapshot if one is in progress, then begin taking a new snapshot. stats get Print detailed server metrics. stats dump Write detailed server metrics to server's debug log.
$ logcabinctl --server=$SERVER1IP stats get | grep bytes last_snapshot_bytes: 0 log_bytes: 744921 open_segment_bytes: 744867
$ logcabinctl --server=$SERVER1IP snapshot start...wait a second...
$ logcabinctl --server=$SERVER1IP stats get | grep bytes last_snapshot_bytes: 2785 log_bytes: 1863 open_segment_bytes: 1863
scqad
(runs Scale's distributed tests) prevents automatic snapshots for one week after a test failure.
$ logcabinctl --server=$SERVER1IP snapshot inhibit set 1week $ logcabinctl --server=$SERVER1IP snapshot inhibit get 604795.121117807 s
Allows you to see more Raft log history with logcabin-storage
.
Undo:
$ logcabinctl --server=$SERVER1IP snapshot inhibit clear $ logcabinctl --server=$SERVER1IP snapshot inhibit get 0 ns
Change the server's verbosity at runtime
$ logcabinctl --server=$SERVER1IP debug policy get NOTICE $ logcabinctl --server=$SERVER1IP debug policy set \ Server/RaftConsensus.cc@VERBOSE,Storage@WARNING,NOTICE
Server1$ tail server1.log Server/RaftConsensus.cc:1393 in handleAppendEntries() VERBOSE: New commitIndex: 4997 Server/RaftConsensus.cc:2804 in setElectionTimer() VERBOSE: Will become candidate in 581 ms ...
Clients have similar control with:
LogCabin::Client::Debug::setLogPolicy(
LogCabin::Client::Debug::logPolicyFromString(
"Client@VERBOSE,NOTICE"));
Set server's default verbosity in config file
Event::Loop::Lock
:
Address
: DNS resolutionGlobals
)LeaderRPC
: connect to leaderMockClientImpl
: in-memory Tree used for testing applicationslogcabin-reconfigure
logcabin
CLI tool
/// Return once the state machine has
/// applied at least the given entry.
void
StateMachine::wait(uint64_t index) const
{
std::unique_lock<Mutex> lockGuard(mutex);
while (lastApplied < index)
entriesApplied.wait(lockGuard);
}
struct WaitHelper {
explicit WaitHelper(StateMachine& stateMachine)
: stateMachine(stateMachine)
, iter(0) {
}
void operator()() {
++iter;
if (iter == 1) {
EXPECT_EQ(0U, stateMachine.lastApplied);
stateMachine.lastApplied = 2;
} else if (iter == 2) {
EXPECT_EQ(2U, stateMachine.lastApplied);
stateMachine.lastApplied = 3;
}
}
StateMachine& stateMachine;
uint64_t iter;
};
TEST_F(ServerStateMachineTest, wait)
{
WaitHelper helper(*stateMachine);
stateMachine->entriesApplied.callback =
std::ref(helper);
stateMachine->wait(3);
EXPECT_EQ(2U, helper.iter);
}
Operations:
Challenge: writing consistent snapshot while taking requests
More details in Log Compaction chapter of Raft dissertation.