LogCabin
|
This class persists a log on the filesystem efficiently. More...
#include <SegmentedLog.h>
Classes | |
class | PreparedSegments |
A producer/consumer monitor for a queue of files to use for open segments. More... | |
struct | Segment |
An open or closed segment. More... | |
struct | SegmentHeader |
This goes at the start of every segment. More... | |
class | Sync |
Queues various operations on files, such as writes and fsyncs, to be executed later. More... | |
Public Types | |
enum | Encoding { TEXT, BINARY } |
Specifies how individual records are serialized. More... | |
Public Member Functions | |
SegmentedLog (const FilesystemUtil::File &parentDir, Encoding encoding, const Core::Config &config) | |
Constructor. | |
~SegmentedLog () | |
std::pair< uint64_t, uint64_t > | append (const std::vector< const Entry * > &entries) |
Start to append new entries to the log. | |
const Entry & | getEntry (uint64_t) const |
Look up an entry by its log index. | |
uint64_t | getLogStartIndex () const |
Get the index of the first entry in the log (whether or not this entry exists). | |
uint64_t | getLastLogIndex () const |
Get the index of the most recent entry in the log. | |
std::string | getName () const |
Return the name of the log implementation as it would be specified in the config file. | |
uint64_t | getSizeBytes () const |
Get the size of the entire log in bytes. | |
std::unique_ptr< Log::Sync > | takeSync () |
Get and remove the Log's Sync object in order to wait on it. | |
void | syncCompleteVirtual (std::unique_ptr< Log::Sync > sync) |
void | truncatePrefix (uint64_t firstIndex) |
Delete the log entries before the given index. | |
void | truncateSuffix (uint64_t lastIndex) |
Delete the log entries past the given index. | |
void | updateMetadata () |
Call this after changing metadata. | |
void | updateServerStats (Protocol::ServerStats &serverStats) const |
Add information about the log's state to the given structure. | |
Private Types | |
typedef Core::Time::SteadyClock | Clock |
Clock used for measuring disk performance. | |
typedef Clock::time_point | TimePoint |
Time point for measuring disk performance. | |
Private Member Functions | |
std::vector< Segment > | readSegmentFilenames () |
List the files in dir and create Segment objects for any of them that look like segments. | |
bool | readMetadata (const std::string &filename, SegmentedLogMetadata::Metadata &metadata, bool quiet) const |
Read a metadata file from disk. | |
bool | loadClosedSegment (Segment &segment, uint64_t logStartIndex) |
Read the given closed segment from disk, issuing PANICs and WARNINGs appropriately. | |
bool | loadOpenSegment (Segment &segment, uint64_t logStartIndex) |
Read the given open segment from disk, issuing PANICs and WARNINGs appropriately, and closing the segment. | |
void | checkInvariants () |
Run through a bunch of assertions of class invariants (for debugging). | |
void | closeSegment () |
Close the open segment if one is open. | |
Segment & | getOpenSegment () |
Return a reference to the current open segment (the one that new writes should go into). | |
const Segment & | getOpenSegment () const |
void | openNewSegment () |
Set up a new open segment for the log head. | |
std::string | readProtoFromFile (const FilesystemUtil::File &file, FilesystemUtil::FileContents &reader, uint64_t *offset, google::protobuf::Message *out) const |
Read the next ProtoBuf record out of 'file'. | |
Core::Buffer | serializeProto (const google::protobuf::Message &in) const |
Prepare a ProtoBuf record to be written to disk. | |
std::pair< std::string, FilesystemUtil::File > | prepareNewSegment (uint64_t fileId) |
Opens a file for a new segment and allocates its space on disk. | |
void | segmentPreparerMain () |
The main function for the segmentPreparer thread. | |
Private Attributes | |
const Encoding | encoding |
Specifies how individual records are stored. | |
const std::string | checksumAlgorithm |
The algorithm to use when writing new records. | |
const uint64_t | MAX_SEGMENT_SIZE |
The maximum size in bytes for newly written segments. | |
const bool | shouldCheckInvariants |
Set to true if checkInvariants() should do its job, or set to false for performance. | |
const std::chrono::milliseconds | diskWriteDurationThreshold |
If a disk operation exceeds this much time, log a warning. | |
SegmentedLogMetadata::Metadata | metadata |
The metadata this class mintains. | |
FilesystemUtil::File | dir |
The directory containing every file this log creates. | |
FilesystemUtil::File | openSegmentFile |
A writable OS-level file that contains the entries for the current open segment. | |
uint64_t | logStartIndex |
The index of the first entry in the log, see getLogStartIndex(). | |
std::map< uint64_t, Segment > | segmentsByStartIndex |
Ordered map of all closed segments and the open segment, indexed by the startIndex of each segment. | |
uint64_t | totalClosedSegmentBytes |
The total number of bytes occupied by the closed segments on disk. | |
PreparedSegments | preparedSegments |
See PreparedSegments. | |
std::unique_ptr < SegmentedLog::Sync > | currentSync |
Accumulates deferred filesystem operations for append() and truncatePrefix(). | |
Core::RollingStat | metadataWriteNanos |
Tracks the time it takes to write a metadata file. | |
Core::RollingStat | filesystemOpsNanos |
Tracks the time it takes to execute wait() on a Sync object. | |
std::thread | segmentPreparer |
Opens files, allocates the to full size, and places them on preparedSegments for the log to use. |
This class persists a log on the filesystem efficiently.
The log entries on disk are stored in a series of files called segments, and each segment is about 8MB in size. Thus, most small appends do not need to update filesystem metadata and can proceed with a single consecutive disk write.
The disk files consist of metadata files, closed segments, and open segments. Metadata files are used to track Raft metadata, such as the server's current term, and also the log's start index. Segments contain contiguous entries that are part of the log. Closed segments are never written to again (but may be renamed and truncated if a suffix of the log is truncated). Open segments are where newly appended entries go. Once an open segment reaches MAX_SEGMENT_SIZE, it is closed and a new one is used.
Metadata files are named "metadata1" and "metadata2". The code alternates between these so that there is always at least one readable metadata file. On boot, the readable metadata file with the higher version number is used.
Closed segments are named by the format string "%020lu-%020lu" with their start and end indexes, both inclusive. Closed segments always contain at least one entry; the end index is always at least as large as the start index. Closed segment files may occasionally include data past their filename's end index (these are ignored but a WARNING is issued). This can happen if the suffix of the segment is truncated and a crash occurs at an inopportune time (the segment file is first renamed, then truncated, and a crash occurs in between).
Open segments are named by the format string "open-%lu" with a unique number. These should not exist when the server shuts down cleanly, but they exist while the server is running and may be left around during a crash. Open segments either contain entries which come after the last closed segment or are full of zeros. When the server crashes while appending to an open segment, the end of that file may be corrupt. We can't distinguish between a corrupt file and a partially written entry. The code assumes it's a partially written entry, issues a WARNING, and ignores it.
Truncating a suffix of the log will remove all entries that are no longer part of the log. Truncating a prefix of the log will only remove complete segments that are before the new log start index. For example, if a segment has entries 10 through 20 and the prefix of the log is truncated to start at entry 15, that entire segment will be retained.
Each segment file starts with a segment header, which currently contains just a one-byte version number for the format of that segment. The current format (version 1) is just a concatenation of serialized entry records.
Definition at line 89 of file SegmentedLog.h.
typedef Core::Time::SteadyClock LogCabin::Storage::SegmentedLog::Clock [private] |
Clock used for measuring disk performance.
Definition at line 93 of file SegmentedLog.h.
typedef Clock::time_point LogCabin::Storage::SegmentedLog::TimePoint [private] |
Time point for measuring disk performance.
Definition at line 97 of file SegmentedLog.h.
Specifies how individual records are serialized.
Definition at line 104 of file SegmentedLog.h.
LogCabin::Storage::SegmentedLog::SegmentedLog | ( | const FilesystemUtil::File & | parentDir, |
Encoding | encoding, | ||
const Core::Config & | config | ||
) |
Constructor.
parentDir | A filesystem directory in which all the files for this storage module are kept. |
encoding | Specifies how individual records are stored. |
config | Settings. |
Definition at line 346 of file SegmentedLog.cc.
Definition at line 465 of file SegmentedLog.cc.
std::pair< uint64_t, uint64_t > LogCabin::Storage::SegmentedLog::append | ( | const std::vector< const Entry * > & | entries | ) | [virtual] |
Start to append new entries to the log.
The entries may not be on disk yet when this returns; see Sync.
entries | Entries to place at the end of the log. |
Implements LogCabin::Storage::Log.
Definition at line 491 of file SegmentedLog.cc.
const SegmentedLog::Entry & LogCabin::Storage::SegmentedLog::getEntry | ( | uint64_t | index | ) | const [virtual] |
Look up an entry by its log index.
index | Must be in the range [getLogStartIndex(), getLastLogIndex()]. Otherwise, this will crash the server. |
Implements LogCabin::Storage::Log.
Definition at line 575 of file SegmentedLog.cc.
uint64_t LogCabin::Storage::SegmentedLog::getLogStartIndex | ( | ) | const [virtual] |
Get the index of the first entry in the log (whether or not this entry exists).
Implements LogCabin::Storage::Log.
Definition at line 592 of file SegmentedLog.cc.
uint64_t LogCabin::Storage::SegmentedLog::getLastLogIndex | ( | ) | const [virtual] |
Get the index of the most recent entry in the log.
Implements LogCabin::Storage::Log.
Definition at line 598 of file SegmentedLog.cc.
std::string LogCabin::Storage::SegmentedLog::getName | ( | ) | const [virtual] |
Return the name of the log implementation as it would be specified in the config file.
Implements LogCabin::Storage::Log.
Definition at line 610 of file SegmentedLog.cc.
uint64_t LogCabin::Storage::SegmentedLog::getSizeBytes | ( | ) | const [virtual] |
Get the size of the entire log in bytes.
Implements LogCabin::Storage::Log.
Definition at line 619 of file SegmentedLog.cc.
std::unique_ptr< Log::Sync > LogCabin::Storage::SegmentedLog::takeSync | ( | ) | [virtual] |
Get and remove the Log's Sync object in order to wait on it.
This Sync object must later be returned to the Log with syncComplete().
While takeSync() and syncComplete() may not be done concurrently with other Log operations, Sync::wait() may be done concurrently with all operations except truncateSuffix().
Implements LogCabin::Storage::Log.
Definition at line 625 of file SegmentedLog.cc.
void LogCabin::Storage::SegmentedLog::syncCompleteVirtual | ( | std::unique_ptr< Log::Sync > | sync | ) |
Definition at line 635 of file SegmentedLog.cc.
void LogCabin::Storage::SegmentedLog::truncatePrefix | ( | uint64_t | firstIndex | ) | [virtual] |
Delete the log entries before the given index.
Once you truncate a prefix from the log, there's no way to undo this. The entries may still be on disk when this returns and file descriptors and other resources may remain open; see Sync.
firstIndex | After this call, the log will contain no entries indexed less than firstIndex. This can be any log index, including 0 and those past the end of the log. |
Implements LogCabin::Storage::Log.
Definition at line 642 of file SegmentedLog.cc.
void LogCabin::Storage::SegmentedLog::truncateSuffix | ( | uint64_t | lastIndex | ) | [virtual] |
Delete the log entries past the given index.
This will not affect the log start index.
lastIndex | After this call, the log will contain no entries indexed greater than lastIndex. This can be any log index, including 0 and those past the end of the log. |
Implements LogCabin::Storage::Log.
Definition at line 679 of file SegmentedLog.cc.
void LogCabin::Storage::SegmentedLog::updateMetadata | ( | ) | [virtual] |
Call this after changing metadata.
Implements LogCabin::Storage::Log.
Definition at line 758 of file SegmentedLog.cc.
void LogCabin::Storage::SegmentedLog::updateServerStats | ( | Protocol::ServerStats & | serverStats | ) | const [virtual] |
Add information about the log's state to the given structure.
Used for diagnostics.
Reimplemented from LogCabin::Storage::Log.
Definition at line 803 of file SegmentedLog.cc.
std::vector< SegmentedLog::Segment > LogCabin::Storage::SegmentedLog::readSegmentFilenames | ( | ) | [private] |
List the files in dir and create Segment objects for any of them that look like segments.
This is only used during initialization. These segments are passed through loadClosedSegment() and loadOpenSegment() next. Also updates SegmentPreparer::filenameCounter.
Definition at line 818 of file SegmentedLog.cc.
bool LogCabin::Storage::SegmentedLog::readMetadata | ( | const std::string & | filename, |
SegmentedLogMetadata::Metadata & | metadata, | ||
bool | quiet | ||
) | const [private] |
Read a metadata file from disk.
This is only used during initialization.
filename | Filename within dir to attempt to open and read. | |
[out] | metadata | Where the contents of the file end up. |
quiet | Set to true to avoid warnings when the file can't be read; used in unit tests. |
Definition at line 876 of file SegmentedLog.cc.
bool LogCabin::Storage::SegmentedLog::loadClosedSegment | ( | Segment & | segment, |
uint64_t | logStartIndex | ||
) | [private] |
Read the given closed segment from disk, issuing PANICs and WARNINGs appropriately.
This is only used during initialization.
Deletes segment if its last index is below logStartIndex.
Reads every entry described in the filename, and PANICs if any of those can't be read.
[in,out] | segment | Closed segment to read from disk. |
logStartIndex | The index of the first entry in the log, according to the log metadata. |
Definition at line 910 of file SegmentedLog.cc.
bool LogCabin::Storage::SegmentedLog::loadOpenSegment | ( | Segment & | segment, |
uint64_t | logStartIndex | ||
) | [private] |
Read the given open segment from disk, issuing PANICs and WARNINGs appropriately, and closing the segment.
This is only used during initialization.
Reads up through the end of the file or the last entry with a valid checksum. If any valid entries are read, the segment is truncated and closed. Otherwise, it is removed.
Deletes segment if its last index is below logStartIndex.
[in,out] | segment | Open segment to read from disk. |
logStartIndex | The index of the first entry in the log, according to the log metadata. |
Definition at line 980 of file SegmentedLog.cc.
void LogCabin::Storage::SegmentedLog::checkInvariants | ( | ) | [private] |
Run through a bunch of assertions of class invariants (for debugging).
For example, there should always be one open segment. See shouldCheckInvariants, controlled by the config option 'storageDebug', and the BUILDTYPE.
Definition at line 1091 of file SegmentedLog.cc.
void LogCabin::Storage::SegmentedLog::closeSegment | ( | ) | [private] |
Close the open segment if one is open.
This removes the open segment if it is empty, or closes it otherwise. Since it's a class invariant that there is always an open segment, the caller should open a new segment after calling this (unless it's shutting down).
Definition at line 1141 of file SegmentedLog.cc.
SegmentedLog::Segment & LogCabin::Storage::SegmentedLog::getOpenSegment | ( | ) | [private] |
Return a reference to the current open segment (the one that new writes should go into).
Crashes if there is no open segment (but it's an invariant of this class to maintain one).
Definition at line 1180 of file SegmentedLog.cc.
const SegmentedLog::Segment & LogCabin::Storage::SegmentedLog::getOpenSegment | ( | ) | const [private] |
Definition at line 1187 of file SegmentedLog.cc.
void LogCabin::Storage::SegmentedLog::openNewSegment | ( | ) | [private] |
Set up a new open segment for the log head.
This is called when append() needs more space but also when the end of the log is truncated with truncatePrefix() or truncateSuffix().
Definition at line 1194 of file SegmentedLog.cc.
std::string LogCabin::Storage::SegmentedLog::readProtoFromFile | ( | const FilesystemUtil::File & | file, |
FilesystemUtil::FileContents & | reader, | ||
uint64_t * | offset, | ||
google::protobuf::Message * | out | ||
) | const [private] |
Read the next ProtoBuf record out of 'file'.
file | The open file, useful for error messages. | |
reader | A reader for 'file'. | |
[in,out] | offset | The byte offset in the file at which to start reading as input. The byte just after the last byte of data as output if successful, otherwise unmodified. |
[out] | out | An empty ProtoBuf to fill in. |
Format:
|checksum|dataLen|data|
The checksum is up to Core::Checksum::MAX_LENGTH bytes and is terminated by a null character. It covers both dataLen and data.
dataLen is an unsigned integer (8 bytes, big-endian byte order) that specifies the length in bytes of data.
data is a protobuf encoded as binary or text, depending on encoding.
Definition at line 1214 of file SegmentedLog.cc.
Core::Buffer LogCabin::Storage::SegmentedLog::serializeProto | ( | const google::protobuf::Message & | in | ) | const [private] |
Prepare a ProtoBuf record to be written to disk.
in | ProtoBuf to be serialized. |
Definition at line 1273 of file SegmentedLog.cc.
std::pair< std::string, FS::File > LogCabin::Storage::SegmentedLog::prepareNewSegment | ( | uint64_t | fileId | ) | [private] |
Opens a file for a new segment and allocates its space on disk.
fileId | ID to use to generate filename; see SegmentPreparer::filenameCounter. |
Definition at line 1321 of file SegmentedLog.cc.
void LogCabin::Storage::SegmentedLog::segmentPreparerMain | ( | ) | [private] |
The main function for the segmentPreparer thread.
Definition at line 1353 of file SegmentedLog.cc.
const Encoding LogCabin::Storage::SegmentedLog::encoding [private] |
Specifies how individual records are stored.
Definition at line 565 of file SegmentedLog.h.
const std::string LogCabin::Storage::SegmentedLog::checksumAlgorithm [private] |
The algorithm to use when writing new records.
When reading records, any available checksum is used.
Definition at line 571 of file SegmentedLog.h.
const uint64_t LogCabin::Storage::SegmentedLog::MAX_SEGMENT_SIZE [private] |
The maximum size in bytes for newly written segments.
Controlled by the 'storageSegmentBytes' config option.
Definition at line 577 of file SegmentedLog.h.
const bool LogCabin::Storage::SegmentedLog::shouldCheckInvariants [private] |
Set to true if checkInvariants() should do its job, or set to false for performance.
Definition at line 583 of file SegmentedLog.h.
const std::chrono::milliseconds LogCabin::Storage::SegmentedLog::diskWriteDurationThreshold [private] |
If a disk operation exceeds this much time, log a warning.
Definition at line 588 of file SegmentedLog.h.
SegmentedLogMetadata::Metadata LogCabin::Storage::SegmentedLog::metadata [private] |
The metadata this class mintains.
This should be combined with the superclass's metadata when being written out to disk.
Reimplemented from LogCabin::Storage::Log.
Definition at line 594 of file SegmentedLog.h.
The directory containing every file this log creates.
Definition at line 599 of file SegmentedLog.h.
A writable OS-level file that contains the entries for the current open segment.
It is a class invariant that this is always a valid file.
Definition at line 605 of file SegmentedLog.h.
uint64_t LogCabin::Storage::SegmentedLog::logStartIndex [private] |
The index of the first entry in the log, see getLogStartIndex().
Definition at line 610 of file SegmentedLog.h.
std::map<uint64_t, Segment> LogCabin::Storage::SegmentedLog::segmentsByStartIndex [private] |
Ordered map of all closed segments and the open segment, indexed by the startIndex of each segment.
This is used to support all the key operations, such as looking up an entry and truncation.
Definition at line 617 of file SegmentedLog.h.
uint64_t LogCabin::Storage::SegmentedLog::totalClosedSegmentBytes [private] |
The total number of bytes occupied by the closed segments on disk.
Used to calculate getSizeBytes() efficiently.
Definition at line 623 of file SegmentedLog.h.
See PreparedSegments.
Definition at line 628 of file SegmentedLog.h.
std::unique_ptr<SegmentedLog::Sync> LogCabin::Storage::SegmentedLog::currentSync [private] |
Accumulates deferred filesystem operations for append() and truncatePrefix().
Definition at line 634 of file SegmentedLog.h.
Tracks the time it takes to write a metadata file.
Definition at line 639 of file SegmentedLog.h.
Tracks the time it takes to execute wait() on a Sync object.
Definition at line 644 of file SegmentedLog.h.
std::thread LogCabin::Storage::SegmentedLog::segmentPreparer [private] |
Opens files, allocates the to full size, and places them on preparedSegments for the log to use.
Definition at line 650 of file SegmentedLog.h.