LogCabin
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Defines
Classes | Public Types | Public Member Functions | Private Types | Private Member Functions | Private Attributes
LogCabin::Storage::SegmentedLog Class Reference

This class persists a log on the filesystem efficiently. More...

#include <SegmentedLog.h>

Inheritance diagram for LogCabin::Storage::SegmentedLog:
LogCabin::Storage::Log

List of all members.

Classes

class  PreparedSegments
 A producer/consumer monitor for a queue of files to use for open segments. More...
struct  Segment
 An open or closed segment. More...
struct  SegmentHeader
 This goes at the start of every segment. More...
class  Sync
 Queues various operations on files, such as writes and fsyncs, to be executed later. More...

Public Types

enum  Encoding {
  TEXT,
  BINARY
}
 Specifies how individual records are serialized. More...

Public Member Functions

 SegmentedLog (const FilesystemUtil::File &parentDir, Encoding encoding, const Core::Config &config)
 Constructor.
 ~SegmentedLog ()
std::pair< uint64_t, uint64_t > append (const std::vector< const Entry * > &entries)
 Start to append new entries to the log.
const EntrygetEntry (uint64_t) const
 Look up an entry by its log index.
uint64_t getLogStartIndex () const
 Get the index of the first entry in the log (whether or not this entry exists).
uint64_t getLastLogIndex () const
 Get the index of the most recent entry in the log.
std::string getName () const
 Return the name of the log implementation as it would be specified in the config file.
uint64_t getSizeBytes () const
 Get the size of the entire log in bytes.
std::unique_ptr< Log::SynctakeSync ()
 Get and remove the Log's Sync object in order to wait on it.
void syncCompleteVirtual (std::unique_ptr< Log::Sync > sync)
void truncatePrefix (uint64_t firstIndex)
 Delete the log entries before the given index.
void truncateSuffix (uint64_t lastIndex)
 Delete the log entries past the given index.
void updateMetadata ()
 Call this after changing metadata.
void updateServerStats (Protocol::ServerStats &serverStats) const
 Add information about the log's state to the given structure.

Private Types

typedef Core::Time::SteadyClock Clock
 Clock used for measuring disk performance.
typedef Clock::time_point TimePoint
 Time point for measuring disk performance.

Private Member Functions

std::vector< SegmentreadSegmentFilenames ()
 List the files in dir and create Segment objects for any of them that look like segments.
bool readMetadata (const std::string &filename, SegmentedLogMetadata::Metadata &metadata, bool quiet) const
 Read a metadata file from disk.
bool loadClosedSegment (Segment &segment, uint64_t logStartIndex)
 Read the given closed segment from disk, issuing PANICs and WARNINGs appropriately.
bool loadOpenSegment (Segment &segment, uint64_t logStartIndex)
 Read the given open segment from disk, issuing PANICs and WARNINGs appropriately, and closing the segment.
void checkInvariants ()
 Run through a bunch of assertions of class invariants (for debugging).
void closeSegment ()
 Close the open segment if one is open.
SegmentgetOpenSegment ()
 Return a reference to the current open segment (the one that new writes should go into).
const SegmentgetOpenSegment () const
void openNewSegment ()
 Set up a new open segment for the log head.
std::string readProtoFromFile (const FilesystemUtil::File &file, FilesystemUtil::FileContents &reader, uint64_t *offset, google::protobuf::Message *out) const
 Read the next ProtoBuf record out of 'file'.
Core::Buffer serializeProto (const google::protobuf::Message &in) const
 Prepare a ProtoBuf record to be written to disk.
std::pair< std::string,
FilesystemUtil::File
prepareNewSegment (uint64_t fileId)
 Opens a file for a new segment and allocates its space on disk.
void segmentPreparerMain ()
 The main function for the segmentPreparer thread.

Private Attributes

const Encoding encoding
 Specifies how individual records are stored.
const std::string checksumAlgorithm
 The algorithm to use when writing new records.
const uint64_t MAX_SEGMENT_SIZE
 The maximum size in bytes for newly written segments.
const bool shouldCheckInvariants
 Set to true if checkInvariants() should do its job, or set to false for performance.
const std::chrono::milliseconds diskWriteDurationThreshold
 If a disk operation exceeds this much time, log a warning.
SegmentedLogMetadata::Metadata metadata
 The metadata this class mintains.
FilesystemUtil::File dir
 The directory containing every file this log creates.
FilesystemUtil::File openSegmentFile
 A writable OS-level file that contains the entries for the current open segment.
uint64_t logStartIndex
 The index of the first entry in the log, see getLogStartIndex().
std::map< uint64_t, SegmentsegmentsByStartIndex
 Ordered map of all closed segments and the open segment, indexed by the startIndex of each segment.
uint64_t totalClosedSegmentBytes
 The total number of bytes occupied by the closed segments on disk.
PreparedSegments preparedSegments
 See PreparedSegments.
std::unique_ptr
< SegmentedLog::Sync
currentSync
 Accumulates deferred filesystem operations for append() and truncatePrefix().
Core::RollingStat metadataWriteNanos
 Tracks the time it takes to write a metadata file.
Core::RollingStat filesystemOpsNanos
 Tracks the time it takes to execute wait() on a Sync object.
std::thread segmentPreparer
 Opens files, allocates the to full size, and places them on preparedSegments for the log to use.

Detailed Description

This class persists a log on the filesystem efficiently.

The log entries on disk are stored in a series of files called segments, and each segment is about 8MB in size. Thus, most small appends do not need to update filesystem metadata and can proceed with a single consecutive disk write.

The disk files consist of metadata files, closed segments, and open segments. Metadata files are used to track Raft metadata, such as the server's current term, and also the log's start index. Segments contain contiguous entries that are part of the log. Closed segments are never written to again (but may be renamed and truncated if a suffix of the log is truncated). Open segments are where newly appended entries go. Once an open segment reaches MAX_SEGMENT_SIZE, it is closed and a new one is used.

Metadata files are named "metadata1" and "metadata2". The code alternates between these so that there is always at least one readable metadata file. On boot, the readable metadata file with the higher version number is used.

Closed segments are named by the format string "%020lu-%020lu" with their start and end indexes, both inclusive. Closed segments always contain at least one entry; the end index is always at least as large as the start index. Closed segment files may occasionally include data past their filename's end index (these are ignored but a WARNING is issued). This can happen if the suffix of the segment is truncated and a crash occurs at an inopportune time (the segment file is first renamed, then truncated, and a crash occurs in between).

Open segments are named by the format string "open-%lu" with a unique number. These should not exist when the server shuts down cleanly, but they exist while the server is running and may be left around during a crash. Open segments either contain entries which come after the last closed segment or are full of zeros. When the server crashes while appending to an open segment, the end of that file may be corrupt. We can't distinguish between a corrupt file and a partially written entry. The code assumes it's a partially written entry, issues a WARNING, and ignores it.

Truncating a suffix of the log will remove all entries that are no longer part of the log. Truncating a prefix of the log will only remove complete segments that are before the new log start index. For example, if a segment has entries 10 through 20 and the prefix of the log is truncated to start at entry 15, that entire segment will be retained.

Each segment file starts with a segment header, which currently contains just a one-byte version number for the format of that segment. The current format (version 1) is just a concatenation of serialized entry records.

Definition at line 89 of file SegmentedLog.h.


Member Typedef Documentation

Clock used for measuring disk performance.

Definition at line 93 of file SegmentedLog.h.

Time point for measuring disk performance.

Definition at line 97 of file SegmentedLog.h.


Member Enumeration Documentation

Specifies how individual records are serialized.

Enumerator:
TEXT 

ProtoBuf human-readable text format.

BINARY 

ProtoBuf binary format.

Definition at line 104 of file SegmentedLog.h.


Constructor & Destructor Documentation

LogCabin::Storage::SegmentedLog::SegmentedLog ( const FilesystemUtil::File parentDir,
Encoding  encoding,
const Core::Config config 
)

Constructor.

Parameters:
parentDirA filesystem directory in which all the files for this storage module are kept.
encodingSpecifies how individual records are stored.
configSettings.

Definition at line 346 of file SegmentedLog.cc.

Definition at line 465 of file SegmentedLog.cc.


Member Function Documentation

std::pair< uint64_t, uint64_t > LogCabin::Storage::SegmentedLog::append ( const std::vector< const Entry * > &  entries) [virtual]

Start to append new entries to the log.

The entries may not be on disk yet when this returns; see Sync.

Parameters:
entriesEntries to place at the end of the log.
Returns:
Range of indexes of the new entries in the log, inclusive.

Implements LogCabin::Storage::Log.

Definition at line 491 of file SegmentedLog.cc.

const SegmentedLog::Entry & LogCabin::Storage::SegmentedLog::getEntry ( uint64_t  index) const [virtual]

Look up an entry by its log index.

Parameters:
indexMust be in the range [getLogStartIndex(), getLastLogIndex()]. Otherwise, this will crash the server.
Returns:
The entry corresponding to that index. This reference is only guaranteed to be valid until the next time the log is modified.

Implements LogCabin::Storage::Log.

Definition at line 575 of file SegmentedLog.cc.

Get the index of the first entry in the log (whether or not this entry exists).

Returns:
1 for logs that have never had truncatePrefix called, otherwise the largest index passed to truncatePrefix.

Implements LogCabin::Storage::Log.

Definition at line 592 of file SegmentedLog.cc.

Get the index of the most recent entry in the log.

Returns:
The index of the most recent entry in the log, or getLogStartIndex() - 1 if the log is empty.

Implements LogCabin::Storage::Log.

Definition at line 598 of file SegmentedLog.cc.

std::string LogCabin::Storage::SegmentedLog::getName ( ) const [virtual]

Return the name of the log implementation as it would be specified in the config file.

Implements LogCabin::Storage::Log.

Definition at line 610 of file SegmentedLog.cc.

uint64_t LogCabin::Storage::SegmentedLog::getSizeBytes ( ) const [virtual]

Get the size of the entire log in bytes.

Implements LogCabin::Storage::Log.

Definition at line 619 of file SegmentedLog.cc.

std::unique_ptr< Log::Sync > LogCabin::Storage::SegmentedLog::takeSync ( ) [virtual]

Get and remove the Log's Sync object in order to wait on it.

This Sync object must later be returned to the Log with syncComplete().

While takeSync() and syncComplete() may not be done concurrently with other Log operations, Sync::wait() may be done concurrently with all operations except truncateSuffix().

Implements LogCabin::Storage::Log.

Definition at line 625 of file SegmentedLog.cc.

Definition at line 635 of file SegmentedLog.cc.

void LogCabin::Storage::SegmentedLog::truncatePrefix ( uint64_t  firstIndex) [virtual]

Delete the log entries before the given index.

Once you truncate a prefix from the log, there's no way to undo this. The entries may still be on disk when this returns and file descriptors and other resources may remain open; see Sync.

Parameters:
firstIndexAfter this call, the log will contain no entries indexed less than firstIndex. This can be any log index, including 0 and those past the end of the log.

Implements LogCabin::Storage::Log.

Definition at line 642 of file SegmentedLog.cc.

void LogCabin::Storage::SegmentedLog::truncateSuffix ( uint64_t  lastIndex) [virtual]

Delete the log entries past the given index.

This will not affect the log start index.

Parameters:
lastIndexAfter this call, the log will contain no entries indexed greater than lastIndex. This can be any log index, including 0 and those past the end of the log.
Warning:
Callers should wait() on all Sync object prior to calling truncateSuffix(). This never happens on leaders, so it's not a real limitation, but things may go wonky otherwise.

Implements LogCabin::Storage::Log.

Definition at line 679 of file SegmentedLog.cc.

Call this after changing metadata.

Implements LogCabin::Storage::Log.

Definition at line 758 of file SegmentedLog.cc.

void LogCabin::Storage::SegmentedLog::updateServerStats ( Protocol::ServerStats &  serverStats) const [virtual]

Add information about the log's state to the given structure.

Used for diagnostics.

Reimplemented from LogCabin::Storage::Log.

Definition at line 803 of file SegmentedLog.cc.

List the files in dir and create Segment objects for any of them that look like segments.

This is only used during initialization. These segments are passed through loadClosedSegment() and loadOpenSegment() next. Also updates SegmentPreparer::filenameCounter.

Returns:
Partially initialized Segment objects, one per discovered filename.

Definition at line 818 of file SegmentedLog.cc.

bool LogCabin::Storage::SegmentedLog::readMetadata ( const std::string &  filename,
SegmentedLogMetadata::Metadata &  metadata,
bool  quiet 
) const [private]

Read a metadata file from disk.

This is only used during initialization.

Parameters:
filenameFilename within dir to attempt to open and read.
[out]metadataWhere the contents of the file end up.
quietSet to true to avoid warnings when the file can't be read; used in unit tests.
Returns:
True if the file was read successfully, false otherwise.

Definition at line 876 of file SegmentedLog.cc.

bool LogCabin::Storage::SegmentedLog::loadClosedSegment ( Segment segment,
uint64_t  logStartIndex 
) [private]

Read the given closed segment from disk, issuing PANICs and WARNINGs appropriately.

This is only used during initialization.

Deletes segment if its last index is below logStartIndex.

Reads every entry described in the filename, and PANICs if any of those can't be read.

Parameters:
[in,out]segmentClosed segment to read from disk.
logStartIndexThe index of the first entry in the log, according to the log metadata.
Returns:
True if the segment is valid; false if it has been removed entirely from disk.

Definition at line 910 of file SegmentedLog.cc.

bool LogCabin::Storage::SegmentedLog::loadOpenSegment ( Segment segment,
uint64_t  logStartIndex 
) [private]

Read the given open segment from disk, issuing PANICs and WARNINGs appropriately, and closing the segment.

This is only used during initialization.

Reads up through the end of the file or the last entry with a valid checksum. If any valid entries are read, the segment is truncated and closed. Otherwise, it is removed.

Deletes segment if its last index is below logStartIndex.

Parameters:
[in,out]segmentOpen segment to read from disk.
logStartIndexThe index of the first entry in the log, according to the log metadata.
Returns:
True if the segment is valid; false if it has been removed entirely from disk.

Definition at line 980 of file SegmentedLog.cc.

Run through a bunch of assertions of class invariants (for debugging).

For example, there should always be one open segment. See shouldCheckInvariants, controlled by the config option 'storageDebug', and the BUILDTYPE.

Definition at line 1091 of file SegmentedLog.cc.

Close the open segment if one is open.

This removes the open segment if it is empty, or closes it otherwise. Since it's a class invariant that there is always an open segment, the caller should open a new segment after calling this (unless it's shutting down).

Definition at line 1141 of file SegmentedLog.cc.

Return a reference to the current open segment (the one that new writes should go into).

Crashes if there is no open segment (but it's an invariant of this class to maintain one).

Definition at line 1180 of file SegmentedLog.cc.

Definition at line 1187 of file SegmentedLog.cc.

Set up a new open segment for the log head.

This is called when append() needs more space but also when the end of the log is truncated with truncatePrefix() or truncateSuffix().

Precondition:
There is no currently open segment.

Definition at line 1194 of file SegmentedLog.cc.

std::string LogCabin::Storage::SegmentedLog::readProtoFromFile ( const FilesystemUtil::File file,
FilesystemUtil::FileContents reader,
uint64_t *  offset,
google::protobuf::Message *  out 
) const [private]

Read the next ProtoBuf record out of 'file'.

Parameters:
fileThe open file, useful for error messages.
readerA reader for 'file'.
[in,out]offsetThe byte offset in the file at which to start reading as input. The byte just after the last byte of data as output if successful, otherwise unmodified.
[out]outAn empty ProtoBuf to fill in.
Returns:
Empty string if successful, otherwise error message.

Format:

|checksum|dataLen|data|

The checksum is up to Core::Checksum::MAX_LENGTH bytes and is terminated by a null character. It covers both dataLen and data.

dataLen is an unsigned integer (8 bytes, big-endian byte order) that specifies the length in bytes of data.

data is a protobuf encoded as binary or text, depending on encoding.

Definition at line 1214 of file SegmentedLog.cc.

Core::Buffer LogCabin::Storage::SegmentedLog::serializeProto ( const google::protobuf::Message &  in) const [private]

Prepare a ProtoBuf record to be written to disk.

Parameters:
inProtoBuf to be serialized.
Returns:
Buffer containing serialized record.

Definition at line 1273 of file SegmentedLog.cc.

std::pair< std::string, FS::File > LogCabin::Storage::SegmentedLog::prepareNewSegment ( uint64_t  fileId) [private]

Opens a file for a new segment and allocates its space on disk.

Parameters:
fileIdID to use to generate filename; see SegmentPreparer::filenameCounter.
Returns:
Filename and writable OS-level file.

Definition at line 1321 of file SegmentedLog.cc.

The main function for the segmentPreparer thread.

Definition at line 1353 of file SegmentedLog.cc.


Member Data Documentation

Specifies how individual records are stored.

Definition at line 565 of file SegmentedLog.h.

The algorithm to use when writing new records.

When reading records, any available checksum is used.

Definition at line 571 of file SegmentedLog.h.

The maximum size in bytes for newly written segments.

Controlled by the 'storageSegmentBytes' config option.

Definition at line 577 of file SegmentedLog.h.

Set to true if checkInvariants() should do its job, or set to false for performance.

Definition at line 583 of file SegmentedLog.h.

const std::chrono::milliseconds LogCabin::Storage::SegmentedLog::diskWriteDurationThreshold [private]

If a disk operation exceeds this much time, log a warning.

Definition at line 588 of file SegmentedLog.h.

SegmentedLogMetadata::Metadata LogCabin::Storage::SegmentedLog::metadata [private]

The metadata this class mintains.

This should be combined with the superclass's metadata when being written out to disk.

Reimplemented from LogCabin::Storage::Log.

Definition at line 594 of file SegmentedLog.h.

The directory containing every file this log creates.

Definition at line 599 of file SegmentedLog.h.

A writable OS-level file that contains the entries for the current open segment.

It is a class invariant that this is always a valid file.

Definition at line 605 of file SegmentedLog.h.

The index of the first entry in the log, see getLogStartIndex().

Definition at line 610 of file SegmentedLog.h.

Ordered map of all closed segments and the open segment, indexed by the startIndex of each segment.

This is used to support all the key operations, such as looking up an entry and truncation.

Definition at line 617 of file SegmentedLog.h.

The total number of bytes occupied by the closed segments on disk.

Used to calculate getSizeBytes() efficiently.

Definition at line 623 of file SegmentedLog.h.

See PreparedSegments.

Definition at line 628 of file SegmentedLog.h.

Accumulates deferred filesystem operations for append() and truncatePrefix().

Definition at line 634 of file SegmentedLog.h.

Tracks the time it takes to write a metadata file.

Definition at line 639 of file SegmentedLog.h.

Tracks the time it takes to execute wait() on a Sync object.

Definition at line 644 of file SegmentedLog.h.

Opens files, allocates the to full size, and places them on preparedSegments for the log to use.

Definition at line 650 of file SegmentedLog.h.


The documentation for this class was generated from the following files:
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Defines