Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to connect to database: Access denied for user 'mogilefs #1

Open
wants to merge 418 commits into
base: master
Choose a base branch
from
This pull request is big! We’re only showing the most recent 250 commits.

Commits on Mar 30, 2012

  1. t/httpfile: more tests for larger (100M) file

    We need to ensure we're sane when dealing with larger files
    requiring multiple reads.
    Eric Wong authored and dormando committed Mar 30, 2012
    Configuration menu
    Copy the full SHA
    853b6c6 View commit details
    Browse the repository at this point in the history
  2. increase MD5 buffers to 1 megabyte (from 16K)

    This matches the buffer size used by replication, and showed a
    performance increase when timing the 100M large file test in
    t/40-httpfile.t
    
    With the following patch, I was able to note a ~46 -> ~27s
    time difference with both MD5 methods using this change
    to increase buffer sizes.
    
      --- a/t/40-httpfile.t
      +++ b/t/40-httpfile.t
      @@ -125,5 +125,12 @@ $expect = $expect->digest;
       @paths = $mogc->get_paths("largefile");
       $file = MogileFS::HTTPFile->at($paths[0]);
       ok($size == $file->size, "big file size match $size");
      +use Time::HiRes qw/tv_interval gettimeofday/;
      +
      +my $t0;
      +$t0 = [gettimeofday];
       ok($file->md5_mgmt(sub {}) eq $expect, "md5_mgmt on big file");
      +print "mgmt ", tv_interval($t0), "\n";
      +$t0 = [gettimeofday];
       ok($file->md5_http(sub {}) eq $expect, "md5_http on big file");
      +print "http ", tv_interval($t0), "\n";
    Eric Wong authored and dormando committed Mar 30, 2012
    Configuration menu
    Copy the full SHA
    6bf8714 View commit details
    Browse the repository at this point in the history
  3. side_channel: switch to hexdigest for exchanging md5 checksums

    Base64 requires further escaping for our tracker protocol which
    gets ugly and confusing.  It's also easier to interact/verify
    with existing command-line tools using hex.
    Eric Wong authored and dormando committed Mar 30, 2012
    Configuration menu
    Copy the full SHA
    175e328 View commit details
    Browse the repository at this point in the history
  4. add MogileFS::Checksum class

    We need a place to store mappings for various checksum
    types we'll support.
    Eric Wong authored and dormando committed Mar 30, 2012
    Configuration menu
    Copy the full SHA
    857e5da View commit details
    Browse the repository at this point in the history
  5. store: update class table with checksumtype column

    This is needed to wire up checksums to classes.
    Eric Wong authored and dormando committed Mar 30, 2012
    Configuration menu
    Copy the full SHA
    bf24853 View commit details
    Browse the repository at this point in the history
  6. replicate: optional digest support

    Digest::MD5 and Digest::SHA1 both support the same API for
    streaming data for the calculation, so we can validate our
    content as we stream it.
    Eric Wong authored and dormando committed Mar 30, 2012
    Configuration menu
    Copy the full SHA
    c497912 View commit details
    Browse the repository at this point in the history
  7. class: wire up checksum support to this

    Checksum usage will be decided on a per-class basis.
    Eric Wong authored and dormando committed Mar 30, 2012
    Configuration menu
    Copy the full SHA
    380f21c View commit details
    Browse the repository at this point in the history
  8. t/40-httpfile.t: speedup test with working clear_cache

    This branch is now rebased against my latest clear_cache
    which allows allows much faster metadata updates for testing.
    Eric Wong authored and dormando committed Mar 30, 2012
    Configuration menu
    Copy the full SHA
    cc3a551 View commit details
    Browse the repository at this point in the history
  9. doc: add checksums.txt for basic design/implementation notes

    Helps me keep my head straight.
    Eric Wong authored and dormando committed Mar 30, 2012
    Configuration menu
    Copy the full SHA
    81e7917 View commit details
    Browse the repository at this point in the history
  10. checksum: add "from_string" and "save" function

    This can come in handy.
    Eric Wong authored and dormando committed Mar 30, 2012
    Configuration menu
    Copy the full SHA
    5713152 View commit details
    Browse the repository at this point in the history
  11. checksums: genericize to be algorithm-independent, add SHA*

    We'll use the "Digest" class in Perl as a guide for this.
    Only MD5 is officially supported.
    
    However, this *should* support SHA-(1|256|384|512) and it's easy
    to add more algorithms.
    Eric Wong authored and dormando committed Mar 30, 2012
    Configuration menu
    Copy the full SHA
    893ba78 View commit details
    Browse the repository at this point in the history
  12. wire up checksum to create_close/file_info/create_class commands

    We can now:
    * enable checksums for classes
    * save client-provided checksums to the database
    * verify them on create_close
    * read them in file_info
    Eric Wong authored and dormando committed Mar 30, 2012
    Configuration menu
    Copy the full SHA
    85e9bea View commit details
    Browse the repository at this point in the history
  13. test for update_class with checksumtype=NONE

    we need to be able to both enable and disable checksuming for a class
    Eric Wong authored and dormando committed Mar 30, 2012
    Configuration menu
    Copy the full SHA
    7ecf77d View commit details
    Browse the repository at this point in the history
  14. add MogileFS::FID->checksum function

    This returns undef if a checksum is missing for a class,
    and a MogileFS::Checksum object if it exists.
    Eric Wong authored and dormando committed Mar 30, 2012
    Configuration menu
    Copy the full SHA
    0323554 View commit details
    Browse the repository at this point in the history
  15. add checksum generation/verifiation to replication worker

    replication now lazily generates checksums if they're not
    provided by the client (but required by the storage class).
    
    replication may also verify checksums if they're available
    in the database.
    
    replication now sets the Content-MD5 header on PUT requests,
    in case the remote server is capable of rejecting corrupt
    transfers based on it
    
    replication attempts to verify the checksum of the freshly
    PUT-ed file.
    
    TODO: monitor will attempt "test-write" with mangled Content-MD5
          to determine if storage backends are Content-MD5-capable
          so replication can avoid reading checksum on destination
    Eric Wong authored and dormando committed Mar 30, 2012
    Configuration menu
    Copy the full SHA
    fdab72f View commit details
    Browse the repository at this point in the history
  16. monitor observes Content-MD5-rejectability

    This functionality (and a server capable of rejecting bad MD5s)
    will allow us to skip an expensive MogileFS::HTTPFile->digest
    request at replication time.
    
    Also testing with the following patch to Perlbal:
    
      --- a/lib/mogdeps/Perlbal/ClientHTTP.pm
      +++ b/lib/mogdeps/Perlbal/ClientHTTP.pm
    @@ -22,6 +22,7 @@ use fields ('put_in_progress', # 1 when we're currently waiting for an async job
                 'content_length',  # length of document being transferred
                 'content_length_remain', # bytes remaining to be read
                 'chunked_upload_state', # bool/obj:  if processing a chunked upload, Perlbal::ChunkedUploadState object, else undef
    +            'md5_ctx',
                 );
    
     use HTTP::Date ();
    @@ -29,6 +30,7 @@ use File::Path;
    
     use Errno qw( EPIPE );
     use POSIX qw( O_CREAT O_TRUNC O_WRONLY O_RDONLY ENOENT );
    +use Digest::MD5;
    
     # class list of directories we know exist
     our (%VerifiedDirs);
    @@ -61,6 +63,7 @@ sub init {
         $self->{put_fh} = undef;
         $self->{put_pos} = 0;
         $self->{chunked_upload_state} = undef;
    +    $self->{md5_ctx} = undef;
     }
    
     sub close {
    @@ -134,6 +137,8 @@ sub handle_put {
    
         return $self->send_response(403) unless $self->{service}->{enable_put};
    
    +    $self->{md5_ctx} = $hd->header('Content-MD5') ? Digest::MD5->new : undef;
    +
         return if $self->handle_put_chunked;
    
         # they want to put something, so let's setup and wait for more reads
    @@ -421,6 +426,8 @@ sub put_writeout {
    
         my $data = join("", map { $$_ } @{$self->{read_buf}});
         my $count = length $data;
    +    my $md5_ctx = $self->{md5_ctx};
    +    $md5_ctx->add($data) if $md5_ctx;
    
         # reset our input buffer
         $self->{read_buf}   = [];
    @@ -460,6 +467,17 @@ sub put_close {
    
         if (CORE::close($self->{put_fh})) {
             $self->{put_fh} = undef;
    +
    +        my $md5_ctx = $self->{md5_ctx};
    +        if ($md5_ctx) {
    +            my $actual = $md5_ctx->b64digest;
    +            my $expect = $self->{req_headers}->header("Content-MD5");
    +            $expect =~ s/=+\s*\z//;
    +            if ($actual ne $expect) {
    +                return $self->send_response(400,
    +                    "Content-MD5 mismatch, expected: $expect actual: $actual");
    +            }
    +        }
             return $self->send_response(200);
         } else {
             return $self->system_error("Error saving file", "error in close: $!");
    Eric Wong authored and dormando committed Mar 30, 2012
    Configuration menu
    Copy the full SHA
    bd624ed View commit details
    Browse the repository at this point in the history
  17. replication skips HTTPFile->digest if device can reject bad MD5s

    Rereading a large file is expensive.  If we can monitor
    and observe our storage nodes for MD5 rejectionability, we
    can rely on that instead of having to have anybody reread
    the entire file to calculate its MD5.
    Eric Wong authored and dormando committed Mar 30, 2012
    Configuration menu
    Copy the full SHA
    55b29e5 View commit details
    Browse the repository at this point in the history
  18. doc: update checksums document

    Only the fsck part remains to be implemented... And I've never
    studied/used fsck much :x
    Eric Wong authored and dormando committed Mar 30, 2012
    Configuration menu
    Copy the full SHA
    7b417cd View commit details
    Browse the repository at this point in the history
  19. ensure checksum row is deleted when FID is deleted

    Stale rows are bad.
    Eric Wong authored and dormando committed Mar 30, 2012
    Configuration menu
    Copy the full SHA
    3bc57a8 View commit details
    Browse the repository at this point in the history
  20. replicate generates proper CRLF for Content-MD5 header

    TODO: see if we can use LWP to avoid mistakes like this :x
    Eric Wong authored and dormando committed Mar 30, 2012
    Configuration menu
    Copy the full SHA
    6ea29a4 View commit details
    Browse the repository at this point in the history
  21. flesh out fsck functionality for checksums

    Fsck behavior is based on existing behavior for size mismatches.
    size failures take precedence, since it's much cheaper to verify
    size match/mismatches than checksum mismatches.
    
    While checksum calculations are expensive and fsck is already
    parallel, so we do not parallelize checksum calculations on
    a per-FID basis.
    Eric Wong authored and dormando committed Mar 30, 2012
    Configuration menu
    Copy the full SHA
    de77caf View commit details
    Browse the repository at this point in the history
  22. rename checksum{type,name} => hash{type,name}

    It reads more easily this way, at least to me.
    Eric Wong authored and dormando committed Mar 30, 2012
    Configuration menu
    Copy the full SHA
    e8ceb02 View commit details
    Browse the repository at this point in the history
  23. Newer SQLite _can_ ALTER TABLE .. ADD COLUMN in some cases

    I'll be testing checksum functionality on my home installation
    before testing it on other installations, and I run SQLite at
    home.
    
    ref: http://www.sqlite.org/lang_altertable.html
    Eric Wong authored and dormando committed Mar 30, 2012
    Configuration menu
    Copy the full SHA
    739e275 View commit details
    Browse the repository at this point in the history
  24. always use HTTPFile->digest with a the ping callback

    We need to ensure the worker stays alive during MD5
    generation, especially on large files that can take
    many seconds to verify.
    Eric Wong authored and dormando committed Mar 30, 2012
    Configuration menu
    Copy the full SHA
    9fa5453 View commit details
    Browse the repository at this point in the history
  25. get_domains returns hashtype as a string

    This special-cases "NONE" for no hash for our users.
    Eric Wong authored and dormando committed Mar 30, 2012
    Configuration menu
    Copy the full SHA
    51bf680 View commit details
    Browse the repository at this point in the history
  26. doc/checksums: clarify binary column type for various DBs

    We don't actually use the BLOB type anywhere, as checksums
    are definitely not "L"(arge) objects.
    Eric Wong authored and dormando committed Mar 30, 2012
    Configuration menu
    Copy the full SHA
    d9527ab View commit details
    Browse the repository at this point in the history
  27. httpfile: fix timeout comparison when digesting via mogstored

    The timeout comparison is wrong and causing ping_cb to never
    fire.  This went unnoticed since I have reasonably fast disks
    on my storage nodes and the <$sock> operation was able to
    complete before being hit by a watchdog timeout.
    Eric Wong authored and dormando committed Mar 30, 2012
    Configuration menu
    Copy the full SHA
    dce4bb5 View commit details
    Browse the repository at this point in the history
  28. fsck: add fsck_auto_checksum server setting

    Enabling this setting allows fsck to checksum all replicas on
    all devices and report any corrupted copies regardless of
    per-class settings.
    
    This feature is useful for determining if enabling checksums on
    certain classes is necessary and will also benefit users who
    cannot afford to store checksums in the database.
    Eric Wong authored and dormando committed Mar 30, 2012
    Configuration menu
    Copy the full SHA
    a26811c View commit details
    Browse the repository at this point in the history
  29. checksums: disable all hash algorithms except MD5

    MD5 is faster than SHA1, and much faster than any of the SHA2
    variants.  Given the time penalty of fsck is already high with
    MD5, prevent folks from shooting themselves in the foot with
    extremely expensive hash algorithms.
    Eric Wong authored and dormando committed Mar 30, 2012
    Configuration menu
    Copy the full SHA
    37ad4cc View commit details
    Browse the repository at this point in the history
  30. fsck_checksum setting replaces fsck_auto_checksum

    Unlike the setting it replaces, this new setting can be used to disable
    checksumming entirely, regardless of per-class options.
    
    fsck_checksum=(class|off|MD5)
    
    class - is the default, fsck based on per-class hashtype
    off - skip all checksumming regardless of per-class setting
    MD5 - same as the previous fsck_auto_checksum=MD5
    Eric Wong authored and dormando committed Mar 30, 2012
    Configuration menu
    Copy the full SHA
    35dcf3e View commit details
    Browse the repository at this point in the history
  31. checksums: use a low-priority task queue for fsck digests

    MD5 is I/O-intensive, and having fsck request MD5s concurrently
    ends up causing I/O contention on rotational drives with high
    seek latency.  So limit fsck MD5 requests to a single job per
    device.
    Eric Wong authored and dormando committed Mar 30, 2012
    Configuration menu
    Copy the full SHA
    a4af4ca View commit details
    Browse the repository at this point in the history
  32. DevFID size caching for fsck with checksumming

    The digest path relies on having a known file size to calculate
    the MD5 timeout, so save an HTTP HEAD request since we always
    check file sizes in fsck before we checksum the file.
    Eric Wong authored and dormando committed Mar 30, 2012
    Configuration menu
    Copy the full SHA
    e596d52 View commit details
    Browse the repository at this point in the history
  33. re-enable SHA-1 for checksums

    Optimized SHA-1 implementations aren't significantly slower than
    MD5 and some folks (e.g. Tomas Doran) may already have SHA-1 in
    place for their data.
    
    A liberally licensed, GPL-compatible collection of SHA-1
    primitives is available from one of the OpenSSL developers:
    
      http://www.openssl.org/~appro/cryptogams/
    
    It would be nice to allow the Perl Digest module to
    transparently take advantage of architecture-specific
    optimizations.
    
    Note there is no standardized equivalent to the HTTP Content-MD5
    header/trailer for any of the SHA variants, so verification for
    replication/uploads may take significantly longer.
    
    Requested-by: Tomas Doran
    Eric Wong authored and dormando committed Mar 30, 2012
    Configuration menu
    Copy the full SHA
    1fd5ef4 View commit details
    Browse the repository at this point in the history
  34. doc/checksums: use $HASHTYPE for referring to hash names

    $NAME is potentially ambiguous and $HASHTYPE matches the
    database column name.
    Eric Wong authored and dormando committed Mar 30, 2012
    Configuration menu
    Copy the full SHA
    4937ade View commit details
    Browse the repository at this point in the history
  35. Configuration menu
    Copy the full SHA
    b396949 View commit details
    Browse the repository at this point in the history
  36. Fix fsck status when running for the first time

    Fsck would print "Status: N / 0 " if it's never been started before. Now
    internally finds the max(fid) on its own.
    dormando committed Mar 30, 2012
    Configuration menu
    Copy the full SHA
    b4cca74 View commit details
    Browse the repository at this point in the history
  37. make fsck_checksum == off honored in more places

    if fsck_checksum was set to off, it would ignore the checksums deep in the
    code, but would still attempt to "fix" the fids each time, which runs far more
    code and UPDATE's each fid's devcount even if you tell it not to.
    
    now it does what it should. however FSCK with checksums enabled will still
    UPDATE devcount on each check.
    dormando committed Mar 30, 2012
    Configuration menu
    Copy the full SHA
    f19e095 View commit details
    Browse the repository at this point in the history
  38. Checking in changes prior to tagging of version 2.60.

    Changelog diff is:
    
    diff --git a/CHANGES b/CHANGES
    index 770e518..5b59d7f 100644
    --- a/CHANGES
    +++ b/CHANGES
    @@ -1,3 +1,12 @@
    +2012-03-30: Release version 2.60
    +
    +   * Fix fsck status when running for the first time (dormando <[email protected]>)
    +
    +   * Checksum support (Major update!) (Eric Wong <[email protected]>)
    +     See doc/checksums.txt for an overview of how the new checksum system
    +     works. Also keep an eye on the wiki (http://www.mogilefs.org) for more
    +     complete documentation in the coming weeks.
    +
     2012-02-29: Release version 2.59
    
        * don't make SQLite error out on lock calls (dormando <[email protected]>)
    dormando committed Mar 30, 2012
    Configuration menu
    Copy the full SHA
    d9bb8e8 View commit details
    Browse the repository at this point in the history

Commits on Apr 12, 2012

  1. log: enable autoflush for stdout logging

    Buffering log output in memory makes it difficult to view debug
    and error output during development.  Since MogileFS does not
    write to stdout frequently, there should be no noticeable
    performance loss from this change.
    
    This also prevents mangling of TAP output which caused
    test failures if DEBUG=1 is set.
    Eric Wong authored and dormando committed Apr 12, 2012
    Configuration menu
    Copy the full SHA
    c0dcf7f View commit details
    Browse the repository at this point in the history

Commits on Apr 14, 2012

  1. worker: delete_domain returns has_classes error

    I noticed that attempting to delete a domain with classes returns
    an unhelpful "Operation failed" error message.
    Eric Wong authored and dormando committed Apr 14, 2012
    Configuration menu
    Copy the full SHA
    ea5d78d View commit details
    Browse the repository at this point in the history

Commits on Apr 21, 2012

  1. monitor: only broadcast reject_bad_md5 on change

    There's no need to broadcast changes to other workers if there
    are no changes.  Since HTTP servers rarely (if ever) change
    their ability to toggle Content-MD5 rejection, this was causing
    needless wakeups in every monitor round.
    
    Tested by running mogilefsd with DEBUG=1 and using toggling
    Content-MD5 rejection in mogstored + perlbal 1.80 via:
    
    	SET mogstored.enable_md5 = (0|1)
    
    to the mgmt port while watching syslog output.
    
    Noticed-by: dormando
    Eric Wong authored and dormando committed Apr 21, 2012
    Configuration menu
    Copy the full SHA
    508a958 View commit details
    Browse the repository at this point in the history
  2. tests: add test for fsck functionality

    Before we make changes to the fsck code, we should ensure
    we don't break existing use cases.
    
    Behavior I'm uncertain about is documented with "XXX".
    Eric Wong authored and dormando committed Apr 21, 2012
    Configuration menu
    Copy the full SHA
    0bdec9b View commit details
    Browse the repository at this point in the history
  3. tests: fix fsck test to work on older LWP::UserAgent

    The LWP::UserAgent module found in my Debian 6.0 installation
    does not have a "delete" convenience wrapper.
    Eric Wong authored and dormando committed Apr 21, 2012
    Configuration menu
    Copy the full SHA
    5aa8dc4 View commit details
    Browse the repository at this point in the history

Commits on Apr 22, 2012

  1. add fsck test to MANIFEST

    Eric Wong committed Apr 22, 2012
    Configuration menu
    Copy the full SHA
    644be50 View commit details
    Browse the repository at this point in the history
  2. tests: fsck test use MogileFS::Store API when possible

    No point in using DBI directly if a task can be done directly
    via the MogileFS::Store API.
    
    Noticed-by: dormando
    Eric Wong committed Apr 22, 2012
    Configuration menu
    Copy the full SHA
    8789f2f View commit details
    Browse the repository at this point in the history

Commits on Apr 23, 2012

  1. t/60-fsck.t: fix overly long sleep when waiting for fsck log

    This was leftover when I was monitoring the test with DEBUG=1 :x
    Eric Wong committed Apr 23, 2012
    Configuration menu
    Copy the full SHA
    5255026 View commit details
    Browse the repository at this point in the history

Commits on Apr 29, 2012

  1. t/60-fsck: additional test cases

    * ensure fsck can handle a stall from an unresponsive mogstored
    
    * ensure over-replicated files are cleaned up
    
    * ensure fsck can work correctly with dead devices if it beats
      reaper to an FID
    Eric Wong committed Apr 29, 2012
    Configuration menu
    Copy the full SHA
    ccab7b4 View commit details
    Browse the repository at this point in the history
  2. fsck: log bad count correctly instead of policy violation

    A BCNT error is more descriptive than a generic POVI entry
    and more accurately reflects the change made to an FID entry.
    
    This also removes a dependency from /using/ the devcount column
    and simplifies the code.  The devcount column remains invaluable
    and informative to users, but MogileFS should not trust it for
    making decisions when it has access to the file_on table.
    Eric Wong committed Apr 29, 2012
    Configuration menu
    Copy the full SHA
    eba5cad View commit details
    Browse the repository at this point in the history

Commits on May 2, 2012

  1. additional tests for fsck stop, resume and stats

    We need to ensure fsck can resume and returns sane
    stats output.
    Eric Wong committed May 2, 2012
    Configuration menu
    Copy the full SHA
    f54abb1 View commit details
    Browse the repository at this point in the history

Commits on May 4, 2012

  1. t/60-fsck: retry SQL statements on deadlock

    Hopefully this can eliminate some random test failures.
    Eric Wong committed May 4, 2012
    Configuration menu
    Copy the full SHA
    e7b9f1f View commit details
    Browse the repository at this point in the history
  2. t/60-fsck: allow fsck_highest_fid_checked to be zero

    After resetting, fsck_highest_fid_checked ends up at zero.
    Eric Wong committed May 4, 2012
    Configuration menu
    Copy the full SHA
    2da72ed View commit details
    Browse the repository at this point in the history

Commits on May 9, 2012

  1. t/60-fsck: fix typo resulting in useless check

    We checked the incorrect return value, so the second
    mogstored failing would've gone unnoticed.
    Eric Wong committed May 9, 2012
    Configuration menu
    Copy the full SHA
    29bc491 View commit details
    Browse the repository at this point in the history
  2. t/60-fsck: fix potential race conditions

    These race conditions were causing this test fail occasionally.
    These test failures were more common on SQLite and Postgres,
    but not unheard of when using MySQL.
    
    Some of these race conditions were due to fsck/job_master not
    receiving settings changes in time, so we now resort to killing
    existing processes and forcing them to reload.
    Eric Wong committed May 9, 2012
    Configuration menu
    Copy the full SHA
    db02848 View commit details
    Browse the repository at this point in the history

Commits on May 11, 2012

  1. fsck: update devcount, forget devs on unfixable FIDs

    Whenever an FID is unfixable, be sure to update devcount (to
    zero) to easily inform the user via mogstats.  If the FID
    magically reappears in the future, the desperate fallback mode
    will still find the file.
    Eric Wong committed May 11, 2012
    Configuration menu
    Copy the full SHA
    3c1c680 View commit details
    Browse the repository at this point in the history

Commits on May 12, 2012

  1. fsck: cleanup and reduce unnecessary devcount updates

    fix_fid(): we no longer rely blindly update devcount on every
    call.  This is important because we call fix_fid() on checksum
    checks regardless, and devcount updates entail unnecessary
    updates to the `file' table.
    
    While we're at it, consolidate the places where we check the
    skip_devcount flag and log bad devcounts.
    Eric Wong committed May 12, 2012
    Configuration menu
    Copy the full SHA
    56ff97b View commit details
    Browse the repository at this point in the history

Commits on May 17, 2012

  1. t/60-fsck: detect fsck stalls via '!watch'

    Instead of blindly sleeping, we can '!watch' through the tracker
    and detect the the error message fsck sends.
    Eric Wong committed May 17, 2012
    Configuration menu
    Copy the full SHA
    d51d0bd View commit details
    Browse the repository at this point in the history
  2. t/60-fsck: use "!to <job> :shutdown" to kill workers

    This is a simpler implementation and lets us be notified of
    worker death (and pending replacement) as soon the tracker
    notices it.
    Eric Wong committed May 17, 2012
    Configuration menu
    Copy the full SHA
    fa1a6af View commit details
    Browse the repository at this point in the history

Commits on May 18, 2012

  1. fsck: prevent running over 100% completion

    FIDs may be created while fsck is running, causing "mogadm fsck
    status" to report completion above 100% (and thus confusing
    users).  Stopping fsck when it reaches fsck_fid_at_end (set to
    MAX(fid) at fsck startup) prevents this.
    
    This change should also have a pleasant side effect of reducing
    contention with replicate workers on newly-uploaded FIDs.
    
    ref: http://code.google.com/p/mogilefs/issues/detail?id=50
    Eric Wong committed May 18, 2012
    Configuration menu
    Copy the full SHA
    5151d6d View commit details
    Browse the repository at this point in the history
  2. remove redundant code to ignore SIGPIPE

    MogileFS::Server and Perlbal both ignore SIGPIPE for
    us.  So there's no need to ever ignore it for socket
    writes in HTTPFile, either.
    Eric Wong authored and dormando committed May 18, 2012
    Configuration menu
    Copy the full SHA
    39dd796 View commit details
    Browse the repository at this point in the history
  3. delete: prevent orphan files from replicator race

    Use the replicate lock here to prevent an DevFID from being
    orphaned by a replicate process.  This prevents orphaned
    requests if a user issues a delete request on a file while
    replication worker is copying it.
    Eric Wong authored and dormando committed May 18, 2012
    Configuration menu
    Copy the full SHA
    d8b5805 View commit details
    Browse the repository at this point in the history
  4. don't attempt to broadcast undef config values to children

    This avoids an uninitialized value warning from Perl when
    choosing a value for the deprecated listen_jobs value.  Neither
    the child nor the parent processes are capable of handling undef
    values from :set_config_from_*.
    Eric Wong authored and dormando committed May 18, 2012
    Configuration menu
    Copy the full SHA
    e8d276a View commit details
    Browse the repository at this point in the history
  5. eliminate dead code for invalid_meta*

    Most of this was already nuked in the following commits:
    ebf8a5a
    3db8a84
    Eric Wong authored and dormando committed May 18, 2012
    Configuration menu
    Copy the full SHA
    67f95e3 View commit details
    Browse the repository at this point in the history
  6. store: remove unused random_fids_on_device() sub

    Unused since commit 0be2f97
    when the old drain/rebalance code was dropped.
    Eric Wong authored and dormando committed May 18, 2012
    Configuration menu
    Copy the full SHA
    7a8ebda View commit details
    Browse the repository at this point in the history
  7. monitor: remove unnecessary conditional assignments

    Based on my reading of the code, the conditional assignments
    of $hostip, $get_port, $devid, $timeout, and $url are needless.
    Eric Wong authored and dormando committed May 18, 2012
    Configuration menu
    Copy the full SHA
    b1c80a2 View commit details
    Browse the repository at this point in the history
  8. tests: use done_testing() instead of test counts

    Keeping track of explicit test counts causes needless merge
    conflicts.  done_testing() is sufficient to note test
    completeness and detect failures.
    Eric Wong authored and dormando committed May 18, 2012
    Configuration menu
    Copy the full SHA
    19ab0fa View commit details
    Browse the repository at this point in the history
  9. Bump Test::More req to get done_testing()

    may annoy centos5 users.
    dormando committed May 18, 2012
    Configuration menu
    Copy the full SHA
    b44b12a View commit details
    Browse the repository at this point in the history
  10. store: remove get_fids_above_id() subroutine

    Unused since commit 6c23c9d
    ("make fsck worker distributed").  Since this always seemed
    fsck-specific, it's also unlikely plugins are using this.
    Eric Wong authored and dormando committed May 18, 2012
    Configuration menu
    Copy the full SHA
    56206f2 View commit details
    Browse the repository at this point in the history
  11. worker/query: Add optional callid parameter

    Queries are computed in parallel and therefore replies are not in
    the right order. This patch adds an optional callid parameter so
    a caller can match the replies back to the queries. ERR lines
    return the callid as 3rd parameter. If the callid parameter is
    missing then the Protocol is the same as before.
    
    Example:
    GET_PATHS callid=1
    ERR no_domain No+domain+provided 1
    GET_DOMAINS callid=1
    OK domains=0&callid=1
    notti authored and dormando committed May 18, 2012
    Configuration menu
    Copy the full SHA
    c744b23 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    3ba3dd5 View commit details
    Browse the repository at this point in the history

Commits on May 19, 2012

  1. sqlite: implement locking via tables

    This is adapted from the Postgres implementation, but
    since SQLite runs on one machine, we can use kill 0
    to detect if a process is got nuked before it could
    release a lock.
    Eric Wong authored and dormando committed May 19, 2012
    Configuration menu
    Copy the full SHA
    fc8ba6b View commit details
    Browse the repository at this point in the history
  2. move common lock code into base Store module

    Since all Store implementations implement get_lock/release_lock,
    we can safely share the same implementation of both
    should_begin_replication_fidid() and note_done_replicating().
    Eric Wong authored and dormando committed May 19, 2012
    Configuration menu
    Copy the full SHA
    3e7c94c View commit details
    Browse the repository at this point in the history
  3. sqlite: delete expired locks regardless of hostname

    For rare SQLite setups, drop locks after 3600s regardless of the
    hostname of the lock holder.  This can work around weird setups
    that change hostnames (frequently) or share SQLite DBs over NFS.
    Eric Wong authored and dormando committed May 19, 2012
    Configuration menu
    Copy the full SHA
    4ff90dc View commit details
    Browse the repository at this point in the history
  4. avoid unnecessary devcount update in create_close

    The devcount of a newly uploaded file is always 1, so
    we do not need another set of trips to the DB to set
    this in the file row.
    Eric Wong authored and dormando committed May 19, 2012
    Configuration menu
    Copy the full SHA
    f54e6fd View commit details
    Browse the repository at this point in the history
  5. fix issue #57 by Pyry and Eric

    Specifying "alivetypo" as the host status would cause mogilefs to implode.
    dormando committed May 19, 2012
    Configuration menu
    Copy the full SHA
    5605390 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    5f48f8d View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    3f8fffd View commit details
    Browse the repository at this point in the history
  8. t/50-checksum: /possibly/ fix a stuck test

    This possible fix could also be hiding another bug, but the
    original test ordering was suspect...
    Eric Wong authored and dormando committed May 19, 2012
    Configuration menu
    Copy the full SHA
    27a51c2 View commit details
    Browse the repository at this point in the history
  9. Checking in changes prior to tagging of version 2.61.

    Changelog diff is:
    
    diff --git a/CHANGES b/CHANGES
    index 5b59d7f..de3ba9b 100644
    --- a/CHANGES
    +++ b/CHANGES
    @@ -1,3 +1,33 @@
    +2012-05-18: Release version 2.61
    +
    +   * fix issue #57 by Pyry and Eric (dormando <[email protected]>)
    +     (mogadm host status sometimes allowed typos)
    +
    +   * avoid unnecessary devcount update in create_close (Eric Wong <[email protected]>)
    +
    +   * sqlite: implement locking via tables (Eric Wong <[email protected]>)
    +
    +   * worker/query: Add optional callid parameter (Gernot Vormayr <[email protected]>)
    +     (allows command pipelining)
    +
    +   * delete: prevent orphan files from replicator race (Eric Wong <[email protected]>)
    +
    +   * fsck: prevent running over 100% completion (Eric Wong <[email protected]>)
    +
    +   * fsck: cleanup and reduce unnecessary devcount updates (Eric Wong <[email protected]>)
    +
    +   * fsck: update devcount, forget devs on unfixable FIDs (Eric Wong <[email protected]>)
    +
    +   * fsck: log bad count correctly instead of policy violation (Eric Wong <[email protected]>)
    +
    +   * tests: add test for fsck functionality (Eric Wong <[email protected]>)
    +
    +   * monitor: only broadcast reject_bad_md5 on change (Eric Wong <[email protected]>)
    +
    +   * worker: delete_domain returns has_classes error (Eric Wong <[email protected]>)
    +
    +   * log: enable autoflush for stdout logging (Eric Wong <[email protected]>)
    +
     2012-03-30: Release version 2.60
    
        * Fix fsck status when running for the first time (dormando <[email protected]>)
    dormando committed May 19, 2012
    Configuration menu
    Copy the full SHA
    7b058a9 View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    28fad96 View commit details
    Browse the repository at this point in the history
  11. Postgres: Fix v15 schema upgrade.

    Schema upgrade needs to use Pg-specific column types for the v15 upgrade
    adding class.hashtype. Only CREATE TABLE is auto-converted where
    possible, not ALTER TABLE.
    
    Signed-off-by: Robin H. Johnson <[email protected]>
    robbat2 authored and dormando committed May 19, 2012
    Configuration menu
    Copy the full SHA
    b643f33 View commit details
    Browse the repository at this point in the history
  12. fix CHANGES for 2.60 :P

    dormando committed May 19, 2012
    Configuration menu
    Copy the full SHA
    0d83c44 View commit details
    Browse the repository at this point in the history
  13. Checking in changes prior to tagging of version 2.62.

    Changelog diff is:
    
    diff --git a/CHANGES b/CHANGES
    index b526b6a..6333f9f 100644
    --- a/CHANGES
    +++ b/CHANGES
    @@ -1,3 +1,9 @@
    +2012-05-19: Release version 2.62
    +
    +   * Critical bugfix for a compilation error (dormando, reported by Robin)
    +
    +   * Fix for upgrading a Postgres install for checksums (Robin * H. Johnson <[email protected]>)
    +
     2012-05-18: Release version 2.61
    
        * fix issue #57 by Pyry and Eric (dormando <[email protected]>)
    dormando committed May 19, 2012
    Configuration menu
    Copy the full SHA
    bfdfc42 View commit details
    Browse the repository at this point in the history

Commits on May 30, 2012

  1. postgres: fix replace_into_file regression in 2.61

    commit f54e6fd botched
    the ordering of parameters when updating the file table.
    Eric Wong authored and dormando committed May 30, 2012
    Configuration menu
    Copy the full SHA
    ac5534a View commit details
    Browse the repository at this point in the history
  2. Checking in changes prior to tagging of version 2.63.

    Changelog diff is:
    
    diff --git a/CHANGES b/CHANGES
    index 6333f9f..07790f3 100644
    --- a/CHANGES
    +++ b/CHANGES
    @@ -1,3 +1,8 @@
    +2012-05-29: Release version 2.63
    +
    +   * Critical bugfix for Postgres users introduced by 2.61. New file uploads
    +     would fail. (noticed by robin H. Johnson, fixed by Eric Wong)
    +
     2012-05-19: Release version 2.62
    
        * Critical bugfix for a compilation error (dormando, reported by Robin)
    dormando committed May 30, 2012
    Configuration menu
    Copy the full SHA
    768f03b View commit details
    Browse the repository at this point in the history

Commits on Jun 20, 2012

  1. monitor skips hosts marked dead or down

    Host don't define readability/writability themselves, so the
    Host::should_get_new_files sub is renamed to "alive" and
    Device->can_read_from respects host status.
    
    Also, queryworker now skips down/dead hosts in cmd_get_paths.
    
    ref: http://code.google.com/p/mogilefs/issues/detail?id=46
    Eric Wong authored and dormando committed Jun 20, 2012
    Configuration menu
    Copy the full SHA
    4fe092f View commit details
    Browse the repository at this point in the history
  2. Device->observed_* all respects observed host state

    The observed unreachable state of the host should always supercede the
    observed state of the device.  This is already the case with
    observed_writeable, but not with observed_readable nor
    observed_unreachable.
    
    The monitor worker does not (and should not, to save bandwidth) update
    states of all devices when a host goes down.
    Eric Wong authored and dormando committed Jun 20, 2012
    Configuration menu
    Copy the full SHA
    c84760f View commit details
    Browse the repository at this point in the history
  3. Device->should_read_from respects all host/device states

    should_read_from() should replace all uses of can_read_from() in
    non-Monitor workers.  This avoids the overhead of needlessly
    rechecking devices either the monitor or user marked down.
    
    This simplifies queryworker logic a bit.
    Eric Wong authored and dormando committed Jun 20, 2012
    Configuration menu
    Copy the full SHA
    2dd9a1b View commit details
    Browse the repository at this point in the history
  4. DevFID size/checksum respects Device->should_read_from

    Avoid needlessly attempting connections for checking files on
    host/devices the monitor (or user) marked as unreadable.
    
    This also makes the Fsck->size_on_disk function redundant.
    Eric Wong authored and dormando committed Jun 20, 2012
    Configuration menu
    Copy the full SHA
    d0de5c5 View commit details
    Browse the repository at this point in the history
  5. get_paths: deprioritize devs in "drain" state

    URLs pointing to devices set to drain are undesirable.
    Files may disappear off draining devices immediately
    after we've queried the file_on table and invalidate
    the paths the client sends us.
    Eric Wong authored and dormando committed Jun 20, 2012
    Configuration menu
    Copy the full SHA
    ae7b415 View commit details
    Browse the repository at this point in the history
  6. t/00-startup: explicit fid test for create_open/close

    This is mainly to prevent bugs like the fix in
    commit ac5534a
    from popping up again.
    Eric Wong authored and dormando committed Jun 20, 2012
    Configuration menu
    Copy the full SHA
    1386dc3 View commit details
    Browse the repository at this point in the history
  7. connection/mogstored: remove sock_if_connected()

    This subroutine has been unused since MogileFS 2.52
    commit 18a40d2
    ("Throw out old HTTPFile->size code and use LWP")
    Eric Wong authored and dormando committed Jun 20, 2012
    Configuration menu
    Copy the full SHA
    62b3bdb View commit details
    Browse the repository at this point in the history
  8. test for existing case-insensitive list_keys behavior

    We cannot break existing case-insensitive behavior for list_keys
    right now, even if it's broken.  This means SQLite/MySQL will
    use case-insensitive LIKE statements for list_keys and Postgres
    remains case-sensitive.
    Eric Wong authored and dormando committed Jun 20, 2012
    Configuration menu
    Copy the full SHA
    42b2992 View commit details
    Browse the repository at this point in the history
  9. implement "case_sensitive_list_keys" server setting

    Enabling this boolean will make the "after" and "prefix" params
    of "list_keys" behave case-sensitively.
    
    If this setting is /not/ enabled, clients will hit
    after_mismatch errors when iterating through keys if they are
    using an uppercase "prefix" argument and a subsequent list_keys
    is called with an "after" that only matches case-insensitively.
    
    If unset, this defaults to false (0) to match historical (buggy)
    behavior.  Historical behavior is preserved (even if broken) as
    users with small namespaces may rely on case-insensitive
    matching.
    
    Postgres users are not affected by this change, as the LIKE
    operator in Postgres is always case-sensitive.
    
    This change is tested on all three databases: Postgres, MySQL,
    and SQLite.
    Eric Wong authored and dormando committed Jun 20, 2012
    Configuration menu
    Copy the full SHA
    12fa688 View commit details
    Browse the repository at this point in the history

Commits on Jun 21, 2012

  1. t/02-host-device: unit tests for device/host state checks

    Device state reporting functions should respect whatever the
    underlying Host state is.
    Eric Wong authored and dormando committed Jun 21, 2012
    Configuration menu
    Copy the full SHA
    f2aca49 View commit details
    Browse the repository at this point in the history
  2. Delete memcache data when we replicate fids

    Because memcache TTL is now user configurable, data in memcached might
    be valid for a long time, and as such invalid paths might be returned.
    
    It would be possible to populate memcache, instead of just removing. But
    it might be wasteful when a device is marked as dead, those replicated
    fids might not need to be in memcached.
    
    There is still one TODO left. If someone modifies mindevcount and runs
    FSCK, then the mappings might become incorrect, but I reasoned that it
    would be rather rare.
    Pyry Hakulinen authored and dormando committed Jun 21, 2012
    Configuration menu
    Copy the full SHA
    2ec4798 View commit details
    Browse the repository at this point in the history

Commits on Jun 22, 2012

  1. Checking in changes prior to tagging of version 2.64.

    Changelog diff is:
    
    diff --git a/CHANGES b/CHANGES
    index 07790f3..c552089 100644
    --- a/CHANGES
    +++ b/CHANGES
    @@ -1,3 +1,15 @@
    +2012-06-21: Release version 2.64
    +
    +   * Delete memcache data when we replicate fids (Pyry Hakulinen <[email protected]>)
    +
    +   * implement "case_sensitive_list_keys" server setting (Eric Wong <[email protected]>)
    +
    +   * get_paths: deprioritize devs in "drain" state (Eric Wong <[email protected]>)
    +
    +   * make marking a host down cause devices to act as down (Eric Wong <[email protected]>)
    +
    +   * monitor skips hosts marked dead or down (Eric Wong <[email protected]>)
    +
     2012-05-29: Release version 2.63
    
        * Critical bugfix for Postgres users introduced by 2.61. New file uploads
    dormando committed Jun 22, 2012
    Configuration menu
    Copy the full SHA
    23322a1 View commit details
    Browse the repository at this point in the history

Commits on Aug 12, 2012

  1. When a mogstored is down, die with a more informative message.

    Dave Lambley authored and dormando committed Aug 12, 2012
    Configuration menu
    Copy the full SHA
    6750311 View commit details
    Browse the repository at this point in the history
  2. iostat: allow MOG_IOSTAT_CMD env override

    This makes it easier to test mock or alternative iostat
    implementations.  This can be used for emulating the
    iostat output on other platforms.
    Eric Wong authored and dormando committed Aug 12, 2012
    Configuration menu
    Copy the full SHA
    159cb25 View commit details
    Browse the repository at this point in the history
  3. iostat: increase flexibility of iostat parser

    The parser now looks for contiguous lines of statistics and (if
    it's previously captured stats) emits whenever the first
    non-stats line appears.  Relying on the "Device:" line is not
    portable to FreeBSD (and possibly other iostats implementations).
    The parser also ignores leading/trailing whitespace on each
    statistics line.
    
    Tested on Linux (sysstat 10.0.5) and FreeBSD 9.
    
    For testing iostat output on FreeBSD, I used MOG_IOSTAT_CMD
    like this on my GNU/Linux system:
    
      MOG_IOSTAT_CMD="ssh fbsd9vm iostat -dx 1 30" mogstored ...
    
    ref: http://code.google.com/p/mogilefs/issues/detail?id=9
    Eric Wong authored and dormando committed Aug 12, 2012
    Configuration menu
    Copy the full SHA
    6f0a20a View commit details
    Browse the repository at this point in the history
  4. remove old rebalance knobs from server settings

    This hasn't been used since the old rebalance code was nuked
    for 2.40 (commit 0be2f97)
    Eric Wong authored and dormando committed Aug 12, 2012
    Configuration menu
    Copy the full SHA
    b70fef9 View commit details
    Browse the repository at this point in the history
  5. fix tests when /etc/mogilefs/mogstored.conf exists

    mogstored gains a --skipconfig switch which we use in tests to
    ignore the default config file.  mogilefsd has had this switch
    (with identical semantics) since 2004.
    Eric Wong authored and dormando committed Aug 12, 2012
    Configuration menu
    Copy the full SHA
    6e47a56 View commit details
    Browse the repository at this point in the history

Commits on Aug 13, 2012

  1. tests: add basic test for reaper

    Reaper isn't tested anywhere else.  We plan on changing it
    slightly so ensure we don't introduce regressions.
    Eric Wong authored and dormando committed Aug 13, 2012
    Configuration menu
    Copy the full SHA
    8c4554c View commit details
    Browse the repository at this point in the history
  2. reaper: factor out reap_fid sub from the work loop

    A subroutine with a 20-line comment deserves to be its own sub.
    This will make it easier to see what the future reaper lock
    will guard without needing to scroll on small terminals.
    Eric Wong authored and dormando committed Aug 13, 2012
    Configuration menu
    Copy the full SHA
    15eca7a View commit details
    Browse the repository at this point in the history
  3. reaper: global lock around DB interaction

    We don't want multiple reaper process stepping on each other
    during UPDATE/INSERT, causing needless conflicts/failures at the
    DB level for every single FID.
    
    JobMaster already locks its queues in a similar way to prevent
    conflicts, so this should not noticeably harm performance (and
    may improve performance due to the DB conflict reduction).
    Eric Wong authored and dormando committed Aug 13, 2012
    Configuration menu
    Copy the full SHA
    57a5099 View commit details
    Browse the repository at this point in the history
  4. reaper: add "queue_rate_for_reaper" server setting

    This controls the number of FIDs the reaper can inject into the
    replication queue for each dead device, per wakeup.
    
    This defaults to 1000, the same value its had since (at least) 2006.
    Eric Wong authored and dormando committed Aug 13, 2012
    Configuration menu
    Copy the full SHA
    8cc75d8 View commit details
    Browse the repository at this point in the history
  5. move ENDOFTIME constant from replicate to store

    This will make it easier to reuse this constant in other
    workers that can check the queues (e.g. job_master/reaper).
    Eric Wong authored and dormando committed Aug 13, 2012
    Configuration menu
    Copy the full SHA
    e51ee46 View commit details
    Browse the repository at this point in the history
  6. reaper: add queue_size_for_reaper server setting

    Users may now configure the queue_size_for_reaper server setting to
    limit the size of the non-urgent replication queue.
    
    The urgent replication queue (nexttry == 0) is unaffected,
    as are other processes (fsck) which may inject into the
    replication queue.
    
    The default remains unlimited, the reaper will queue as fast as
    it possibly can: 1000 FIDs every 5 seconds (per process)
    Eric Wong authored and dormando committed Aug 13, 2012
    Configuration menu
    Copy the full SHA
    0715d95 View commit details
    Browse the repository at this point in the history
  7. reaper: switch to Danga::Socket for scheduling

    Reaper now schedules the first batch of files on a newly-dead
    device for replication after delaying (on the reaper itself) for
    DEVICE_SUMMARY_CACHE_TIMEOUT+1 (16) seconds.  This allows all
    subsequent files to replicate sooner, without the 16s delay.
    Since Danga::Socket is used to schedule this 16s delay, reaping
    of other dead devices won't be impacted by this delay.
    
    The DEVICE_SUMMARY_CACHE_TIMEOUT+1 delay still remains to
    offer a small level of protection against replicators with
    out-of-date internal device caches and writable-but-"dead"
    devices.
    
    As an additional countermeasure against out-of-date device
    caches, reapers will now slowly back off of a device over the
    course of 4 hours after it fails to find new, unreaped FIDs.
    Previously, any files that got replicated to an already "dead"
    device would remain there until a reaper restart.
    Eric Wong authored and dormando committed Aug 13, 2012
    Configuration menu
    Copy the full SHA
    6218dc7 View commit details
    Browse the repository at this point in the history
  8. reaper: remove update_devcount call

    Since reaper now schedule replications with the same priority as
    fsck, we will also rely on the replicator to call
    update_devcount for us, allowing us to avoid making an
    unnecessary write to the database.
    Eric Wong authored and dormando committed Aug 13, 2012
    Configuration menu
    Copy the full SHA
    f015219 View commit details
    Browse the repository at this point in the history
  9. reaper: better handling of lock failures

    The delay backoff should only occur if we got a successful lock.
    Backing off the delay on lock failure can result in the
    delay becoming undef and (incorrectly) making the reaper give up
    on a device.
    
    Fortunately, lock failures with the extremely long (60s) lock
    timeout is unlikely to be a problem in practice.
    Eric Wong authored and dormando committed Aug 13, 2012
    Configuration menu
    Copy the full SHA
    84dee81 View commit details
    Browse the repository at this point in the history

Commits on Aug 14, 2012

  1. Postgres advisory lock instead of table-based lock

    Update Pg locking code to use Postgres advisory locks. Now requires
    Postgres 8.4 as min version.
    
    Signed-off-by: Robin H. Johnson <[email protected]>
    robbat2 authored and dormando committed Aug 14, 2012
    Configuration menu
    Copy the full SHA
    208b43f View commit details
    Browse the repository at this point in the history
  2. Cleanup lock timeout sleep location per Eric.

    Signed-off-by: Robin H. Johnson <[email protected]>
    robbat2 authored and dormando committed Aug 14, 2012
    Configuration menu
    Copy the full SHA
    21a6694 View commit details
    Browse the repository at this point in the history
  3. Checking in changes prior to tagging of version 2.65.

    Changelog diff is:
    
    diff --git a/CHANGES b/CHANGES
    index c552089..64455f4 100644
    --- a/CHANGES
    +++ b/CHANGES
    @@ -1,3 +1,28 @@
    +2012-08-13: Release version 2.65
    +
    +   * Postgres advisory lock instead of table-based lock (Robin H. Johnson <[email protected]>)
    +     Now requires minimum Postgres version of pg8.4.
    +
    +   * reaper: switch to Danga::Socket for scheduling (Eric Wong <[email protected]>)
    +
    +   * reaper: add queue_size_for_reaper server setting (Eric Wong <[email protected]>)
    +
    +   * reaper: add "queue_rate_for_reaper" server setting (Eric Wong <[email protected]>)
    +     defaults to 1000, same as previously.
    +
    +   * reaper: global lock around DB interaction (Eric Wong <[email protected]>)
    +     prevents reapers clobbering each other, causing a reduction in DB writes.
    +
    +   * tests: add basic test for reaper (Eric Wong <[email protected]>)
    +
    +   * fix tests when /etc/mogilefs/mogstored.conf exists (Eric Wong <[email protected]>)
    +
    +   * iostat: increase flexibility of iostat parser (Eric Wong <[email protected]>)
    +
    +   * iostat: allow MOG_IOSTAT_CMD env override (Eric Wong <[email protected]>)
    +
    +   * When a mogstored is down, die with a more informative message. (Dave Lambley <[email protected]>)
    +
     2012-06-21: Release version 2.64
    
        * Delete memcache data when we replicate fids (Pyry Hakulinen <[email protected]>)
    dormando committed Aug 14, 2012
    Configuration menu
    Copy the full SHA
    fae9663 View commit details
    Browse the repository at this point in the history

Commits on Nov 3, 2012

  1. Configuration menu
    Copy the full SHA
    823becc View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    7205d52 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    f8cfff6 View commit details
    Browse the repository at this point in the history
  4. [#58] disable logging and move the pid to the data docroot to make ng…

    …inx backend less architecture dependent
    frett committed Nov 3, 2012
    Configuration menu
    Copy the full SHA
    33fef38 View commit details
    Browse the repository at this point in the history

Commits on Nov 12, 2012

  1. [#58] remove a couple unnecessary configuration directives per Gernot…

    …'s recommendation
    
    tcp_nodelay defaults to on, so there is no need to specify it
    remove unnecessary error_page config, there is no need for pretty error pages
    frett committed Nov 12, 2012
    Configuration menu
    Copy the full SHA
    9424a04 View commit details
    Browse the repository at this point in the history
  2. [#58] only specify the root once in the server directive instead of f…

    …or each configured location
    frett committed Nov 12, 2012
    Configuration menu
    Copy the full SHA
    bc37655 View commit details
    Browse the repository at this point in the history
  3. [#58] relocate the prefix directory to keep nginx from conflicting wi…

    …th other running copies. Thanks Gernot for the heads up
    frett committed Nov 12, 2012
    Configuration menu
    Copy the full SHA
    d69de20 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    0d6e7cd View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    88a39f3 View commit details
    Browse the repository at this point in the history

Commits on Nov 13, 2012

  1. Configuration menu
    Copy the full SHA
    7904b38 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    d31497a View commit details
    Browse the repository at this point in the history
  3. [#58] relocate all temp_path's to a temp path specific to mogstored

    this attempts to prevent conflicts with other running instances of nginx
    frett committed Nov 13, 2012
    Configuration menu
    Copy the full SHA
    1510088 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    5a635e6 View commit details
    Browse the repository at this point in the history

Commits on Dec 23, 2012

  1. Configuration menu
    Copy the full SHA
    02ed229 View commit details
    Browse the repository at this point in the history

Commits on Dec 24, 2012

  1. if one really wants to be root - let him be

    notti authored and frett committed Dec 24, 2012
    Configuration menu
    Copy the full SHA
    db1187d View commit details
    Browse the repository at this point in the history

Commits on Jan 5, 2013

  1. Fix "skip_devcount" during rebalance

    We were updating devcount field even when skip_devcount was true. We
    should not use $sto here because we already have FID object and nice
    method available for this.
    
    Signed-off-by: Eric Wong <[email protected]>
    Pyry Hakulinen authored and Eric Wong committed Jan 5, 2013
    Configuration menu
    Copy the full SHA
    7c89846 View commit details
    Browse the repository at this point in the history
  2. fix use_dest_devs for rebalance

    The caller expects an array ref, currently using use_dest_devs will kill
    JobMaster with:
    
    Oct 18 09:45:25 storage22 mogilefsd[23263]: crash log: rebalance cannot find suitable destination devices at /usr/local/share/perl/5.10.1/MogileFS/Worker/JobMaster.pm line 233
    Oct 18 09:45:26 storage22 mogilefsd[22044]: Child 23263 (job_master) died: 256 (UNEXPECTED)
    
    Signed-off-by: Eric Wong <[email protected]>
    Pyry Hakulinen authored and Eric Wong committed Jan 5, 2013
    Configuration menu
    Copy the full SHA
    7bc3c35 View commit details
    Browse the repository at this point in the history

Commits on Jan 6, 2013

  1. prevent reqid mismatches (and queryworker death)

    On certain errors, the queryworker may send two "ERR" responses, causing
    the ProcManager to terminate the queryworker upon reading the second
    response if the queryworker is immediately fed another query.
    
    This can affect busy setups, but is also easy to reproduce with a single
    queryworker that's receiving a pipelined request to an
    invalid/non-existent domain:
    
    	(
    		printf 'list_keys domain=\r\nlist_keys domain=\r\n'
    		sleep 2
    	) | socat - TCP:127.0.0.1:7001
    
    The queryworker strace will look like this (writing 4 lines):
    
      write(14, "4981-1 0.0005 ERR no_domain No+domain+provided\r\n", 48) = 48
      write(14, "4981-1 ERR domain_not_found Domain+not+found\r\n", 46) = 46
      write(14, "4981-2 0.0005 ERR no_domain No+domain+provided\r\n", 48) = 48
      write(14, "4981-2 ERR domain_not_found Domain+not+found\r\n", 46) = 46
    
    And a message like this will appear for "!watch" users:
    
      Worker responded with id <undef> (line: [4981-1 ERR domain_not_found Domain+not+found]), but expected id 4981-2, killing
    
    This is because ProgManager immediately calls NoteIdleQueryWorker upon
    writing the first ERR response to the client (at the end of
    HandleQueryWorkerResponse).  This means the idle query worker may
    immediately start processing a second request before the ProcManager has
    a chance to process the second ERR response line (from the first
    request).
    
    Preventing err_line() from calling send_to_parent() with "ERR"
    if querystarttime is undef prevents this issue, but there may be
    better ways to fix this bug.  A similar, preventative fix may be
    appropriate for ok_line().
    Eric Wong committed Jan 6, 2013
    Configuration menu
    Copy the full SHA
    ff6ac2c View commit details
    Browse the repository at this point in the history
  2. test: expose try_for() as a common test function

    This saves us from reinventing it in every test and will
    help us detect stuck tests more easily.
    Eric Wong committed Jan 6, 2013
    Configuration menu
    Copy the full SHA
    3cebc87 View commit details
    Browse the repository at this point in the history
  3. t/50-checksum.t: use common try_for() function

    I've had this test get stuck intermittently in different
    places, this should make it easier to track down stuck
    tests.
    Eric Wong committed Jan 6, 2013
    Configuration menu
    Copy the full SHA
    3be61cf View commit details
    Browse the repository at this point in the history
  4. t/50-checksum.t: ensure replicate worker is really down

    Workers do not receive nor respond to messages as soon as
    the ProcManager dispatches the request to stop/start them,
    so wait until ProcManager no longer knows about a process
    before proceeding.
    Eric Wong committed Jan 6, 2013
    Configuration menu
    Copy the full SHA
    9614260 View commit details
    Browse the repository at this point in the history
  5. checksum: avoid comparison on uninitialized value

    $class->{hashtype} is undef by default for classes where no
    checksums are configured.
    
    Since clients can force checksum verification in create_close
    regardless of class, we can end up with a Checksum object for
    FIDs regardless of which class the FID is in.
    Eric Wong committed Jan 6, 2013
    Configuration menu
    Copy the full SHA
    9457434 View commit details
    Browse the repository at this point in the history
  6. query: allow "0" key on all commands which take keys

    delete, file_info, get_paths, rename, file_debug, updateclass
    were all broken when handling a key named "0".
    Eric Wong committed Jan 6, 2013
    Configuration menu
    Copy the full SHA
    ff5a3da View commit details
    Browse the repository at this point in the history
  7. fsck: use replicate lock when fixing FID

    We need to ensure neither replicate (nor delete) are changing
    the devids list when fixing an FID.  This should ensure we're
    safely modifying the devid list for a given FID when forgetting
    about bad ones.
    Eric Wong committed Jan 6, 2013
    Configuration menu
    Copy the full SHA
    7851e44 View commit details
    Browse the repository at this point in the history
  8. fsck: skip non-existent FIDs properly

    We should not waste time stat()-ing FIDs that no longer
    exist at all.
    Eric Wong committed Jan 6, 2013
    Configuration menu
    Copy the full SHA
    a549b00 View commit details
    Browse the repository at this point in the history
  9. improve handling of classes which change hash algorithm

    In replicate, we validate via existing FID checksum regardless
    of class.  This failed when the class.hashtype was altered after
    uploading but before replication.
    
    The following sequence of events caused replication to fail:
    
      1. class.hashtype = MD5
      2. FID created and stores MD5:...
      3. FID enqueued for replication
      4. class.hashtype = NONE
      5. FID begins replicating
    
    Replication (Step 5) failed since the existing MD5 digest is
    trying to validate against a (now) non-existent class digest.
    Since we stored the checksum in the database anyways, calculate
    and validate anyways as an admin could've only wanted to
    alter a class temporarily, not permanently.
    
    An admin may also decide to switch checksum algorithms.
    Fsck now logs hash algorithm mismatches as "BALG" and
    emits a descriptive message to syslog
    Eric Wong committed Jan 6, 2013
    Configuration menu
    Copy the full SHA
    25e344a View commit details
    Browse the repository at this point in the history
  10. reaper: validate DB connection before reaping

    This helps prevent the reaper process from dying if
    the DB disconnected us for idleness.
    
    This should fix #75: ("reaper dies if DB connection closes")
    http://code.google.com/p/mogilefs/issues/detail?id=75
    Eric Wong committed Jan 6, 2013
    Configuration menu
    Copy the full SHA
    811c621 View commit details
    Browse the repository at this point in the history
  11. t/30-rebalance: remove redundant try_for() function

    I missed this when moving try_for() into Test.pm
    Eric Wong committed Jan 6, 2013
    Configuration menu
    Copy the full SHA
    f79d69e View commit details
    Browse the repository at this point in the history

Commits on Jan 7, 2013

  1. t/00-startup: fix updateclass test

    We called MogileFS::Client::update_class incorrectly without
    the key.  Additionally, the test for checking the number of
    copies was also incorrect.
    Eric Wong authored and dormando committed Jan 7, 2013
    Configuration menu
    Copy the full SHA
    d9da3b2 View commit details
    Browse the repository at this point in the history
  2. support updating the class to the default class which has an id of 0

    [ew: added trivial test]
    Signed-off-by: Eric Wong <[email protected]>
    frett authored and dormando committed Jan 7, 2013
    Configuration menu
    Copy the full SHA
    affc065 View commit details
    Browse the repository at this point in the history
  3. add a hook to cmd_updateclass

    Signed-off-by: Eric Wong <[email protected]>
    frett authored and dormando committed Jan 7, 2013
    Configuration menu
    Copy the full SHA
    79e1c5b View commit details
    Browse the repository at this point in the history
  4. Checking in changes prior to tagging of version 2.66.

    Changelog diff is:
    
    diff --git a/CHANGES b/CHANGES
    index 64455f4..bd7e38a 100644
    --- a/CHANGES
    +++ b/CHANGES
    @@ -1,3 +1,29 @@
    +2013-01-06: Release version 2.66
    +
    +   * add a hook to cmd_updateclass (Daniel Frett <[email protected]>)
    +
    +   * support updating the class to the default class which has an id of 0 (Daniel Frett <[email protected]>)
    +
    +   * reaper: validate DB connection before reaping (Eric Wong <[email protected]>)
    +     Fixes occasional crash in reaper process.
    +
    +   * improve handling of classes which change hash algorithm (Eric Wong <[email protected]>)
    +
    +   * fsck: skip non-existent FIDs properly (Eric Wong <[email protected]>)
    +
    +   * fsck: use replicate lock when fixing FID (Eric Wong <[email protected]>)
    +
    +   * query: allow "0" key on all commands which take keys (Eric Wong <[email protected]>)
    +
    +   * prevent reqid mismatches (and queryworker death) (Eric Wong <[email protected]>)
    +     Fixes crash case with specific error types.
    +
    +   * fix use_dest_devs for rebalance (Pyry Hakulinen <[email protected]>)
    +     Fixes "use_dest_devs" argument during rebalance.
    +
    +   * Fix "skip_devcount" during rebalance (Pyry Hakulinen <[email protected]>)
    +     Now actually skips updating devcount column during rebalance.
    +
     2012-08-13: Release version 2.65
    
        * Postgres advisory lock instead of table-based lock (Robin H. Johnson <[email protected]>)
    dormando committed Jan 7, 2013
    Configuration menu
    Copy the full SHA
    37ab849 View commit details
    Browse the repository at this point in the history

Commits on Jan 9, 2013

  1. Reseed the random number generator after forking.

    Signed-off-by: Eric Wong <[email protected]>
    Dave Lambley authored and Eric Wong committed Jan 9, 2013
    Configuration menu
    Copy the full SHA
    b656875 View commit details
    Browse the repository at this point in the history
  2. Pull out device sorting into it's own method for overriding.

    Signed-off-by: Eric Wong <[email protected]>
    Dave Lambley authored and Eric Wong committed Jan 9, 2013
    Configuration menu
    Copy the full SHA
    70c8d58 View commit details
    Browse the repository at this point in the history
  3. Do both sorts in one method, to save on shared initialisation.

    Signed-off-by: Eric Wong <[email protected]>
    Dave Lambley authored and Eric Wong committed Jan 9, 2013
    Configuration menu
    Copy the full SHA
    818f802 View commit details
    Browse the repository at this point in the history
  4. worker: set monitor_has_run flag at initialization

    This way the queryworker will know it won't have to
    wait again for the monitor to run.  This allows
    users to (manually) set higher intervals in Monitor.pm
    without noticing ill effects.
    Eric Wong committed Jan 9, 2013
    Configuration menu
    Copy the full SHA
    92046c1 View commit details
    Browse the repository at this point in the history
  5. monitor: remove dead iostats code/comments

    * set_observed_utilization() is a no-op, and can be
      safely removed.
    
    * Looking up the device via factory does not incur DB hit
      since the factory changes of May 2011.
    
    * Really avoids propagating invalid devids with correct ordering
      of the hash assignment.  This prevents an invalid devid from
      hitting even the {devutil}->{cur} hash which lasts the
      lifetime of the monitor process.
    Eric Wong committed Jan 9, 2013
    Configuration menu
    Copy the full SHA
    decedfe View commit details
    Browse the repository at this point in the history
  6. ProcManager: favor using recently-used queryworkers

    As HTTP/1.1 servers tend to disconnect idle connections over
    time, recently-used queryworkers are more likely to have
    reusable HTTP connections.  This can reduce the number of open
    HTTP sockets across the cluster during non-peak periods.
    
    This may improve performance in two ways:
    
    * recently-used worker processes should have better memory locality
    
    * can avoid the chance for TCP slow-start-after-idle behavior
      to kick in for DB connections.
    
    The downside of this patch is memory/CPU usage between workers
    may appear lopsided and probably confuse users.  This change
    should also make potential memory leaks more noticeable.
    Eric Wong committed Jan 9, 2013
    Configuration menu
    Copy the full SHA
    ca46b42 View commit details
    Browse the repository at this point in the history
  7. disable Nagle's algorithm for accepted clients

    Normally, disabling Nagle's algorithm would have little effect
    on typical MogileFS traffic:
    
       < read one request from client
       - process request in queryworker
       > write one response to client
       < read one request from client
       - process request in queryworker
       > write one response to client
       < read one request from client
       - process request in queryworker
       > write one response to client
       ...
    
    However, in certain cases, clients may pipeline requests (and
    sort responses on the client side).  This causes tracker traffic
    to end up like this:
    
       < read multiple requests from client
       - process requests in parallel on multiple queryworkers
       > write one response to client
       > write one response to client
       > write one response to client
       ...
    
    Since Nagle's algorithm waits for an ACK from each response the
    server writes before sending the next response, it limits the
    rate at which the client can receive responses.
    
    Informal testing over loopback running the "file_info" command
    on two batches of 1000 keys each (2000 keys total) consistently
    reveals a small, ~20-60ms reduction (580-600ms -> 540-580ms) on
    a somewhat active machine with four queryworkers (and four
    cores).
    
    Like SO_KEEPALIVE, TCP_NODELAY is inherited from the listener by
    accepted sockets in every system I've checked, so there's no
    additional overhead in userspace when accepting new clients.
    Eric Wong committed Jan 9, 2013
    Configuration menu
    Copy the full SHA
    ee5f196 View commit details
    Browse the repository at this point in the history
  8. sqlite: use immediate transactions to prevent busy errors

    The default (deferred) transaction mode in SQLite delays
    locking, potentially leading to "database is locked" errors on
    concurrent access.  Immediate transactions lock the database
    immediately, preventing unnecessary errors at the cost of
    reduced concurrency.
    
    I've still occasionally encountered a "database is locked"
    or two on my SQLite deployment with many workers over the
    months.
    
    Tested on MySQL, Postgres, and DBD::SQLite 1.29 and 1.37.
    This feature appeared in DBD::SQLite 1.30, but the extra
    attribute for DBI->connect is harmless for drivers which
    do not support this attribute.
    
    ref: http://search.cpan.org/dist/DBD-SQLite/lib/DBD/SQLite.pm
    
    Using the following instrumentation patch, I have not hit
    busy/locked errors while putting my SQLite-based MogileFS
    instance through heavy activity (fsck, uploads, deletes):
    
      --- a/lib/MogileFS/Store/SQLite.pm
      +++ b/lib/MogileFS/Store/SQLite.pm
      @@ -164,7 +164,12 @@ use constant SQLITE_LOCKED => 6; # A table in the database is locked
       sub was_deadlock_error {
           my $err = $_[0]->dbh->err or return 0;
    
      -    ($err == SQLITE_BUSY || $err == SQLITE_LOCKED);
      +    if ($err == SQLITE_BUSY || $err == SQLITE_LOCKED) {
      +       Mgd::log('info', "DB locked");
      +       1;
      +    } else {
      +       0;
      +    }
       }
    
       sub was_duplicate_error {
    Eric Wong committed Jan 9, 2013
    Configuration menu
    Copy the full SHA
    86129e5 View commit details
    Browse the repository at this point in the history

Commits on Jan 11, 2013

  1. Merge remote-tracking branch 'pull/26/head' into next-nginx

    * pull/26/head:
      if one really wants to be root - let him be
      Moved utf-8 config to http block
      [#58] support nginx server type in command line options
      [#58] relocate all temp_path's to a temp path specific to mogstored
      [#58] utilize non-daemon mode for nginx >= 1.0.9
      [#58] die if nginx fails to start
      clean up formatting, no functional changes
      [#58] store the nginx pid in the prefix dir and reduce the scope of the variable
      [#58] relocate the prefix directory to keep nginx from conflicting with other running copies. Thanks Gernot for the heads up
      [#58] only specify the root once in the server directive instead of for each configured location
      [#58] remove a couple unnecessary configuration directives per Gernot's recommendation
      [#58] disable logging and move the pid to the data docroot to make nginx backend less architecture dependent
      [#58] fix the code generating sections for each device
      [#58] load the Nginx server file so it can be used
      [#58] load the latest version of the nginx module
    Eric Wong committed Jan 11, 2013
    Configuration menu
    Copy the full SHA
    3ab1d90 View commit details
    Browse the repository at this point in the history

Commits on Jan 12, 2013

  1. nginx: additional version check for uwsgi and scgi

    Debian squeeze (stable as of 2013/01) uses nginx 0.7.67, so
    there are likely many users still using this older version.
    Attempting to specify a dummy {uwsgi,scgi}_temp_path causes
    errors at startup for me.
    
    According to the the nginx CHANGES file, uwsgi appeared in 0.8.40
    and scgi appeared in 0.8.42.
    
    ref: http://nginx.org/en/CHANGES
    Eric Wong committed Jan 12, 2013
    Configuration menu
    Copy the full SHA
    b05bced View commit details
    Browse the repository at this point in the history
  2. use the ngx_version function for determining non-daemon support

    also, only calculate the actual version once
    frett committed Jan 12, 2013
    Configuration menu
    Copy the full SHA
    80c09e2 View commit details
    Browse the repository at this point in the history

Commits on Jan 13, 2013

  1. move checksum and tempfile delete to delete worker

    This removes two DB calls from the latency-critical queryworker
    process.  This may widen a race condition with reused explicit
    FIDs, but explicit FIDs are a bad idea anyways and reusing
    FIDs likely had problems before this change.
    
    I've also removed the Postgres-specific delete_fidid()
    function.  commit 7dbfb44
    (Make postgres use new delete worker code) removed the
    Postgres-specific code path and made it functionally
    identical to the generic version.
    Eric Wong authored and dormando committed Jan 13, 2013
    Configuration menu
    Copy the full SHA
    c8942c0 View commit details
    Browse the repository at this point in the history
  2. httpfile: avoid killing worker on down sidechannel

    The lack of a mogstored sidechannel listener should not
    be fatal to a replication worker (or any other worker).
    This bug only affects checksums users who misconfigure
    mogstored.
    Eric Wong authored and dormando committed Jan 13, 2013
    Configuration menu
    Copy the full SHA
    81b0067 View commit details
    Browse the repository at this point in the history

Commits on Jan 15, 2013

  1. typo fix with root check in nginx module

    Signed-off-by: Eric Wong <[email protected]>
    notti authored and Eric Wong committed Jan 15, 2013
    Configuration menu
    Copy the full SHA
    e5b0b0b View commit details
    Browse the repository at this point in the history

Commits on Jan 17, 2013

  1. query: avoid redundant calls to err_line()

    Additionally, log redundant calls to err_line so we have a
    chance at figuring out what is causing redundant calls to
    err_line()
    Eric Wong committed Jan 17, 2013
    Configuration menu
    Copy the full SHA
    eee892b View commit details
    Browse the repository at this point in the history
  2. query: fix error reporting for _do_fsck_reset

    Failed set_server_settings calls just die with errors, so we
    need to track and log that to syslog.  We'll also report we had
    a database error back to the client (but avoid propagating the
    exact error message, in case there is any sensitive
    information).
    Eric Wong committed Jan 17, 2013
    Configuration menu
    Copy the full SHA
    3ff5eec View commit details
    Browse the repository at this point in the history

Commits on Jan 18, 2013

  1. debian/control: sysstat contains /usr/bin/iostat

    Signed-off-by: Eric Wong <[email protected]>
    Dave Lambley authored and Eric Wong committed Jan 18, 2013
    Configuration menu
    Copy the full SHA
    2aac660 View commit details
    Browse the repository at this point in the history
  2. Filter the devices before we do an expensive sort.

    Signed-off-by: Eric Wong <[email protected]>
    Dave Lambley authored and Eric Wong committed Jan 18, 2013
    Configuration menu
    Copy the full SHA
    c23cc6b View commit details
    Browse the repository at this point in the history
  3. mogstored: fix kqueue usage with daemonization

    Calling Mogstored::HTTPServer::Perlbal->start() creates a kqueue
    descriptor.  kqueue descriptors are invalidated across fork,
    so we must avoid kqueue creation until after daemonization.
    
    We continue starting non-Perlbal HTTP servers before
    daemonization, as error reporting can be easier if stderr/stdout
    are not redirected to /dev/null.
    
    ref: http://code.google.com/p/mogilefs/issues/detail?id=72
    Cc: [email protected]
    Eric Wong committed Jan 18, 2013
    Configuration menu
    Copy the full SHA
    d8de2ba View commit details
    Browse the repository at this point in the history
  4. postgres: remove Pg-specific create_class

    This version is still racy after several years.  More importantly,
    it's missing the change in commit 5d01811
    which allows the default class to be overridden.
    Eric Wong committed Jan 18, 2013
    Configuration menu
    Copy the full SHA
    857f2cd View commit details
    Browse the repository at this point in the history
  5. store: wrap create_class in a transaction to avoid races

    Race conditions in create_class are unlikely to be a problem
    in normal usage, but this will discourage code duplication
    which can lead to maintainability issues.
    Eric Wong committed Jan 18, 2013
    Configuration menu
    Copy the full SHA
    d92e2a1 View commit details
    Browse the repository at this point in the history
  6. domain removal also removes its default class

    A default class may enter the class table if its settings
    (e.g. mindevcount) are altered.  The queryworker does not
    allow removing the default class, so the only way to remove
    it is to remove it when the domain goes away.
    Eric Wong committed Jan 18, 2013
    Configuration menu
    Copy the full SHA
    7eb1674 View commit details
    Browse the repository at this point in the history

Commits on Jan 19, 2013

  1. tests: add "!want <count> <jobclass>" helper

    This is used in several places, and will make code
    easier to maintain going forward.
    Eric Wong committed Jan 19, 2013
    Configuration menu
    Copy the full SHA
    61e1d3b View commit details
    Browse the repository at this point in the history
  2. reaper: ensure worker can be stopped via "!want"

    Now, all of our job classes may be controlled via "!want"
    Eric Wong committed Jan 19, 2013
    Configuration menu
    Copy the full SHA
    6c9f5f6 View commit details
    Browse the repository at this point in the history

Commits on Feb 3, 2013

  1. Serialize tempfile reaping

    prevents dogpiling on some slowish queries if you DB is hosed, or if you have
    many tempfile rows that need processing.
    dormando committed Feb 3, 2013
    Configuration menu
    Copy the full SHA
    e3f7601 View commit details
    Browse the repository at this point in the history
  2. Checking in changes prior to tagging of version 2.67.

    Changelog diff is:
    
    diff --git a/CHANGES b/CHANGES
    index bd7e38a..f0f578c 100644
    --- a/CHANGES
    +++ b/CHANGES
    @@ -1,3 +1,36 @@
    +2013-02-02: Release version 2.67
    +
    +   * Serialize tempfile reaping (dormando <[email protected]>)
    +
    +   * reaper: ensure worker can be stopped via "!want" (Eric Wong <[email protected]>)
    +
    +   * domain removal also removes its default class (Eric Wong <[email protected]>)
    +
    +   * store: wrap create_class in a transaction to avoid races (Eric Wong <[email protected]>)
    +
    +   * mogstored: fix kqueue usage with daemonization (Eric Wong <[email protected]>)
    +
    +   * Filter the devices before we do an expensive sort. (Dave Lambley <[email protected]>)
    +
    +   * httpfile: avoid killing worker on down sidechannel (Eric Wong <[email protected]>)
    +
    +   * move checksum and tempfile delete to delete worker (Eric Wong <[email protected]>)
    +
    +   * sqlite: use immediate transactions to prevent busy errors (Eric Wong <[email protected]>)
    +
    +   * disable Nagle's algorithm for accepted clients (Eric Wong <[email protected]>)
    +
    +   * ProcManager: favor using recently-used queryworkers (Eric Wong <[email protected]>)
    +
    +   * Do both sorts in one method, to save on shared initialisation. (Dave Lambley <[email protected]>)
    +
    +   * Pull out device sorting into it's own method for overriding. (Dave Lambley <[email protected]>)
    +
    +   * Reseed the random number generator after forking. (Dave Lambley <[email protected]>)
    +
    +   * support nginx server type in mogstored command line options (Daniel Frett <[email protected]>)
    +     (also Gernot Vormayr <[email protected]>, others)
    +
     2013-01-06: Release version 2.66
    
        * add a hook to cmd_updateclass (Daniel Frett <[email protected]>)
    dormando committed Feb 3, 2013
    Configuration menu
    Copy the full SHA
    221808c View commit details
    Browse the repository at this point in the history

Commits on Feb 7, 2013

  1. list_keys: consistent ESCAPE usage across DB types

    Without specifying an ESCAPE character for LIKE queries, the '\'
    we use for escaping is treated as a literal and improperly
    matched keys with '\' in them under SQLite.
    
    This is only needed for SQLite, as the SQLite language reference
    makes no reference of a default ESCAPE character in
    http://www.sqlite.org/lang_expr.html
    
    ESCAPE is supported in MySQL and Postgres, too; and defaults to
    '\'.  We specify it anyways to reduce code differences between
    different databases.
    
    Tested on MySQL 5.1.66 and Postgres 8.4.13 on Debian 6.0
    and SQLite 3.7.13 on Debian 7.0
    Eric Wong committed Feb 7, 2013
    Configuration menu
    Copy the full SHA
    461b1e3 View commit details
    Browse the repository at this point in the history
  2. list_keys: escape in Store, allow [%\\] as prefix

    If we support non-SQL DBs in the future, escaping rules could
    become store-specific, so Worker/Query is not the right place for
    it.
    
    Since '%' and '\' may be escaped just like any other character,
    we may also allow these characters as prefixes by properly
    escaping them.
    
    Tested on MySQL 5.1.66 and Postgres 8.4.13 on Debian 6.0
    and SQLite 3.7.13 on Debian 7.0
    Eric Wong committed Feb 7, 2013
    Configuration menu
    Copy the full SHA
    0889c79 View commit details
    Browse the repository at this point in the history

Commits on Feb 12, 2013

  1. reaper: detect resurrection of "dead" devices

    Although never officially supported in MogileFS, some users will
    manage to change "dead" devices to another state.  When running
    fsck, this may cause the desperate search to continually fail
    as any files found and added to file_on table will just be
    reaped.
    
    Reported-by: Ask Bjørn Hansen <[email protected]>
    Subject: fsck/FOND not adding a row to file_on
    Message-ID: <[email protected]>
    Eric Wong committed Feb 12, 2013
    Configuration menu
    Copy the full SHA
    a92cfe7 View commit details
    Browse the repository at this point in the history
  2. fsck: do not log FOND if note_on_device croaks

    note_on_device may croak, so avoid logging FOND until we've
    successfully called note_on_device to ensure the fsck log is
    consistent with what was done.
    Eric Wong committed Feb 12, 2013
    Configuration menu
    Copy the full SHA
    47b710f View commit details
    Browse the repository at this point in the history

Commits on Feb 19, 2013

  1. Tell the kernel we're doing sequential reads.

    [ew: squashed Dave's change to make IO::AIO optional]
    
    Signed-off-by: Eric Wong <[email protected]>
    Dave Lambley authored and Eric Wong committed Feb 19, 2013
    Configuration menu
    Copy the full SHA
    32c2313 View commit details
    Browse the repository at this point in the history
  2. Don't emit warnings if we're lacking the space free of a device. If w…

    …e don't
    
    find space on devices with known space free, try the unknowns.
    
    Signed-off-by: Eric Wong <[email protected]>
    Dave Lambley authored and Eric Wong committed Feb 19, 2013
    Configuration menu
    Copy the full SHA
    534177a View commit details
    Browse the repository at this point in the history

Commits on Feb 23, 2013

  1. ProcManager: only log times_out_of_qworkers for new queries

    Logging times_out_of_qworkers in ProcessQueues is not accurate:
    recently-idle queryworkers may not be noticed and marked idle while
    ProcessQueues is looping and draining the @IdleQueryWorkers pool.
    
    Instead, only log times_out_of_qworkers when new requests are
    enqueued.
    Eric Wong committed Feb 23, 2013
    Configuration menu
    Copy the full SHA
    93eac88 View commit details
    Browse the repository at this point in the history

Commits on Feb 26, 2013

  1. Merge remote-tracking branch 'bogomips/list_keys' into next-small

    * bogomips/list_keys:
      list_keys: escape in Store, allow [%\\] as prefix
      list_keys: consistent ESCAPE usage across DB types
    Eric Wong committed Feb 26, 2013
    Configuration menu
    Copy the full SHA
    55f3e41 View commit details
    Browse the repository at this point in the history
  2. Merge remote-tracking branch 'bogomips/pending_queries' into next-small

    * bogomips/pending_queries:
      ProcManager: only log times_out_of_qworkers for new queries
    Eric Wong committed Feb 26, 2013
    Configuration menu
    Copy the full SHA
    2f05313 View commit details
    Browse the repository at this point in the history

Commits on Feb 27, 2013

  1. mogstored: avoid bareword on IO::AIO w/o fadvise

    IO::AIO 2.4 on Debian stable lacks IO::AIO::FADV_SEQUENTIAL
    constant, causing compilation to fail on the bareword.
    Accessing the constant as a subroutine call (via "()") avoids
    the bareword and defers the error to runtime (which is trapped
    by eval).
    
    Tested under IO::AIO 2.4 on Debian stable and IO::AIO 4.15 on
    Debian testing (verified fadvise64() syscall under strace).
    Eric Wong committed Feb 27, 2013
    Configuration menu
    Copy the full SHA
    21049f9 View commit details
    Browse the repository at this point in the history
  2. httpfile: correct FILE_MISSING check in digest_mgmt

    This is unlikely to be an issue in fsck, fsck checks
    file size/existence before digesting the file.
    Eric Wong committed Feb 27, 2013
    Configuration menu
    Copy the full SHA
    947ae4b View commit details
    Browse the repository at this point in the history
  3. httpfile: correct timeouts for sidechannel digest

    fsck digests are deprioritized and serialized in mogstored, so
    it's nearly impossible to tell what's in the queue before our
    request.  Since fsck is not latency critical, extend the timeout
    for that.
    
    We also need to account for normal seek/network latency for
    non-fsck digest requests, so add node_timeout to that.
    
    These bugs were mostly hidden since we are relying on <> to
    read, which may incur watchdog timeouts.
    Eric Wong committed Feb 27, 2013
    Configuration menu
    Copy the full SHA
    265ccef View commit details
    Browse the repository at this point in the history

Commits on Mar 9, 2013

  1. fix "drain" handling used by MultipleHosts replpolicy

    MogileFS::DeviceState was never updated for the 2.40 drain changes.
    The broken-since-2.40 should_have_files sub caused
    ReplicationPolicy::MultipleHosts to overreplicate files, as it
    was not counting drain devices in the total disks check.
    
    Thanks to Tim on for reporting this to the mailing list at
    [email protected]
    Eric Wong committed Mar 9, 2013
    Configuration menu
    Copy the full SHA
    f369b15 View commit details
    Browse the repository at this point in the history

Commits on Mar 30, 2013

  1. fsck: this avoid redundant fsck log entries

    With many fsck workers and slow fsck (due to checksumming large
    files and/or high network latency), it may be possible for fsck
    workers to start working on the same FID without a lock.
    
    ref: ML Subject: "FSCK Status/Log Entries"
    Eric Wong committed Mar 30, 2013
    Configuration menu
    Copy the full SHA
    9bba043 View commit details
    Browse the repository at this point in the history

Commits on Apr 1, 2013

  1. remove unused *::get_dbh subroutines

    These subroutines are unused.
    Eric Wong committed Apr 1, 2013
    Configuration menu
    Copy the full SHA
    2c52aed View commit details
    Browse the repository at this point in the history

Commits on Jul 10, 2013

  1. ProcManager: log socketpair errors correctly

    Log the correct name of the failed function and the error string
    associated with the OS errno to aid in debugging.
    
    ML Ref:
      Date: Mon, 8 Jul 2013 17:14:40 -0700 (PDT)
      From: Tim <[email protected]>
      To: [email protected]
      Message-Id: <[email protected]>
      Subject: MogileFS crashes
    Eric Wong committed Jul 10, 2013
    Configuration menu
    Copy the full SHA
    b87f38a View commit details
    Browse the repository at this point in the history
  2. httpfile: log mogstored I/O errors when checksumming

    Mogstored/SideChannelClient.pm may hit the following on I/O error:
    
      $self->write("ERR read $uri at $offset failed\r\n");
    
    Be prepared to show that error to tracker watchers (and any
    other possible errors mogstored may return in the future).
    Eric Wong committed Jul 10, 2013
    Configuration menu
    Copy the full SHA
    d45bbf0 View commit details
    Browse the repository at this point in the history

Commits on Aug 4, 2013

  1. add naive MultipleDevice replication policy

    This can be useful when MultipleHosts is too noisy when hosts differ
    greatly in storage capacity.
    
    The intended target of this policy is a low-priority backup cluster
    where a single host contains the bulk of the storage with a handful
    of random machines helping out.  The MultipleHosts policy can be too
    noisy with log messages about running out of suggestions in this case.
    Eric Wong authored and dormando committed Aug 4, 2013
    Configuration menu
    Copy the full SHA
    74844dc View commit details
    Browse the repository at this point in the history
  2. store: do not auto-reconnect while holding a lock

    Auto-reconnect is probably always unsafe while a holding a lock
    on all networked databases.
    
    While we do not use the builtin auto-reconnect functionality of
    MySQL, any auto-reconnect implementation should be affected by
    the same issues upon connection failure:
    https://dev.mysql.com/doc/refman/5.6/en/auto-reconnect.html
    
    With auto-reconnect, we could be operating under the false
    assumption we have a lock after the reconnect when we do not.
    
    For now, the easiest method of recovery is to just let the
    worker die while working on the current task and have the
    ProcManager restart it.
    Eric Wong authored and dormando committed Aug 4, 2013
    Configuration menu
    Copy the full SHA
    1f8bc08 View commit details
    Browse the repository at this point in the history
  3. store: do not disconnect for max_handles while locked

    Dropping a connection while holding an advisory lock with
    MySQL or Postgres will cause a fatal error, so hold the
    connection open until the next time the dbh is requested
    without holding a lock.
    Eric Wong authored and dormando committed Aug 4, 2013
    Configuration menu
    Copy the full SHA
    8a5bad0 View commit details
    Browse the repository at this point in the history
  4. allow startup without job_master (and dependent workers)

    And when running without a job_master, do not spawn job_master-dependent
    workers (delete, fsck, replicate) as those workers will never get work.
    
    Running a queryworker+monitor in a remote datacenter makes sense with
    the MogileFS::Network plugins since the "create_close" size verification
    is faster and more reliable if the queryworker is in the same datacenter
    as the client, even if the master DB is in a remote datacenter.
    
    Being in a remote datacenter, (master)DB-intensive operations from
    delete, fsck and replicate workers can encounter high latency and an
    unreliable link, so admins may disable those workers in this situation.
    
    However, disabling delete, fsck, and replicate workers individually
    still allows the job_master to fill the initial queues (which is never
    processed) and prevent other trackers from processing items for 1000
    seconds.
    
    Future commits may allow job_master to ignore certain queues if there
    are zero workers for that queue, but for now, stopping job_master
    entirely should be sufficient for most users with trackers in a
    different datacenter than the DB.
    
    P.S. It also makes sense to disable reaper in remote datacenters, too,
    but reaper does not rely on job_master.
    Eric Wong authored and dormando committed Aug 4, 2013
    Configuration menu
    Copy the full SHA
    ad8de9f View commit details
    Browse the repository at this point in the history
  5. increase receive buffers for large state events

    The monitor may send large state events for large installations
    with many hosts, devices, domains, or classes.  The 1K default
    is too small and leads to excessive syscalls and string operations.
    
    This increases startup performance for a mock instance with 10K domains
    and 10K non-default classes.
    
    Using the parent_ping function and "No simple reply" warning as an
    informal benchmark, this change reduces the loop time from 12 to 10
    loops.
    Eric Wong authored and dormando committed Aug 4, 2013
    Configuration menu
    Copy the full SHA
    18e8a6c View commit details
    Browse the repository at this point in the history
  6. monitor: do not repeat join() for the debug statement

    This join() takes about 20ms on my mock instance with 10K domains
    and 10K classes, so it has some impact on startup performance.
    Eric Wong authored and dormando committed Aug 4, 2013
    Configuration menu
    Copy the full SHA
    be0f167 View commit details
    Browse the repository at this point in the history
  7. do not replay :monitor_events back to the monitor

    This was excessively expensive for my instance with 10K domains and
    10K classes.  Applying state information without incurring
    IPC/scheduling costs allows non-monitor workers to start up within
    ~4 seconds of the monitor starting up.
    Eric Wong authored and dormando committed Aug 4, 2013
    Configuration menu
    Copy the full SHA
    b5db211 View commit details
    Browse the repository at this point in the history

Commits on Aug 8, 2013

  1. Checking in changes prior to tagging of version 2.68.

    Changelog diff is:
    
    diff --git a/CHANGES b/CHANGES
    index f0f578c..b74f7f4 100644
    --- a/CHANGES
    +++ b/CHANGES
    @@ -1,3 +1,35 @@
    +2013-08-07: Release version 2.68
    +
    +   * optimize monitor worker for large installs (Eric Wong <[email protected]>)
    +
    +   * allow startup without job_master (and dependent workers) (Eric Wong <[email protected]>)
    +
    +   * store: do not disconnect for max_handles while locked (Eric Wong <[email protected]>)
    +
    +   * store: do not auto-reconnect while holding a lock (Eric Wong <[email protected]>)
    +
    +   * add naive MultipleDevice replication policy (Eric Wong <[email protected]>)
    +
    +   * httpfile: log mogstored I/O errors when checksumming (Eric Wong <[email protected]>)
    +
    +   * ProcManager: log socketpair errors correctly (Eric Wong <[email protected]>)
    +
    +   * fix "drain" handling used by MultipleHosts replpolicy (Eric Wong <[email protected]>)
    +
    +   * httpfile: correct timeouts for sidechannel digest (Eric Wong <[email protected]>)
    +
    +   * httpfile: correct FILE_MISSING check in digest_mgmt (Eric Wong <[email protected]>)
    +
    +   * mogstored: avoid bareword on IO::AIO w/o fadvise (Eric Wong <[email protected]>)
    +
    +   * ProcManager: only log times_out_of_qworkers for new queries (Eric Wong <[email protected]>)
    +
    +   * Don't emit warnings if we're lacking the space free of a device.  If we don't find space on devices with known space free, try the unknowns. (Dave Lambley <[email protected]>)
    +
    +   * list_keys: escape in Store, allow [%\\] as prefix (Eric Wong <[email protected]>)
    +
    +   * list_keys: consistent ESCAPE usage across DB types (Eric Wong <[email protected]>)
    +
     2013-02-02: Release version 2.67
    
        * Serialize tempfile reaping (dormando <[email protected]>)
    dormando committed Aug 8, 2013
    Configuration menu
    Copy the full SHA
    08c83b1 View commit details
    Browse the repository at this point in the history

Commits on Aug 10, 2013

  1. move Danga::Socket->Reset to ProcManager

    We will be using Danga::Socket in more (possibly all) workers,
    not just the Monitor and Reaper.  Resetting in workers that do
    not use Danga::Socket is harmless and will not allocate
    epoll/kqueue descriptors until the worker actually uses
    Danga::Socket.
    Eric Wong committed Aug 10, 2013
    Configuration menu
    Copy the full SHA
    d9d3a9b View commit details
    Browse the repository at this point in the history
  2. monitor: refactor/rewrite to use new async API

    In order to migrate to the upcoming Danga::Socket-based
    HTTP API, we'll first refactor monitor to use the new API
    (but preserve LWP usage behind-the-scenes).
    
    DEBUG=1 users will see the elapsed time for all device refreshes
    each time monitor runs.
    
    While we're at it, also guard against race conditions on the
    PUT/GET test by double-checking on failure.  (A long-standing
    TODO item)
    
    also squashed the following commit:
    
      use conn_timeout in monitor, node_timeout in other workers
    
      This matches the behavior in MogileFS:Server 2.65.
    
      It makes sense to use a different, lower timeout in monitor to
      quickly detect overloaded nodes and avoid propagating their
      liveness for a monitoring period.
    
      It also makes sense to use a higher value for node_timeout in
      other workers since other actions are less fault-tolerant.
    
      For example, a timed-out size check in create_close may cause a
      client to eventually reupload the file, creating even more load
      on the cluster.
    Eric Wong committed Aug 10, 2013
    Configuration menu
    Copy the full SHA
    d8cd470 View commit details
    Browse the repository at this point in the history
  3. monitor: switch to non-blocking HTTP device checks

    Net::HTTP::NB is usable with Danga::Socket and may be used to
    make HTTP requests in parallel.
    
    The new connection pool supports persistent connection pooling
    similar to LWP::ConnCache.  Total connection capacity is
    enforced to prevent out-of-FD situations on the workers.
    
    Unlike LWP::ConnCache, MogileFS::ConnectionPool is designed for
    use with concurrent, active connections.  It also supports
    queueing (when any enforced capacity or system limits are
    reached) and relies on Danga::Socket for scheduling queued
    connections.
    
    In addition to total capacity limits, MogileFS::ConnectionPool
    also supports limiting concurrency on a per-destination basis to
    avoid potentially overloading a single destination.
    
    Currently, we limit ourselves to 20 connections from a single
    worker (matching the old LWP limit) and also limit ourselves
    to 20 connections to a single host (again matching our previous
    LWP behavior).
    Eric Wong committed Aug 10, 2013
    Configuration menu
    Copy the full SHA
    333e071 View commit details
    Browse the repository at this point in the history
  4. JobMaster: use Danga::Socket to schedule

    In the future, this will allow JobMaster to write concurrently to
    ProcManager (or even individual workers) without blocking.
    
    (tweaked to accomodate "!want 0 job_master" support)
    Eric Wong committed Aug 10, 2013
    Configuration menu
    Copy the full SHA
    f281556 View commit details
    Browse the repository at this point in the history
  5. httpfile: remove size check failure backoff handling

    This backoff handling in HTTPFile is redundant for several reasons:
    
    * We rely on the monitor worker anyways to inform us of unreachable hosts
    
    * Monitor runs much faster nowadays, giving us a smaller window for
      out-of-date information about host reachability
    
    * HTTPFile->size no longer connects to the sidechannel port,
      only HTTP, so we waste fewer syscalls on failure if we a
      host went down before the last monitor run.
    Eric Wong committed Aug 10, 2013
    Configuration menu
    Copy the full SHA
    a78bc66 View commit details
    Browse the repository at this point in the history
  6. fsck: parallelize size checks for any given FID

    This allows us to us to speed up fsck on high latency clusters
    by issuing parallel HEAD requests.
    Eric Wong committed Aug 10, 2013
    Configuration menu
    Copy the full SHA
    81433c2 View commit details
    Browse the repository at this point in the history
  7. httpfile: use Net::HTTP::NB, remove LWP::UserAgent

    This allows us to use the same HTTP connections between
    digest and HTTP size checks, reducing the number of open
    connections we need in the Fsck worker.
    Eric Wong committed Aug 10, 2013
    Configuration menu
    Copy the full SHA
    d5cd4cf View commit details
    Browse the repository at this point in the history
  8. httpfile: use HTTP connection pool for DELETE

    This simplifies the delete subroutine and should reduce
    the number of sockets created during rebalance.
    Eric Wong committed Aug 10, 2013
    Configuration menu
    Copy the full SHA
    d45c8a6 View commit details
    Browse the repository at this point in the history
  9. delete worker uses persistent HTTP connections

    This allows us to avoid running ourselves out of local ports
    when handling massive delete storms.
    
    Eventually, we can parallelize deletes in a manner similar
    to fsck size checking.
    Eric Wong committed Aug 10, 2013
    Configuration menu
    Copy the full SHA
    13e5fe2 View commit details
    Browse the repository at this point in the history
  10. device: reuse HTTP connections for MKCOL

    This can reduce latency for folks still stuck with MKCOL.
    This creates no new sockets for replicate and monitor in
    all cases, as connections to the HTTP DAV server are already
    used in those workers.
    
    This only adds new persistent connections to the queryworker if
    GET-only HTTP ports are configured (queryworker already may call
    HTTPFile->size).
    Eric Wong committed Aug 10, 2013
    Configuration menu
    Copy the full SHA
    bdeaaf9 View commit details
    Browse the repository at this point in the history
  11. create_open: parallelize directory vivification

    For setups stuck needing MKCOL, we can parallelize
    directory vivification for multi-destination uploads.
    Eric Wong committed Aug 10, 2013
    Configuration menu
    Copy the full SHA
    7e7e530 View commit details
    Browse the repository at this point in the history
  12. replicate: enforce expected Content-Length in http_copy

    There's no reason we should ever skip Content-Length validation
    if we know which FID we're replicating and have an FID object
    handy.
    
    Conflicts:
    	lib/MogileFS/Worker/Replicate.pm
    Eric Wong committed Aug 10, 2013
    Configuration menu
    Copy the full SHA
    b8a0674 View commit details
    Browse the repository at this point in the history
  13. replicate: use persistent connection from pool if possible

    This should reduce the amount of TIME-WAIT sockets and TCP
    handshakes when replicating, especially with small files.
    
    An attempt was previously made to use the Net::HTTP::NB API
    directly, but that resulted in complicated callback nesting
    and state management needed to throttle the reader if the
    sender socket were blocked in any way.
    
    There were many bugs in the early version of this code as
    a result of the complicated code.  Even after all the bugs
    got fixed, a small performance reduction due to the extra
    buffer copies was difficult to avoid.
    
    Thus I started using the synchronous version to keep the code
    simple and fast while preserving the ability to use persistent
    sockets to avoid excessive TIME-WAIT and handshaking for small
    file replication.
    Eric Wong committed Aug 10, 2013
    Configuration menu
    Copy the full SHA
    9054405 View commit details
    Browse the repository at this point in the history
  14. host: handle case where conn_get may return undef

    MogileFS::ConnectionPool::conn_get may return undef on some
    errors, so we must account for that and not kill the replicate
    worker.
    Eric Wong committed Aug 10, 2013
    Configuration menu
    Copy the full SHA
    c562af2 View commit details
    Browse the repository at this point in the history
  15. ConnectionPool: improve reporting of socket creation errors

    Send the entire error message (including intended host:port so
    it is more informative when it propagates to
    Connection::HTTP::err_response.  We also do not need to log
    the error in ConnectionPool, as the error will be logged by
    the caller.
    
    While we're at it, fix the documentation and a spelling error in
    err_response, too.
    Eric Wong committed Aug 10, 2013
    Configuration menu
    Copy the full SHA
    72758a5 View commit details
    Browse the repository at this point in the history
  16. t/http.t: test error handling on non-running server

    We need to ensure we don't blow up a worker process if a
    server is shutdown and a connection attempted before the
    monitor notices.
    Eric Wong committed Aug 10, 2013
    Configuration menu
    Copy the full SHA
    66c8827 View commit details
    Browse the repository at this point in the history
  17. connection/{poolable,http}: common retry logic for timeouts

    We will want similar logic for Mogstored sidechannel to avoid
    retrying on timeout.
    Eric Wong committed Aug 10, 2013
    Configuration menu
    Copy the full SHA
    9b2b87a View commit details
    Browse the repository at this point in the history
  18. connection/poolable: stricter timeout key check

    String representations of small floating point values may
    be in (scientific) E notation, so we must ensure the entire
    string is free of decimal digits before considering it a
    configuration key.
    Eric Wong committed Aug 10, 2013
    Configuration menu
    Copy the full SHA
    fa80d20 View commit details
    Browse the repository at this point in the history
  19. ProcManager: SetAsChild drops inherited IPC sockets

    Workers only need to inherit the minimum amount necessary from the
    parent ProcManager.  Keeping the socket of unrelated workers in each
    worker is wasteful and may contribute to premature resource
    exhaustion.
    
    Additionally, we will be using Danga::Socket in more (possibly all)
    workers, not just the Monitor and Reaper.  Resetting in workers that
    do not use Danga::Socket is harmless and will not allocate
    epoll/kqueue descriptors until the worker actually uses
    Danga::Socket.
    Eric Wong committed Aug 10, 2013
    Configuration menu
    Copy the full SHA
    11e3cdc View commit details
    Browse the repository at this point in the history
  20. monitor: remove misleading error message for timeout

    The timeout we're removing includes time spent in the queue waiting
    to even start, so reporting it in the syslog is confusing,
    especially since we already log the timeout via Connection::Poolable
    
    This avoids a confusing sequence of error messages like the following:
    
    [monitor(666)] node_timeout: 2 (elapsed: 2.00099802017212): GET http://127.0.0.1:7500/dev666/usage
    [monitor(666)] Timeout contacting 127.0.0.1 dev 666 (http://127.0.0.1:7500/dev666/usage):  took 2.25 seconds out of 2 allowed
    
    Now, we only display the first message.
    Eric Wong committed Aug 10, 2013
    Configuration menu
    Copy the full SHA
    4dd1d6f View commit details
    Browse the repository at this point in the history

Commits on Aug 19, 2013

  1. Checking in changes prior to tagging of version 2.70.

    Changelog diff is:
    
    diff --git a/CHANGES b/CHANGES
    index b74f7f4..a6b2872 100644
    --- a/CHANGES
    +++ b/CHANGES
    @@ -1,3 +1,26 @@
    +2013-08-18: Release version 2.70
    +
    +   * This release features a very large rewrite to the Monitor worker to run
    +     checks in parallel. There are no DB schema changes.
    +
    +   * replicate: use persistent connection from pool if possible (Eric Wong <[email protected]>)
    +
    +   * replicate: enforce expected Content-Length in http_copy (Eric Wong <[email protected]>)
    +
    +   * create_open: parallelize directory vivification (Eric Wong <[email protected]>)
    +
    +   * device: reuse HTTP connections for MKCOL (Eric Wong <[email protected]>)
    +
    +   * delete worker uses persistent HTTP connections (Eric Wong <[email protected]>)
    +
    +   * httpfile: use HTTP connection pool for DELETE (Eric Wong <[email protected]>)
    +
    +   * httpfile: use Net::HTTP::NB, remove LWP::UserAgent (Eric Wong <[email protected]>)
    +
    +   * fsck: parallelize size checks for any given FID (Eric Wong <[email protected]>)
    +
    +   * monitor: refactor/rewrite to use new async API (Eric Wong <[email protected]>)
    +
     2013-08-07: Release version 2.68
    
        * optimize monitor worker for large installs (Eric Wong <[email protected]>)
    dormando committed Aug 19, 2013
    Configuration menu
    Copy the full SHA
    7969ce4 View commit details
    Browse the repository at this point in the history

Commits on Dec 15, 2014

  1. add LICENSE file to distro

    Clarified by Brad Fitzpatrick
    dormando committed Dec 15, 2014
    Configuration menu
    Copy the full SHA
    202bbef View commit details
    Browse the repository at this point in the history

Commits on Dec 16, 2014

  1. host: add "readonly" state to override device "alive" state

    Marking an entire host as "readonly" before a host maintenance
    window can useful and easier than marking each device "readonly"
    and reduces the likelyhood a device will be incorrectly marked
    as "alive" again when it is intended to stay down.
    Eric Wong authored and dormando committed Dec 16, 2014
    Configuration menu
    Copy the full SHA
    b7aff32 View commit details
    Browse the repository at this point in the history
  2. enable TCP keepalives for iostat watcher sockets

    This allows the monitor to eventually notice a client socket is
    totally gone if a machine death was not detected earlier.  We enable
    TCP keepalive everywhere else, too.
    Eric Wong authored and dormando committed Dec 16, 2014
    Configuration menu
    Copy the full SHA
    569cbb5 View commit details
    Browse the repository at this point in the history
  3. add conn_pool_size configuration option

    This defines the size of the HTTP connection pool.  This affects
    all workers at the moment, but is likely most interesting to the
    Monitor as it affects the number of devices the monitor may
    concurrently update.
    
    This defaults to 20 (the long-existing, hard-coded value).
    
    In the future, there may be a(n easy) way to specify this on a
    a per-worker basis, but for now it affects all workers.
    Eric Wong authored and dormando committed Dec 16, 2014
    Configuration menu
    Copy the full SHA
    8593b0e View commit details
    Browse the repository at this point in the history
  4. connection/poolable: do not write before event_write

    Blindly attempting to write to a socket before a TCP connection can be
    established returns EAGAIN on Linux, but not on FreeBSD 8/9.  This
    causes Danga::Socket to error out, as it won't attempt to buffer on
    anything but EAGAIN on write() attempts.
    
    Now, we buffer writes explicitly after the initial socket creation and
    connect(), and only call Danga::Socket::write when we've established
    writability.  This works on Linux, too, and avoids an unnecessary
    syscall in most cases.
    
    Reported-by: Alex Yakovenko <[email protected]>
    Eric Wong authored and dormando committed Dec 16, 2014
    Configuration menu
    Copy the full SHA
    fcd13ab View commit details
    Browse the repository at this point in the history
  5. connection/poolable: disable watch_write before retrying write

    Otherwise we'll end up constantly waking up when there's nothing
    to write.
    Eric Wong authored and dormando committed Dec 16, 2014
    Configuration menu
    Copy the full SHA
    828ed8e View commit details
    Browse the repository at this point in the history
  6. connection/poolable: defer expiry of timed out connections

    The timeout check may run on a socket before epoll_wait/kevent has
    a chance to run, giving the application no chance for any readiness
    callbacks to fire.
    
    This prevents timeouts in the monitor if the database is slow during
    synchronous UPDATE device calls (or there are just thousands of active
    connections).
    Eric Wong authored and dormando committed Dec 16, 2014
    Configuration menu
    Copy the full SHA
    fef1fe7 View commit details
    Browse the repository at this point in the history
  7. monitor: defer DB updates until all HTTP requests are done

    HTTP requests time out because we had to wait synchronously for DBI,
    this is very noticeable on a high-latency connection.  So avoid
    running synchronous code while asynchronous code (which is subject
    to timeouts) is running..
    Eric Wong authored and dormando committed Dec 16, 2014
    Configuration menu
    Copy the full SHA
    e654401 View commit details
    Browse the repository at this point in the history
  8. monitor: ping parent during deferred DB updates

    With enough devices and high enough network latency to the DB,
    we bump into the watchdog timeout of 30s easily.
    Eric Wong authored and dormando committed Dec 16, 2014
    Configuration menu
    Copy the full SHA
    b0f05b7 View commit details
    Browse the repository at this point in the history
  9. monitor: batch MySQL device table updates

    Issuing many UPDATE statements slow down monitoring on high latency
    connections between the monitor and DB.  Under MySQL, it is possible
    to do multiple UPDATEs in a single statement using CASE/WHEN
    syntax.
    
    We limit ourselves to 10000 devices per update for now, this should
    keep us comfortably under most the max_allowed_packet size of most
    MySQL deployments (where the default is 1M).
    
    A compatibility function is provided for SQLite and Postgres users.
    SQLite users are not expected to run this over high-latency NFS, and
    interested Postgres users should submit their own implementation.
    Eric Wong authored and dormando committed Dec 16, 2014
    Configuration menu
    Copy the full SHA
    7b6fe2e View commit details
    Browse the repository at this point in the history
  10. remove users of unreachable_fids table

    mark_fidid_unreachable has not been used since MogileFS 2.35
    commit 53528c7
    ("Wipe out old replication code.", r1432)
    Eric Wong authored and dormando committed Dec 16, 2014
    Configuration menu
    Copy the full SHA
    196e928 View commit details
    Browse the repository at this point in the history
  11. remove update_host_property

    No longer used since commit ebf8a5a
    ("Mass nuke unused code and fix most tests") in MogileFS 2.50
    Eric Wong authored and dormando committed Dec 16, 2014
    Configuration menu
    Copy the full SHA
    a8dbbea View commit details
    Browse the repository at this point in the history
  12. Work with DBD::SQLite's latest lock errors

    "is not unique" => "UNIQUE constraint failed". String matching is lovely.
    dormando committed Dec 16, 2014
    Configuration menu
    Copy the full SHA
    d0ee2a2 View commit details
    Browse the repository at this point in the history
  13. Checking in changes prior to tagging of version 2.72.

    Changelog diff is:
    
    diff --git a/CHANGES b/CHANGES
    index a6b2872..441b328 100644
    --- a/CHANGES
    +++ b/CHANGES
    @@ -1,3 +1,29 @@
    +2014-12-15: Release version 2.72
    +
    +   * Work with DBD::SQLite's latest lock errors (dormando <[email protected]>)
    +
    +   * remove update_host_property (Eric Wong <[email protected]>)
    +
    +   * remove users of unreachable_fids table (Eric Wong <[email protected]>)
    +
    +   * monitor: batch MySQL device table updates (Eric Wong <[email protected]>)
    +
    +   * monitor: defer DB updates until all HTTP requests are done (Eric Wong <[email protected]>)
    +
    +   * connection/poolable: defer expiry of timed out connections (Eric Wong <[email protected]>)
    +
    +   * connection/poolable: disable watch_write before retrying write (Eric Wong <[email protected]>)
    +
    +   * connection/poolable: do not write before event_write (Eric Wong <[email protected]>)
    +
    +   * add conn_pool_size configuration option (Eric Wong <[email protected]>)
    +
    +   * enable TCP keepalives for iostat watcher sockets (Eric Wong <[email protected]>)
    +
    +   * host: add "readonly" state to override device "alive" state (Eric Wong <[email protected]>)
    +
    +   * add LICENSE file to distro (dormando <[email protected]>)
    +
     2013-08-18: Release version 2.70
    
        * This release features a very large rewrite to the Monitor worker to run
    dormando committed Dec 16, 2014
    Configuration menu
    Copy the full SHA
    6832b1b View commit details
    Browse the repository at this point in the history

Commits on Apr 17, 2015

  1. replicate: reduce backoff for too_happy FIDs

    Due to a bug the MultipleNetworks replication policy
    <[email protected]>, a network split caused an
    instance to explode with overreplicated files.  Since every
    too_happy pruning increases failcount, it could end up taking days
    due to clean up a file with far too many replicas.
    Eric Wong committed Apr 17, 2015
    Configuration menu
    Copy the full SHA
    f9c9d68 View commit details
    Browse the repository at this point in the history

Commits on Jun 12, 2015

  1. enable DB upgrade for host readonly state

    The readonly host state was not enabled via mogdbsetup and
    could not be used although the code supports it, making
    the schema version bump to 16 a no-op.
    
    This bumps the schema version to 17.
    
    Add a test using mogadm to ensure the setting is changeable, as the
    existing test for this state did not rely on the database.
    
    This was also completely broken with Postgres before, as
    Postgres currently offers no way to modify constraints in-place.
    Constraints must be dropped and re-added instead.
    
    Note: it seems the upgrade_add_device_* functions in Postgres.pm
    are untested as well and never got used.  Perhaps they ought
    to be removed entirely since those device columns predate Postgres
    support.
    Eric Wong committed Jun 12, 2015
    Configuration menu
    Copy the full SHA
    e2bd3ff View commit details
    Browse the repository at this point in the history

Commits on Dec 17, 2015

  1. replicate: avoid buffered IO on reads

    Perl buffered IO is only reading 8K at a time (or only 4K on older
    versions!) despite us requesting to read in 1MB chunks.  This wastes
    syscalls and can affect TCP window scaling when MogileFS is
    replicating across long fat networks (LFN).
    
    While we're at it, this fixes a long-standing FIXME item to perform
    proper timeouts when reading headers as we're forced to do sysread
    instead of line-buffered I/O.
    
    ref: https://rt.perl.org/Public/Bug/Display.html?id=126403
    (and confirmed by strace-ing replication workers)
    Eric Wong committed Dec 17, 2015
    Configuration menu
    Copy the full SHA
    8b76d98 View commit details
    Browse the repository at this point in the history

Commits on Feb 9, 2017

  1. Merge remote-tracking branch 'bogomips/fix-readonly' into testing

    * bogomips/fix-readonly:
      enable DB upgrade for host readonly state
    Eric Wong committed Feb 9, 2017
    Configuration menu
    Copy the full SHA
    8d1d685 View commit details
    Browse the repository at this point in the history
  2. Merge remote-tracking branch 'bogomips/fsck-recheck' into testing

    * bogomips/fsck-recheck:
      fsck: this avoid redundant fsck log entries
    Eric Wong committed Feb 9, 2017
    Configuration menu
    Copy the full SHA
    aae24cf View commit details
    Browse the repository at this point in the history
  3. Merge remote-tracking branch 'bogomips/fsck-found-order' into testing

    * bogomips/fsck-found-order:
      fsck: do not log FOND if note_on_device croaks
    Eric Wong committed Feb 9, 2017
    Configuration menu
    Copy the full SHA
    f2c011a View commit details
    Browse the repository at this point in the history
  4. Merge remote-tracking branch 'bogomips/prune-too_happy-v3' into testing

    * bogomips/prune-too_happy-v3:
      replicate: reduce backoff for too_happy FIDs
    Eric Wong committed Feb 9, 2017
    Configuration menu
    Copy the full SHA
    32fba34 View commit details
    Browse the repository at this point in the history
  5. Merge remote-tracking branch 'bogomips/resurrect-device' into testing

    * bogomips/resurrect-device:
      reaper: detect resurrection of "dead" devices
    Eric Wong committed Feb 9, 2017
    Configuration menu
    Copy the full SHA
    7eaab8e View commit details
    Browse the repository at this point in the history

Commits on Feb 13, 2017

  1. ConnectionPool: avoid undefined behavior for hash iteration

    Perl 5.18 stable and later (commit a7b39f85d7caac) introduced a
    warning for restarting `each` after hash modification.  While we
    accounted for this undefined behavior and documented it in the
    past, this may still cause maintenance problems in the future
    despite our current workarounds being sufficient.
    
    In any case, keeping idle sockets around is cheap with modern
    APIs, and conn_pool_size was introduced in 2.72 to avoid
    dropping idle connections at all; so _conn_drop_idle may
    never be called on a properly configured tracker.
    
    Mailing list references:
    
    <CABJfL5jiAGC+5JzZjuW7R_NXs1DShHPGsKnjzXrPbjWOy2wi3g@mail.gmail.com>
    <[email protected]>
    Eric Wong committed Feb 13, 2017
    Configuration menu
    Copy the full SHA
    2acba9e View commit details
    Browse the repository at this point in the history

Commits on Apr 6, 2017

  1. client connection should always be nonblocking

    On *BSD platforms, the accept()-ed clients inherit the
    O_NONBLOCK file flag from the listen socket.
    
    This is not true on Linux, and I noticed sockets blocking on
    write() syscalls via strace.  Checking the octal 04000
    (O_NONBLOCK) flag in /proc/$PID/fdinfo/$FD for client TCP
    sockets confirms O_NONBLOCK was not set.
    
    This also makes us resilient to spurious wakeups causing
    event_read to get stuck, as documented in the Linux select(2)
    manpage.
    Eric Wong committed Apr 6, 2017
    Configuration menu
    Copy the full SHA
    56b39b8 View commit details
    Browse the repository at this point in the history

Commits on Apr 7, 2017

  1. tracker: client fairness, backpressure, and expiry

    Make client query processing less aggressive and more fair by
    only enqueueing a single worker request at a time.  Pipelined
    requests in the read buffer will only be handled after
    successful writes, and any incomplete writes will block further
    request processing.
    
    Furthermore, add a watchdog for clients we're writing to
    expire clients which are not reading our responses.
    
    Danga::Socket allows clients to use an infinite amount of
    space for buffering, and it's possible for dead sockets
    to go undetected for hours by the OS.
    
    Use a watchdog to kick out any sockets which have made no
    forward progress after two minutes.
    Eric Wong committed Apr 7, 2017
    Configuration menu
    Copy the full SHA
    05cdf17 View commit details
    Browse the repository at this point in the history
  2. client: use single write for admin commands

    This avoids the odd case where the first write completes, but
    the second one (for 3 bytes: ".\r\n") does not complete, causing
    a client to having both read and write watchability enabled
    after the previous commit to stop reads when writes do not
    complete.
    
    This would not be fatal, but breaks the rule where clients
    should only be reading or writing exclusively, never doing
    both; as that could lead to pathological memory usage.
    
    This also reduces client wakeups and TCP overhead with
    TCP_NODELAY sockets by avoiding a small packet (".\r\n")
    after the main response.
    Eric Wong committed Apr 7, 2017
    Configuration menu
    Copy the full SHA
    5d18499 View commit details
    Browse the repository at this point in the history
  3. client: always disable watch_read after a command

    Otherwise it'll be possible to pipeline admin (!) commands
    and event_read will trigger EOF before all the admin commands
    are processed in read_buf.
    Eric Wong committed Apr 7, 2017
    Configuration menu
    Copy the full SHA
    7748d36 View commit details
    Browse the repository at this point in the history

Commits on May 8, 2017

  1. Merge branch 'client-backpressure' into next

    * client-backpressure:
      client: always disable watch_read after a command
      client: use single write for admin commands
      tracker: client fairness, backpressure, and expiry
      client connection should always be nonblocking
    Eric Wong committed May 8, 2017
    Configuration menu
    Copy the full SHA
    ed1aac3 View commit details
    Browse the repository at this point in the history
  2. Merge remote-tracking branch 'bogomips/replicate-nobuf' into next

    * bogomips/replicate-nobuf:
      replicate: avoid buffered IO on reads
    Eric Wong committed May 8, 2017
    Configuration menu
    Copy the full SHA
    acc475c View commit details
    Browse the repository at this point in the history

Commits on May 9, 2017

  1. Merge remote-tracking branch 'bogomips/conn-pool-each' into next

    * bogomips/conn-pool-each:
      ConnectionPool: avoid undefined behavior for hash iteration
    Eric Wong committed May 9, 2017
    Configuration menu
    Copy the full SHA
    591b6d2 View commit details
    Browse the repository at this point in the history

Commits on Jun 7, 2017

  1. fsck: avoid infinite wait on dead devices

    If DevFID::size_on_disk encounters an unreadable (dead) device
    AND there are no HTTP requests pending; we must ensure
    Danga::Socket runs the PostLoopCallback to check if the event
    loop is complete.  Do that by scheduling another timer to
    run immediately.
    Eric Wong committed Jun 7, 2017
    Configuration menu
    Copy the full SHA
    2c50b8b View commit details
    Browse the repository at this point in the history

Commits on Sep 18, 2017

  1. Merge branch 'fsck-timeout' into next

    * fsck-timeout:
      fsck: avoid infinite wait on dead devices
    Eric Wong committed Sep 18, 2017
    Configuration menu
    Copy the full SHA
    3204f2c View commit details
    Browse the repository at this point in the history

Commits on Jan 19, 2018

  1. Checking in changes prior to tagging of version 2.73.

    Changelog diff is:
    
    diff --git a/CHANGES b/CHANGES
    index 441b328..e053851 100644
    --- a/CHANGES
    +++ b/CHANGES
    @@ -1,3 +1,29 @@
    +2018-01-18: Release version 2.73
    +
    +   * fsck: avoid infinite wait on dead devices (Eric Wong <[email protected]>)
    +
    +   * client: always disable watch_read after a command (Eric Wong <[email protected]>)
    +
    +   * client: use single write for admin commands (Eric Wong <[email protected]>)
    +
    +   * tracker: client fairness, backpressure, and expiry (Eric Wong <[email protected]>)
    +
    +   * client connection should always be nonblocking (Eric Wong <[email protected]>)
    +
    +   * ConnectionPool: avoid undefined behavior for hash iteration (Eric Wong <[email protected]>)
    +
    +   * replicate: avoid buffered IO on reads (Eric Wong <[email protected]>)
    +
    +   * enable DB upgrade for host readonly state (Eric Wong <[email protected]>)
    +
    +   * replicate: reduce backoff for too_happy FIDs (Eric Wong <[email protected]>)
    +
    +   * fsck: this avoid redundant fsck log entries (Eric Wong <[email protected]>)
    +
    +   * fsck: do not log FOND if note_on_device croaks (Eric Wong <[email protected]>)
    +
    +   * reaper: detect resurrection of "dead" devices (Eric Wong <[email protected]>)
    +
     2014-12-15: Release version 2.72
    
        * Work with DBD::SQLite's latest lock errors (dormando <[email protected]>)
    dormando committed Jan 19, 2018
    Configuration menu
    Copy the full SHA
    92b6914 View commit details
    Browse the repository at this point in the history