Skip to content

Latest commit

 

History

History
196 lines (153 loc) · 7.64 KB

README.md

File metadata and controls

196 lines (153 loc) · 7.64 KB

BucketStore

An abstraction layer on the top of file cloud storage systems such as Google Cloud Storage or S3. This module exposes a generic interface that allows interoperability between different storage options. Callers don't need to worry about the specifics of where and how a file is stored and retrieved as long as the given key is valid.

Keys within the BucketStorage are URI strings that can universally locate an object in the given provider. A valid key example would be gs://a-gcs-bucket/file/path.json.

Usage

This library is distributed as a Ruby gem, and we recommend adding it to your Gemfile:

gem "bucket_store"

Some attributes can be configured via BucketStore.configure. If using Rails, you want to add a new initializer for BucketStore. Example:

BucketStore.configure do |config|
  config.logger = Logger.new($stderr)
end

If using RSpec, you'll probably want to add this line to RSpec's config block (see the Adapters section for more details):

config.before { BucketStore::InMemory.reset! }

For our policy on compatibility with Ruby versions, see COMPATIBILITY.md.

Design and Architecture

The main principle behind BucketStore is that each resource or group of resources must be unequivocally identifiable by a URI. The URI is always composed of three parts:

  • the "adapter" used to fetch the resource (see "adapters" below)
  • the "bucket" where the resource lives
  • the path to the resource(s)

As an example, all the following are valid URIs:

  • gs://gcs-bucket/path/to/file.xml
  • inmemory://bucket/separator/file.xml
  • disk://hello/path/to/file.json

Even though BucketStore's main goal is to be an abstraction layer on top of systems such as S3 or Google Cloud Storage where the "path" to a resource is in practice a unique identifier as a whole (i.e. the / is not a directory separator but rather part of the key's name), we assume that clients will actually want some sort of hierarchical separation of resources and assume that such separation is achieved by defining each part of the hierarchy via /.

This means that the following are also valid URIs in BucketStore but they refer to all the resources under that specific hierarchy:

  • gs://gcs-bucket/path/subpath/
  • inmemory://bucket/separator/
  • disk://hello/path

Configuration

BucketStore exposes some configurable attributes via BucketStore.configure. If necessary this should be called at startup time before any other method is invoked.

  • logger: custom logger class. By default, logs will be sent to stdout.

Adapters

BucketStore comes with 4 built-in adapters:

  • gs: the Google Cloud Storage adapter
  • s3: the S3 adapter
  • disk: a disk-based adapter
  • inmemory: an in-memory store

GS adapter

This is the adapter for Google Cloud Storage. BucketStore assumes that the authorisation for accessing the resources has been set up outside of the gem.

S3 adapter

This is the adapter for S3. BucketStore assumes that the authorisation for accessing the resources has been set up outside of the gem (see also https://docs.aws.amazon.com/sdk-for-ruby/v3/api/index.html#Configuration).

Disk adapter

A disk-backed key-value store. This adapter will create a temporary directory where all the files will be written into/read from. The base directory can be explicitly defined by setting the DISK_ADAPTER_BASE_DIR environment variable, otherwise a temporary directory will be created.

In-memory adapter

An in-memory key-value storage. This works just like the disk adapter, except that the content of all the files is stored in memory, which is particularly useful for testing. Note that content added to this adapter will persist for the lifetime of the application as it's not possible to create different instances of the same adapter. In general, this is not what's expected during testing where the content of the bucket should be reset between different tests. The adapter provides a way to easily reset the content though via a .reset! method. In RSpec this would translate to adding this line in the spec_helper:

config.before { BucketStore::InMemory.reset! }

BucketStore vs ActiveStorage

ActiveStorage is a common framework to access cloud storage systems that is included in the ActiveSupport library. In general, ActiveStorage provides a lot more than BucketStore does (including many more adapters) however the two libraries have different use cases in mind:

  • ActiveStorage requires you to define every possible bucket you're planning to use ahead of time in a YAML file. This works well for most cases, however if you plan to use a lot of buckets this soon becomes impractical. We think that BucketStore approach works much better in this case.
  • BucketStore does not provide ways to manipulate the content whereas ActiveStorage does. If you plan to apply transformations to the content before uploading or after downloading them, then probably ActiveStorage is the library for you. With that said, it's still possible to do these transformations outside of BucketStore and in fact we've found the explicitness of this approach a desirable property.
  • BucketStore approach makes any resource on a cloud storage system uniquely identifiable via a single URI, which means normally it's enough to pass that string around different systems to be able to access the resource without ambiguity. As the URI also includes the adapter, it's possible for example to download a disk://dir/input_file and upload it to a gs://bucket/output_file all going through a single interface. ActiveStorage is instead focused on persisting an equivalent reference on a Rails model. If your application does not use Rails, or does not need to persist the reference or just requires more flexibility in general, then BucketStore is probably the library for you.

Examples

Uploading a string to a bucket

BucketStore.for("inmemory://bucket/path/file.xml").upload!("hello world")
=> "inmemory://bucket/path/file.xml"

Accessing a string in a bucket

BucketStore.for("inmemory://bucket/path/file.xml").download
=> {:bucket=>"bucket", :key=>"path/file.xml", :content=>"hello world"}

Uploading a file-like object to a bucket

buffer = StringIO.new("This could also be an actual file")
BucketStore.for("inmemory://bucket/path/file.xml").stream.upload!(file: buffer)
=> "inmemory://bucket/path/file.xml"

Downloading to a file-like object from a bucket

buffer = StringIO.new
BucketStore.for("inmemory://bucket/path/file.xml").stream.download(file: buffer)
=> {:bucket=>"bucket", :key=>"path/file.xml", :file=>buffer}
buffer.string 
=> "This could also be an actual file"

Listing all keys under a prefix

BucketStore.for("inmemory://bucket/path/").list
=> ["inmemory://bucket/path/file.xml"]

Delete a file

BucketStore.for("inmemory://bucket/path/file.xml").delete!
=> true

Development

Running tests

BucketStore comes with both unit and integration tests. While unit tests can be run by simply executing bundle exec rspec, integration tests require running minio locally. We provide a docker-compose file that spins up pre-configured simulator instances for S3 and GCS with test buckets. Running the integration tests is as easy as:

docker-compose up
bundle exec rspec --tag integration

License & Contributing

GoCardless ♥ open source. If you do too, come join us.