Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NodeFSStorageAdapter does not preload any document #397

Open
Daru13 opened this issue Nov 8, 2024 · 2 comments
Open

NodeFSStorageAdapter does not preload any document #397

Daru13 opened this issue Nov 8, 2024 · 2 comments

Comments

@Daru13
Copy link

Daru13 commented Nov 8, 2024

I am developing a server that stores the current state of multiple repositories on the disk using NodeFSStorageAdapter and exposes their content, i.e., the documents it contains, to its clients. This is working great as long as the documents are created while the server is running. However, whenever I stop the server and start it again, the server seems to be unaware of the locally available content of the repositories: no existing document seems to be (pre)loaded by NodeFSStorageAdapter unless I manually request them using Repo.find. This, in turn, requires knowing the IDs of all the document of the repository that exist locally. Yet, I did not find any good way to get that information using the current APIs of automerge-repo and the storage adapter I'm using.

My current workaround is to make some assumptions on how NodeFSStorageAdapter stores the data on the disk and to go through the two-level hierarchy of directories it uses to encode the document IDs immediately after creating the repository:

import type * as A from "@automerge/automerge-repo";
import * as fs from "node:fs";

function loadAutomergeDocumentsFromFiles(
    repo: A.Repo,
    pathToStorageDirectory: string
): void {
    // To retrieve all the documents IDs, we assume that NodeFSStorageAdapter
    // stores them in the given root directory under the following structure:
    // 
    // <root directory>
    //   └ <first two characters of the ID>         (1st level directories below)
    //       └ <remaining characters of the ID>     (2nd level directories below)
    //           └ ...
    // 
    // Moreover, all repositories written on disk with this utility seem to
    // store an entry called "storage-adapter-id" (under st/orage-adapter-id),
    // which must therefore NOT be treated as a document ID.

    const automergeDocumentIds = new Set<string>();

    const firstLevelDirectoryNames = fs.readdirSync(pathToStorageDirectory);
    for (let firstLevelDirectoryName of firstLevelDirectoryNames) {
        const secondLevelDirectoryNames = fs.readdirSync(`${pathToStorageDirectory}/${firstLevelDirectoryName}`);
        for (let secondLevelDirectoryName of secondLevelDirectoryNames) {
            automergeDocumentIds.add(`${firstLevelDirectoryName}${secondLevelDirectoryName}`);
        }
    }

    automergeDocumentIds.delete("storage-adapter-id");

    for (let id of automergeDocumentIds) {
        try {
            repo.find(id as A.DocumentId);
        }
        catch (error) {
            console.warn(`Error while preloading Automerge document ${id}.`);
        }
    }
}

...but it feels a bit hacky, since this code is relying on implementation details.

It would be great to have an "official" way to automatically preload all the documents of a repository that are locally available, or, at the very least, to get the list of IDs of all the documents that are stored on the disk.

For example, I imagine that it could take the form of a flag passed to the constructor of NodeFSStorageAdapter, and/or of a new (possibly optional) method to preload documents in the StorageAdapterInterface interface (which may, in turn, be exposed as a flag when creating a repository, regardless of the storage adapter that is actually being used).

Has anyone else faced a similar problem/found a better solution? If we reach an agreement on a solution, I can look into it and create a PR :).

@pvh
Copy link
Member

pvh commented Nov 8, 2024

Hi @Daru13. It is by design that you cannot (supportedly) introspect the contents of a repository. You should assume that other applications or users share a storage adapter. The recommended approach is to create some kind a directory document that is your "root folder" and store that one ID somewhere convenient. From there, you can relatively easily load documents recursively.

This is an important design feature in automerge-repo, because in a world where applications are always changing and different elements of your system might introduce new document types a design that assumes a homogenous repository becomes a liability.

@Daru13
Copy link
Author

Daru13 commented Nov 11, 2024

Hi @pvh, and thank you for the explanation of this design rationale.

I am currently using a dedicated repository on the disk for each of the server's Automerge repositories, so I suppose that I can safely assume that no other application/user will alter its content. That being said, I agree with you that my current solution is definitely not very future proof.

I initially wanted to try using Automerge to store all the server's data, including some local configuration data, to avoid using another form of storage, but I suppose that I will have to if I need to store the ID of some "root" document, as you suggest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants