Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database schema #6

Closed
rgaudin opened this issue May 29, 2024 · 10 comments · Fixed by #13
Closed

Database schema #6

rgaudin opened this issue May 29, 2024 · 10 comments · Fixed by #13
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@rgaudin
Copy link
Member

rgaudin commented May 29, 2024

Here's a list of the information I think we'd want in the DB.

Given we may want to use Metabase to consume data, we must have all information in-DB as metabase will connect directly and we wont be able to leverage computed stuff in the backend.
This must be taken into consideration when designing the DB. Might have an impact on DB size but that's not an issue

Mirrors

  • ID
  • Base URL
  • Enabled
  • Region
  • ➡️ Country
  • ASN
  • Score
  • Latitude
  • Longitude
  • Country Only
  • Region Only
  • AS Only
  • Other Countries

ASN and below params are duplicates from the MB DB that would only be helpful once we intend to configure MB remotely. We may skip all this for now. Also, those information are not available on mirrors.html

Countries

  • Code
  • Name

Workers

  • ID
  • ➡️ Countries
  • Auth Info
  • Last Seen On

Tests

  • ID
  • ➡️ Mirror
  • ➡️ Worker*
  • ➡️ Country
  • RequestedOn
  • StartedOn
  • Status: Missed | Succeeded | Errored
  • Error*
  • IP
  • ASN
  • ISP
  • Location (City)
  • Latency
  • Download Size
  • Download duration
  • Download speed

@elfkuzco, use this as direction and not specification. You are invited to suggest otherwise. Also at this stage, we want a basis to build upon. No doubt this will evolve as we implement.

@rgaudin rgaudin added the enhancement New feature or request label May 29, 2024
@rgaudin rgaudin added this to the MVP milestone May 29, 2024
@elfkuzco
Copy link
Collaborator

elfkuzco commented May 30, 2024

Regarding the fields from MirrorBrain about a mirror, I guess they can be nullable fields, right?
Does the AuthInfo field for the Worker table mean authentication credentials? If so, what do you think of the worker manager generating tokens with a public key and having the server verify the token with a private key as opposed to storing this data on the database?

@rgaudin
Copy link
Member Author

rgaudin commented May 30, 2024

Regarding the fields from MirrorBrain about a mirror, I guess they can be nullable fields, right?

Right.

Does the AuthInfo field for the Worker table mean authentication credentials? If so, what do you think of the worker manager generating tokens with a public key and having the server verify the token with a private key as opposed to storing this data on the database?

Yes, that's what we do in Zimfarm 👍

@elfkuzco
Copy link
Collaborator

Yes, that's what we do in Zimfarm 👍

That means, we shouldn't store the AuthInfo on the database then as we can set the public key as environment variable or pass it as a parameter? Right?

@rgaudin
Copy link
Member Author

rgaudin commented May 30, 2024

No, we still want to maintain ability to have multiple workers. We'll most likely have one but it can change overtime and we want want to keep records separate. Another thing we may want to do is to have contrib workers run from time to time. My country is not in any VPN's list and will never be so I may start one from time to time to get insights.

So we want to independant worker records and we want to identify and authenticate them. We'll store the public key in the DB for that

@elfkuzco
Copy link
Collaborator

Okay

@elfkuzco
Copy link
Collaborator

What do you think about adding a version number to the models especially for the Test since it would be updated? The version would serve as a form of optimistic lock for concurrent updates ensuring that only one update request would go through.

Here's a snippet of it I have used in one of my personal projects.

async def update_model(
    session: AsyncSession,
    model_cls: type,
    primary_key_field: str,
    record_id: str | int,
    version: int,
    **update_values: Any,
) -> int:
    """
    Updates the fields of a database model using the version number
    to prevent race conditions where two processes update
    the same model at the same time
    - primary_key_field: the primary key field for this model
    - record_id: the primary_key value of the object in the database
    - version: the version number of the object to update
    - update_values: the fields to update in the model

    Raises EditConflict exception if the update wasn't successful either
    as a result of the version being passed being outdated or the model
    doesn't exist
    Returns the new version number if the update was successful.
    """
    pk = getattr(model_cls, primary_key_field)
    new_version = version + 1
    update_query = (
        update(model_cls)
        .where(pk == record_id)
        .where(model_cls.version == version)  # type: ignore
        .values(version=new_version, **update_values)
    )
    result = await session.execute(update_query)
    await session.commit()
    if result.rowcount != 1:
        vals = ""
        for k, v in update_values.items():
            vals += f"{k}={v!r},"
        vals.rstrip(",")

        raise EditConflict(
            (
                "EditConflict while updating {cls}("
                "{pk}={record_id}, version={version}, {vals})"
            ).format(
                cls=model_cls.__name__,
                pk=primary_key_field,
                record_id=record_id,
                version=version,
                vals=vals,
            )
        )
    return new_version

@rgaudin
Copy link
Member Author

rgaudin commented May 31, 2024

If I understand correctly, it would make some fields un-mutable and the Test a sort of State Machine preventing a worker that has already PATCHed one resource to PATCH it again?

That seems too sophisticated for our need. We can simply refuse the update should the data already be present in the DB but it's somewhat equivalent so do as you please.

Keep in mind that we'll update the resource multiple times to first change status then record the results (or the error).

@elfkuzco
Copy link
Collaborator

Actually, it makes sure a model can always be updated by one request at a time. I agree it's a bit complex for this use-case since only one PATCH request can come at a given time and will not consider it.

@elfkuzco
Copy link
Collaborator

elfkuzco commented Jun 2, 2024

Please, could you provide more insights on the purpose of the isp field on the Test model? I'm thinking it might not always be available when we want to get information about the details of a worker's network. I visited https://am.i.mullvad.net/json and it's not available. It could be gotten from other sites though but such sites might likely need to set a user-agent header or be rate-limited APIs

@rgaudin
Copy link
Member Author

rgaudin commented Jun 3, 2024

ISP is an important information ; both historically and should we get multiple workers at the same time.
Let's assume here (this is a DB ticket) that we can get this information. It's important the system is tolerant on missing information for all those stuff: IP, ASN, ISP, etc. The only mandatory stuff is the country (that we supply) and the test results. The rest is metadata that brings context.

How to get it is a worker issue. We'll look into what the GeoIP DB provide and if it's not there, we can query online services. It's true that over a VPN, such services might fail given the outbound IP is shared accross many users… but when we're on VPN, it's OK because the ISP is the VPN provider, so it can be part of the worker configuration. For non-VPN (that we wont support just now), those servives would likely work.
In any way, it's important that lacking this information doesn't prevent the test from being run and recorded.

@elfkuzco elfkuzco linked a pull request Jun 3, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants