Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAPLE MGL API #1549

Open
2 tasks
Mephistic opened this issue Apr 24, 2024 · 0 comments
Open
2 tasks

MAPLE MGL API #1549

Mephistic opened this issue Apr 24, 2024 · 0 comments

Comments

@Mephistic
Copy link
Collaborator

Quick writeup based on our hack night discussion on 4/23 - this needs more refinement before we're ready to go, this is mostly a braindump.

Problem

Our ML experts want to be able to reference the Massachusetts General Laws (MGL) to hydrate data for our upcoming LLM-driven bill summaries. If a bill references any sections of the MGL, the model will need to fetch those sections in order to generate the prompt used to generate the summary (to ensure we aren't missing critical context).

They are currently doing so by scraping the HTML of the MGL website, but we want a less fragile solution going forward.

Proposal

At the beginning of a session, let's scrape the MGL, store text on our side, and expose it to our LLM wrapper service via a private API. We'll do this instead of relying on the MA Legislature API at runtime because <???> (missing context, are we looking for versioning on this or just speed)?

Success Criteria

  • Worker that scrapes the MGL and stores it in a Firestore collection, accessible by chapter + section
    • Maybe this doesn't need to be in Firestore? Chapter + Section -> Blob text seems like the relevant bits, let's check in with Matt V and Nathan on what they need here and whether this can be simpler.
  • Private API endpoint that lets a caller query by Chapter + Section
    • One bill can reference 50+ sections, so a batch endpoint is likely useful.
    • The total section text can be pretty large, we may need to chunk/stream the response.

Open Questions

  • What is going on with the Journal of Session Laws? Do bills reference those? Do we need to care? What exactly is the timing/process for moving a bill from the Journal of Session Laws to the MGL?
  • If the MGL web page and the Mass Legislature API disagree, which is correct?
  • The MGL web page says that it isn't the official version of the MGL - is this a major problem, or are issues caused by this covered by the disclaimer we'll have on LLM summaries anyway?

Quick diagram of my understanding from our hack night discussion on 4/23:
Screenshot 2024-04-23 at 9.09.31 PM.png

@Mephistic Mephistic changed the title WIP - MAPLE MGL API MAPLE MGL API Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant