Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert to Scripture Burrito Proposed Format #6

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

themikejr
Copy link

@themikejr themikejr commented Oct 3, 2023

This PR introduces a script that moves our existing alignment data to conform to the upcoming scripture burrito alignment data specification. As the specification is still solidifying and becoming more concrete, changes to this PR might be needed. For now, it's a discussion tool to look at and discuss the results of a potential conversion.

Link to specification: https://docs.google.com/document/d/1zR5gsrm3gIoNiHVBlWz5_BBw3N-Ew1-4M5rMsFrPzSw/edit

@themikejr
Copy link
Author

@jtauber One thing I noticed when writing this initial conversion, is that we had been using an id field on alignment records. I don't see an equivalent in the spec. Having an identifier for each record seems useful for:

  • referring to a specific record and tracking change within it
  • sorting records

Thoughts on including id on alignment records?

Copy link

@ryderwishart ryderwishart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Way to move quickly!


def create_sb_json_structure():
sb_alignment = {}
sb_alignment["type"] = "translation"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is the appropriate type. @jtauber can confirm, but I think the 'translation' type was meant to be for cases where we knew the source was indeed the source, not simply for cases where we assume a source for the sake of alignment. Perhaps type should be 'alignment' as a default?

A 'translation' example would be if we machine-translated a text, and we knew exactly what the source and target were.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Generally, I don't pretend to have the answer, but at least we know what we need to know now.

So I interpret the question to be: when source-target affinity is dubious (as I expect it would be with nearly every Bible translation we work with), what is the correct type to use?

sb_alignment = {}
sb_alignment["type"] = "translation"
sb_alignment["meta"] = {}
sb_alignment["meta"]["creator"] = "GrapeCity"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a 'note' about the GrapeCity ones for posterity, perhaps something Randall uses to describe the provenance and alignment process—in case he's not available to answer questions about it at some point?



# These are not in the standard format.
ALIGNMENT_EXCEPTIONS = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry i let these languish @themikejr: should be updated now in Alignments.

alignment_file_paths.append(os.path.join(root, file))
return alignment_file_paths


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@themikejr no problems with this, but at some point you might want to look at bible_alignments/config.py for dealing with the various pieces of alignment files.



def create_new_file_name(existing_path):
old_path_parts = existing_path.split("/")
Copy link
Contributor

@sboisen sboisen Oct 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@themikejr you should use pathlib for workings with paths (and eventually config.py for constructing filenames: see https://github.com/Clear-Bible/Alignments/blob/main/bible_alignments/config.py#L53).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just got us started -- I'm not a python pro. I could maybe come back to this later, but I'm happy for anyone to push commits that make the code more idiomatic or make better use of existing utilities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants