Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert to Scripture Burrito Proposed Format #6

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .python-version
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
3.10
82 changes: 82 additions & 0 deletions converters/convert_to_scripture_burrito.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
import os
import json


# These are not in the standard format.
ALIGNMENT_EXCEPTIONS = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry i let these languish @themikejr: should be updated now in Alignments.

"WLC-NET-manual.json",
"WLC-CSBE-manual.json",
"WLC-YLT-manual.json",
"WLC-SGS-manual.json",
"NA27-SGS-manual.json",
"NA27-HSB-manual.json",
"NA27-CUVMP-manual.json",
]


def find_alignment_file_paths_for_conversion():
alignment_file_paths = []
for root, dirs, files in os.walk("data/alignments"):
for file in files:
if file.endswith("-manual.json") and file not in ALIGNMENT_EXCEPTIONS:
alignment_file_paths.append(os.path.join(root, file))
return alignment_file_paths


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@themikejr no problems with this, but at some point you might want to look at bible_alignments/config.py for dealing with the various pieces of alignment files.

def convert():
alignment_file_paths = find_alignment_file_paths_for_conversion()

for alignment_file_path in alignment_file_paths:
print(alignment_file_path)
sb_alignment = create_sb_json_structure()

with open(alignment_file_path, "r") as file:
alignment_data = json.load(file)
for alignment_datum in alignment_data:
try:
sb_alignment_record = create_sb_alignment_record()
sb_alignment_record["id"] = alignment_datum["id"]

for source_id in alignment_datum["source_ids"]:
sb_alignment_record["source"].append(source_id)
for target_id in alignment_datum["target_ids"]:
sb_alignment_record["target"].append(target_id)

sb_alignment["records"].append(sb_alignment_record)
except:
print("Error in alignment_datum")
print(f"\t{alignment_file_path}")
print(f"\t{alignment_datum}")

new_path = create_new_file_name(alignment_file_path)
json.dump(sb_alignment, open(new_path, "w"), indent=2)
# print("MIKE\n\n\n\n")
# print(sb_alignment)


def create_sb_json_structure():
sb_alignment = {}
sb_alignment["type"] = "translation"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is the appropriate type. @jtauber can confirm, but I think the 'translation' type was meant to be for cases where we knew the source was indeed the source, not simply for cases where we assume a source for the sake of alignment. Perhaps type should be 'alignment' as a default?

A 'translation' example would be if we machine-translated a text, and we knew exactly what the source and target were.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Generally, I don't pretend to have the answer, but at least we know what we need to know now.

So I interpret the question to be: when source-target affinity is dubious (as I expect it would be with nearly every Bible translation we work with), what is the correct type to use?

sb_alignment["meta"] = {}
sb_alignment["meta"]["creator"] = "GrapeCity"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a 'note' about the GrapeCity ones for posterity, perhaps something Randall uses to describe the provenance and alignment process—in case he's not available to answer questions about it at some point?

sb_alignment["records"] = []
return sb_alignment


def create_sb_alignment_record():
sb_alignment_record = {}
sb_alignment_record["id"] = ""
sb_alignment_record["source"] = []
sb_alignment_record["target"] = []
return sb_alignment_record


def create_new_file_name(existing_path):
old_path_parts = existing_path.split("/")
Copy link
Contributor

@sboisen sboisen Oct 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@themikejr you should use pathlib for workings with paths (and eventually config.py for constructing filenames: see https://github.com/Clear-Bible/Alignments/blob/main/bible_alignments/config.py#L53).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just got us started -- I'm not a python pro. I could maybe come back to this later, but I'm happy for anyone to push commits that make the code more idiomatic or make better use of existing utilities.

old_name_parts = old_path_parts[4].split("-")
new_filename = f"{old_name_parts[0]}-{old_name_parts[1]}-manual.sb.json"
new_path = f"{old_path_parts[0]}/{old_path_parts[1]}/{old_path_parts[2]}/{old_path_parts[3]}/{new_filename}"
return new_path


convert()
3 changes: 3 additions & 0 deletions data/alignments/eng/CSBE/NA27-CSBE-manual.sb.json
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/alignments/eng/ESV/NA27-ESV-manual.sb.json
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/alignments/eng/ESV/WLC-ESV-manual.sb.json
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/alignments/eng/HCSB/NA27-HCSB-manual.sb.json
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/alignments/eng/HCSB/WLC-HCSB-manual.sb.json
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/alignments/eng/LEB/NA27-LEB-manual.sb.json
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/alignments/eng/LEB/WLC-LEB-manual.sb.json
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/alignments/eng/NET/NA27-NET-manual.sb.json
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/alignments/eng/YLT/NA27-YLT-manual.sb.json
Git LFS file not shown