You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The general philosophy of OSIS is to use XML elements for all the semantic markup.
Using the solidus within the text to separate morpheme segments within Hebrew words goes against this OSIS philosophy. One friend has described this as "bad, bad, very bad".
cf. The XML files for the CrossWire WLC module are more conformant with this principle where they used the XML seg element for this purpose. The original data was obtained from the website tanach.us but further preprocessing was done before building the latest version of module, which differs from it's earliest version in this respect.
NB. In this extract, the output was also converted to Word Per Line format afterwards.
Aside: That is not to say that the WLC module is perfect.
Irrespective of any text critical issues, at least these mistakes were made when it was first built.
The Hebrew text should not have been normalized to NFC.
There should not be a space either before or after each MAQAF.
The space between Hebrew words should be outside the w elements.
These are not your responsibility. I mention them merely in passing.
Those defects were rectified in the WLC module after I created this issue in 2017.
Hi @DavidHaslam, I suspect many people agree with you on that, myself included. Making such a change in the text as it is now would certainly cause all sorts of backwards incompatibility issues.
I'd be in favor of offering an alternate version of the files in the repo that has the fields separated according to OSIS philosophy. If you want to put in PR with the changes as you suggest I think we'd be willing to incorporate it.
Since I added this issue in 2017, the website tanach.us has had a change of title.
Instead of Westminster Leningrad Codex
it's now Unicode XML Leningrad Codex
There are other significant changes, but one relevant to this issue is that all the solidus/ markers that used to separate morphological segments have all been removed!
The general philosophy of OSIS is to use XML elements for all the semantic markup.
Using the solidus within the text to separate morpheme segments within Hebrew words goes against this OSIS philosophy. One friend has described this as "bad, bad, very bad".
cf. The XML files for the CrossWire WLC module are more conformant with this principle where they used the XML seg element for this purpose. The original data was obtained from the website tanach.us but further preprocessing was done before building the latest version of module, which differs from it's earliest version in this respect.
e.g. Taken from the mod2imp output of the CrossWire WLC module, they are generally like this:
NB. In this extract, the output was also converted to Word Per Line format afterwards.
Aside: That is not to say that the WLC module is perfect.
Irrespective of any text critical issues, at least these mistakes were made when it was first built.
These are not your responsibility. I mention them merely in passing.
Those defects were rectified in the WLC module after I created this issue in 2017.
The text was updated successfully, but these errors were encountered: