Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Virtualize DICOM database #1154

Merged
merged 10 commits into from
Dec 13, 2023
Merged

Conversation

pieper
Copy link
Member

@pieper pieper commented Nov 9, 2023

This generalizes the ctkDICOMDatabase code to ease hard-coded restrictions about DICOM files always being on disk. Now the schema includes a URL field of the SQLite database so certain file-specific operations are only performed on filePaths while URL are handled independently. Entries in the Images table of the database may now have either a URL or a fileName or both and new accessors are provided to get URLs at the series and instance level.

Also this contains a fix to the ctkDICOMTagCache so that tags are always stored internally as uppercase hex, so a string like "0010,000d" will always be turned into "0010,000D" for storage in the database. Both forms can be used to query tags. This avoids the situation where some cache entries were duplicated because different code used either upper or lower case to refer to the same tag. Using only upper case should improve storage use and improve performance by avoiding unneeded cache misses.

Allow the ctkDICOMDatabase to hold URLs for retrieving
instances in addition to traditional filenames (really
filepaths).

Ensure that any code that tries to calculate absolute file
paths from relative paths does not mangle the URLs.  Do this
by detecting the URL sceme (e.g. http in http://) and ignore
these when calculating abolute paths.
Previously, the TagCache database would store whatever hex string
was passed in, so depending on how the tags were specified duplicate
entries could end up being cached, such as "0010,000d" and '0010,000D".
This wasted space, but also caused inefficiencies.

Although usage varies, the DICOM standard uses upper case hex both
in the body of the standard text, and when defining the DICOM JSON
and XML models. (https://dicom.nema.org/medical/dicom/current/output/chtml/part18/sect_f.2.html)

This change always converts tags to upper case hex when storing
and query of the tag cache.  Users of the library can still pass
lower case hex tags to methods, but they will be converted
to upper case internally, so most use cases will be unchanged.

The only externally visible method where the behavior will change is:
`Q_INVOKABLE void getCachedTags(const QString sopInstanceUID, QMap<QString, QString> &cachedTags);`
Previously tags could have been of mixed case depending on how
they were specified for caching.  Now they will be all upper case hex.

Currently, the only known use of `getCachedTags` is internnally
to `ctkDICOMDatabase` so this change of behavior is unlikely to cause
problems in practice.

Note also that while the DICOM standard models for JSON and XML omit
the comma between group and element, but `ctkDICOMTagCache` implementation
retains the comma in the API and the database.
Previously, if some data imported into the `ctkDICOMDatabase` had
incomplete information then the display field updates would abort.

With this change, a more informative message is generated to support
troubleshooting of any failed updates, but the rest of the process
completes so that any valid display field updates can be shown
to the user.
@pieper pieper marked this pull request as draft November 9, 2023 00:14
Copy link
Member

@lassoan lassoan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All changes look good, we should just clarify naming and what exactly we store in the database (url or url/localfile).

@@ -206,7 +206,7 @@ QStringList ctkDICOMDatabasePrivate::allFilesInDatabase()

while (allFilesQuery.next())
{
allFileNames << this->absolutePathFromInternal(allFilesQuery.value(0).toString());
allFileNames << this->absolutePathFromInternalIfFile(allFilesQuery.value(0).toString());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variables that can store urls should not be named as "fileName" (e.g., here "allFileNames") because it is important to make developers aware that the string may contain not just a local file path but it may be a url. Most method names fortunately just use "file" (instead of "filepath"), so they don't need to be renamed.

Suggested change
allFileNames << this->absolutePathFromInternalIfFile(allFilesQuery.value(0).toString());
allUrls << this->absoluteUrl(allFilesQuery.value(0).toString());

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I agree with this, but since the term fileName is used in the schema I felt like keeping this consistency would lead to the most readable code. For the reasons you list in your next comment I'm not sure allUrls is correct either. Perhaps something generic like instanceLocators is more correct. We could make a more sweeping change, including the schema, if we want to generalize.

Part of the reason for this is that some protocols don't use actual URLs to identify the concept of an "DICOM instance". For example, DIMSE doesn't have URLs, but we might want to refer to an instance with something like cget://<pacs ip>/<instance uid>. Currently I have to do this for AHI since frame-level access is a REST API call with dataset and image frame IDs as parameters, so I have a synthetic "URL" string.

Libs/DICOM/Core/ctkDICOMDatabase.cpp Outdated Show resolved Hide resolved
Reverts commontk@3b8182e

Plan is to keep using fileName only for file paths and add
a URL field to be used explicitly for URLs so this logic is
no longer needed.
Adds a URL field to the Images schema and provides
accessor methods for it.
We should be checking the return value of QSqlQuery::exec
to catch problems.

Filed an issue to track this:

commontk#1155
Since neither of these are required fields, the NOT NULL constraint is not appropriate.
Schema now has URL column of size 2048 (which based on research
is suggested as the max size for URLs).  The schema now
allows either field to be NULL and does not require the
Filename column to be unique.

The api now icnludes a urlsForSeries option to get the list
of URLs for a given series instance uid.

The fileValue method now accepts either filePaths or URLs with
filePaths being given priority if both exist.
@pieper pieper marked this pull request as ready for review November 16, 2023 00:01
@pieper
Copy link
Member Author

pieper commented Nov 16, 2023

This should be merged by squashing so the commit message will be fixed.

@jcfr
Copy link
Member

jcfr commented Dec 12, 2023

Proposed commit message for squash & merge integration:

ENH: Update DICOM Database with URL Support and Tag Cache Optimization

This generalizes the ctkDICOMDatabase code to ease hard-coded restrictions about
DICOM files always being on disk. Now the schema includes a `URL` field of the
SQLite database so certain file-specific operations are only performed on filePaths
while URL are handled independently. Entries in the `Images` table of the database
may now have either a `URL` or a `fileName` or both and new accessors are provided
to get `URL`s at the series and instance level.

Schema version is updated from version `0.7.0` to `0.8.0`.

Also this contains a fix to the `ctkDICOMTagCache` so that tags are always stored
internally as uppercase hex, so a string like "0010,000d" will always be turned
into "0010,000D" for storage in the database. Both forms can be used to query
tags. This avoids the situation where some cache entries were duplicated because
different code used either upper or lower case to refer to the same tag. Using
only upper case should improve storage use and improve performance by avoiding
unneeded cache misses.

Co-authored-by: Andras Lasso <[email protected]>

@jcfr
Copy link
Member

jcfr commented Dec 12, 2023

@Punzo Do you have any suggestion related to naming and/or approach that would make rebasing of #1142 easier ?

@@ -84,7 +85,7 @@ CREATE TABLE 'Series' (
'DisplayedFieldsUpdatedTimestamp' DATETIME NULL ,
PRIMARY KEY ('SeriesInstanceUID') );

CREATE UNIQUE INDEX IF NOT EXISTS 'ImagesFilenameIndex' ON 'Images' ('Filename');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the URL be indexed as well ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, maybe, since that is used in the fileValue method to find the instanceUID so it can be used to look up values in the tag cache. Bypassing the tag cache is still probably better, but an index on this might have addressed some of the performance issues I was seeing.

@Punzo
Copy link
Contributor

Punzo commented Dec 13, 2023

@Punzo Do you have any suggestion related to naming and/or approach that would make rebasing of #1142 easier ?

I have already had a videocall with Steve and we agreed on this infrastructure. Once it will be merged, then I will rebase my PR.
The idea would be that the url variable values would be like:

protocol+infrastructure://information

examples can be:
A) how I will store my "url" for DIMSE in CTK: dimse+ctk://connectionName/studyInstanceUID/seriesInstanceUID/SOPInstanceUID
b) and for Steve would be: dicomweb+gcp://IP/studyInstanceUID/seriesInstanceUID/SOPInstanceUID
etc...

@pieper please confirm if this is accurate.

@Punzo
Copy link
Contributor

Punzo commented Dec 13, 2023

A) how I will store my "url" for DIMSE in CTK: dimse+ctk://connectionName/studyInstanceUID/seriesInstanceUID/SOPInstanceUID
b) and for Steve would be: dicomweb+gcp://IP/studyInstanceUID/seriesInstanceUID/SOPInstanceUID

maybe we should add this in somewhere as docs, maybe in the description of https://github.com/commontk/CTK/pull/1154/files#diff-47b3a2aabeea5b49d2da30b70b2dfc96ec78db4fcae06d5cfea45bd0bbe07e67R194 ?

@pieper
Copy link
Member Author

pieper commented Dec 13, 2023

Thanks @Punzo for doublechecking.

I agree on the URL scheme description but since we don't rely on it in the CTK code at this point it's more of an application level thing to define. Yesterday @jcfr , @lassoan , and I talked about moving what I have now in the DICOMLogic repo into the a pure-python package under CTK organization (with a final name TBD) and using that as a place to define higher level usage. We plan to hash out details at Project Week 40. I'll start a project page for this.

@lassoan
Copy link
Member

lassoan commented Dec 13, 2023

maybe we should add this in somewhere as docs, maybe in the description of

We can amend the documentation in your pull request (that will be the first use of this new database field in CTK). We could describe that dimse+ctk protocol is reserved for the CTK visual DICOM browser, while any other protocols are free to use. When other protocols emerge then we can add them to the documentation as well, to reduce the chance of unintended name clashes.

@lassoan
Copy link
Member

lassoan commented Dec 13, 2023

@pieper I think it is ready for integration, just need to squash the commits and update the commit message as @jcfr suggested. Maybe also add @Punzo as coauthor.

@pieper pieper merged commit 8418771 into commontk:master Dec 13, 2023
1 of 2 checks passed
@pieper pieper deleted the virtualize-database branch December 13, 2023 15:57
pieper added a commit to Slicer/Slicer that referenced this pull request Dec 13, 2023
jcfr pushed a commit to Slicer/Slicer that referenced this pull request Dec 13, 2023
The `ctkDICOMDatabase` update allows handling DICOM files with both file paths and URLs.
The SQLite database schema now includes a `URL` field. Entries in the `Images` table can
have a `URL`, a `fileName`, or both. The schema version is updated from `0.7.0` to `0.8.0`.

The `ctkDICOMTagCache` is also improved to store tags as uppercase hex, ensuring consistency
and optimizing storage and performance.

For more details, refer to commontk/CTK#1154

List of changes:

$ git shortlog f53820a4e..841877133 --no-merges 
Steve Pieper (1):
      ENH: Update DICOM Database with URL Support and Tag Cache Optimization
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants