ENH: Virtualize DICOM database #1154

pieper · 2023-11-09T00:14:14Z

This generalizes the ctkDICOMDatabase code to ease hard-coded restrictions about DICOM files always being on disk. Now the schema includes a URL field of the SQLite database so certain file-specific operations are only performed on filePaths while URL are handled independently. Entries in the Images table of the database may now have either a URL or a fileName or both and new accessors are provided to get URLs at the series and instance level.

Also this contains a fix to the ctkDICOMTagCache so that tags are always stored internally as uppercase hex, so a string like "0010,000d" will always be turned into "0010,000D" for storage in the database. Both forms can be used to query tags. This avoids the situation where some cache entries were duplicated because different code used either upper or lower case to refer to the same tag. Using only upper case should improve storage use and improve performance by avoiding unneeded cache misses.

Allow the ctkDICOMDatabase to hold URLs for retrieving instances in addition to traditional filenames (really filepaths). Ensure that any code that tries to calculate absolute file paths from relative paths does not mangle the URLs. Do this by detecting the URL sceme (e.g. http in http://) and ignore these when calculating abolute paths.

Previously, the TagCache database would store whatever hex string was passed in, so depending on how the tags were specified duplicate entries could end up being cached, such as "0010,000d" and '0010,000D". This wasted space, but also caused inefficiencies. Although usage varies, the DICOM standard uses upper case hex both in the body of the standard text, and when defining the DICOM JSON and XML models. (https://dicom.nema.org/medical/dicom/current/output/chtml/part18/sect_f.2.html) This change always converts tags to upper case hex when storing and query of the tag cache. Users of the library can still pass lower case hex tags to methods, but they will be converted to upper case internally, so most use cases will be unchanged. The only externally visible method where the behavior will change is: `Q_INVOKABLE void getCachedTags(const QString sopInstanceUID, QMap<QString, QString> &cachedTags);` Previously tags could have been of mixed case depending on how they were specified for caching. Now they will be all upper case hex. Currently, the only known use of `getCachedTags` is internnally to `ctkDICOMDatabase` so this change of behavior is unlikely to cause problems in practice. Note also that while the DICOM standard models for JSON and XML omit the comma between group and element, but `ctkDICOMTagCache` implementation retains the comma in the API and the database.

Previously, if some data imported into the `ctkDICOMDatabase` had incomplete information then the display field updates would abort. With this change, a more informative message is generated to support troubleshooting of any failed updates, but the rest of the process completes so that any valid display field updates can be shown to the user.

lassoan

All changes look good, we should just clarify naming and what exactly we store in the database (url or url/localfile).

lassoan · 2023-11-09T01:08:18Z

Libs/DICOM/Core/ctkDICOMDatabase.cpp

@@ -206,7 +206,7 @@ QStringList ctkDICOMDatabasePrivate::allFilesInDatabase()

  while (allFilesQuery.next())
  {
-    allFileNames << this->absolutePathFromInternal(allFilesQuery.value(0).toString());
+    allFileNames << this->absolutePathFromInternalIfFile(allFilesQuery.value(0).toString());


Variables that can store urls should not be named as "fileName" (e.g., here "allFileNames") because it is important to make developers aware that the string may contain not just a local file path but it may be a url. Most method names fortunately just use "file" (instead of "filepath"), so they don't need to be renamed.

Suggested change

allFileNames << this->absolutePathFromInternalIfFile(allFilesQuery.value(0).toString());

allUrls << this->absoluteUrl(allFilesQuery.value(0).toString());

In general I agree with this, but since the term fileName is used in the schema I felt like keeping this consistency would lead to the most readable code. For the reasons you list in your next comment I'm not sure allUrls is correct either. Perhaps something generic like instanceLocators is more correct. We could make a more sweeping change, including the schema, if we want to generalize.

Part of the reason for this is that some protocols don't use actual URLs to identify the concept of an "DICOM instance". For example, DIMSE doesn't have URLs, but we might want to refer to an instance with something like cget://<pacs ip>/<instance uid>. Currently I have to do this for AHI since frame-level access is a REST API call with dataset and image frame IDs as parameters, so I have a synthetic "URL" string.

Libs/DICOM/Core/ctkDICOMDatabase.cpp

Reverts commontk@3b8182e Plan is to keep using fileName only for file paths and add a URL field to be used explicitly for URLs so this logic is no longer needed.

Adds a URL field to the Images schema and provides accessor methods for it.

We should be checking the return value of QSqlQuery::exec to catch problems. Filed an issue to track this: commontk#1155

Since neither of these are required fields, the NOT NULL constraint is not appropriate.

Schema now has URL column of size 2048 (which based on research is suggested as the max size for URLs). The schema now allows either field to be NULL and does not require the Filename column to be unique. The api now icnludes a urlsForSeries option to get the list of URLs for a given series instance uid. The fileValue method now accepts either filePaths or URLs with filePaths being given priority if both exist.

pieper · 2023-11-16T00:03:32Z

This should be merged by squashing so the commit message will be fixed.

jcfr · 2023-12-12T16:24:30Z

Proposed commit message for squash & merge integration:

ENH: Update DICOM Database with URL Support and Tag Cache Optimization

This generalizes the ctkDICOMDatabase code to ease hard-coded restrictions about
DICOM files always being on disk. Now the schema includes a `URL` field of the
SQLite database so certain file-specific operations are only performed on filePaths
while URL are handled independently. Entries in the `Images` table of the database
may now have either a `URL` or a `fileName` or both and new accessors are provided
to get `URL`s at the series and instance level.

Schema version is updated from version `0.7.0` to `0.8.0`.

Also this contains a fix to the `ctkDICOMTagCache` so that tags are always stored
internally as uppercase hex, so a string like "0010,000d" will always be turned
into "0010,000D" for storage in the database. Both forms can be used to query
tags. This avoids the situation where some cache entries were duplicated because
different code used either upper or lower case to refer to the same tag. Using
only upper case should improve storage use and improve performance by avoiding
unneeded cache misses.

Co-authored-by: Andras Lasso <[email protected]>

jcfr · 2023-12-12T16:43:47Z

@Punzo Do you have any suggestion related to naming and/or approach that would make rebasing of #1142 easier ?

jcfr · 2023-12-12T16:51:42Z

Libs/DICOM/Core/Resources/dicom-schema.sql

@@ -84,7 +85,7 @@ CREATE TABLE 'Series' (
  'DisplayedFieldsUpdatedTimestamp' DATETIME NULL ,
  PRIMARY KEY ('SeriesInstanceUID') );

-CREATE UNIQUE INDEX IF NOT EXISTS 'ImagesFilenameIndex' ON 'Images' ('Filename');


Should the URL be indexed as well ?

Hmmm, maybe, since that is used in the fileValue method to find the instanceUID so it can be used to look up values in the tag cache. Bypassing the tag cache is still probably better, but an index on this might have addressed some of the performance issues I was seeing.

Punzo · 2023-12-13T08:27:29Z

@Punzo Do you have any suggestion related to naming and/or approach that would make rebasing of #1142 easier ?

I have already had a videocall with Steve and we agreed on this infrastructure. Once it will be merged, then I will rebase my PR.
The idea would be that the url variable values would be like:

protocol+infrastructure://information

examples can be:
A) how I will store my "url" for DIMSE in CTK: dimse+ctk://connectionName/studyInstanceUID/seriesInstanceUID/SOPInstanceUID
b) and for Steve would be: dicomweb+gcp://IP/studyInstanceUID/seriesInstanceUID/SOPInstanceUID
etc...

@pieper please confirm if this is accurate.

Punzo · 2023-12-13T08:30:23Z

A) how I will store my "url" for DIMSE in CTK: dimse+ctk://connectionName/studyInstanceUID/seriesInstanceUID/SOPInstanceUID
b) and for Steve would be: dicomweb+gcp://IP/studyInstanceUID/seriesInstanceUID/SOPInstanceUID

maybe we should add this in somewhere as docs, maybe in the description of https://github.com/commontk/CTK/pull/1154/files#diff-47b3a2aabeea5b49d2da30b70b2dfc96ec78db4fcae06d5cfea45bd0bbe07e67R194 ?

@jcfr

Based on suggestion from @jcfr commontk#1154 (comment)

pieper · 2023-12-13T15:23:03Z

Thanks @Punzo for doublechecking.

I agree on the URL scheme description but since we don't rely on it in the CTK code at this point it's more of an application level thing to define. Yesterday @jcfr , @lassoan , and I talked about moving what I have now in the DICOMLogic repo into the a pure-python package under CTK organization (with a final name TBD) and using that as a place to define higher level usage. We plan to hash out details at Project Week 40. I'll start a project page for this.

lassoan · 2023-12-13T15:41:27Z

maybe we should add this in somewhere as docs, maybe in the description of

We can amend the documentation in your pull request (that will be the first use of this new database field in CTK). We could describe that dimse+ctk protocol is reserved for the CTK visual DICOM browser, while any other protocols are free to use. When other protocols emerge then we can add them to the documentation as well, to reduce the chance of unintended name clashes.

lassoan · 2023-12-13T15:42:59Z

@pieper I think it is ready for integration, just need to squash the commits and update the commit message as @jcfr suggested. Maybe also add @Punzo as coauthor.

As discussed here: commontk/CTK#1154

The `ctkDICOMDatabase` update allows handling DICOM files with both file paths and URLs. The SQLite database schema now includes a `URL` field. Entries in the `Images` table can have a `URL`, a `fileName`, or both. The schema version is updated from `0.7.0` to `0.8.0`. The `ctkDICOMTagCache` is also improved to store tags as uppercase hex, ensuring consistency and optimizing storage and performance. For more details, refer to commontk/CTK#1154 List of changes: $ git shortlog f53820a4e..841877133 --no-merges Steve Pieper (1): ENH: Update DICOM Database with URL Support and Tag Cache Optimization

pieper added 3 commits October 25, 2023 10:59

pieper marked this pull request as draft November 9, 2023 00:14

pieper mentioned this pull request Nov 9, 2023

EN: Virtualize DICOM database Slicer/Slicer#7365

Draft

lassoan reviewed Nov 9, 2023

View reviewed changes

pieper added 5 commits November 13, 2023 16:33

ENH: use fileName in dicom db only for files

ff515f9

Reverts commontk@3b8182e Plan is to keep using fileName only for file paths and add a URL field to be used explicitly for URLs so this logic is no longer needed.

ENH: Updates dicom db schema to support urls

5200de9

Adds a URL field to the Images schema and provides accessor methods for it.

BUG: Fix dicom db insert with URL field

8f9f14a

We should be checking the return value of QSqlQuery::exec to catch problems. Filed an issue to track this: commontk#1155

ENH: Allow file or url to be null in dicom-schema.sql

0c47c6a

Since neither of these are required fields, the NOT NULL constraint is not appropriate.

pieper marked this pull request as ready for review November 16, 2023 00:01

Merge branch 'master' into virtualize-database

2bfc42a

jcfr reviewed Dec 12, 2023

View reviewed changes

ENH: Add index for URL to dicom database schema

df63274

Based on suggestion from @jcfr commontk#1154 (comment)

pieper merged commit 8418771 into commontk:master Dec 13, 2023
1 of 2 checks passed

pieper deleted the virtualize-database branch December 13, 2023 15:57

pieper added a commit to Slicer/Slicer that referenced this pull request Dec 13, 2023

ENH: Update CTK DICOM Database version

5455fee

As discussed here: commontk/CTK#1154

pieper mentioned this pull request Dec 13, 2023

ENH: Update CTK DICOM Database version from 0.7.0 to 0.8.0 Slicer/Slicer#7472

Merged

Punzo mentioned this pull request May 10, 2024

DICOM tag string handling issues #1204

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Virtualize DICOM database #1154

ENH: Virtualize DICOM database #1154

pieper commented Nov 9, 2023 •

edited

Loading

lassoan left a comment

lassoan Nov 9, 2023

pieper Nov 9, 2023

pieper commented Nov 16, 2023

jcfr commented Dec 12, 2023 •

edited

Loading

jcfr commented Dec 12, 2023

jcfr Dec 12, 2023

pieper Dec 12, 2023

Punzo commented Dec 13, 2023 •

edited

Loading

Punzo commented Dec 13, 2023

pieper commented Dec 13, 2023

lassoan commented Dec 13, 2023

lassoan commented Dec 13, 2023

	allFileNames << this->absolutePathFromInternalIfFile(allFilesQuery.value(0).toString());
	allUrls << this->absoluteUrl(allFilesQuery.value(0).toString());

ENH: Virtualize DICOM database #1154

ENH: Virtualize DICOM database #1154

Conversation

pieper commented Nov 9, 2023 • edited Loading

lassoan left a comment

Choose a reason for hiding this comment

lassoan Nov 9, 2023

Choose a reason for hiding this comment

pieper Nov 9, 2023

Choose a reason for hiding this comment

pieper commented Nov 16, 2023

jcfr commented Dec 12, 2023 • edited Loading

jcfr commented Dec 12, 2023

jcfr Dec 12, 2023

Choose a reason for hiding this comment

pieper Dec 12, 2023

Choose a reason for hiding this comment

Punzo commented Dec 13, 2023 • edited Loading

Punzo commented Dec 13, 2023

pieper commented Dec 13, 2023

lassoan commented Dec 13, 2023

lassoan commented Dec 13, 2023

pieper commented Nov 9, 2023 •

edited

Loading

jcfr commented Dec 12, 2023 •

edited

Loading

Punzo commented Dec 13, 2023 •

edited

Loading