-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Virtualize DICOM database #1154
Conversation
Allow the ctkDICOMDatabase to hold URLs for retrieving instances in addition to traditional filenames (really filepaths). Ensure that any code that tries to calculate absolute file paths from relative paths does not mangle the URLs. Do this by detecting the URL sceme (e.g. http in http://) and ignore these when calculating abolute paths.
Previously, the TagCache database would store whatever hex string was passed in, so depending on how the tags were specified duplicate entries could end up being cached, such as "0010,000d" and '0010,000D". This wasted space, but also caused inefficiencies. Although usage varies, the DICOM standard uses upper case hex both in the body of the standard text, and when defining the DICOM JSON and XML models. (https://dicom.nema.org/medical/dicom/current/output/chtml/part18/sect_f.2.html) This change always converts tags to upper case hex when storing and query of the tag cache. Users of the library can still pass lower case hex tags to methods, but they will be converted to upper case internally, so most use cases will be unchanged. The only externally visible method where the behavior will change is: `Q_INVOKABLE void getCachedTags(const QString sopInstanceUID, QMap<QString, QString> &cachedTags);` Previously tags could have been of mixed case depending on how they were specified for caching. Now they will be all upper case hex. Currently, the only known use of `getCachedTags` is internnally to `ctkDICOMDatabase` so this change of behavior is unlikely to cause problems in practice. Note also that while the DICOM standard models for JSON and XML omit the comma between group and element, but `ctkDICOMTagCache` implementation retains the comma in the API and the database.
Previously, if some data imported into the `ctkDICOMDatabase` had incomplete information then the display field updates would abort. With this change, a more informative message is generated to support troubleshooting of any failed updates, but the rest of the process completes so that any valid display field updates can be shown to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All changes look good, we should just clarify naming and what exactly we store in the database (url or url/localfile).
Libs/DICOM/Core/ctkDICOMDatabase.cpp
Outdated
@@ -206,7 +206,7 @@ QStringList ctkDICOMDatabasePrivate::allFilesInDatabase() | |||
|
|||
while (allFilesQuery.next()) | |||
{ | |||
allFileNames << this->absolutePathFromInternal(allFilesQuery.value(0).toString()); | |||
allFileNames << this->absolutePathFromInternalIfFile(allFilesQuery.value(0).toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variables that can store urls should not be named as "fileName" (e.g., here "allFileNames") because it is important to make developers aware that the string may contain not just a local file path but it may be a url. Most method names fortunately just use "file" (instead of "filepath"), so they don't need to be renamed.
allFileNames << this->absolutePathFromInternalIfFile(allFilesQuery.value(0).toString()); | |
allUrls << this->absoluteUrl(allFilesQuery.value(0).toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I agree with this, but since the term fileName
is used in the schema I felt like keeping this consistency would lead to the most readable code. For the reasons you list in your next comment I'm not sure allUrls
is correct either. Perhaps something generic like instanceLocators
is more correct. We could make a more sweeping change, including the schema, if we want to generalize.
Part of the reason for this is that some protocols don't use actual URLs to identify the concept of an "DICOM instance". For example, DIMSE doesn't have URLs, but we might want to refer to an instance with something like cget://<pacs ip>/<instance uid>
. Currently I have to do this for AHI since frame-level access is a REST API call with dataset and image frame IDs as parameters, so I have a synthetic "URL" string.
Reverts commontk@3b8182e Plan is to keep using fileName only for file paths and add a URL field to be used explicitly for URLs so this logic is no longer needed.
Adds a URL field to the Images schema and provides accessor methods for it.
We should be checking the return value of QSqlQuery::exec to catch problems. Filed an issue to track this: commontk#1155
Since neither of these are required fields, the NOT NULL constraint is not appropriate.
Schema now has URL column of size 2048 (which based on research is suggested as the max size for URLs). The schema now allows either field to be NULL and does not require the Filename column to be unique. The api now icnludes a urlsForSeries option to get the list of URLs for a given series instance uid. The fileValue method now accepts either filePaths or URLs with filePaths being given priority if both exist.
This should be merged by squashing so the commit message will be fixed. |
Proposed commit message for
|
@@ -84,7 +85,7 @@ CREATE TABLE 'Series' ( | |||
'DisplayedFieldsUpdatedTimestamp' DATETIME NULL , | |||
PRIMARY KEY ('SeriesInstanceUID') ); | |||
|
|||
CREATE UNIQUE INDEX IF NOT EXISTS 'ImagesFilenameIndex' ON 'Images' ('Filename'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the URL be indexed as well ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, maybe, since that is used in the fileValue
method to find the instanceUID
so it can be used to look up values in the tag cache. Bypassing the tag cache is still probably better, but an index on this might have addressed some of the performance issues I was seeing.
I have already had a videocall with Steve and we agreed on this infrastructure. Once it will be merged, then I will rebase my PR.
examples can be: @pieper please confirm if this is accurate. |
maybe we should add this in somewhere as docs, maybe in the description of https://github.com/commontk/CTK/pull/1154/files#diff-47b3a2aabeea5b49d2da30b70b2dfc96ec78db4fcae06d5cfea45bd0bbe07e67R194 ? |
Based on suggestion from @jcfr commontk#1154 (comment)
Thanks @Punzo for doublechecking. I agree on the URL scheme description but since we don't rely on it in the CTK code at this point it's more of an application level thing to define. Yesterday @jcfr , @lassoan , and I talked about moving what I have now in the DICOMLogic repo into the a pure-python package under CTK organization (with a final name TBD) and using that as a place to define higher level usage. We plan to hash out details at Project Week 40. I'll start a project page for this. |
We can amend the documentation in your pull request (that will be the first use of this new database field in CTK). We could describe that |
As discussed here: commontk/CTK#1154
The `ctkDICOMDatabase` update allows handling DICOM files with both file paths and URLs. The SQLite database schema now includes a `URL` field. Entries in the `Images` table can have a `URL`, a `fileName`, or both. The schema version is updated from `0.7.0` to `0.8.0`. The `ctkDICOMTagCache` is also improved to store tags as uppercase hex, ensuring consistency and optimizing storage and performance. For more details, refer to commontk/CTK#1154 List of changes: $ git shortlog f53820a4e..841877133 --no-merges Steve Pieper (1): ENH: Update DICOM Database with URL Support and Tag Cache Optimization
This generalizes the ctkDICOMDatabase code to ease hard-coded restrictions about DICOM files always being on disk. Now the schema includes a
URL
field of the SQLite database so certain file-specific operations are only performed on filePaths while URL are handled independently. Entries in theImages
table of the database may now have either aURL
or afileName
or both and new accessors are provided to getURL
s at the series and instance level.Also this contains a fix to the ctkDICOMTagCache so that tags are always stored internally as uppercase hex, so a string like "0010,000d" will always be turned into "0010,000D" for storage in the database. Both forms can be used to query tags. This avoids the situation where some cache entries were duplicated because different code used either upper or lower case to refer to the same tag. Using only upper case should improve storage use and improve performance by avoiding unneeded cache misses.