Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for dimension labels #259

Merged
merged 13 commits into from
Oct 9, 2023
Merged

Conversation

shaunrd0
Copy link
Contributor

@shaunrd0 shaunrd0 commented May 1, 2023

This adds support for dimension labels. I needed this support to continue working on SC-25205 (See blocking SC-28383 for more detail).

I'm generally new to Go, so please point out any weirdness and I will address asap. Specifically looking for help or advice on this PR in the following areas:

  • The changes to query.go depend on new dimension label APIs, so builds without the experimental tag will fail. I'll do some digging for ideas online, but curious if there's already a preferred way to prevent this.
  • There doesn't seem to be a Subarray object in TileDB-Go so I could not implement dimension label APIs that operate on Subarray. Should we add one in this PR or create a story for follow up?
  • I struggled to implement tiledb_array_schema_set_dimension_label_tile_extent. I'll take another crack at it, but decided to move on for today so I could get working on adding tests for the other new APIs here.

@shaunrd0 shaunrd0 requested review from anastasop and snagles May 1, 2023 19:42
@shortcut-integration
Copy link

@shaunrd0 shaunrd0 force-pushed the smr/sc-25205/support-dimlabels branch from 48ae1fd to 0d661fe Compare May 24, 2023 17:54
@shaunrd0 shaunrd0 force-pushed the smr/sc-25205/support-dimlabels branch from 0d661fe to 3f6857f Compare July 21, 2023 13:03
@shaunrd0 shaunrd0 force-pushed the smr/sc-25205/support-dimlabels branch from 1849736 to ce55d2d Compare August 1, 2023 20:43
@anastasop
Copy link
Contributor

Rebased to latest master to get TileDB 2.17 and the Subarray APIs

// AttributeName returns the name of the attribute the label data is stored under.
func (d *DimensionLabel) AttributeName() (string, error) {
var labelAttrName *C.char
ret := C.tiledb_dimension_label_get_label_attr_name(d.context.tiledbContext, d.tiledbDimensionLabel, &labelAttrName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who owns the memory pointed to by labelAttrName after this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memory is owned by core but GoString at return makes a safe copy in the go space

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment saying this? Just so that future readers know why this is safe. (Similarly in other cases where we get a reference to C-owned memory that we copy out of.)

func (a *ArraySchema) DimensionLabelsNum() (uint64, error) {
var labelNum uint64

ret := C.tiledb_array_schema_get_dimension_label_num(a.context.tiledbContext, a.tiledbArraySchema, (*C.uint64_t)(unsafe.Pointer(&labelNum)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here would recommend doing the same thing as you've done in other places with

var labelNum C.uint64_t

// ...

return uint64(labelNum)

which I believe should avoid the need for an unsafe.Pointer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment on lines 212 to 214
switch dimType {
case TILEDB_INT8:
tmpExtent := extent.(int8)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few things going on here which will be annoying to deal with:

  • Since Go is garbage-collected, memory becomes eligible for freeing (and thus should be considered as already freed) after the last time they are referred to.
  • unsafe.Pointer intentionally is opaque to the GC, so something referred to by an unsafe.Pointer does not count as actually there.

This means that in this case:

  1. tmpExtent gets space allocated (on the stack, in this case), and the value of extent is copied into it.
  2. We create an unsafe.Pointer to tmpExtent.
  3. tmpExtent is no longer referred to, thus making it eligible for collection.
  4. We pass that pointer out to TileDB.

(This might be something we do elsewhere in this package; if so we need to fix it there too.)

In this case we need to ensure tmpExtent stays around. So I think we need to do something like:

var extentPtr any
defer runtime.KeepAlive(extentPtr)

// Here we can also reduce the amount of repetition by using a type-switch.
switch extent := extent.(type) {
case int8:
  // This ensures that `extent` stays alive.
  extentPtr = &extent
  // and since cExtent is a pointer to the same thing as extentPtr, it's OK.
  cExtent := unsafe.Pointer(&extent)
// ...
}

(This is annoying and fiddly.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks! Similar pattern was also in dimension.go, I made this change there too.

return nil
}

// getDimensionLabelDataType Retrieve a dimension label Datatype from the schema using experimental APIs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*retrieves a dimension label

enums.go Outdated
}

// FromString Converts from a string to the equivalent DataOrder enum
func (d *DataOrder) FromString(name string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an odd API:

var myDO DataOrder
myDO.FromString("someStr")

I would expect something like:

order, err := tiledb.DataOrderFromString(someStr)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. Made it a package method

// AttributeName returns the name of the attribute the label data is stored under.
func (d *DimensionLabel) AttributeName() (string, error) {
var labelAttrName *C.char
ret := C.tiledb_dimension_label_get_label_attr_name(d.context.tiledbContext, d.tiledbDimensionLabel, &labelAttrName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment saying this? Just so that future readers know why this is safe. (Similarly in other cases where we get a reference to C-owned memory that we copy out of.)

@@ -11,6 +11,11 @@ func checkError(err error) {
}
}

func checkedValue[T any](v T, err error) T {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add a docstring here? a one-liner should suffice.

Comment on lines +216 to +248
case int8:
extentPtr = &tmpExtent
cExtent = unsafe.Pointer(&tmpExtent)
case int16:
extentPtr = &tmpExtent
cExtent = unsafe.Pointer(&tmpExtent)
case int32:
extentPtr = &tmpExtent
cExtent = unsafe.Pointer(&tmpExtent)
case int64:
extentPtr = &tmpExtent
cExtent = unsafe.Pointer(&tmpExtent)
case uint8:
extentPtr = &tmpExtent
cExtent = unsafe.Pointer(&tmpExtent)
case uint16:
extentPtr = &tmpExtent
cExtent = unsafe.Pointer(&tmpExtent)
case uint32:
extentPtr = &tmpExtent
cExtent = unsafe.Pointer(&tmpExtent)
case uint64:
extentPtr = &tmpExtent
cExtent = unsafe.Pointer(&tmpExtent)
case float32:
extentPtr = &tmpExtent
cExtent = unsafe.Pointer(&tmpExtent)
case float64:
extentPtr = &tmpExtent
cExtent = unsafe.Pointer(&tmpExtent)
case bool:
extentPtr = &tmpExtent
cExtent = unsafe.Pointer(&tmpExtent)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i love go

@anastasop anastasop merged commit 1b1b761 into master Oct 9, 2023
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants