Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GODRIVER-2388 Improved Bulk Write API. #1884

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

qingyang-hu
Copy link
Collaborator

GODRIVER-2388
GODRIVER-3348
GODRIVER-3349
GODRIVER-3364

Summary

Improved Bulk Write API.

Background & Motivation

Refactor the (Operation).createWireMessage() to support the bulk write batching.

@mongodb-drivers-pr-bot mongodb-drivers-pr-bot bot added the priority-3-low Low Priority PR for Review label Nov 5, 2024
Copy link
Contributor

mongodb-drivers-pr-bot bot commented Nov 5, 2024

API Change Report

./v2/mongo

compatible changes

(*Client).BulkWrite: added
ClientBulkWriteDeleteResult: added
ClientBulkWriteException: added
ClientBulkWriteInsertResult: added
ClientBulkWriteResult: added
ClientBulkWriteUpdateResult: added
ClientDeleteManyModel: added
ClientDeleteOneModel: added
ClientInsertOneModel: added
ClientReplaceOneModel: added
ClientUpdateManyModel: added
ClientUpdateOneModel: added
ClientWriteModels: added
NewClientDeleteManyModel: added
NewClientDeleteOneModel: added
NewClientInsertOneModel: added
NewClientReplaceOneModel: added
NewClientUpdateManyModel: added
NewClientUpdateOneModel: added

./v2/mongo/options

incompatible changes

(*DistinctOptionsBuilder).SetHint: removed
DistinctOptions.Hint: removed

compatible changes

ClientBulkWrite: added
ClientBulkWriteOptions: added
ClientBulkWriteOptionsBuilder: added

./v2/x/mongo/driver

incompatible changes

(*Batches).AdvanceBatch: removed
(*Batches).ClearBatch: removed
(*Batches).Valid: removed
Batches.Current: removed
##NewCursorResponse: changed from func(ResponseInfo) (CursorResponse, error) to func(./v2/x/bsonx/bsoncore.Document, ResponseInfo) (CursorResponse, error)
Operation.Batches: changed from *Batches to interface{AdvanceBatches(n int); AppendBatchArray(dst []byte, maxCount int, maxDocSize int, totalSize int) (int, []byte, error); AppendBatchSequence(dst []byte, maxCount int, maxDocSize int, totalSize int) (int, []byte, error); IsOrdered() *bool; Size() int}
##Operation.ProcessResponseFn: changed from func(ResponseInfo) error to func(context.Context, ./v2/x/bsonx/bsoncore.Document, ResponseInfo) error
ResponseInfo.ServerResponse: removed

compatible changes

(*Batches).AdvanceBatches: added
(*Batches).AppendBatchArray: added
(*Batches).AppendBatchSequence: added
(*Batches).IsOrdered: added
(*Batches).Size: added
ExtractCursorDocument: added
ResponseInfo.Error: added

./v2/x/mongo/driver/operation

incompatible changes

(*Distinct).Hint: removed

./v2/x/mongo/driver/session

incompatible changes

Client.RetryRead: removed
Client.RetryWrite: removed

./v2/x/mongo/driver/wiremessage

compatible changes

DocumentSequenceToArray: added

@@ -398,7 +398,7 @@ func TestClientSideEncryptionCustomCrypt(t *testing.T) {
"expected 0 calls to DecryptExplicit, got %v", cc.numDecryptExplicitCalls)
assert.Equal(mt, cc.numCloseCalls, 0,
"expected 0 calls to Close, got %v", cc.numCloseCalls)
assert.Equal(mt, cc.numBypassAutoEncryptionCalls, 2,
assert.Equal(mt, cc.numBypassAutoEncryptionCalls, 1,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only call it once after the operation.go refactoring.

// A top-level error that occurred when attempting to communicate with the server
// or execute the bulk write. This value may not be populated if the exception was
// thrown due to errors occurring on individual writes.
TopLevelError *WriteError
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot use Error as a field name because of the conflict with the conventional method name.

@qingyang-hu qingyang-hu marked this pull request as ready for review November 5, 2024 20:18
@qingyang-hu qingyang-hu added priority-2-medium Medium Priority PR for Review and removed priority-3-low Low Priority PR for Review labels Nov 5, 2024
mongo/errors.go Show resolved Hide resolved
mongo/client.go Show resolved Hide resolved
mongo/client.go Outdated Show resolved Hide resolved
}

// AppendInsertOne appends ClientInsertOneModels.
func (m *ClientWriteModels) AppendInsertOne(database, collection string, models ...*ClientInsertOneModel) *ClientWriteModels {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest abstracting the Append* methods:

type clientBulkWriteModel interface {
	ClientInsertOneModel
}

// appendModels is a helper function to append models to ClientWriteModels.
func appendModels[T clientBulkWriteModel](m *ClientWriteModels, database, collection string, models []*T) *ClientWriteModels {
	if m == nil {
		m = &ClientWriteModels{}
	}
	for _, model := range models {
		m.models = append(m.models, clientWriteModel{
			namespace: fmt.Sprintf("%s.%s", database, collection),
			model:     model,
		})
	}
	return m
}

}
type clientWriteModel struct {
namespace string
model interface{}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add stronger type constraints to this?

type clientBulkWriteModel interface {
	ClientInsertOneModel // etc.
}

type clientWriteModel struct {
	namespace string
	model     clientBulkWriteModel
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need an additional abstraction for an un-exported struct.

mongo/results.go Show resolved Hide resolved
mongo/results.go Outdated Show resolved Hide resolved
mongo/results.go Outdated Show resolved Hide resolved
mongo/results.go Outdated Show resolved Hide resolved
}

// Error implements the error interface.
func (bwe ClientBulkWriteException) Error() string {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function doesn't return an error if the write is unacknowledged. The specifications required that users be able to discern whether a BulkWriteResult contains acknowledged results. Either return an error indicating an unacknowledged result, or update ClientBulkWriteResult in the spirit of GODRIVER-2821.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can test this with the following:

package main

import (
	"context"

	"go.mongodb.org/mongo-driver/v2/bson"
	"go.mongodb.org/mongo-driver/v2/mongo"
	"go.mongodb.org/mongo-driver/v2/mongo/options"
	"go.mongodb.org/mongo-driver/v2/mongo/writeconcern"
)

func main() {
	client, err := mongo.Connect()
	if err != nil {
		panic(err)
	}

	defer func() { _ = client.Disconnect(context.Background()) }()

	pairs := &mongo.ClientWriteModels{}

	insertOneModel := mongo.NewClientInsertOneModel().SetDocument(bson.D{{"x", 1}})

	opts := options.ClientBulkWrite().SetWriteConcern(writeconcern.Unacknowledged()).SetOrdered(false)

	pairs = pairs.AppendInsertOne("db", "k", insertOneModel)
	_, err = client.BulkWrite(context.Background(), pairs, opts) // Should not panic
	if err != nil {
		panic(err)
	}
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a unified spec test that covers this case? If not we should add one / add an integration test.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +300 to +302
if filter == nil {
return nil, fmt.Errorf("%w: filter is required", err)
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update the error message when the filter is not given.

Comment on lines +435 to +437
if doc.filter == nil {
return nil, fmt.Errorf("%w: filter is required", err)
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update the error message when the filter is not given.

@qingyang-hu qingyang-hu force-pushed the godriver2388v2 branch 4 times, most recently from 4bc724e to dbd44c9 Compare November 15, 2024 23:14
Copy link
Collaborator

@prestonvasquez prestonvasquez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qingyang-hu There are still outstanding issues from the previous review.

@@ -13,6 +13,55 @@ import (
"go.mongodb.org/mongo-driver/v2/x/mongo/driver/operation"
)

// ClientBulkWriteResult is the result type returned by a client-level BulkWrite operation.
type ClientBulkWriteResult struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specifications say that "Users MUST be able to discern whether a [result] contains verbose results without inspecting the value provided for verboseResults in [options]". Does this mean we should add a boolean value ClientBulkWriteResult: HasVerboseResults?

Copy link
Collaborator Author

@qingyang-hu qingyang-hu Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial thought was to leave results maps as nil when verboseResults is false. However, I think you are right that an additional HasVerboseResults field is more obvious.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, either solution sounds good to me.

},
}
var n int
n, _, err = batches.AppendBatchSequence(nil, 4, 16_000, 16_000)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the significance of 16_000 here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is just a number big enough not to cut the document, so only the maxCount regulates the output. Will add a comment there.

var idx int32
dst = wiremessage.AppendMsgSectionType(dst, wiremessage.DocumentSequence)
idx, dst = bsoncore.ReserveLength(dst)
dst = append(dst, identifier...)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the specifications:

The first entry in each document has the name of the operation as its key and the index ini the nsInfo array of the namespace on which the operation should be performed as its value

When I do command monitoring for client bulk write with multiple pairs, I get the following:

2024/11/18 14:39:47 started: &{Command:{"bulkWrite": {"$numberInt":"1"},"errorsOnly": false,"ordered": true,"lsid": {"id": {"$binary":{"base64":"XTDtLVGhTx6MEIcFDhf0qw==","subType":"04"}}},"txnNumber": {"$numberLong":"1"},"$clusterTime": {"clusterTime": {"$timestamp":{"t":1731965987,"i":1}},"signature": {"hash": {"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"00"}},"keyId": {"$numberLong":"0"}}},"$db": "admin","ops": [{"insert": {"$numberInt":"0"},"document": {"_id": {"$oid":"673bb423a86efe4126c4c585"},"x": {"$numberInt":"1"}}},{"insert": {"$numberInt":"0"},"document": {"_id": {"$oid":"673bb423a86efe4126c4c586"},"x": {"$numberInt":"2"}}}],"nsInfo": [{"ns": "db.coll"}]} DatabaseName:admin CommandName:bulkWrite RequestID:1 ConnectionID:localhost:27017[-4] ServerConnectionID:0x14000390250 ServiceID:<nil>}

Where the index value for each document in the sequence is {"$numberInt":"0"}. Shouldn't this be {"$numberInt":"0"}, then {"$numberInt":"1"}, etc?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same thing occurs with the other client bulk write operations.

Copy link
Collaborator Author

@qingyang-hu qingyang-hu Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean the Int32 value of the operation name such as "insert", "update", or "delete"?

...
"ops":[
    {
        "insert":{
            "$numberInt":"0"
        },
...

It is "the index in the nsInfo array of the namespace on which the operation should be performed as its value".

The specs also require:

When constructing the nsInfo array for a bulkWrite batch, drivers MUST only include the namespaces that are referenced in the ops array for that batch.

and:

Drivers MUST NOT include duplicate namespaces in this list.

Therefore, if both operations perform on the same namespace, the nsInfo array should contain only one item, and both operation indices are 0, pointing to "db.coll".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for the explanation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority-2-medium Medium Priority PR for Review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants