Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

duplicate key error (E11000) #218

Open
gbrd opened this issue Feb 1, 2019 · 14 comments
Open

duplicate key error (E11000) #218

gbrd opened this issue Feb 1, 2019 · 14 comments

Comments

@gbrd
Copy link
Contributor

gbrd commented Feb 1, 2019

[Edited] problem seems not linked to scala driver.

I have "duplicate key error (E11000)" errors :

com.mongodb.MongoBulkWriteException: Bulk write operation error on server XXX:27017. Write errors: [BulkWriteError{index=0, code=11000, message='E11000 duplicate key error collection: akka-persistence.akka_persistence_journal_-foo-bar index: akka_persistence_journal_index dup key: { : "-foo-bar_toto_titi", : 55, : 55 }', details={ }}]

(build with openjdk8, running on openjdk11)
version 2.2.2
("com.github.scullxbones" %% "akka-persistence-mongo-scala" % "2.2.2")

I did not took time to investigate for now.

I guess stack trace will not help bu just in case:

. 
	at com.mongodb.connection.BulkWriteBatchCombiner.getError(BulkWriteBatchCombiner.java:176)
	at com.mongodb.operation.BulkWriteBatch.getError(BulkWriteBatch.java:235)
	at com.mongodb.operation.MixedBulkWriteOperation.addBatchResult(MixedBulkWriteOperation.java:483)
	at com.mongodb.operation.MixedBulkWriteOperation.access$1500(MixedBulkWriteOperation.java:67)
	at com.mongodb.operation.MixedBulkWriteOperation$6.onResult(MixedBulkWriteOperation.java:468)
	at com.mongodb.operation.MixedBulkWriteOperation$6.onResult(MixedBulkWriteOperation.java:443)
	at com.mongodb.internal.async.ErrorHandlingResultCallback.onResult(ErrorHandlingResultCallback.java:49)
	at com.mongodb.internal.connection.DefaultServer$DefaultServerProtocolExecutor$2.onResult(DefaultServer.java:227)
	at com.mongodb.internal.async.ErrorHandlingResultCallback.onResult(ErrorHandlingResultCallback.java:49)
	at com.mongodb.internal.connection.CommandProtocolImpl$1.onResult(CommandProtocolImpl.java:85)
	at com.mongodb.internal.connection.DefaultConnectionPool$PooledConnection$1.onResult(DefaultConnectionPool.java:461)
	at com.mongodb.internal.connection.UsageTrackingInternalConnection$2.onResult(UsageTrackingInternalConnection.java:111)
	at com.mongodb.internal.async.ErrorHandlingResultCallback.onResult(ErrorHandlingResultCallback.java:49)
	at com.mongodb.internal.connection.InternalStreamConnection$2$1.onResult(InternalStreamConnection.java:379)
	at com.mongodb.internal.connection.InternalStreamConnection$2$1.onResult(InternalStreamConnection.java:356)
	at com.mongodb.internal.connection.InternalStreamConnection$MessageHeaderCallback$MessageCallback.onResult(InternalStreamConnection.java:651)
	at com.mongodb.internal.connection.InternalStreamConnection$MessageHeaderCallback$MessageCallback.onResult(InternalStreamConnection.java:618)
	at com.mongodb.internal.connection.InternalStreamConnection$5.completed(InternalStreamConnection.java:494)
	at com.mongodb.internal.connection.InternalStreamConnection$5.completed(InternalStreamConnection.java:491)
	at com.mongodb.internal.connection.AsynchronousSocketChannelStream$BasicCompletionHandler.completed(AsynchronousSocketChannelStream.java:237)
	at com.mongodb.internal.connection.AsynchronousSocketChannelStream$BasicCompletionHandler.completed(AsynchronousSocketChannelStream.java:220)
	at java.base/sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:127)
	at java.base/sun.nio.ch.Invoker.invokeDirect(Invoker.java:158)
	at java.base/sun.nio.ch.UnixAsynchronousSocketChannelImpl.implRead(UnixAsynchronousSocketChannelImpl.java:560)
	at java.base/sun.nio.ch.AsynchronousSocketChannelImpl.read(AsynchronousSocketChannelImpl.java:277)
	at java.base/sun.nio.ch.AsynchronousSocketChannelImpl.read(AsynchronousSocketChannelImpl.java:298)
	at com.mongodb.internal.connection.AsynchronousSocketChannelStream.readAsync(AsynchronousSocketChannelStream.java:132)
	at com.mongodb.internal.connection.InternalStreamConnection.readAsync(InternalStreamConnection.java:491)
	at com.mongodb.internal.connection.InternalStreamConnection.access$1000(InternalStreamConnection.java:74)
	at com.mongodb.internal.connection.InternalStreamConnection$MessageHeaderCallback.onResult(InternalStreamConnection.java:608)
	at com.mongodb.internal.connection.InternalStreamConnection$MessageHeaderCallback.onResult(InternalStreamConnection.java:593)
	at com.mongodb.internal.connection.InternalStreamConnection$5.completed(InternalStreamConnection.java:494)
	at com.mongodb.internal.connection.InternalStreamConnection$5.completed(InternalStreamConnection.java:491)
	at com.mongodb.internal.connection.AsynchronousSocketChannelStream$BasicCompletionHandler.completed(AsynchronousSocketChannelStream.java:237)
	at com.mongodb.internal.connection.AsynchronousSocketChannelStream$BasicCompletionHandler.completed(AsynchronousSocketChannelStream.java:220)
	at java.base/sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:127)
	at java.base/sun.nio.ch.Invoker$2.run(Invoker.java:219)
	at java.base/sun.nio.ch.AsynchronousChannelGroupImpl$1.run(AsynchronousChannelGroupImpl.java:112)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecuto
@gbrd gbrd changed the title duplicate key error (E11000) with scala mongo driver duplicate key error (E11000) Feb 7, 2019
@gbrd
Copy link
Contributor Author

gbrd commented Feb 7, 2019

Supposition (based on real observation) :

  • database load is high
  • On a persistent actor, 1 "persist" times out ...
  • .... but write order remains in queue (in mongo buffer, on incoming network card buffer...)
  • the persistent actor stop itself (because of persist failure)
  • it is restarted (by application)
  • it reloads it's journal but does not seen the last persist that is still 'in queue"
  • it tries to persist an event
  • the write is enqueued after the other (with the same sequence number)
  • ... and ends with duplicate key

is it really possible ?

I seen another problem : on the first persist failure (due to timeout) is is "normal" that circuits breaker open. But on the second one, we have an issue on a particular "resource" not on all database accesses and circuit breaker will probably open and impact all database operations, could we avoid that ? Is it event possible with akka persistence plugin API ?
Should I open a separate issue ?

@gbrd
Copy link
Contributor Author

gbrd commented Feb 8, 2019

Hum ok, in fact it's a well known subject :
https://doc.akka.io/docs/akka/2.5/persistence-journals.html

* Please also note that requests for the highest sequence number may be made concurrently
 * to this call executing for the same `persistenceId`, in particular it is possible that
 * a restarting actor tries to recover before its outstanding writes have completed. In
 * the latter case it is highly desirable to defer reading the highest sequence number
 * until all outstanding writes have completed, otherwise the PersistentActor may reuse
 * sequence numbers.

Thus it may depends on write/read concern, and read preference on mongodb to be able to see it's own write...

@scullxbones
Copy link
Owner

Which write concern are you using?

I've never seen this behavior, but I don't have a super high rate of messages or especially recovery. I do use journaled write concern.

@gbrd
Copy link
Contributor Author

gbrd commented Feb 8, 2019

Acknowledge
mongo 3.0
replicaset = 1 primary and 3 secondary (1 is "hidden")

Read preference is read on primary by default. it is overridden by the plug-in ?

@scullxbones
Copy link
Owner

I don't think the plugin does anything to actively touch read preference.

Acknowledged write concern does have some consistency tradeoffs for performance, enough to scare me off from using that mode in production for whatever that's worth.

@gbrd
Copy link
Contributor Author

gbrd commented Feb 8, 2019

Ok.
But we still have a problem whatever the writeconcern if "persist" operation times out while the write on mongo is still on "queue" (not taken into account). And then the persistent actor restart (maybe on another server using another journal actor...) and etc... (yes it's a (very) race condition)

Am I right ?

The question is then : how to implement : "defer reading the highest sequence number until all outstanding writes have completed"

@scullxbones
Copy link
Owner

I think the best path is to retry persist in the PA with some kind of exponential backoff. Both when the circuit breaker is open, as well as when a duplicate key exception is received.

Since only the mongo server knows when all outstanding writes have completed, and the plugin does not control how/when the library (akka-persistence) invokes maxSequence logic, I'm not seeing how it can be handled by the plugin. It would be complicated (I think) but it's possible to put in a gate delaying maxSequence calls. It's not clear to me how to know when that gate should be open or closed, without knowing the state of the mongo write queue.

@gbrd
Copy link
Contributor Author

gbrd commented Feb 11, 2019

I don't think we can retry persist: in doc of onPersistFailure we can read: The actor is always stopped after this method has been invoked.
Thus retry would be to let actor crash on launch it again ; that's what we currently do naturally, and it could be acceptable.
The thing that might be a problem is that the persistent actor could remain live without writing data, and will see newer data when it will restart.

A solution that I see could be to use write-concern=majority for all writes and read-concern=linearisable when reading highest sequence number. This would cover that cases where write was acknoledged, for the race condition where it times out, we could just erase the previous event on second persist (after duplicate-key error). This solution is certainly not ideal because it would mask some other errors.

We may ask to @patriknw how it is managed in cassandra plugin ?
In code here:
https://github.com/akka/akka-persistence-cassandra/blob/master/core/src/main/scala/akka/persistence/cassandra/journal/CassandraJournal.scala

I see that there is a data structure writeInProgress used to defer read of maxSeqNum but I guess it will not work if the PA is recreated on another server in an akka cluster (which will have a separate writeInProgress collection).

@scullxbones
Copy link
Owner

I don't think we can retry persist: in doc of onPersistFailure we can read: The actor is always stopped after this method has been invoked.
Thus retry would be to let actor crash on launch it again ; that's what we currently do naturally, and it could be acceptable.

Ah yes of course - good point. You could exponential backoff if you had a request/reply command protocol on the outside, but not inside the PA. The advantage of this is that the maxSequenceNr would eventually increment and no longer trigger a unique index exception. That said, the events would need to be idempotent if they will potentially be duplicated.

A solution that I see could be to use write-concern=majority for all writes and read-concern=linearisable when reading highest sequence number.

Is your replica set flapping / doing master re-elections frequently? If not, is there a way to figure out why the database is so slow as to cause the circuit breaker to time out? Perhaps the logs, if slow logging is enabled. I know if the mongo working set is bigger than available RAM it bogs down pretty seriously.

@gbrd
Copy link
Contributor Author

gbrd commented Feb 14, 2019

We don't have re-elections frequently, I was just trying to find a solution where overwriting could be acceptable (and thus in all cases - even with master changes - if first write was acknowledged it is not acceptable to overwrite).

I'm still working on this but it seems this we have a part of persist failures due to other causes, (probably a problem with akka-cluster-sharding that ends with 2 PA with the same persistent id at the same time...) I'm still investigating, but this part has probably no link with the plugin.

@scullxbones
Copy link
Owner

probably a problem with akka-cluster-sharding that ends with 2 PA with the same persistent id at the same time

Ouch - ok, so a network partition split-brain. That would definitely cause constraint failures even by itself. Are you using a cluster downing library to take down isolated nodes? There's the lightbend subscription that provides one, and I think there's also a few OSS ones on github.

@gbrd
Copy link
Contributor Author

gbrd commented Feb 14, 2019

We have our own implementation of split brain resolver. (implemented before OSS one is available)
...But the cluster itself is not split, and I've reproduced a complex situation where a shard is allocated at 2 servers at the same time. Still under investigation...

@patriknw
Copy link

It's right that the writeInProgress in the Cassandra plugin protects against PA restarts while there is still writes in flight from same PA, but it's only for restarts of PA on same node (same journal Actor instance). For restart on another node the thinking is that such fail over would anyway take long time (10s of seconds) and by that time any pending writes from the old node should have been completed.

@gbrd
Copy link
Contributor Author

gbrd commented Feb 25, 2019

Ok. Our main issue was a bad configuration of serializers that made journal event not replayed (only snapshots) for internal akka cluster sharding data and lead to complex situations...

But the issue described initially here remains present and I will continue to work on it.
Thanks @patriknw for your answer, maybe a similar solution would fix this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants