Retry hangfire jobs
Codecov Report
Attention: Patch coverage is 49.26591% with 311 lines in your changes missing coverage. Please review.
Project coverage is 67.04%. Comparing base (
1011777) to head (e52be9d).
Additional details and impacted files
@@ Coverage Diff @@
## master #197 +/- ##
==========================================
- Coverage 67.30% 67.04% -0.26%
==========================================
Files 447 455 +8
Lines 35432 35925 +493
Branches 4736 4762 +26
==========================================
+ Hits 23848 24087 +239
- Misses 10496 10745 +249
- Partials 1088 1093 +5
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Is there a good way to? I couldn't think of a way to convert function calls to strings (or even binary data) that could be stored in a mongoDB. Unless there is some package that can auto-grab the GRPC calls and make them into an outbox, I don't have a good solution. I couldn't find one in my quick search.
src/SIL.Machine.AspNetCore/Configuration/IMachineBuilderExtensions.cs line 270 at r1 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
The naming convention for collections is a plural noun, so if we change the name of the doc to
OutboxMessage, it would beoutbox_messages.
Done.
src/SIL.Machine.AspNetCore/Models/PlatformMessage.cs line 3 at r1 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
We might want to use this for other messages in the future, so I would just call this
MessageorOutboxMessage.
Done.
src/SIL.Machine.AspNetCore/Models/PlatformMessage.cs line 7 at r1 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
I think an enum would be better.
Done.
src/SIL.Machine.AspNetCore/Services/PlatformMessageOutboxService.cs line 5 at r1 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
You should separate the hosted service from the service that enqueues messages. The hosted service should only run on a single server (the engine server). Messages can be enqueued from any server. You won't be able to use the
_messageInOutboxflag. You also won't need to lock in theProcessMessagesAsyncmethod.
Ok - I'll subscribe to changes in the MongoDB instead to have a quick turn time.
src/SIL.Machine.AspNetCore/Services/PlatformMessageOutboxService.cs line 67 at r1 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
Optimally, we would watch the collection for changes using the
SubscribeAsyncmethod. If we use polling, it might be more efficient to useExistsAsyncto check if the outbox is empty.
Switched to a subscribe - and checking first with Exists.
src/SIL.Machine.AspNetCore/Services/PlatformMessageOutboxService.cs line 88 at r1 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
You can drop the
Asyncfrom the names.
Ok - I see that Async is not relevant when we are queueing up the message to be sent.
src/SIL.Machine.AspNetCore/Services/PlatformMessageOutboxService.cs line 205 at r1 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
You can use the
Incoperator.
Done
src/SIL.Machine.AspNetCore/Configuration/IMachineBuilderExtensions.cs line 183 at r2 (raw file):
.UseRecommendedSerializerSettings() .UseMongoStorage(connectionString, GetMongoStorageOptions()) .UseFilter(new AutomaticRetryAttribute { Attempts = 3 })
Here is what is likely the issue: https://discuss.hangfire.io/t/recurring-jobs-do-not-automatically-get-retried-after-application-crash-net-core-service/9160/2
Ok fixed.
src/SIL.Machine.AspNetCore/Services/IMessageOutboxService.cs line 5 at r3 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
We need to ensure that messages are delivered in order for a particular build. I would pass in a
groupIdto this method that is used to ensure that all messages with the same group id are delivered in order. In the case of a build, thegroupIdwould be the build id.
Major updates.
src/SIL.Machine.AspNetCore/Services/MessageOutboxHandlerService.cs line 39 at r3 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
It could potentially fail 5 times very quickly. An expiration time would be better.
We should talk about this - what is the desired behavior if a message is unable to be sent and why? Why would we be retrying multiple times and what might resolve itself - and how we should respond?
src/SIL.Machine.AspNetCore/Services/MessageOutboxHandlerService.cs line 38 at r4 (raw file):
m => m.GroupId, m => m, (key, element) => element.OrderBy(m => m.Created).ToList()
From MongoDB:
Returns a new [ObjectId](https://www.mongodb.com/docs/manual/reference/bson-types/#std-label-objectid). The 12-byte [ObjectId](https://www.mongodb.com/docs/manual/reference/bson-types/#std-label-objectid) consists of:
- A 4-byte timestamp, representing the ObjectId's creation, measured in seconds since the Unix epoch.
- A 5-byte random value generated once per process. This random value is unique to the machine and process.
- A 3-byte incrementing counter, initialized to a random value.
Since we only have one process, it will be the seconds and the counter that determine the order, which should work. In contrast, UtcNow relies on the underlying system timer, which may be up to 15ms - potentially creating a conflict.
src/SIL.Machine.AspNetCore/Services/MessageOutboxHandlerService.cs line 39 at r3 (raw file):
Previously, johnml1135 (John Lambert) wrote…
We should talk about this - what is the desired behavior if a message is unable to be sent and why? Why would we be retrying multiple times and what might resolve itself - and how we should respond?
4 days - basically, how long could the Serval server conceivably be down?
src/SIL.Machine.AspNetCore/Services/ServalPlatformService.cs line 118 at r3 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
I'm concerned about the size of the MongoDB document. This could potentially get very big and there is a size limit on a MongoDB document. Maybe we could keep the
pretranslate.trg.jsonfile until we have successfully sent the pretranslations to Serval.
Saved the large files to hard drive
src/SIL.Machine.AspNetCore/Services/MessageOutboxHandlerService.cs line 38 at r4 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
This seems like a relevant quote from the documentation:
While ObjectId values should increase over time, they are not necessarily monotonic. This is because they:
Only contain one second of temporal resolution, so ObjectId values created within the same second do not have a guaranteed ordering, and
Are generated by clients, which may have differing system clocks.
Unfortunately, this means we can't rely on the
ObjectId. This means we need to use a sequence number to guarantee the ordering. We have two options:
- Create an outbox collection that stores the current outbox state. The outbox state would contain the next sequence number. Use the
UpdateAsyncmethod to increment the sequence number and return the new value. The MongoDB update should be atomic.- Create a unique index on the sequence number field in the message collection. In a loop, query the database for the message with the max sequence number, increment the number, and attempt to insert the message with the new sequence number.
I think I would prefer option 1, since it doesn't require a loop and could be used to store other metadata about the outbox. It would also allow us to have multiple outboxes, which could be useful.
Since we only have one client generating these (that is, for one build or engine ID, which is what we are concerned about) I believe the current solution should be acceptable.
src/SIL.Machine.AspNetCore/Models/Sequence.cs line 3 at r8 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
This should be named
MessageOutbox.
So the message index should be named MessageOutbox and the message itself should be OutboxMessage? That appears confusing. What about OutboxIndex?
src/SIL.Machine.AspNetCore/Services/ClearMLMonitorService.cs line 234 at r8 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
I don't think that the lock will work properly inside of a transaction. The lock needs to be acquired and released outside of the transaction.
Done
src/SIL.Machine.AspNetCore/Services/MessageOutboxHandlerService.cs line 5 at r8 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
MessageOutboxDeliveryServiceis a clearer name.
Done.
src/SIL.Machine.AspNetCore/Services/MessageOutboxHandlerService.cs line 17 at r8 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
This should be configurable.
Done.
src/SIL.Machine.AspNetCore/Services/MessageOutboxService.cs line 38 at r8 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
I would prefer to always save the pretranslations to disk, so that the behavior is consistent and easier to debug. This would also allow us to avoid deserializing and serializing the pretranslations an extra time. The
RequestContentcould be the path to the pretranslations json file.
Done
src/SIL.Machine.AspNetCore/Services/MessageOutboxService.cs line 30 at r8 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
The order index and the message id should be separate properties.
As per our previous conversations, Since we only have one client generating these (that is, for one build or engine ID, which is what we are concerned about) I believe the current solution should be acceptable. I don't see a realistic case where there will be two separate clients updating the same build synchronously (at least for any plans for the next 2 years). Therefore, I don't see a need for the added complexity.
src/SIL.Machine.AspNetCore/Configuration/IMachineBuilderExtensions.cs line 267 at r8 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
The collection should be named
message_outboxesto be consistent with the collection naming conventions that we currently use.
As per the other comment, I believe that message_outbox and outbox_message could be confusing - one is really an index...
src/SIL.Machine.AspNetCore/Services/ClearMLMonitorService.cs line 239 at r9 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
The
UpdateTrainJobStatusand logger call don't need to happen in the lock or transaction.
Done
src/SIL.Machine.AspNetCore/Services/MessageOutboxService.cs line 30 at r8 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
All we have to do is add a separate order index property. I think that is actually much clearer. It also removes restrictions that are inherent in the design, such as supporting multiple outboxes.
Added - I see the value more now. It does make it more generic and make the properties a bit more consistent.
src/SIL.Machine.AspNetCore/Services/MessageOutboxService.cs line 38 at r8 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
I don't think we need this code anymore if we just use the existing pretranslations JSON file.
But there could be other messages we may want to save as well, including assessments and word graphs. I think this format makes sense. The file on the S3 bucket to the file ready to be sent to Serval are logically different things. We would have to copy the file locally (to some other location), give the EnqueueMessages function a path and then update the OutboxMessage to accept either string content or a filename. Instead, I removed the ".json" at the end of the filename - it is just a string that is saved and recalled when needed.
src/SIL.Machine.AspNetCore/Services/ServalPlatformService.cs line 126 at r9 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
We are unnecessarily serializing and deserializing the pretranslations here. We can just pass in the path to the pretranslations JSON file as the request content.
The request content is a string. Should we make it just a byte array instead? We will need some way to serialize it in the message or on the
src/SIL.Machine.AspNetCore/Services/ServalPlatformService.cs line 126 at r9 (raw file):
Previously, johnml1135 (John Lambert) wrote…
The request content is a string. Should we make it just a byte array instead? We will need some way to serialize it in the message or on the
Also, a Pretranslation is not a InsertPretranslationRequest, namely that the later also has the engineId.
src/SIL.Machine.AspNetCore/Configuration/IMachineBuilderExtensions.cs line 267 at r8 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
I would really like to stick with the convention. What about
outboxes?
sortable_indexes?