ejabberd icon indicating copy to clipboard operation
ejabberd copied to clipboard

PubSub: storing items into a Database cluster

Open Emilio-Gonzalez opened this issue 7 years ago • 7 comments

What version of ejabberd are you using?

18.03

What operating system (version) are you using?

CentOS 7

How did you install ejabberd (source, package, distribution)?

RPM pakage

What did not work as expected? Are there error messages in the log? What was the unexpected behavior? What was the expected result?

I use an Ejabbed cluster with 2 hosts (Ej-A and Ej-B) connected to a database cluster with 2 hosts (DB-X and DB-Y): Ej-A points to DB-X Ej-B points to DB-Y

My use case is: One user logs into Ej-A and publishes an item, which is stored in Db-X. Then, the same user logs into Ej-B and removes the same item, which should be stored in Db-Y (replicated).

Sometimes, it works fine.

The unexpected behavior is: When there is a lot of pubsub usage (I suppose this is the key) the item is not found in DB-Y. I receive 'item-not-found' but the item is in the pubsub.

I have tried with Mnesia and MySQL, but I obtain the same result. Only with MySQL and one host it works fine, but I need a database cluster for redundancy.

Emilio-Gonzalez avatar Apr 25 '18 07:04 Emilio-Gonzalez

such architecture is out of scope of ejabberd itself. with your design, how do you expect Db-X and Db-Y get the exact same data without using a giant lock waiting all replicas have the data ? I guess your master-master setup just returns query result as soon item is stored locally, and perform the replication in the background. mod_pubsub does not even know about this. When you remove that item on Db-Y, it's not yet stored in that database.

cromain avatar Apr 25 '18 08:04 cromain

There is a lot of time between the publish and the remove.

As I can see, the item is already in both database nodes when the user tries to remove it.

Emilio-Gonzalez avatar Apr 25 '18 08:04 Emilio-Gonzalez

In the use case I exposed previously the user is the pubsub node owner. The user has several resources connected, one for each ejabberd host. So the user publishes the item in Ej-A using the resource A and a different one, the resource B, to remove the item from Ej-B. Could this affect the behavior?

Emilio-Gonzalez avatar Apr 25 '18 10:04 Emilio-Gonzalez

interesting, i'll review this part

cromain avatar Apr 25 '18 10:04 cromain

Once the unexpected behavior happened and the item-not-found message is returned (code: 404, type: cancel, message: item-not-found), nobody can remove the item, not even the publisher or the node owner (the original resource). It returns the same code and message.

But the item is there, I can retrieve it.

Emilio-Gonzalez avatar Apr 26 '18 14:04 Emilio-Gonzalez

I have to delay this a bit as i don't have time to review by 18.06

cromain avatar Jun 26 '18 10:06 cromain

@prefiks: What do you think?

Neustradamus avatar Apr 13 '21 13:04 Neustradamus