connectors icon indicating copy to clipboard operation
connectors copied to clipboard

No documents synced if source table does not have Primary Keys

Open lio-p opened this issue 3 years ago • 13 comments

Bug Description

Using MySQL Connector, if the source table does not have primary keys, no documents are being synced. No messages explain what happened.

To Reproduce

  • Create a table with no PK.
CREATE TABLE faq_no_pk (
    title TEXT,
    content TEXT
)
  • Ingest documents to the table.
  • In Kibana, create a new Elasticsearch index and choose MySQL Connector as a connector.
  • Sync contents with the database
  • No documents are being synced

Expected behavior

I understand that it's not working probably because Elasticsearch relies on the primary keys to track the synchronization state. But I think it would be interesting to have a message that explains how many documents have been omitted.

Environment

  • Version 8.6.1

lio-p avatar Feb 14 '23 11:02 lio-p

A primary key is mandatory, but missing user feedback seems to be a bug. According to the design we should log an warning/error mentioning which table we're skipping, ideally also a suggestion for a possible fix e.g. consider adding an auto-increment key. The warning / error should be looped back to Kibana. cc: @moxarth-elastic

danajuratoni avatar Feb 14 '23 14:02 danajuratoni

According to the design we should log an warning/error mentioning which table we're skipping,

@danajuratoni We have that logger message where we inform user that we have skipped that table because it has no primary key.

possible fix e.g. consider adding an auto-increment key.

You mean do you want us to add a random unique id to the elastic document when primary key is not attached, in this case all the document will be reindex in each sync. OR You mean it would done at the database level by end user.

parth-crest avatar Feb 15 '23 05:02 parth-crest

Cool, for connector clients, that log message you linked is exactly what we need. For the native connector we'd need a similar warning fed back to Kibana.

danajuratoni avatar Feb 15 '23 19:02 danajuratoni

This probably calls for adding a 'warnings' or 'messages' field to the connector protocol to communicate these kinds of things, maybe with a severity (log/warn/error)? Would definitely be useful in the jobs index, I'm not sure whether that would be as useful in the connector document itself.

Adding a possible 'warning' status to the connector and job would help, too.

sphilipse avatar Mar 02 '23 15:03 sphilipse

Agree, 'warning' status needs to be added at the framework level and the connector should be able to pass the messages via any method that appends the message to the 'warning' field.

@tarekziade what are your thoughts on this ?

parth-crest avatar Mar 03 '23 09:03 parth-crest

Excellent idea - I would use the standard levels though for the severity field, so info, warning error critical etc.

tarekziade avatar Mar 03 '23 15:03 tarekziade

We'd probably get a structure like this:

messages: [
  { severity: 'warning', message: 'Skipped table due to missing primary key'},
  { severity: 'info', message: 'Synced five tables with 120 entries}
]

sphilipse avatar Mar 03 '23 16:03 sphilipse

Marking this issue as blocked by the work needed to support additional severity warnings. Postponed to 8.8.

danajuratoni avatar Mar 10 '23 20:03 danajuratoni

Moving this to 8.9, as work for additional severity warning levels is postponed.

danajuratoni avatar May 04 '23 10:05 danajuratoni

@danajuratoni Is there any update on this considering the 8.9 release?

khushbu-elastic avatar Jul 31 '23 07:07 khushbu-elastic

@danajuratoni Please let us know if there is any update on this?

khushbu-elastic avatar Aug 07 '23 07:08 khushbu-elastic

Moving this to 8.9, as work for additional severity warning levels is postponed.

Is the intention to add support for additional indexes or simply improve the warning message? It would be helpful for us if we were able to specify alternative indexes as we are currently unable to index materialize views due to this constraint.

chantzlarge avatar Jan 02 '24 14:01 chantzlarge

I agree, it would be good for us to revisit these constraints, and find a way to allow the user to provide a column/function to be used as _id so that views and even arbitrary joins could be indexed. This relates to https://github.com/elastic/connectors/issues/2002

seanstory avatar Jan 15 '24 19:01 seanstory