mysql2sqlite support for extended inserts

#Hi. I was getting the "file is encrypted or is not a database" because I had the extended inserts in my SQL backup file (values for multiple rows were in one INSERT command). For example: INSERT INTO `table_name` VALUES (1,'value1'),(2,'values2'),(3,'value3');

So I did this patch, tested it and it worked for me. Please consider for inclusion, I don't know how to submit patches using GIT, sorry:

--- mysql2sqlite        2018-02-07 19:01:00.000000000 +0000
+++ mysql2sqlite_for_extended_insert.awk        2018-03-12 11:41:34.880073736 +0000
@@ -121,6 +121,15 @@
       NR ": WARN Hex number trimmed (length longer than 16 chars)." )
     hexIssue = 0
   }
+
+  # for extended-insert (INSERT command with multiple rows in the same line) - split into rows
+  if ( match ( $0, /^((INSERT|insert)  *(INTO|into)  *.* (VALUES|values)  *\()/, matches ) )
+  {
+#    print "start of INSERT command="matches[1];
+    gsub( /\),\(/, ");\n"matches[1] )  #replace (..),(..) with INSERT.. (..) and and with semicolon
+    gsub( /\)\); *$/, ");" )  #last tuple in the row - get rid of double parenthesis
+  }
+
   print
   next
 }

Mar 12 '18 12:03 PikMaster

@PikMaster would you have an original MySQL dump (or at least few lines of these extended insert statements)? The line you've posted seems not syntactically correct.

Also it's quite a dangerous patch - I'll include its POSIX variant, but comment it out by default as it might destroy the values.

Mar 12 '18 18:03 dumblob

Hi, I see where I got it wrong - there should be just one closing round bracket at the end, just before semi-colon. I corrected it in the post above, for future readers.

Here is the example MySQL dump:

--
-- Table structure for table `firmware_profiles_rules`
--

DROP TABLE IF EXISTS `firmware_profiles_rules`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `firmware_profiles_rules` (
  `fpr_id` int(11) NOT NULL AUTO_INCREMENT,
  `fp_id` int(11) NOT NULL,
  `shortname_rule` varchar(100) COLLATE utf8_bin DEFAULT NULL,
  `name_rule` varchar(100) COLLATE utf8_bin DEFAULT NULL,
  `type_rule` varchar(100) COLLATE utf8_bin DEFAULT NULL,
  `min_version` varchar(30) COLLATE utf8_bin NOT NULL,
  PRIMARY KEY (`fpr_id`),
  UNIQUE KEY `multi_uniq_idx` (`min_version`,`fp_id`,`shortname_rule`,`name_rule`,`type_rule`)
) ENGINE=InnoDB AUTO_INCREMENT=8 DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
/*!40101 SET character_set_client = @saved_cs_client */;

--
-- Dumping data for table `firmware_profiles_rules`
--

LOCK TABLES `firmware_profiles_rules` WRITE;
/*!40000 ALTER TABLE `firmware_profiles_rules` DISABLE KEYS */;
INSERT INTO `firmware_profiles_rules` VALUES (3,0,NULL,NULL,NULL,''),(4,0,NULL,NULL,NULL,''),(5,0,NULL,NULL,NULL,''),(6,0,NULL,NULL,NULL,''),(7,0,NULL,NULL,NULL,''),(2,1,NULL,NULL,'BIOS','1.0.0'),(1,1,'%DRAC%',NULL,NULL,'1.06.07');
/*!40000 ALTER TABLE `firmware_profiles_rules` ENABLE KEYS */;
UNLOCK TABLES;

Mar 13 '18 13:03 PikMaster

Also, I don't quite understand what do you mean by "POSIX variant", do you have a separate tree or something for POSIX and non-POSIX compliant systems? To split these tuples correctly, you would need something like a fully blown parser, as that is hard to do via regular expressions alone. Although, inspired by your post, I did some research and found this - https://stackoverflow.com/questions/4231382/regular-expression-pattern-not-matching-anywhere-in-string/4234491 So it seems to be possible, but not worth implementing it that way. However, I will try to come up with some better function for separating tuples, that will not destroy the data.

Mar 13 '18 13:03 PikMaster

Hi @dumblob here is the patch with working parser. I admit I don't fully understand your series of "gsub" replacements, but I just pulled it into my print tuple routine. Also, I don't have extended test cases, so I'm not sure if it's breaking.

--- mysql2sqlite        2018-03-13 15:32:25.662104698 +0000
+++ mysql2sqlite_for_extended_insert2.awk       2018-03-13 15:44:43.682329086 +0000
@@ -103,25 +103,81 @@
 # print all INSERT lines
 ( /^ *\(/ && /\) *[,;] *$/ ) || /^(INSERT|insert)/ {
   prev = ""
-  # single quotes are escaped by another single quote
-  gsub( /\\'/, "''" )
-  gsub( /\\'',/, "\\'," )
-  gsub( /\\n/, "\n" )
-  gsub( /\\r/, "\r" )
-  gsub( /\\"/, "\"" )
-  gsub( /\\\\/, "\\" )
-  gsub( /\\\032/, "\032" )  # substitute char
-  # sqlite3 is limited to 16 significant digits of precision
-  while( match( $0, /0x[0-9a-fA-F]{17}/ ) ){
-    hexIssue = 1
-    sub( /0x[0-9a-fA-F]+/, substr( $0, RSTART, RLENGTH-1 ), $0 )
-  }
-  if( hexIssue ){
-    printerr( \
-      NR ": WARN Hex number trimmed (length longer than 16 chars)." )
-    hexIssue = 0
+
+  # for extended-insert (INSERT command with multiple rows in the same line) - split into rows
+  if ( match ( $0, /^((INSERT|insert)  *(INTO|into)  *.* (VALUES|values)  *)(\(.*\));$/, matches ) )
+  {
+    parser_offset = RSTART -1  #offset in the line where the data to be inserted stsarts
+    tuples_to_split = matches[5]   #all multi-rows data in one variable
+    arr_len = split (tuples_to_split, arr, "")  #split the data into array, to analyze character-by-character
+    parser_round_brackets_open=0  #parser flag to track open round brackets, important from grammar point of view, ie. not quoted inside data
+    parser_single_quotes_open=0  #parser flag to track if we are currently in open single quotes
+    parser_next_char_is_escaped=0  #parser flag to track if next character is quoted, so brackets and single quotes can be ignored
+    parser_tuple_start_pos=0  #where does the currently analyzed tuple start
+    parser_tuple_end_pos=0  #where does the currently analyzed tuple ends
+    for (i=1; i<arr_len; i++)
+    {  #for every character in the rest of the line
+      c = arr[i];  #single character to be parsed
+
+      if (0==parser_next_char_is_escaped)
+      {  #character is not escaped, see if it has special meaning
+        if ("(" == c) {  #opening bracket
+          if (0==parser_single_quotes_open)
+          {  #if not inside quotes, it matters
+            if (0==parser_round_brackets_open) parser_tuple_start_pos=i;  #we see new tuple opening
+            parser_round_brackets_open++;
+          }
+        }
+        else if (")" == c) {  #closing bracket
+          if (0==parser_single_quotes_open) {  #not inside quote, then it matters
+            parser_round_brackets_open--;
+            if (0==parser_round_brackets_open) {  #if all nested brackets closed...
+              parser_tuple_end_pos=i;  #this is the end of tuple
+                # take the tuple as string
+              tuple = substr (tuples_to_split, parser_offset + parser_tuple_start_pos, parser_tuple_end_pos - parser_tuple_start_pos+1);
+              # print "tuple="tuple
+              # print "start of INSERT command="matches[1];
+
+
+              # single quotes are escaped by another single quote
+              gsub( /\\'/, "''", tuple )
+              gsub( /\\'',/, "\\',", tuple )
+              gsub( /\\n/, "\n", tuple )
+              gsub( /\\r/, "\r", tuple )
+              gsub( /\\"/, "\"", tuple )
+              gsub( /\\\\/, "\\", tuple )
+              gsub( /\\\032/, "\032", tuple )  # substitute char
+              # sqlite3 is limited to 16 significant digits of precision
+              while( match( tuple, /0x[0-9a-fA-F]{17}/ ) ){
+                hexIssue = 1
+                sub( /0x[0-9a-fA-F]+/, substr( tuple, RSTART, RLENGTH-1 ), tuple )
+              }
+              if( hexIssue ){
+                printerr( \
+                  NR ": WARN Hex number trimmed (length longer than 16 chars)." )
+                hexIssue = 0
+              }
+
+              print matches[1] tuple";"  #replace (..),(..) with INSERT.. (..) and and with semicolon
+            }
+          }
+        }
+        else if ("'" == c) {  #single quote
+          if (1==parser_single_quotes_open) parser_single_quotes_open--;  #closing the open quote
+          else parser_single_quotes_open=1  #start a single quote
+        }
+        else if ("\\" == c) parser_next_char_is_escaped=1;  #backslash escapes special meaning of bracket or single quote
+      }
+      else
+      {  #character is escaped, take it literally
+        parser_next_char_is_escaped=0  #next character is not escaped
+      }
+
+      # debug print for parser - prints every character processed and the parser flags
+#      print c" parser_round_brackets_open="parser_round_brackets_open" parser_single_quotes_open="parser_single_quotes_open" parser_next_char_is_escaped="parser_next_char_is_escaped;
+    }  #for every character in the rest of the line
   }
-  print
+
   next
 }

This is the test SQL dump file I was using:

LOCK TABLES `firmware_profiles_rules` WRITE;
/*!40000 ALTER TABLE `firmware_profiles_rules` DISABLE KEYS */;
INSERT INTO `firmware_profiles_rules` VALUES (3,0,NULL,NULL,NULL,'nothing'),(4,0,NULL,NULL,NULL,'kot\'ma),(ale'),(5,0,NULL,NULL,NULL,'(ala),(ma,(kota)'),(6,0,NULL,NULL,NULL,''),(7,0,NULL,NULL,NULL,''),(2,1,NULL,NULL,'BIOS','1.0.0'),(1,1,'%DRAC%',NULL,NULL,'1.06.07');
INSERT INTO `firmware_profiles_rules` VALUES (9,0,NULL,NULL,NULL,'');
/*!40000 ALTER TABLE `firmware_profiles_rules` ENABLE KEYS */;
UNLOCK TABLES;

Here is the output:

# ./mysql2sqlite_for_extended_insert2.awk example.backup.with.extended.inserts.sql
PRAGMA synchronous = OFF;
PRAGMA journal_mode = MEMORY;
BEGIN TRANSACTION;
CREATE TABLE `firmware_profiles_rules` (
  `fpr_id` integer NOT NULL PRIMARY KEY AUTOINCREMENT
,  `fp_id` integer NOT NULL
,  `shortname_rule` varchar(100) DEFAULT NULL
,  `name_rule` varchar(100) DEFAULT NULL
,  `type_rule` varchar(100) DEFAULT NULL
,  `min_version` varchar(30) NOT NULL
,  UNIQUE (`min_version`,`fp_id`,`shortname_rule`,`name_rule`,`type_rule`)
);
INSERT INTO `firmware_profiles_rules` VALUES (3,0,NULL,NULL,NULL,'nothing');
INSERT INTO `firmware_profiles_rules` VALUES (4,0,NULL,NULL,NULL,'kot''ma),(ale');
INSERT INTO `firmware_profiles_rules` VALUES (5,0,NULL,NULL,NULL,'(ala),(ma,(kota)');
INSERT INTO `firmware_profiles_rules` VALUES (6,0,NULL,NULL,NULL,'');
INSERT INTO `firmware_profiles_rules` VALUES (7,0,NULL,NULL,NULL,'');
INSERT INTO `firmware_profiles_rules` VALUES (2,1,NULL,NULL,'BIOS','1.0.0');
END TRANSACTION;

Mar 13 '18 15:03 PikMaster

Also, I don't quite understand what do you mean by "POSIX variant", do you have a separate tree or something for POSIX and non-POSIX compliant systems?

It's just that this script is POSIX-compliant and you're using match( $0, /ere/, x ), which is not POSIX (one needs to use match( $0, /ere/ ) and then substr( $0, RSTART, RLENGTH )).

Hi @dumblob here is the patch with working parser...

Wow, I'll need some time to review its edge cases. I think we'll need to rewrite it making functions like skip_to_next_dquote() out of it. This script is used by thousands of users so we can't afford to break something like an insert statement :wink:.

I admit I don't fully understand your series of "gsub" replacements, but I just pulled it into my print tuple routine.

Those are necessary as the SQlite syntax is different and also supports e.g. direct LF and CR characters and therefore it's safer to change escape sequences directly into these characters to avoid wrong parsing of escape sequences later.

Mar 13 '18 17:03 dumblob

Thanks for the patch. It contains a bug which means that the very last record is omitted from the sqlite INSERT. You can see this happening with Pikmaster's input/output example above.

Suggested fix is to change the for loop on line 121 into:

for (i=1; i<arr_len+1; i++)

Jun 22 '18 09:06 jnn83

This has direct conflict with PR #71.

Mar 26 '20 20:03 vdechef

I'm more and more inclined to explicitly not supporting extended inserts for now.

If anyone will provide a comprehensive set of test cases and a carefully carved patch of a manageable size (not bigger than the current script itself :wink:), I'll review it for merging though.

Mar 26 '20 21:03 dumblob