RMySQL icon indicating copy to clipboard operation
RMySQL copied to clipboard

dbWriteTable into utf-8 MySQL DB from windows

Open jfdesomzee opened this issue 8 years ago • 9 comments

Hello,

I'm trying to export a table from R into a MySQL database with encoding utf-8 using dbWriteTable (I'm working with R-Studio and RmySQL 0.10.9). But I get uncorrect results. Word having accent or other special character are truncated as the first accent is encounter. E.G.: Céline --> C It does the same thing as I would get importing in workbench when the file is in latin1 and I specify it as UTF-8.

Here is my code:

require(RMySQL) con = dbConnect(MySQL(), user=USER, password=PASSWORD_DEV, dbname=DBNAME, host=HOST)

Some text file with default encoding

accent_ANSI<-read.table("D:/5-Testing/Accent/Accent_ANSI.txt",header=TRUE,fileEncoding="latin1")

Test Céline Julie Jérôme

Some text file encoded in UTF-8

accent_UTF8<-read.table("D:/5-Testing/Accent/Accent_UTF-8.txt",header=TRUE,fileEncoding="UTF-8")

DBI::dbGetQuery(con, "show variables like 'character_set_%'")

         Variable_name                                     Value

1 character_set_client latin1 2 character_set_connection latin1 3 character_set_database utf8 4 character_set_filesystem binary 5 character_set_results latin1 6 character_set_server latin1 7 character_set_system utf8 8 character_sets_dir /rdsdbbin/mysql-5.6.27.R1/share/charsets/

dbWriteTable(con, value = accent_ANSI,name = "default_ANSI",row.names=FALSE,overwrite=TRUE ) dbWriteTable(con, value = accent_UTF8,name = "default_UTF8",row.names=FALSE,overwrite=TRUE )

In both case I end up with

Test C Julie J

So I guess R is working in latin1 under windows and when I'm sending the table using dbWriteTable I should specify somewhere that I'm sending latin1.

What I've tried so far:

  • Setting R so that it is working in UTF-8, but the system won't let me. From what I've read, changing this might not be so easy and so recommended.

Sys.getlocale() [1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252" Sys.setlocale("LC_ALL", 'en_US.UTF-8') Warning message: In Sys.setlocale("LC_ALL", "en_US.UTF-8") : OS reports request to set locale to "en_US.UTF-8" cannot be honored

  • Change the connection properties

rs <- dbSendQuery(con, 'set character set utf8') DBI::dbGetQuery(con, "show variables like 'character_set_%'") Variable_name Value 1 character_set_client utf8 2 character_set_connection utf8 3 character_set_database utf8 4 character_set_filesystem binary 5 character_set_results utf8 6 character_set_server latin1 7 character_set_system utf8 8 character_sets_dir /rdsdbbin/mysql-5.6.27.R1/share/charsets/

It does not change the result but tend to crash SQL Error [1412] [HY000]: Table definition has changed, please retry transaction java.sql.SQLException: Table definition has changed, please retry transaction

It sounds similar to #57, #93

Thanks in advance,

Jef

jfdesomzee avatar Mar 01 '17 11:03 jfdesomzee

Converting encoring before the export does not seem to work either

accent_UTF8_enc<-mutate(accent_UTF8,newchar=enc2utf8(Test)) dbWriteTable(con, value = accent_UTF8_enc,name = "enc_UTF8")

image

jfdesomzee avatar Mar 21 '17 14:03 jfdesomzee

Got the same issue. Trying to write UTF-8 data to MySQL with "utf8_unicode_ci" collation and after "ö, ä,ü" its cuts off the words.

results <- dbWriteTable(con, name = "Mysqltable", value = finalTable, overwrite = FALSE, row.names = FALSE, append = T, fileEncoding="UTF-8")

some month ago I had the same problem, but its magically disappeared. Maybe through an fixing update which was forgotten to keep in new versions?

al13nus avatar Mar 23 '17 20:03 al13nus

LOL, found how to fix. I just updated the "DBI" package and restartet R. Then it works. Maybe RMySQL is using an older version of DBI or I dont know, but it can work.

Try this, load DBI before RMySQL:

list.of.packages <- c("DBI","RMySQL") new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])] if(length(new.packages)) install.packages(new.packages)

library(DBI, quietly = T) library(RMySQL, quietly = T)

al13nus avatar Mar 23 '17 20:03 al13nus

Does not change anything for me... It worked briefly installing the dev version of RMySQL but now it is not working anymore.

jfdesomzee avatar Mar 24 '17 16:03 jfdesomzee

It worked on Windows 10 for me and still works. After trying on Ubuntu, hat still the same issue.

al13nus avatar Mar 25 '17 17:03 al13nus

I got the same issue. Have you found the solutions to this problem? I would very much appreciate it if you could share it with me

JacobZHANG2015 avatar Aug 31 '17 15:08 JacobZHANG2015

Not sure anymore, try creating data tables with "utf8_general_ci" encoding. Since then I never had problems with writing UTF-8 to MySQL.

al13nus avatar Aug 31 '17 15:08 al13nus

I have searched on the internet and found one way to bypass the encoding issue. We can use write.csv() to convert the desired data.frame to UTF-8 encoding and then use dbWritetable() to import the csv file into MySQL directly. For example,

dbWriteTable(conn, name='test', value = "tmp.csv", fileEncoding = "UTF-8", overwrite = T, sep = ",")

However, I get another issue and have no idea how to solve it.

Error in read.table(value, sep = sep, header = header, skip = skip, nrows = nrows, : more columns than column names

Could you please give me some advice to solve this issue?

JacobZHANG2015 avatar Aug 31 '17 15:08 JacobZHANG2015

first of all: start using fread(). much faster and there you fread("link.csv", encode = "UTF-8").

for writing to database I'm doing it like this:

dbWriteTable(con, name = "Database", value = data, overwrite = FALSE, row.names = FALSE, append = TRUE).

like I sad months ago, update all packages and maybe delete the old ones. Had problems with Rcpp because some packages used the old version instead of the new one. hmm

al13nus avatar Aug 31 '17 19:08 al13nus