pg_sample DBD::Pg::db do failed: ERROR:

I'm using PostgreSQL 9.2.15-1PGDG.rhel6 on a Centos 6.7 64 Bits.

When executing the command: ./pg_sample dbname --data-only --file=test.sql --limit="ja_admins = 1000"

I got the following error:

Creating table "_pg_sample"."pg_catalog_pg_statistic" DBD::Pg::db do failed: ERROR:  column "stavalues1" has pseudo-type "anyarray" at ./pg_sample line 296
    main::__ANON__('DBD::Pg::db do failed: ERROR:  column "stavalues1" has pseudo...', 'DBI::db=HASH(0xfdec08)', undef) called at ./pg_sample line 538
Dropping sample schema _pg_sample
Done.

Using --trace option:

)= undef at pg_sample line 538
    -> HandleError on DBI::db=HASH(0x289b660) via CODE(0x2858218) (undef)
DBD::Pg::db do failed: ERROR:  column "stavalues1" has pseudo-type "anyarray" at ./pg_sample line 296
    main::__ANON__('DBD::Pg::db do failed: ERROR:  column "stavalues1" has pseudo...', 'DBI::db=HASH(0x289b738)', undef) called at ./pg_sample line 538
       ERROR: 7 'ERROR:  column "stavalues1" has pseudo-type "anyarray"' (err#0)
    <- DESTROY(DBI::st=HASH(0x289bb70))= undef at Carp.pm line 45
Dropping sample schema _pg_sample
    !! The ERROR '7' was CLEARED by call to do method
    <- do('DROP SCHEMA _pg_sample CASCADE')= '0E0' at pg_sample line 713
Done.
!   <- DESTROY(DBI::db=HASH(0x289b738))= undef at pg_sample line 45 during global destruction

What can I do to solve this issue?

Mar 03 '16 03:03 lpossamai

Hi! Can you give me the part of your schema that involved the pseudo-type? I've never used pseudo-types before, so not sure what's happening.

If you have a dump of the schema, you can see where the stavalues1 is referenced?

Mar 03 '16 05:03 mla

Hi! Yep.. I can give it to you if I could find it....

I've got the schema but was unable to find it...

the error says:

Creating table "_pg_sample"."pg_catalog_pg_shseclabel" 0
Creating table "_pg_sample"."pg_catalog_pg_statistic" DBD::Pg::db do failed: ERROR:  column "stavalues1" has pseudo-type "anyarray" at ./pg_sample line 296
    main::__ANON__('DBD::Pg::db do failed: ERROR:  column "stavalues1" has pseudo...', 'DBI::db=HASH(0x24b77c8)', undef) called at ./pg_sample line 538
Dropping sample schema _pg_sample
Done.

Couldn't find anything related to "stavalues1" or "anyarray"

Do you have any tips?

Mar 03 '16 08:03 lpossamai

I see it trhough DBVISUALIZER:

Table pg_catalog.pg_statistic:

CREATE TABLE
    pg_statistic
    (
        starelid "OID" NOT NULL,
        staattnum SMALLINT NOT NULL,
        stainherit BOOLEAN NOT NULL,
        stanullfrac REAL NOT NULL,
        stawidth INTEGER NOT NULL,
        stadistinct REAL NOT NULL,
        stakind1 SMALLINT NOT NULL,
        stakind2 SMALLINT NOT NULL,
        stakind3 SMALLINT NOT NULL,
        stakind4 SMALLINT NOT NULL,
        stakind5 SMALLINT NOT NULL,
        staop1 "OID" NOT NULL,
        staop2 "OID" NOT NULL,
        staop3 "OID" NOT NULL,
        staop4 "OID" NOT NULL,
        staop5 "OID" NOT NULL,
        stanumbers1 REAL[],
        stanumbers2 REAL[],
        stanumbers3 REAL[],
        stanumbers4 REAL[],
        stanumbers5 REAL[],
        stavalues1 "ANYARRAY",
        stavalues2 "ANYARRAY",
        stavalues3 "ANYARRAY",
        stavalues4 "ANYARRAY",
        stavalues5 "ANYARRAY"
    );

Mar 03 '16 08:03 lpossamai

I don't know why it's trying to sample pg_catalog.* stuff. Try this patch. Make sure you use a test db. This is completely untested.

--- pg_sample.orig  2016-03-03 00:21:46.932012688 -0800
+++ pg_sample   2016-03-03 00:22:29.460012847 -0800
@@ -504,6 +504,8 @@
   my $sname = $row->{pg_schema} || unquote_identifier($row->{TABLE_SCHEM})
     or die "no pg_schema or TABLE_SCHEM value?!";

+  next if $sname eq 'pg_catalog';
+
   my $tname = $row->{pg_table} || unquote_identifier($row->{TABLE_NAME})
     or die "no pg_table or TABLE_NAME value?!";

Mar 03 '16 08:03 mla

@mla thanks.. The patch worked. *_But *_I got another error now =\

Error: Copying "dm"."invoices" ("invoice_id") rows referenced from "_pg_sample"."dm_bill_items" ("invoice_id")... DBD::Pg::db do failed: ERROR: could not identify an equality operator for type "json"

Table dm.invoices:

CREATE TABLE
    invoices
    (
        invoice_id BIGINT DEFAULT "nextval"('"dm"."invoices_invoice_id_seq"'::"regclass") NOT NULL,
        format_version CHARACTER VARYING(10) NOT NULL,
        ts_issue TIMESTAMP(6) WITH TIME ZONE NOT NULL,
        ts_ack TIMESTAMP(6) WITH TIME ZONE,
        customer_id BIGINT NOT NULL,
        code CHARACTER VARYING(32),
        tag CHARACTER VARYING(256),
        account_data "JSON" NOT NULL,
        customer_data "JSON" NOT NULL,
        invoice_data "JSON" NOT NULL,
        invoice_items_data "JSON"[],
        CONSTRAINT pk_invoices PRIMARY KEY (invoice_id),
        CONSTRAINT fk_customer_must_exist FOREIGN KEY (customer_id) REFERENCES
        geoop_uat_old.public.ja_customers (id),
        CONSTRAINT cc_ack_must_be_after_issue CHECK (COALESCE("ts_ack", "ts_issue") >= "ts_issue"),
        CONSTRAINT cc_ack_and_code_populated_together CHECK (("ts_ack" IS NULL) = ("code" IS NULL))
    );

Mar 03 '16 08:03 lpossamai

Must have an issue with json columns. I'll have to try and reproduce it. Late now, sorry, but will try to investigate tomorrow.

Mar 03 '16 08:03 mla

@mla you've helped a lot! It would be great if you can have a look on that issue for me (When u have time of course)

Thank you!

Mar 03 '16 09:03 lpossamai

@mla I was also looking for a --exclude-table-data= option.. is there one using pg_sample? Cheers

Mar 03 '16 19:03 lpossamai

There's no equality operator for json. The usual trick is to cast the json fields to jsonb if available, or text:

# select '{}'::json = '{}'::json;
ERROR:  operator does not exist: json = json
LINE 1: select '{}'::json = '{}'::json;

# select '{}'::json::text = '{}'::json::text;
 ?column? 
══════════
 t
(1 row)

You can unfortunately have some wrong result with json data casted to text, depending on how they where inserted:

# select '{"b": 1, "a": 2}'::json::text = '{"a": 2, "b":1}'::json::text;
 ?column? 
══════════
 f
(1 row)

# select '{"b": 1, "a": 2}'::jsonb = '{"a": 2, "b":1}'::jsonb;
 ?column? 
══════════
 t
(1 row)

Hope that helps.

Mar 03 '16 22:03 rjuju

@mla I was also looking for a --exclude-table-data= option.. is there one using pg_sample?

What would that do? Sorry, not following.

Mar 04 '16 00:03 mla

Looks like the JSON issue is related to the "SELECT DISTINCT" around line 620. It can't do the DISTINCT because it doesn't understand the equality. @lpossamai, try just removing that DISTINCT clause.

@@ -615,14 +617,14 @@
     # satisfy the fk table, except those already present.
     my $query = qq{
       INSERT INTO $target_sample_table
-           SELECT DISTINCT t1.*
+           SELECT t1.*
              FROM $target_table t1
                   JOIN $sample_fk_table f1 ON ($join1)
                   LEFT JOIN $target_sample_table s1 ON ($join2)
             WHERE $where
     };

Mar 04 '16 00:03 mla

@mla yep.. removing "DISTINCT" worked... Thanks for that!

about my other question:

on pg_dump docs:

--exclude-table=table Do not dump any tables matching the table pattern. The pattern is interpreted according to the same rules as for -t. -T can be given more than once to exclude tables matching any of several patterns.

source: http://www.postgresql.org/docs/9.3/static/app-pgdump.html

Is there any option on pg_sample that does the same thing? Couldn't find...

Mar 04 '16 00:03 lpossamai

well.. put the -limit="junk.* = 0" option.. it's fine. Thanks for your help @mla Can close this issue

Mar 04 '16 00:03 lpossamai

@mla

The dump has been successfully created.

Command: ./pg_sample dbname --data-only --file=all_data.sql --limit="public.* = 1000" --limit="dm.* = 1000" --limit="integrations.* = *" --limit="gorfs.* = 100" --limit="junk.* = 0" --verbose

I've just got the error below:

psql:all_data_second.sql:3404263: ERROR:  duplicate key value violates unique constraint "pk_billables"
DETAIL:  Key ("billable_id")=(17) already exists.
CONTEXT:  COPY billables, line 513
psql:all_data_second.sql:10586440: ERROR:  duplicate key value violates unique constraint "pk_bills"
DETAIL:  Key ("bill_id")=(764047) already exists.
CONTEXT:  COPY bills, line 525
psql:all_data_second.sql:10938107: ERROR:  duplicate key value violates unique constraint "pk_invoices"
DETAIL:  Key ("invoice_id")=(340990) already exists.
CONTEXT:  COPY invoices, line 239084
psql:all_data_second.sql:12336660: ERROR:  duplicate key value violates unique constraint "pk_tax_aggregates"
DETAIL:  Key ("tax_aggregate_id")=(634898) already exists.
CONTEXT:  COPY tax_aggregates, line 595
psql:all_data_second.sql:26545982: ERROR:  duplicate key value violates unique constraint "ja_clients_pkey"
DETAIL:  Key ("id")=(458) already exists.
CONTEXT:  COPY ja_clients, line 550
psql:all_data_second.sql:28810421: ERROR:  duplicate key value violates unique constraint "ja_customers_pkey"
DETAIL:  Key ("id")=(338401) already exists.
CONTEXT:  COPY ja_customers, line 505
psql:all_data_second.sql:28818328: ERROR:  duplicate key value violates unique constraint "ja_mobiusers_pkey"
DETAIL:  Key ("id")=(683341) already exists.
CONTEXT:  COPY ja_mobiusers, line 520
psql:all_data_second.sql:28821661: ERROR:  duplicate key value violates unique constraint "pk_role"
DETAIL:  Key ("id")=(1731) already exists.
CONTEXT:  COPY ja_role, line 514
psql:all_data_second.sql:28828033: ERROR:  duplicate key value violates unique constraint "spatial_ref_sys_pkey"
DETAIL:  Key ("srid")=(3819) already exists.
CONTEXT:  COPY spatial_ref_sys, line 1

So when I do the import - I don't get the data from that tables.. which is bad.

Any idea why? Cheers

Mar 04 '16 03:03 lpossamai

I bet it was the fact that we removed the DISTINCT. I'll investigate how to distinct w JSON columns.

On Thursday, March 3, 2016, lpossamai [email protected] wrote:

@mla https://github.com/mla

The dump has been successfully created.

Command: ./pg_sample dbname --exclude-table-data='junk.' --exclude-table-data='gorfs.' --data-only --file=public_dm_integrations_.sql --limit="public.* = 1000" --verbose

I've just got the error below:

psql:all_data_second.sql:3404263: ERROR: duplicate key value violates unique constraint "pk_billables" DETAIL: Key ("billable_id")=(17) already exists. CONTEXT: COPY billables, line 513 psql:all_data_second.sql:10586440: ERROR: duplicate key value violates unique constraint "pk_bills" DETAIL: Key ("bill_id")=(764047) already exists. CONTEXT: COPY bills, line 525 psql:all_data_second.sql:10938107: ERROR: duplicate key value violates unique constraint "pk_invoices" DETAIL: Key ("invoice_id")=(340990) already exists. CONTEXT: COPY invoices, line 239084 psql:all_data_second.sql:12336660: ERROR: duplicate key value violates unique constraint "pk_tax_aggregates" DETAIL: Key ("tax_aggregate_id")=(634898) already exists. CONTEXT: COPY tax_aggregates, line 595 psql:all_data_second.sql:26545982: ERROR: duplicate key value violates unique constraint "ja_clients_pkey" DETAIL: Key ("id")=(458) already exists. CONTEXT: COPY ja_clients, line 550 psql:all_data_second.sql:28810421: ERROR: duplicate key value violates unique constraint "ja_customers_pkey" DETAIL: Key ("id")=(338401) already exists. CONTEXT: COPY ja_customers, line 505 psql:all_data_second.sql:28818328: ERROR: duplicate key value violates unique constraint "ja_mobiusers_pkey" DETAIL: Key ("id")=(683341) already exists. CONTEXT: COPY ja_mobiusers, line 520 psql:all_data_second.sql:28821661: ERROR: duplicate key value violates unique constraint "pk_role" DETAIL: Key ("id")=(1731) already exists. CONTEXT: COPY ja_role, line 514 psql:all_data_second.sql:28828033: ERROR: duplicate key value violates unique constraint "spatial_ref_sys_pkey" DETAIL: Key ("srid")=(3819) already exists. CONTEXT: COPY spatial_ref_sys, line 1

So when I do the import - I don't get the data from that tables.. which is bad.

Any idea why? Cheers

— Reply to this email directly or view it on GitHub https://github.com/mla/pg_sample/issues/5#issuecomment-192073953.

Mar 04 '16 03:03 mla

I think we need to convert the DISTINCT to be DISTINCT ON (<PRIMARY_KEY>). I'll try to work on it tomorrow. Thanks for working through this.

Mar 04 '16 05:03 mla

That's great @mla - Thanks for your time and help...

Mar 04 '16 11:03 lpossamai

@lpossamai give this patch a try, please:

@@ -615,7 +617,8 @@ while ($num_rows) {
     # satisfy the fk table, except those already present.
     my $query = qq{
       INSERT INTO $target_sample_table
-           SELECT DISTINCT t1.*
+           SELECT DISTINCT ON ($target_cols)
+                  t1.*
              FROM $target_table t1
                   JOIN $sample_fk_table f1 ON ($join1)
                   LEFT JOIN $target_sample_table s1 ON ($join2)

Mar 05 '16 05:03 mla

@mla - Yep. That patch worked. No more errors.

I'm just trying to understand something:

I've used this command: ./pg_sample dbtest --file=all_data_3.sql --limit="public.* = 500" --limit="dm.* = 500" --limit="integrations.* = *" --limit="gorfs.* = 50" --limit="junk.* = 0" --verbose

This means that I'm trying to get 500 rows of all tables in "public" and "dm" schema, 50 rows in the "gorfs" schema and 0 rows on "junk" schema (right?).

But, after exporting and importing it to a new DB, when I do like:

select count(*) from public.ja_clients;

I get: 
count 
-------
  6249

and when I do the same command to another table into public schema:

select count(*) from public.ja_customers;
count 
-------
     0

Why does that happen? I should have 500 each.. shouldn't I?

The same happens on others schemas:

select count(*) from dm.bills;
  count  
---------
 1191268
(1 row)

Thanks for your prompt reply and help!

UPDATE: - Related to ja_customers:

I've just seen an error here: ( I got those errors when importing all the exported data - psql dbname -f filefromPG_SAMPLE.sql )

psql:all_data_schema.sql:21171292: ERROR:  value too long for type character varying(255)
CONTEXT:  COPY ja_customers, line 428054, column company: "NEW BLINDS: Supply & Delivery (46mm Wooden blinds x 4) Colour: TBC *Blinds OnlineâHome Brand* *..."
psql:all_data_schema.sql:21188979: ERROR:  duplicate key value violates unique constraint "spatial_ref_sys_pkey"
DETAIL:  Key ("srid")=(3819) already exists.
CONTEXT:  COPY spatial_ref_sys, line 1

table ja_customers:

CREATE TABLE
    ja_customers
    (
        id BIGINT DEFAULT "nextval"('"ja_customers_id_seq"'::"regclass") NOT NULL,
        clientid BIGINT DEFAULT 0 NOT NULL,
        name_first CHARACTER VARYING(80) DEFAULT ''::CHARACTER VARYING,
        name_last CHARACTER VARYING(80) DEFAULT ''::CHARACTER VARYING,
        company CHARACTER VARYING(255) DEFAULT ''::CHARACTER VARYING,
        businesstype CHARACTER VARYING(80) DEFAULT ''::CHARACTER VARYING,
        position CHARACTER VARYING(80) DEFAULT ''::CHARACTER VARYING,
        email CHARACTER VARYING(50) DEFAULT NULL::CHARACTER VARYING,
        address CHARACTER VARYING(250),
        suburb CHARACTER VARYING(250),
        city CHARACTER VARYING(80),
        phone CHARACTER VARYING(50) DEFAULT NULL::CHARACTER VARYING,
        mobile CHARACTER VARYING(40) DEFAULT NULL::CHARACTER VARYING,
        fax CHARACTER VARYING(40) DEFAULT NULL::CHARACTER VARYING,
        mailing_address CHARACTER VARYING(250),
        mailing_suburb CHARACTER VARYING(250),
        mailing_city CHARACTER VARYING(80),
        regdate BIGINT DEFAULT 0 NOT NULL,
        notes "TEXT" DEFAULT ''::"text",
        gps_lat NUMERIC(15,10),
        gps_long NUMERIC(15,10),
        hourlyrate NUMERIC(6,2) DEFAULT 0 NOT NULL,
        onaccount SMALLINT DEFAULT 0 NOT NULL,
        calc_mobile CHARACTER VARYING(20),
        country_code NUMERIC(10,1),
        client_code CHARACTER VARYING(50),
        geocode_accuracy SMALLINT DEFAULT 0,
        crm_accountid CHARACTER VARYING(20),
        crm_update BOOLEAN,
        deleted BOOLEAN DEFAULT false NOT NULL,
        address_bldg CHARACTER VARYING(150),
        schedule_key CHARACTER VARYING(255),
        schedule_key_timestamp INTEGER,
        login CHARACTER VARYING(30) DEFAULT ''::CHARACTER VARYING NOT NULL,
        pass CHARACTER VARYING(50) DEFAULT ''::CHARACTER VARYING NOT NULL,
        default_bill_customer BIGINT,
        url CHARACTER VARYING(64),
        postcode CHARACTER VARYING(20),
        c_invoice_count INTEGER,
        mailing_postcode CHARACTER VARYING(20),
        created_date TIMESTAMP(6) WITHOUT TIME ZONE DEFAULT "statement_timestamp"(),
        modified_date TIMESTAMP(6) WITHOUT TIME ZONE DEFAULT "statement_timestamp"(),
        created_userid INTEGER,
        modified_userid INTEGER,
        stpr CHARACTER VARYING(80),
        mailing_stpr CHARACTER VARYING(80),
        is_demo BOOLEAN DEFAULT false NOT NULL,
        tax_aggregate_id_cost BIGINT,
        tax_aggregate_id_price BIGINT,
        job_share_mode SMALLINT DEFAULT 0 NOT NULL,
        PRIMARY KEY (id),
        CONSTRAINT fk_tax_aggregate_cost_must_exist FOREIGN KEY (tax_aggregate_id_cost) REFERENCES
        dbname_uat.dm.tax_aggregates (tax_aggregate_id),
        CONSTRAINT fk_tax_aggregate_price_must_exist FOREIGN KEY (tax_aggregate_id_price)
        REFERENCES dbname_uat.dm.tax_aggregates (tax_aggregate_id),
        CONSTRAINT fk_account_must_exist FOREIGN KEY (clientid) REFERENCES ja_clients (id),
        CONSTRAINT fk_default_bill_customer_must_exist FOREIGN KEY (default_bill_customer)
        REFERENCES ja_customers (id),
        CONSTRAINT cc_gps_lat_and_long_must_come_together CHECK (("gps_long" IS NULL) = ("gps_lat"
        IS NULL))
    );

Some other information: -bash-4.1$ ./pg_sample dbname_uat --file=all_data_schema.sql --limit="public.* = 500" --limit="dm.* = 500" --limit="integrations.* = " --limit="gorfs. = 50" --limit="junk.* = 0" --verbose

Server encoding is UTF8
Client encoding is UTF8
Exporting schema
Creating sample schema _pg_sample
[limit] public.* = 500
[limit] dm.* = 500
[limit] integrations.* = *
[limit] gorfs.* = 50
[limit] junk.* = 0
[limit] .* = 100

[...]

Creating table "_pg_sample"."public_ja_customers" 500

[...]

Copying "public"."ja_customers" ("id") rows referenced from "_pg_sample"."public_ja_customers" ("default_bill_customer")... 119 rows
Copying "public"."ja_customers" ("id") rows referenced from "_pg_sample"."dm_billables_links" ("customer_id")... 183 rows
Copying "public"."ja_customers" ("id") rows referenced from "_pg_sample"."integrations_customers" ("customer_id")... 2007308 rows
Copying "public"."ja_customers" ("id") rows referenced from "_pg_sample"."dm_invoices" ("customer_id")... 18133 rows
Copying "public"."ja_customers" ("id") rows referenced from "_pg_sample"."dm_bills" ("customer_id")... 55745 rows
Copying "public"."ja_customers" ("id") rows referenced from "_pg_sample"."public_ja_jobs" ("customerid")... 304 rows
Copying "public"."ja_customers" ("id") rows referenced from "_pg_sample"."public_ja_jobs" ("bill_customer")... 11 rows

[...]

Mar 06 '16 01:03 lpossamai

@lpossamai, you should never get an empty table, unless you request 0 rows.

Getting more rows than requested is definitely not uncommon though. That will be dictated by the foreign key constraints. e.g., you may request only 500 rows from table A, but if you include 10k rows from table B, and each row from table B has a FK to table A, you'll end up with however many rows from table A were needed to satisfy table B's FKs.

Regarding the "psql:all_data_schema.sql:21171292: ERROR: value too long for type character varying(255)" error, how did you create the new DB that you're loading into? I see that the existing DB is using UTF8. Is the new DB using UTF8 too? Looking at the error message, that string looks messed up. Like it's treating the UTF8 value as Latin1. And that could cause that error... the multi-byte UTF8 characters are being interpreted as single byte characters, causing the string to end up longer and exceeding the column width.

Mar 06 '16 04:03 mla

@lpossamai try this patch, please:

--- a/pg_sample
+++ b/pg_sample
@@ -192,6 +192,8 @@ use Getopt::Long qw/ GetOptions :config no_ignore_case /;
 use DBI;
 use DBD::Pg 2.0.0;

+binmode STDOUT, ':utf8';
+
 $SIG{TERM} = $SIG{INT} = $SIG{QUIT} = $SIG{HUP} = sub {
   my $signal = shift;
   die "Received signal '$signal'; cleaning up and terminating.\n";

It's not a real fix, but I'm wondering if we need to explicitly set STDOUT to use UTF8 encoding. Try running it again and see if you still get that "value too long" error.

Mar 06 '16 06:03 mla

Hrm. I see we already set the encoding of STDOUT, so I doubt that change will do anything. What value of LANG do you have in your shell?

Mar 06 '16 07:03 mla

Hi @mla

My new DB is UTF-8 as well...

Shell = LANG=en_US.UTF-8

I'm currently testing the changes you've told me before about the STDOUT...

It's copying now.. once it's done let u know.. thanks!

Mar 06 '16 07:03 lpossamai

When you view that row containing "NEW BLINDS: Supply & Delivery..." in psql, what does it look like? Do you see those messed up characters or something else?

Mar 06 '16 07:03 mla

I don't remember if I saw that during the pg_sample or the importing (using pgsql command)

So.. I'm doing it again now and will pay attention to see if I get any errors.. shouldn't take too long..

Mar 06 '16 07:03 lpossamai

@mla Yep.. still getting the errors when importing using psql:

psql:all_data_3.sql:21171222: ERROR:  value too long for type character varying(255)
CONTEXT:  COPY ja_customers, line 428054, column company: "NEW BLINDS: Supply & Delivery (46mm Wooden blinds x 4) Colour: TBC *Blinds OnlineâHome Brand* *..."
psql:all_data_3.sql:21188908: ERROR:  duplicate key value violates unique constraint "spatial_ref_sys_pkey"
DETAIL:  Key ("srid")=(3819) already exists.
CONTEXT:  COPY spatial_ref_sys, line 1

Mar 06 '16 07:03 lpossamai

k, thanks. So in the source db, what does that row look like? Does it show up as "Blinds Onlineâ��Home Brand* *" there too?

Mar 06 '16 07:03 mla

If I could find it.. But I can't hahahaha =\

Mar 06 '16 07:03 lpossamai

SELECT company FROM ja_customers WHERE company LIKE 'NEW BLINDS%'

?

Mar 06 '16 07:03 mla

yep... thanks!

NEW BLINDS: Supply & Delivery (46mm Wooden blinds x 4) Colour: TBC Blinds Online–Home Brand * Doesn't includes check measure & professional install visits. * Please allow 15–18 working days for blind manufacture and del. * Freight included * 3 YEAR WA

Mar 06 '16 07:03 lpossamai

ok, good. So it seems like there's some encoding issue going on.

We're sure the encoding of the source DB is UTF8? "psql -l" shows what for the source db?

Mar 06 '16 07:03 mla

And that string is exactly 255 chars. So the theory about the multibyte being interpreted as individual chars still makes sense then.

Mar 06 '16 08:03 mla

Source DB:

Dst DB: test_lucas2 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8

Mar 06 '16 08:03 lpossamai

And if you grep the pg_sample output file for that string, you see the screwed up version, yes?

grep "NEW BLINDS" all_data_3.sql

Mar 06 '16 08:03 mla

yep....

2221550 8948    NBâSupply & Delivery   NEW BLINDS: Supply & Delivery\n(25mm Venetian blinds x ?)\n(50mm Venetian blinds x ?)\n(??mm Vertical blinds x ?)\nDay/Night Dual Blind Systems x ?\n(Blockout Rollerblinds x ?)\n(Thermal Blockout Rollerblinds x ?)\n(5% Sunscreen blinds x ?)\n(46mm Wooden blinds x ?)\n(46mm Cedar blinds x ?)\n(50mm Cedar blind  \NUnspecified Units 0.0000  \N  0.0000  t   151029  2015-07-11 14:00:42.914421+00   2015-07-11 14:00:42.914421+00   \N  f
2221570 8948    NBâSupply & Install    NEW BLINDS: Supply & Install\n(25mm Venetian blinds x ?)\n(50mm Venetian blinds x ?)\n(??mm Vertical blinds x ?)\nDay/Night Dual Blind Systems x ?\n(Blockout Roller blinds x ?)\n(Thermal Blockout Roller blinds x ?)\n(5% Sunscreen blinds x ?)\n(46mm Wooden blinds x ?)\n(46mm Cedar blinds x ?)\n(50mm Cedar blin  \NUnspecified Units 0.0000  \N  0.0000  t   151027  2015-07-11 14:00:42.914421+00   2015-07-11 14:00:42.914421+00   \N  f
2221572 8948    SUP/DEL NEW BLINDS >\nFor supply and delivery of...\n\nMeasurements by:\n(width x drop)(mm)(control side)\n\n\n(25mm Venetian blinds x ?)\n(50mm Venetian blinds x ?)\n(??mm Vertical blinds x ?)\n(Blockout Roller blinds x ?)\n(5% Sunscreen blinds x ?)\n(46mm Cedar blinds x ?)\n(50mm Cedar blinds x ?)\n(50mm Wood Vision bl  \N  Unspecified Units   0.0000  \N  0.0000  t   151027  2015-07-11 14:00:42.914421+00   2015-07-11 14:00:42.914421+00   \N  f

Mar 06 '16 08:03 lpossamai

What perl is being used?

/usr/bin/env perl -v

Mar 06 '16 08:03 mla

This is perl, v5.10.1 (*) built for x86_64-linux-thread-multi

Mar 06 '16 08:03 lpossamai

You might try a more recent perl. I'm using 5.18 for this testing, which is itself fairly old now.

You can use the "plenv" tool to install local versions of perl. See: https://github.com/tokuhirom/plenv

Mar 06 '16 08:03 mla

What version of DBD::Pg do you have installed?

perl -MDBD::Pg -le 'print DBD::Pg->VERSION'

Mar 06 '16 08:03 mla

ok so.. perl has been updated to 5.18

and DBD - 3.5.3

I'll do the process again now... let u know once it's done

Mar 06 '16 08:03 lpossamai

Crashing for tonight. I was reading that perl 5.10 was pretty buggy, so maybe there were some unicode issues with it.

Mar 06 '16 09:03 mla

@mla updating the Perl I haven't got the error anymore.. so.. PASS

I just got this error when importing the file:

psql:all_data_3.sql:21188918: ERROR:  duplicate key value violates unique constraint "spatial_ref_sys_pkey"
DETAIL:  Key ("srid")=(3819) already exists.
CONTEXT:  COPY spatial_ref_sys, line 1

Any idea?

Mar 06 '16 09:03 lpossamai

Excellent. Can you show me table definition for spatial_ref_sys including all constraints?

On Sunday, March 6, 2016, lpossamai [email protected] wrote:

@mla https://github.com/mla updating the Perl I haven't got the error anymore.. so.. PASS

I just got this error when importing the file:

psql:all_data_3.sql:21188918: ERROR: duplicate key value violates unique constraint "spatial_ref_sys_pkey" DETAIL: Key ("srid")=(3819) already exists. CONTEXT: COPY spatial_ref_sys, line 1

Any idea?

— Reply to this email directly or view it on GitHub https://github.com/mla/pg_sample/issues/5#issuecomment-192851371.

Mar 06 '16 09:03 mla

Sure

CREATE TABLE
    spatial_ref_sys
    (
        srid INTEGER NOT NULL,
        auth_name CHARACTER VARYING(256),
        auth_srid INTEGER,
        srtext CHARACTER VARYING(2048),
        proj4text CHARACTER VARYING(2048),
        PRIMARY KEY (srid),
        CONSTRAINT spatial_ref_sys_srid_check CHECK (("srid" > 0)
    AND (
            "srid" <= 998999))
    );

Mar 06 '16 09:03 lpossamai

How many tables have foreign keys to that table? Wondering if it's the DISTICT issue again.

On Sunday, March 6, 2016, lpossamai [email protected] wrote:

Sure

CREATE TABLE spatial_ref_sys ( srid INTEGER NOT NULL, auth_name CHARACTER VARYING(256), auth_srid INTEGER, srtext CHARACTER VARYING(2048), proj4text CHARACTER VARYING(2048), PRIMARY KEY (srid), CONSTRAINT spatial_ref_sys_srid_check CHECK (("srid" > 0) AND ( "srid" <= 998999)) );

— Reply to this email directly or view it on GitHub https://github.com/mla/pg_sample/issues/5#issuecomment-192858149.

Mar 06 '16 09:03 mla

Sorry @mla - Any idea how can I test that? I'm not a postgres fan...

Mar 06 '16 20:03 lpossamai

@lpossamai pull my latest version from the 1.10 branch, if you would: https://github.com/mla/pg_sample/tree/release/1.10

I added additional logging to --verbose mode to list all the foreign keys.

If you could, run it again with --verbose option, and upload the stderr output here (or email to me at [email protected]).

Mar 06 '16 20:03 mla

@mla - yep.. doing it now. - Thanks for your help!

Mar 06 '16 21:03 lpossamai

Sure thing :) Thanks for your patience. We're finding a lot of interesting issues that haven't come up before.

Do you know if any of your tables use inheritance? That's an area I haven't tested before.

I have to run out for a few hours but will check in later tonight and see if we can figure out what's going on here. I think we're close.

Mar 06 '16 22:03 mla

Yeah! That's great @mla - You've been patient as well :P

Don't think we use inheritance - But there are lots of tables.. If u know somehow to see from all of them, I can test....

I've imported the file created by pg_sample and it works fine.... So to solve the last error we have time...

I have deleted the table description to keep the security

Mar 06 '16 22:03 lpossamai

That's strange.... I see no reference to "spatial" or "srid" in any of the foreign keys. So how are we getting the duplicate row?

Just as a sanity check, could you run this query?

SELECT srid, count(*) AS num_rows FROM spatial_ref_sys GROUP BY srid HAVING count(*) > 1;

We're expecting that to return nothing, since srid is the primary key.

Mar 07 '16 01:03 mla

yep... 0 rows .. lol

Mar 07 '16 01:03 lpossamai

Could you "grep -i inherits" the pg_sample output file? That's the only other sort of thing I could see causing this, if in fact the spatial_ref_sys table isn't participating in any foreign keys.

Mar 07 '16 02:03 mla

Yeah.. got nothing.. only comments...

I think will do this way and see if I get any issue after the DB is imported...

Thanks @mla

Mar 07 '16 02:03 lpossamai

@lpossamai ok. If you are comfortable emailing me your full schema, I'll see if I can discover anything.

You could also try grepping the pg_sample output file for that partial PK value, 3819, just to prove it actually is being duplicated. Although that value is pretty short so you may get a lot of false positives.

Mar 07 '16 02:03 mla

awesome.. thank you so much @mla !

Mar 07 '16 02:03 lpossamai

Are u dropping the dest. db prior to retrying the import?

On Sunday, March 6, 2016, lpossamai [email protected] wrote:

awesome.. thank you so much @mla https://github.com/mla !

— Reply to this email directly or view it on GitHub https://github.com/mla/pg_sample/issues/5#issuecomment-193051590.

Mar 07 '16 02:03 mla

yep... @mla

Mar 07 '16 02:03 lpossamai

Aha! It's from postgis: http://postgis.refractions.net/documentation/manual-1.4/ch04.html#spatial_ref_sys

So, I'm not sure if pg_sample should even be sampling those tables? I haven't worked with postgis...

Mar 07 '16 04:03 mla

k, so I think that's it. There's this line from the schema definition:

CREATE EXTENSION IF NOT EXISTS postgis WITH SCHEMA public;

That adds the postgis support, and as part of that, it creates and populates the spatial_ref_sys table. So when our sampled content from that table is loaded, you get this duplicate key error.

So now I need to investigate if there's a way to differentiate between tables created via extensions vs. normal tables.

Mar 07 '16 04:03 mla