pdns icon indicating copy to clipboard operation
pdns copied to clipboard

Records duplication when changing them concurrently via web API

Open pracj3am opened this issue 7 years ago • 13 comments

  • Program: Authoritative
  • Issue type: Bug report

Short description

Concurrent web API requests PATCH /servers/{server_id}/zones/{zone_id}, results in records duplication. Both updated record and SOA record is duplicated.

Environment

  • Operating system: Linux Debian Stretch
  • Software version: 4.0.3-1+deb9u2
  • Software source: Debian repository

Steps to reproduce

  1. pdns authoritative server with web API enabled and gpgsql module enabled, otherwise the default pdns configuration.
  2. postgresql server runs on the same host machine
for i in {1..5}; do
  curl -X PATCH "http://127.1/api/v1/servers/localhost/zones/test.com" \
    -H "accept: application/json" -H "X-API-Key: secret" -H "Content-Type: application/json" \
    -d "{ \"rrsets\": [ { \"name\": \"a.test.com.\", \"type\": \"CNAME\", \"ttl\": 3600, \"changetype\": \"REPLACE\", \"records\": [ { \"content\": \"test.com.\", \"disabled\": false, \"set-ptr\": false} ] } ] }" &
done

Expected behaviour

Expected results of postresql queries

pdns=# select type, content from records where domain_id = 1 and name = 'a.test.com' and type='CNAME';
 type  |  content   
-------+-----------
 CNAME | test.com
(1 rows)
pdns=# select type, content from records where domain_id = 1 and type='SOA';
type |                          content                          
-----+-----------------------------------------------------------
 SOA | ns.test.com admin.test.com 2018103164 10800 180 604800 60
(1 rows)

Actual behaviour

pdns=# select type, content from records where domain_id = 1 and name = 'a.test.com' and type='CNAME';
 type  |  content   
-------+-----------
 CNAME | test.com
 CNAME | test.com
 CNAME | test.com
 CNAME | test.com
(4 rows)
pdns=# select type, content from records where domain_id = 1 and type='SOA';
type |                          content                          
-----+-----------------------------------------------------------
 SOA | ns.test.com admin.test.com 2018103164 10800 180 604800 60
 SOA | ns.test.com admin.test.com 2018103164 10800 180 604800 60
 SOA | ns.test.com admin.test.com 2018103164 10800 180 604800 60
 SOA | ns.test.com admin.test.com 2018103164 10800 180 604800 60
(4 rows)

Other information

There should be a transaction in PacketHandler::performUpdate

BEGIN;
SELECT * FROM records WHERE type='CNAME' AND name='a.test.com' AND domain_id=1;
DELETE FROM records WHERE type='CNAME' AND name='a.test.com' AND domain_id=1;
INSERT ...;
COMMIT;

with repeatable read isolation level, but I found no transaction in PacketHandler::performUpdate or GSQLBackend::replaceRRSet.

pracj3am avatar Nov 01 '18 11:11 pracj3am

Thanks for this, do you have the possibility/chance to test this in 4.1?

pieterlexis avatar Nov 01 '18 12:11 pieterlexis

Yes, I am able to reproduce it also with version 4.1.4-1.

pracj3am avatar Nov 01 '18 12:11 pracj3am

I found no transaction in PacketHandler::performUpdate or GSQLBackend::replaceRRSet

In case anyone else is wondering, PacketHandler::performUpdate() is called from PacketHandler::processUpdate() after a transaction has been started and before it's committed or rollback-ed.

rgacogne avatar Nov 02 '18 09:11 rgacogne

The patchZone() function also starts a transaction before doing anything else, weird..

rgacogne avatar Nov 02 '18 09:11 rgacogne

For PostgeSQL you need to set repeatable read isolation level for this transaction, otherwise the delete queries are just ignored, if a concurrent transaction deletes the same rows.

pracj3am avatar Nov 02 '18 11:11 pracj3am

Note this important bit on repeatable read: "Applications using this level must be prepared to retry transactions due to serialization failures"

zeha avatar Feb 15 '19 20:02 zeha

4.2.0-rc3 has much more correct transaction handling. Can you try to reproduce your issue on it?

Habbie avatar Aug 13 '19 16:08 Habbie

Closing this in favour of #8374 which seems identical and has more recent activity.

Habbie avatar Oct 03 '19 09:10 Habbie

Sorry for the late reply. I am still able to reproduce it on 4.2.0-rc3 with postgres 11.

pracj3am avatar Oct 03 '19 12:10 pracj3am

So, we're clearly not using enough "locking" here, even with startTransaction etc. As said by @pracj3am the gpgsqlbackend could try with repeatable read; gmysqlbackend should maybe take a lock on the domain record (not sure about the exact semantics); gsqlite might be implicitly okay by sqlite's write rules?

zeha avatar Oct 03 '19 12:10 zeha

Wouldn't it be better to find a solution whick works with all backends? I.e. having the locking inside PDNS? Ie if there is a modification on a zone's RRs, create a shared lock based on zone+label+type, then delete/insert, then unlock. I think there are already plenty of locks in PDNS itself.

klaus3000 avatar Oct 03 '19 12:10 klaus3000

Wouldn't it be better to find a solution whick works with all backends? I.e. having the locking inside PDNS?

That doesn't help pdns_server vs. pdnsutil, or two pdns_server instances talking to the same database, etc.

Habbie avatar Oct 03 '19 12:10 Habbie

The following dockerfile reproduces the issue.

FROM debian:buster

RUN apt update && apt upgrade -y && apt install --no-install-recommends -y \
        ca-certificates \
        git \
        libpq-dev \
        build-essential \
        postgresql-11 \
        autoconf \
        automake \
        libtool \
        pkg-config \
        ragel \
        bison \
        flex \
        virtualenv \
        libboost-all-dev \
        libssl-dev \
        curl \
        jq \
        && apt clean && rm -rf /var/lib/apt/lists/*

RUN git clone -b rel/auth-4.2.x https://github.com/PowerDNS/pdns && \
    cd pdns && \
    autoreconf -vi && \
    ./configure --with-modules=gpgsql --without-lua --disable-lua-records && \
    make && make install

RUN \
    echo '\n\
api=yes\n\
api-key=1\n\
webserver=yes\n\
webserver-address=0.0.0.0\n\
webserver-port=80\n\
launch=gpgsql\n\
gpgsql-host=127.0.0.1\n\
gpgsql-dbname=dns\n\
gpgsql-user=pdns\n\
gpgsql-password=2\n' \
 >/usr/local/etc/pdns.conf 

RUN \
    echo 'curl -s -X PATCH "http://127.1/api/v1/servers/localhost/zones/example.com" -H "accept: application/json" -H "X-API-Key: 1" -H "Content-Type: application/json" -d "{ \"rrsets\": [ { \"name\": \"a.example.com.\", \"type\": \"CNAME\", \"ttl\": 3600, \"changetype\": \"REPLACE\", \"records\": [ { \"content\": \"example.com.\", \"disabled\": false, \"set-ptr\": false} ] } ] }"' >/patch && \
    chmod +x /patch && \
    echo 'curl -s "http://127.1/api/v1/servers/localhost/zones/example.com" -H "accept: application/json" -H "X-API-Key: 1" -H "Content-Type: application/json" | jq ".rrsets | map(select(.name==\"a.example.com.\"))[0].records"' >/test && \
    chmod +x /test && \
    echo 'curl -s "http://127.1/api/v1/servers/localhost/zones/example.com" -H "accept: application/json" -H "X-API-Key: 1" -H "Content-Type: application/json" | jq ".rrsets | map(select(.type==\"SOA\"))[0].records"' >/test2 && \
    chmod +x /test2

RUN \
    /etc/init.d/postgresql start && \
    su postgres -c 'psql <<<"create database dns"' && \
    su postgres -c "psql <<<\"create role pdns with login password '2'\"" && \
    su postgres -c "psql dns </pdns/modules/gpgsqlbackend/schema.pgsql.sql" && \
    su postgres -c "psql dns <<<\"grant all on all tables in schema public to pdns\"" && \
    su postgres -c "psql dns <<<\"grant all on all sequences in schema public to pdns\"" && \
    /usr/local/sbin/pdns_server --daemon && \
    curl -s -X POST "http://127.1/api/v1/servers/localhost/zones" -H "accept: application/json" -H "X-API-Key: 1" -H "Content-Type: application/json" -d '{ "name": "example.com.", "kind": "Native", "nameservers": ["ns.example.com."]}' && \
    /patch && \
    pkill pdns_server && \
    /etc/init.d/postgresql stop


ENTRYPOINT /etc/init.d/postgresql start && \
    /usr/local/sbin/pdns_server >/var/log/pdns.log 2>&1 & sleep 5 && \ 
    /bin/bash -c "for i in {1..5}; do for j in {1..10}; do /patch & done; sleep 1; done" && \
    /test && /test2

pracj3am avatar Oct 03 '19 16:10 pracj3am