Records duplication when changing them concurrently via web API
- Program: Authoritative
- Issue type: Bug report
Short description
Concurrent web API requests PATCH /servers/{server_id}/zones/{zone_id}, results in records duplication. Both updated record and SOA record is duplicated.
Environment
- Operating system: Linux Debian Stretch
- Software version: 4.0.3-1+deb9u2
- Software source: Debian repository
Steps to reproduce
- pdns authoritative server with web API enabled and gpgsql module enabled, otherwise the default pdns configuration.
- postgresql server runs on the same host machine
for i in {1..5}; do
curl -X PATCH "http://127.1/api/v1/servers/localhost/zones/test.com" \
-H "accept: application/json" -H "X-API-Key: secret" -H "Content-Type: application/json" \
-d "{ \"rrsets\": [ { \"name\": \"a.test.com.\", \"type\": \"CNAME\", \"ttl\": 3600, \"changetype\": \"REPLACE\", \"records\": [ { \"content\": \"test.com.\", \"disabled\": false, \"set-ptr\": false} ] } ] }" &
done
Expected behaviour
Expected results of postresql queries
pdns=# select type, content from records where domain_id = 1 and name = 'a.test.com' and type='CNAME';
type | content
-------+-----------
CNAME | test.com
(1 rows)
pdns=# select type, content from records where domain_id = 1 and type='SOA';
type | content
-----+-----------------------------------------------------------
SOA | ns.test.com admin.test.com 2018103164 10800 180 604800 60
(1 rows)
Actual behaviour
pdns=# select type, content from records where domain_id = 1 and name = 'a.test.com' and type='CNAME';
type | content
-------+-----------
CNAME | test.com
CNAME | test.com
CNAME | test.com
CNAME | test.com
(4 rows)
pdns=# select type, content from records where domain_id = 1 and type='SOA';
type | content
-----+-----------------------------------------------------------
SOA | ns.test.com admin.test.com 2018103164 10800 180 604800 60
SOA | ns.test.com admin.test.com 2018103164 10800 180 604800 60
SOA | ns.test.com admin.test.com 2018103164 10800 180 604800 60
SOA | ns.test.com admin.test.com 2018103164 10800 180 604800 60
(4 rows)
Other information
There should be a transaction in PacketHandler::performUpdate
BEGIN;
SELECT * FROM records WHERE type='CNAME' AND name='a.test.com' AND domain_id=1;
DELETE FROM records WHERE type='CNAME' AND name='a.test.com' AND domain_id=1;
INSERT ...;
COMMIT;
with repeatable read isolation level, but I found no transaction in PacketHandler::performUpdate or GSQLBackend::replaceRRSet.
Thanks for this, do you have the possibility/chance to test this in 4.1?
Yes, I am able to reproduce it also with version 4.1.4-1.
I found no transaction in PacketHandler::performUpdate or GSQLBackend::replaceRRSet
In case anyone else is wondering, PacketHandler::performUpdate() is called from PacketHandler::processUpdate() after a transaction has been started and before it's committed or rollback-ed.
The patchZone() function also starts a transaction before doing anything else, weird..
For PostgeSQL you need to set repeatable read isolation level for this transaction, otherwise the delete queries are just ignored, if a concurrent transaction deletes the same rows.
Note this important bit on repeatable read: "Applications using this level must be prepared to retry transactions due to serialization failures"
4.2.0-rc3 has much more correct transaction handling. Can you try to reproduce your issue on it?
Closing this in favour of #8374 which seems identical and has more recent activity.
Sorry for the late reply. I am still able to reproduce it on 4.2.0-rc3 with postgres 11.
So, we're clearly not using enough "locking" here, even with startTransaction etc.
As said by @pracj3am the gpgsqlbackend could try with repeatable read; gmysqlbackend should maybe take a lock on the domain record (not sure about the exact semantics); gsqlite might be implicitly okay by sqlite's write rules?
Wouldn't it be better to find a solution whick works with all backends? I.e. having the locking inside PDNS? Ie if there is a modification on a zone's RRs, create a shared lock based on zone+label+type, then delete/insert, then unlock. I think there are already plenty of locks in PDNS itself.
Wouldn't it be better to find a solution whick works with all backends? I.e. having the locking inside PDNS?
That doesn't help pdns_server vs. pdnsutil, or two pdns_server instances talking to the same database, etc.
The following dockerfile reproduces the issue.
FROM debian:buster
RUN apt update && apt upgrade -y && apt install --no-install-recommends -y \
ca-certificates \
git \
libpq-dev \
build-essential \
postgresql-11 \
autoconf \
automake \
libtool \
pkg-config \
ragel \
bison \
flex \
virtualenv \
libboost-all-dev \
libssl-dev \
curl \
jq \
&& apt clean && rm -rf /var/lib/apt/lists/*
RUN git clone -b rel/auth-4.2.x https://github.com/PowerDNS/pdns && \
cd pdns && \
autoreconf -vi && \
./configure --with-modules=gpgsql --without-lua --disable-lua-records && \
make && make install
RUN \
echo '\n\
api=yes\n\
api-key=1\n\
webserver=yes\n\
webserver-address=0.0.0.0\n\
webserver-port=80\n\
launch=gpgsql\n\
gpgsql-host=127.0.0.1\n\
gpgsql-dbname=dns\n\
gpgsql-user=pdns\n\
gpgsql-password=2\n' \
>/usr/local/etc/pdns.conf
RUN \
echo 'curl -s -X PATCH "http://127.1/api/v1/servers/localhost/zones/example.com" -H "accept: application/json" -H "X-API-Key: 1" -H "Content-Type: application/json" -d "{ \"rrsets\": [ { \"name\": \"a.example.com.\", \"type\": \"CNAME\", \"ttl\": 3600, \"changetype\": \"REPLACE\", \"records\": [ { \"content\": \"example.com.\", \"disabled\": false, \"set-ptr\": false} ] } ] }"' >/patch && \
chmod +x /patch && \
echo 'curl -s "http://127.1/api/v1/servers/localhost/zones/example.com" -H "accept: application/json" -H "X-API-Key: 1" -H "Content-Type: application/json" | jq ".rrsets | map(select(.name==\"a.example.com.\"))[0].records"' >/test && \
chmod +x /test && \
echo 'curl -s "http://127.1/api/v1/servers/localhost/zones/example.com" -H "accept: application/json" -H "X-API-Key: 1" -H "Content-Type: application/json" | jq ".rrsets | map(select(.type==\"SOA\"))[0].records"' >/test2 && \
chmod +x /test2
RUN \
/etc/init.d/postgresql start && \
su postgres -c 'psql <<<"create database dns"' && \
su postgres -c "psql <<<\"create role pdns with login password '2'\"" && \
su postgres -c "psql dns </pdns/modules/gpgsqlbackend/schema.pgsql.sql" && \
su postgres -c "psql dns <<<\"grant all on all tables in schema public to pdns\"" && \
su postgres -c "psql dns <<<\"grant all on all sequences in schema public to pdns\"" && \
/usr/local/sbin/pdns_server --daemon && \
curl -s -X POST "http://127.1/api/v1/servers/localhost/zones" -H "accept: application/json" -H "X-API-Key: 1" -H "Content-Type: application/json" -d '{ "name": "example.com.", "kind": "Native", "nameservers": ["ns.example.com."]}' && \
/patch && \
pkill pdns_server && \
/etc/init.d/postgresql stop
ENTRYPOINT /etc/init.d/postgresql start && \
/usr/local/sbin/pdns_server >/var/log/pdns.log 2>&1 & sleep 5 && \
/bin/bash -c "for i in {1..5}; do for j in {1..10}; do /patch & done; sleep 1; done" && \
/test && /test2