couchdb icon indicating copy to clipboard operation
couchdb copied to clipboard

Replication - url_parsing_failed/nx_domain on IPv6

Open JakeHillion opened this issue 4 years ago • 3 comments

Description

I am attempting to schedule replication between two instances which only have IPv6 connectivity between them. The failure in the logs suggests that this parser cannot accept IPv6 addresses.

Steps to Reproduce

  • curl -X PUT 'http://[fd19:ca4f:1891:0:c77d:21c3:80b1:47d5]:5984/a'
  • curl -X PUT 'http://[fd19:ca4f:1891:0:c77d:21c3:80b1:47d5]:5984/a/a' -d '{}'
  • curl -X PUT 'http://[fd19:ca4f:1891:0:5d59:4c52:1d67:4f85]:5984/b'

Replicate Endpoint

  • curl -X POST -H 'Content-Type: application/json' 'http://[fd19:ca4f:1891:0:5d59:4c52:1d67:4f85]:5984/_replicate' -d '{"source":"http://[fd19:ca4f:1891:0:c77d:21c3:80b1:47d5]:5984/a","target":"http://[fd19:ca4f:1891:0:5d59:4c52:1d67:4f85]:5984/b"}'
    • {"error":"nxdomain","reason":"could not resolve http://[fd19:ca4f:1891:0:c77d:21c3:80b1:47d5]:5984/a/"}

Fauxton

This one is trickier to replicate via curl IMO, so instead I've included an image of the configuration and then the output logs.

image

Logs:

  • [error] 2022-01-09T23:52:44.193423Z [email protected] <0.9118.0> -------- couch_replicator_httpc: auth plugin initialization failed "http://[fd19:ca4f:1891:0:c77d:21c3:80b1:47d5]:5984/a/" {session_request_failed,"http://fd19:ca4f:1891:0:c77d:21c3:80b1:47d5/_session","admin",{url_parsing_failed,{error,invalid_uri}}}
  • [error] 2022-01-09T23:52:44.193636Z [email protected] <0.9118.0> -------- throw:{replication_auth_error,{session_request_failed,"http://fd19:ca4f:1891:0:c77d:21c3:80b1:47d5/_session","admin",{url_parsing_failed,{error,invalid_uri}}}}: Replication 0d172815abcfddb7ae70200408d7f6b8 failed to start "http://[fd19:ca4f:1891:0:c77d:21c3:80b1:47d5]:5984/a/" -> "http://[fd19:ca4f:1891:0:5d59:4c52:1d67:4f85]:5984/b/" doc <<"shards/00000000-7fffffff/_replicator.1641770003">>:<<"52b362d49d9b3f9b57b03fed76000352">> stack:[{couch_replicator_httpc,setup,1,[{file,"src/couch_replicator_httpc.erl"},{line,59}]},{couch_replicator_api_wrap,db_open,3,[{file,"src/couch_replicator_api_wrap.erl"},{line,74}]}]

Expected Behaviour

  • Replication to occur as with IPv4.
  • More specifically, the URLs are in (I believe) the correct format for a directly specified IPv6 address, yet it seems for the _replicate endpoint these are parsed as domains, and the _replicator databases treats them as invalid addresses.

Your Environment

CouchdbTest1 (fd19:ca4f:1891:0:c77d:21c3:80b1:47d5/64): {"couchdb":"Welcome","version":"3.2.1","git_sha":"244d428af","uuid":"7e9872dd6737bc29302be23caf5294c1","features":["access-ready","partitioned","pluggable-storage-engines","reshard","scheduler"],"vendor":{"name":"The Apache Software Foundation"}}

CouchdbTest2 (fd19:ca4f:1891:0:5d59:4c52:1d67:4f85/64): {"couchdb":"Welcome","version":"3.2.1","git_sha":"244d428af","uuid":"b6a91ada70259942daf27bc5fbad8f4a","features":["access-ready","partitioned","pluggable-storage-engines","reshard","scheduler"],"vendor":{"name":"The Apache Software Foundation"}}

  • CouchDB version used:
  • curl 7.74.0
  • Debian 11 Bullseye.

Additional Context

I found https://issues.apache.org/jira/browse/COUCHDB-665 when searching, which suggests that this bug may be a regression. I used similar commands to them, with the difference being that I created two separate virtual machines for testing, rather than using a single host.

I haven't replicated this in this bug report, but I also tried on some other systems allocating a DNS name with a AAAA record to the same IPv6 addresses. This also did not immediately solve the problem, but I have left it out of this bug report as it might be a confounding variable.

JakeHillion avatar Jan 09 '22 23:01 JakeHillion

This looks like a bug with url parsing. Specifically, it looks like it's in the _session (cookie) auth plugin in replicator.

If you're willing to try an experiment, see if disabling the session auth plugin would make it behave differently. Try this config setting:

[replicator]
auth_plugins = couch_replicator_auth_noop

This bypasses the session plugin and switches to sending basic auth credentials with every request.

nickva avatar Jan 11 '22 05:01 nickva

Happy to try extra tests now I've got the machines setup.

My Environment

Added to each machine:

cat > /opt/couchdb/etc/local.d/20-replicator.ini
[replicator]
auth_plugins = couch_replicator_auth_noop

Otherwise same as initial post.

Replicate Endpoint

  • curl -X POST -H 'Content-Type: application/json' 'http://[fd19:ca4f:1891:0:5d59:4c52:1d67:4f85]:5984/_replicate' -d '{"source":"http://[fd19:ca4f:1891:0:c77d:21c3:80b1:47d5]:5984/a","target":"http://[fd19:ca4f:1891:0:5d59:4c52:1d67:4f85]:5984/b"}'
    • {"error":"nxdomain","reason":"could not resolve http://[fd19:ca4f:1891:0:c77d:21c3:80b1:47d5]:5984/a/"}

Fauxton

Configured in the UI as above.

Logs:

  • [notice] 2022-01-11T11:58:34.347861Z [email protected] <0.3873.0> -------- Retrying GET request to http://[fd19:ca4f:1891:0:c77d:21c3:80b1:47d5]:5984/a/ in 4.0 seconds due to error {conn_failed,{error,nxdomain}}
  • [error] 2022-01-11T11:58:34.348737Z [email protected] <0.3873.0> -------- Replicator, request GET to "http://[fd19:ca4f:1891:0:c77d:21c3:80b1:47d5]:5984/a/" failed due to error {error,{conn_failed,{error,nxdomain}}}
  • [error] 2022-01-11T11:58:34.349091Z [email protected] <0.3873.0> -------- exit:{nxdomain,<<"could not resolve http://[fd19:ca4f:1891:0:c77d:21c3:80b1:47d5]:5984/a/">>}: Replication 0d172815abcfddb7ae70200408d7f6b8 failed to start "http://[fd19:ca4f:1891:0:c77d:21c3:80b1:47d5]:5984/a/" -> "http://[fd19:ca4f:1891:0:5d59:4c52:1d67:4f85]:5984/b/" doc <<"shards/00000000-7fffffff/_replicator.1641770003">>:<<"16c76a702836504eee392149ab000b38">> stack:[{couch_replicator_api_wrap,db_open,3,[{file,"src/couch_replicator_api_wrap.erl"},{line,120}]},{couch_replicator_scheduler_job,init_state,1,[{file,"src/couch_replicator_scheduler_job.erl"},{line,571}]}]

Summary

This seems to align the two methods of replication. Whereas before they were getting different errors, they're now the same, but I'm still experiencing problems in my testing.

JakeHillion avatar Jan 11 '22 12:01 JakeHillion

@JakeHillion thank you for checking.

nxdomain is the error from failing to resolve the host with DNS. In this case it's an IP already just in the IPv6 format. So this looks like a bug.

nickva avatar Jan 13 '22 21:01 nickva

@nickva Not sure if that's the bug actually, as I'm getting this same error when using internal domains from fly.io that have AAAA records. I'm able to curl the same thing just fine, but CouchDB fails to replicate.

2023-10-10T15:37:03.329 app[5683d554a54dd8] iad [info] [error] 2023-10-10T15:37:03.329220Z nonode@nohost <0.2198.1> -------- Replicator, request GET to "http://5683d554a54dd8.vm.kitty-couchdb.internal:5984/kitty/" failed due to error {error,{conn_failed,{error,nxdomain}}}

2023-10-10T15:37:03.330 app[5683d554a54dd8] iad [info] [error] 2023-10-10T15:37:03.329678Z nonode@nohost <0.2198.1> -------- exit:{nxdomain,<<"could not resolve http://5683d554a54dd8.vm.kitty-couchdb.internal:5984/kitty/">>}: Replication c69137545c9630b0fc5f26941816e28b+continuous+create_target failed to start "http://5683d554a54dd8.vm.kitty-couchdb.internal:5984/kitty/" -> "http://e784936a2e2768.vm.kitty-couchdb.internal:5984/kitty/" doc <<"shards/80000000-ffffffff/_replicator.1696915972">>:<<"f0283eeea4248b169847e01f30001169">> stack:[{couch_replicator_api_wrap,db_open,3,[{file,"src/couch_replicator_api_wrap.erl"},{line,122}]},{couch_replicator_scheduler_job,init_state,1,[{file,"src/couch_replicator_scheduler_job.erl"},{line,634}]}]
root@5683d554a54dd8:/# curl http://5683d554a54dd8.vm.kitty-couchdb.internal:5984/_up
{"status":"ok","seeds":{}}

(5683d554a54dd8 is the machine with the local database, e784936a2e2768 is the machine being replicated to)

catgirlinspace avatar Oct 10 '23 15:10 catgirlinspace

Is it fixed by 882e7161acb692d3e721c7653d8cdc2e5e65d2ef?

rnewson avatar Oct 10 '23 16:10 rnewson

Good find @rnewson

That patch hasn't been released yet I think. @catgirlinspace would you be able to build Apache CouchDB main from source?

If you can, then you can try it out with this option:

[replicator]
ibrowse_options = [{prefer_ipv6, true}]

nickva avatar Oct 10 '23 16:10 nickva

also its implied the nodes are ipv6-only so if this is a cluster I don't think they can connect without a separate change in vm.args to switch the distribution protocol module to `inet6_tcp' (https://www.erlang.org/doc/man/erl#proto_dist)

rnewson avatar Oct 10 '23 16:10 rnewson

not at home to try that atm, but I'll try using a build of main tonight!

Unrelated but feels like an ok opportunity to ask since it was mentioned--is there much of a difference between replication and the clustering? Can I write to any node and it replicates to all the others with both methods or does that only work with one of them?

catgirlinspace avatar Oct 10 '23 19:10 catgirlinspace

Clustering is a way to connect multiple Apache CouchDB nodes using Erlang's native clustering mechanism over TCP connections. At the HTTP API level it looks like a single Apache CouchDB instance. With clustering you could get 1) better performance, by having nodes process requests in parallel, and 2) reliability by having 2 extra document copies on separate nodes. If one or two node go down, you can still access you data. However there is more setup involved, with some firewall rules. See https://docs.couchdb.org/en/stable/cluster/index.html for more info. Clustering works best on reliable, low latency, network connections.

Replication is a way to replicate data between any Apache CouchDB compatible endpoints. It's a peer-to-peer setup, which is kind of a unique feature amongst databases. All the data is replicated via the HTTP API interface. You can run replication jobs on any Apache CouchDB instance and it can replicate between any two other CouchDB instances. Replication can be bi-directional (A->B, B->A), or you can set up any other topology you like: a star, with a central instance, or a circle, etc. Replication is suitable even for high latency or unreliable networks, with intermittent connectivity. There are checkpoints and retries, which are all configurable.

Can I write to any node and it replicates to all the others with both methods or does that only work with one of them?

In case of replications, you can achieve that with a bi-directional replication. If you have CouchDB instances A and B. You can replicate from A to B and then also from B to A. Those two replication jobs, then can also run on A, B, or even a third cluster C! You can read more about it in: https://docs.couchdb.org/en/stable/replication/index.html But you'd also have to be aware of conflicts since the same document could be updated concurrently on A and B. You can read more about it here: https://docs.couchdb.org/en/stable/replication/conflicts.html

nickva avatar Oct 10 '23 20:10 nickva

Ah, that makes a lot more sense, thank you!! Would be good to maybe put that somewhere in the docs possibly. I didn't see an explanation like this anywhere when I was looking.

For having A, B, and C nodes, how would I setup replication? Would I want like a triangle, where each vertex is a node and each side has bi-directional replication?

catgirlinspace avatar Oct 11 '23 00:10 catgirlinspace

@nickva good news-- building CouchDB from main and using the config option you provided fixed it!!

catgirlinspace avatar Oct 11 '23 00:10 catgirlinspace

good news-- building CouchDB from main and using the config option you provided fixed it!!

That's great to hear. Credit also goes to @rnewson for both fixing the ipv6 setting and remembering that it was the correct answer for this issue!

For having A, B, and C nodes, how would I setup replication? Would I want like a triangle, where each vertex is a node and each side has bi-directional replication?

A triangle would work with 3 bi-directional links (a total of 6 links):

  • On A run: A -> B, A-> C
  • On B run: B -> A, B -> C
  • On C run: C -> A, C -> B

nickva avatar Oct 11 '23 02:10 nickva

Ah, I'll try that, thanks! Might also make a PR to the docs on replication to add this stuff (unless you want to?). Feels like this is information that should be there but I couldn't find it.

catgirlinspace avatar Oct 11 '23 03:10 catgirlinspace

Might also make a PR to the docs on replication to add this stuff

That would be great, please do. Contributions are always welcome!

nickva avatar Oct 11 '23 03:10 nickva

I also faced the same issue while migrating to the IPv6 cluster. When can we expect a release with this fix? Thanks, everyone🙏

ajithcnambiar avatar Oct 23 '23 22:10 ajithcnambiar

The replication works well in the IPv6 cluster with CouchDB 3.3.3 Thanks again!

ajithcnambiar avatar Dec 16 '23 07:12 ajithcnambiar

Cheers, @ajithcnambiar!

I'll close the issue then.

nickva avatar Dec 16 '23 07:12 nickva