redpanda icon indicating copy to clipboard operation
redpanda copied to clipboard

PESDLC-1113 Copy logs from agent and RP pods

Open savex opened this issue 1 year ago • 9 comments

Take advantage of call to logs property and simulate DT log copy using streaming approach to eliminate huge memory consumptions.

Backports Required

  • [ ] none - not a bug fix
  • [ ] none - this is a backport
  • [x] none - issue does not exist in previous branches
  • [ ] none - papercut/not impactful enough to backport
  • [ ] v23.3.x
  • [ ] v23.2.x

Release Notes

  • none

savex avatar Apr 10 '24 21:04 savex

EC2 test run:

ubuntu@ip-172-31-4-5:~/tests$  cd /home/ubuntu/tests ; /usr/bin/env /opt/.ducktape-venv/bin/python3 /home/ubuntu/.vscode-server/extensions/ms-python.debugpy-2024.5.11001012-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher 48537 -- -m ducktape --compress --cluster=ducktape.cluster.json.JsonCluster --cluster-file=cluster.json --globals=globals.json --max-parallel=1 --repeat=1 --test-runner-timeout=86400000 rptest/redpanda_cloud_tests/cloud_self_test.py::SelfRedpandaCloudTest.test_healthy 
[INFO:2024-04-22 22:40:13,692]: starting test run with session id 2024-04-22--006...
[INFO:2024-04-22 22:40:13,692]: running 1 tests...
[INFO:2024-04-22 22:40:13,693]: Triggering test 1 of 1...
[INFO:2024-04-22 22:40:14,642]: RunnerClient: Loading test {'directory': '/home/ubuntu/redpanda/tests/rptest/redpanda_cloud_tests', 'file_name': 'cloud_self_test.py', 'cls_name': 'SelfRedpandaCloudTest', 'method_name': 'test_healthy', 'injected_args': None}
[INFO:2024-04-22 22:40:14,646]: RunnerClient: rptest.redpanda_cloud_tests.cloud_self_test.SelfRedpandaCloudTest.test_healthy: on run 1/1
[INFO:2024-04-22 22:40:14,968]: RunnerClient: rptest.redpanda_cloud_tests.cloud_self_test.SelfRedpandaCloudTest.test_healthy: Setting up...
[WARNING - 2024-04-22 22:40:18,650 - redpanda_cloud - create - lineno:899]: will not create cluster; already have cluster_id cojcnl1ai2dm17j4nqng
WARN [TBOT]      CLI parameters are overriding onboarding config from  config/config.go:472
INFO [TBOT]      Created directory "/tmp/tbot-data" config/destination_directory.go:132
INFO [TBOT]      Anonymous telemetry is not enabled. Find out more about Machine ID's anonymous telemetry at https://goteleport.com/docs/machine-id/reference/telemetry/ tbot/anonymous_telemetry.go:83
INFO [TBOT]      Attempting to generate new identity from token tbot/renew.go:510
INFO [AUTH]      Attempting registration via proxy server. auth/register.go:283
INFO [AUTH]      Attempting to register Bot with IAM method using regional STS endpoint auth/register.go:629
INFO [AUTH]      Successfully registered Bot with IAM method using regional STS endpoint auth/register.go:662
INFO [AUTH]      Successfully registered via proxy server. auth/register.go:290
INFO [TBOT]      Successfully generated new bot identity, valid: after=2024-04-22T22:39:20Z, before=2024-04-23T04:40:19Z, duration=6h0m59s | kind=tls, renewable=false, disallow-reissue=false, roles=[bot-buildkite-robot], principals=[-teleport-internal-join], generation=0 tbot/tbot.go:446
INFO [TBOT]      Beginning renewal loop: ttl=6h0m0s interval=6h0m0s tbot/renew.go:726
INFO [TBOT]      Started watching for CA rotations tbot/ca_rotation.go:173
INFO [TBOT]      Attempting to generate new identity from token tbot/renew.go:510
INFO [AUTH]      Attempting registration via proxy server. auth/register.go:283
INFO [AUTH]      Attempting to register Bot with IAM method using regional STS endpoint auth/register.go:629
INFO [AUTH]      Successfully registered Bot with IAM method using regional STS endpoint auth/register.go:662
INFO [AUTH]      Successfully registered via proxy server. auth/register.go:290
INFO [TBOT]      Successfully renewed bot certificates, valid: after=2024-04-22T22:39:22Z, before=2024-04-23T04:40:21Z, duration=6h0m59s | kind=tls, renewable=false, disallow-reissue=false, roles=[bot-buildkite-robot], principals=[-teleport-internal-join], generation=0 tbot/renew.go:634
INFO [TBOT]      Successfully renewed impersonated certificates for directory /tmp/machine-id, valid: after=2024-04-22T22:39:23Z, before=2024-04-23T04:40:23Z, duration=6h1m0s | kind=tls, renewable=false, disallow-reissue=true, roles=[teleport-admin], principals=[root redpanda {{internal.logins}} -teleport-internal-join], generation=0 tbot/renew.go:697
INFO [TBOT]      Persisted certificates successfully. One-shot mode enabled so exiting. tbot/renew.go:780
WARN [TBOT]      Context canceled during backoff for CA rotation watcher. Aborting. tbot/ca_rotation.go:139
=== kubectl already installed
=== k alias already created
=== k9s already installed
=== AWS VM detected. Running AWS specific configurations
=== k8s context already configured
=== grpcurl already installed
=== unzip already installed
=== rpk already installed
=== stern already installed
[INFO:2024-04-22 22:40:39,041]: RunnerClient: rptest.redpanda_cloud_tests.cloud_self_test.SelfRedpandaCloudTest.test_healthy: Running...
[WARNING - 2024-04-22 22:42:55,050 - utils - _check_oversized_allocations - lineno:120]: Ignoring oversized allocation, 217088 is less than the max allowable allocation size of 409600 bytes
[WARNING - 2024-04-22 22:42:55,050 - utils - _check_oversized_allocations - lineno:120]: Ignoring oversized allocation, 217088 is less than the max allowable allocation size of 409600 bytes
[WARNING - 2024-04-22 22:43:19,950 - utils - _check_oversized_allocations - lineno:120]: Ignoring oversized allocation, 217088 is less than the max allowable allocation size of 409600 bytes
[WARNING - 2024-04-22 22:43:19,950 - utils - _check_oversized_allocations - lineno:120]: Ignoring oversized allocation, 217088 is less than the max allowable allocation size of 409600 bytes
[WARNING - 2024-04-22 22:43:43,433 - utils - _check_oversized_allocations - lineno:120]: Ignoring oversized allocation, 217088 is less than the max allowable allocation size of 409600 bytes
[WARNING - 2024-04-22 22:43:43,434 - utils - _check_oversized_allocations - lineno:120]: Ignoring oversized allocation, 217088 is less than the max allowable allocation size of 409600 bytes
[INFO:2024-04-22 22:43:43,435]: RunnerClient: rptest.redpanda_cloud_tests.cloud_self_test.SelfRedpandaCloudTest.test_healthy: Tearing down...
[INFO:2024-04-22 22:43:43,437]: RunnerClient: rptest.redpanda_cloud_tests.cloud_self_test.SelfRedpandaCloudTest.test_healthy: PASS
[INFO:2024-04-22 22:43:43,438]: RunnerClient: rptest.redpanda_cloud_tests.cloud_self_test.SelfRedpandaCloudTest.test_healthy: Data: None
test_id:    rptest.redpanda_cloud_tests.cloud_self_test.SelfRedpandaCloudTest.test_healthy
status:     PASS
run time:   3 minutes 28.791 seconds
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
========================================================================================================================================================================================================================================================================================
SESSION REPORT (ALL TESTS)
ducktape version: 0.8.18
session_id:       2024-04-22--006
run time:         3 minutes 30.361 seconds
tests run:        1
passed:           1
flaky:            0
failed:           0
ignored:          0
opassed:          0
ofailed:          0
opassedfips:      0
ofailedfips:      0
========================================================================================================================================================================================================================================================================================
ubuntu@ip-172-31-4-5:~/tests$

savex avatar Apr 22 '24 22:04 savex

/ci-repeat 1

savex avatar Apr 22 '24 22:04 savex

new failures in https://buildkite.com/redpanda/redpanda/builds/48123#018f085c-7474-4b3e-b2ac-9d1bd0d8dae4:

"rptest.tests.full_disk_test.FullDiskReclaimTest.test_full_disk_triggers_gc"

new failures in https://buildkite.com/redpanda/redpanda/builds/48230#018f119a-6ef6-4e46-8a17-b7cbef6e856e:

"rptest.tests.rbac_upgrade_test.UpgradeMigrationCreatingDefaultRole.test_rbac_migration"

vbotbuildovich avatar Apr 23 '24 01:04 vbotbuildovich

/ci-repeat 1

savex avatar Apr 24 '24 18:04 savex

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/48230#018f11b9-e027-4032-843c-412a6f75cd07

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/48709#018f40f3-a1d2-4340-9aaa-98448686dddb

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/48709#018f40e3-aea2-40b8-8a37-2b638952338f

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/48856#018f5aa3-ebf1-4884-951e-bd3e87bc614b

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/48856#018f5df0-c8dd-447e-906a-0e826719a6a4

vbotbuildovich avatar Apr 24 '24 21:04 vbotbuildovich

/ci-repeat 1

savex avatar Apr 25 '24 22:04 savex

Linter error are from separate module and not related to this change

Run shfmt -i 2 -ci -s -d .
--- tests/docker/ducktape-deps/tinygo-wasi-transforms.orig
+++ tests/docker/ducktape-deps/tinygo-wasi-transforms
@@ -10,7 +10,7 @@
   local count=0
   until "$@"; do
     exit=$?
-    count=$(($count + 1))
+    count=$((count + 1))
     if [ $count -lt $retries ]; then
       echo "Retry $count/$retries exited $exit, retrying..."
     else
Error: Process completed with exit code 1.

savex avatar Apr 25 '24 23:04 savex

/ci-repeat 1

savex avatar Apr 25 '24 23:04 savex

Rebased with latest changes from @travisdowns on capture errors from cmd output

savex avatar May 03 '24 20:05 savex