tidb icon indicating copy to clipboard operation
tidb copied to clipboard

config, sysvar: move configs tmp-storage-* to instance sysvars

Open CbcWestwolf opened this issue 3 years ago • 6 comments

What problem does this PR solve?

Issue Number: ref #34960

Problem Summary:

Before, if we want to change the temporary storage path/quota in a node/instance, we have to change the configuration and restart the node.

What is changed and how it works?

In this PR we introduce two instance scope variables tmpdir and tidb_tmp_storage_quota, and the corresponding config items in [instance] section.

Check List

Tests

  • [x] Unit test (existing UT)
  • [ ] Integration test
  • [x] Manual test (add detailed scripts or steps below)
  • [ ] No code

Manual Test

Data preparation

Setup t1 and t2:

#!/bin/bash
mysql -u root -h 127.0.0.1 -D test -P 4000 -e "DROP TABLE IF EXISTS t1, t2"
mysql -u root -h 127.0.0.1 -D test -P 4000 -e "CREATE TABLE t1 (c INT); CREATE TABLE t2 (c VARCHAR(100));"
for ((i=1; i<=9999; i++)); do
mysql -u root -h 127.0.0.1 -D test -P 4000 -e "INSERT INTO t1 VALUES ($i); INSERT INTO t2 VALUES ($i);"
done
Test

There are several functions using tmpdir:

  1. util/disk/tempDir.go: InitializeTempDir() would be called each time tmpdir is modified, and create a _dir.lock file. We expect to see this file after we modify tmpdir.

    You can just switch different tmpdir like TestSwitchMultipleTmpDir, and check that both $tmpdir/record/ and $tmpdir/_dir.lock exist.

  2. executor/explain.go: getHeapProfile() would create a temp file named like $tmpdir/record/heapGC2006-01-02T15:04:05Z07:001. We can use the sql explain analyze select ... to trigger a long-time query and modify [instance].tmpdir in another session. We expect that the modification is successful and the origin heap file names are recorded in memoryDebugModeHandler.infoField and shown on logs.

    • connect to session 1 and execute:
    set tidb_memory_debug_mode_min_heap_inuse = 1024;
    set tidb_memory_debug_mode_alarm_ratio = 100;
    -- The following query costs about 3 seconds to execute.
    explain analyze select t1.c as c1, t2.c as c2 from t1 join t2 on t1.c = t2.c - t1.c order by c2 desc;
    
    • connect to session 2 and execute set global tmpdir = 'another_tmpdir';
    • waiting the explain analyze in session 1 to finish. The log would show a line containing ["Memory Debug Mode"] [sql=finished]. In this line you could see heap profile following the tmpdir. In most cases it has been set to another_tmpdir, but it is still reasonable if it keeps the origin value.
  3. util/misc.go: LoadTLSCertificates() creates a key file and certificate in tmpdir. We can set auto-tls to true in config.toml and use the sql alter instance reload tls to call LoadTLSCertificates(). By checking the log, we expect that the paths of cert and key file keep the same as the original.

    • create a config.toml as below and start the tidb:
    [security]
    auto-tls = true
    
    • connect to tidb and check the ssl key and cert by select @@ssl_key, @@ssl_cert;
    • execute alter instance reload tls;, and the log ["execute reload tls"] shows the ssl_key and ssl_path
    • select @@ssl_key, @@ssl_cert; and we expect to get the origin result
  4. util/chunk/disk.go: initDiskFile() uses disk.TmpDirMutex to prevent possible conflicts since it has several statements reading tmpdir. We can perform a large query with sort or join operator under limited memory to call this method. Since the block of mutex, the modification of tmpdir would wait for the query finished.

    • do the query in session 1. Two temporary files chunk.ListInDisk-xxxxx and chunk.ListInDiskOffset-xxxxx exists on tmpdir:
    set @@tidb_mem_quota_query = 100 << 10; -- 100KB
    set global tidb_mem_oom_action = 'log';
    -- The following query costs hundreds of seconds to execute.
    select t1.c as c1, t2.c as c2 from t1 join t2 on t1.c = t2.c - t1.c order by c2 desc;
    
    • when the query in session 1 is going on, run set global tmpdir = "another_tmpdir"; in session 2
    • when the query finished in session 1, the origin 2 temp chunk files are deleted, and select @@global.tmpdir; returns another_tmpdir in session 1.
  5. type memoryUsageAlarm struct binds to each domain, and owns a tmpDir field for dumping information about OOM. We can modify tidb_memory_usage_alarm_ratio to trigger the alarm dumping.

    • create a config.toml as below and start the tidb:
    [performance]
    # you can set some other suitable values to trigger the memory alarm
    server-memory-quota = 1048576 # 1MB
    memory-usage-alarm-ratio = 0.01
    
    • query select t1.c as c1, t2.c as c2 from t1 join t2 on t1.c = t2.c - t1.c order by c2 desc;
  6. ddl/ingest/env.go: genLightningDataDir() is called each time a ddl is starting. We expect to see a directory named $tmpdir/tmp_ddl-<port> after we modify the tmpdir and do another ddl.

Side effects

  • [ ] Performance regression: Consumes more CPU
  • [ ] Performance regression: Consumes more Memory
  • [ ] Breaking backward compatibility

Documentation

  • [ ] Affects user behaviors
  • [ ] Contains syntax changes
  • [x] Contains variable changes
  • [ ] Contains experimental features
  • [ ] Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

Move configs `tmp-storage-path` and `tmp-storage-quota` to `[instance]` section, and introduce corresponding instance system variables `tmpdir` and `tidb_tmp_storage_quota`.

CbcWestwolf avatar Aug 03 '22 09:08 CbcWestwolf

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • morgo
  • xhebox

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment. After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review. Reviewer can cancel approval by submitting a request changes review.

ti-chi-bot avatar Aug 03 '22 09:08 ti-chi-bot

Code Coverage Details: https://codecov.io/github/pingcap/tidb/commit/3984f5887b9e6e83c0f0d97b0ff385034b44645a

sre-bot avatar Aug 03 '22 10:08 sre-bot

/cc wshwsh12

CbcWestwolf avatar Aug 09 '22 13:08 CbcWestwolf

/run-mysql-test

CbcWestwolf avatar Aug 10 '22 03:08 CbcWestwolf

/cc xhebox

CbcWestwolf avatar Aug 10 '22 10:08 CbcWestwolf

Any manual test instruction?

A manual test plan is proposed in the description. I'll give more concrete instructions later :-)

CbcWestwolf avatar Sep 21 '22 09:09 CbcWestwolf

@CbcWestwolf: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

ti-chi-bot avatar Nov 05 '22 09:11 ti-chi-bot

@CbcWestwolf: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
tiprow_fast_test 9a6e68995a303cba85b4b9b2c0cdb790a257c95e link true /test fast_test_tiprow

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

tiprow[bot] avatar Feb 27 '24 13:02 tiprow[bot]

@CbcWestwolf: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
idc-jenkins-ci-tidb/mysql-test 9a6e68995a303cba85b4b9b2c0cdb790a257c95e link true /test mysql-test
idc-jenkins-ci-tidb/unit-test 9a6e68995a303cba85b4b9b2c0cdb790a257c95e link true /test unit-test
idc-jenkins-ci-tidb/build 9a6e68995a303cba85b4b9b2c0cdb790a257c95e link true /test build
idc-jenkins-ci-tidb/check_dev_2 9a6e68995a303cba85b4b9b2c0cdb790a257c95e link true /test check-dev2
pull-mysql-client-test 9a6e68995a303cba85b4b9b2c0cdb790a257c95e link true /test pull-mysql-client-test
pull-integration-ddl-test 9a6e68995a303cba85b4b9b2c0cdb790a257c95e link true /test pull-integration-ddl-test
idc-jenkins-ci-tidb/check_dev 9a6e68995a303cba85b4b9b2c0cdb790a257c95e link true /test check-dev
pull-br-integration-test 9a6e68995a303cba85b4b9b2c0cdb790a257c95e link true /test pull-br-integration-test
pull-lightning-integration-test 9a6e68995a303cba85b4b9b2c0cdb790a257c95e link true /test pull-lightning-integration-test

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

ti-chi-bot[bot] avatar Apr 10 '24 10:04 ti-chi-bot[bot]