[Doc][guide/installation/pseudo-cluster.md] Documentation improvement

Open cncws opened this issue 2 years ago • 0 comments

Search before asking

[X] I had searched in the issues and found no similar feature requirement.

Description

TL;DR

guide/installation/pseudo-cluster.md

replace this

# Database related configuration, set database type, username and password
export DATABASE=${DATABASE:-postgresql}
export SPRING_PROFILES_ACTIVE=${DATABASE}
export SPRING_DATASOURCE_URL="jdbc:postgresql://127.0.0.1:5432/dolphinscheduler"
export SPRING_DATASOURCE_USERNAME={user}
export SPRING_DATASOURCE_PASSWORD={password}

# Database related configuration, set database type, username and password
export DATABASE=${DATABASE:-postgresql}
export SPRING_PROFILES_ACTIVE=${DATABASE}
export SPRING_DATASOURCE_URL="jdbc:postgresql://127.0.0.1:5432/dolphinscheduler?stringtype=unspecified"
export SPRING_DATASOURCE_USERNAME={user}
# empty password is not recommended or some task will be failed
export SPRING_DATASOURCE_PASSWORD={password}

Why

I deployed DS in the centos virtual machine and encountered problems because of the database related configuration. Here is my settings:

export SPRING_DATASOURCE_URL="jdbc:postgresql://localhost:5432/dolphinscheduler"
export SPRING_DATASOURCE_USERNAME="postgres"
export SPRING_DATASOURCE_PASSWORD=""

The password is empty since I only use it locally. However this will cause a problem: the DS cannot run Data Quality tasks.

When run a Data Quality task, master server print the log like Task <task_name> is submitted to priority queue error. I read source code and got that DS link to the database above to find the data source used in task. And it failed on connection due to NPE. The relevant code is located at dolphinscheduler-master/src/main/java/org/apache/dolphinscheduler/server/master/runner/task/BaseTaskProcessor.java:

public DataSource getDefaultDataSource() {
        DataSource dataSource = new DataSource();

        HikariDataSource hikariDataSource = (HikariDataSource) defaultDataSource;
        dataSource.setUserName(hikariDataSource.getUsername());
        JdbcInfo jdbcInfo = JdbcUrlParser.getJdbcInfo(hikariDataSource.getJdbcUrl());
        if (jdbcInfo != null) {
            Properties properties = new Properties();
            properties.setProperty(USER, hikariDataSource.getUsername());
            properties.setProperty(PASSWORD, hikariDataSource.getPassword());    // this line
            properties.setProperty(DATABASE, jdbcInfo.getDatabase());
            properties.setProperty(ADDRESS, jdbcInfo.getAddress());
            properties.setProperty(OTHER, jdbcInfo.getParams());
            properties.setProperty(JDBC_URL, jdbcInfo.getAddress() + SINGLE_SLASH + jdbcInfo.getDatabase());
            dataSource.setType(DbType.of(JdbcUrlParser.getDbType(jdbcInfo.getDriverName()).getCode()));
            dataSource.setConnectionParams(JSONUtils.toJsonString(properties));
        }

        return dataSource;
    }

The solution is quite simple, just use a datasource with password. So I want to add a hint to the doc to help others avoid this problem. It would be better to make the Data Quality task compatible with empty password.

Another suggestion: many tables used timestamp columns but the DS insert a string value like "2023-12-12 12:00:00". This would cause error like this: column "data_time" is of type timestamp without time zone but expression is of type character varying. We can append ?stringtype=unspecified to the SPRING_DATASOURCE_URL to avoid this problem.

Documentation Links

https://dolphinscheduler.apache.org/zh-cn/docs/3.2.0/guide/installation/pseudo-cluster

Are you willing to submit a PR?

[X] Yes I am willing to submit a PR!

Code of Conduct

[X] I agree to follow this project's Code of Conduct

Dec 12 '23 08:12 cncws