ozone icon indicating copy to clipboard operation
ozone copied to clipboard

HDDS-11405. Implement setrep command for Ozone FS.

Open sadanand48 opened this issue 1 year ago • 7 comments

What changes were proposed in this pull request?

Currently this command does not work, as on demand changing of replication is not supported in Ozone. However the new atomic rewriteKey API , makes it possible to rewrite key with the new replication and setrep can be implemented using this. This is only for RATIS keys and does not apply for EC keys as there is no point of replication factor in EC keys.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-11405

How was this patch tested?

Unit tests added. Also tested the ozone fs shell

bash-4.4$ ozone sh key info  vol1/buck1/key1 | grep "replicationFactor"
 "replicationFactor" : "ONE",
bash-4.4$ ozone fs -ls ofs://om/vol1/buck1/key1
-rw-rw-rw-   1 hadoop hadoop       4068 2024-09-05 08:36 ofs://om/vol1/buck1/key1
bash-4.4$ ozone fs -setrep -w 3 ofs://om/vol1/buck1/key1
Replication 3 set: ofs://om/vol1/buck1/key1
Waiting for ofs://om/vol1/buck1/key1 ... done
bash-4.4$ ozone sh key info  vol1/buck1/key1   | grep "replicationFactor"
replicationFactor : THREE,
bash-4.4$ ozone fs -setrep -w 2 ofs://om/vol1/buck1/key1
setrep: Replication factor of 2 not supported

setrep with and without -w

bash-4.4$ ozone fs -setrep  3  ofs://om/s3v/buck/key2
-setrep: Asynchronous set rep is not supported,Please use -w arg
Usage: ozone fs [generic options]
	[-appendToFile [-n] <localsrc> ... <dst>]
	[-cat [-ignoreCrc] <src> ...]
	[-checksum [-v] <src> ...]
	[-chgrp [-R] GROUP PATH...]
	[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
	[-chown [-R] [OWNER][:[GROUP]] PATH...]
	[-concat <target path> <src path> <src path> ...]
	[-copyFromLocal [-f] [-p] [-l] [-d] [-t <thread count>] [-q <thread pool queue size>] <localsrc> ... <dst>]
	[-copyToLocal [-f] [-p] [-crc] [-ignoreCrc] [-t <thread count>] [-q <thread pool queue size>] <src> ... <localdst>]
	[-count [-q] [-h] [-v] [-t [<storage type>]] [-u] [-x] [-e] [-s] <path> ...]
	[-cp [-f] [-p | -p[topax]] [-d] [-t <thread count>] [-q <thread pool queue size>] <src> ... <dst>]
	[-createSnapshot <snapshotDir> [<snapshotName>]]
	[-deleteSnapshot <snapshotDir> <snapshotName>]
	[-df [-h] [<path> ...]]
	[-du [-s] [-h] [-v] [-x] <path> ...]
	[-expunge [-immediate] [-fs <path>]]
	[-find <path> ... <expression> ...]
	[-get [-f] [-p] [-crc] [-ignoreCrc] [-t <thread count>] [-q <thread pool queue size>] <src> ... <localdst>]
	[-getfacl [-R] <path>]
	[-getfattr [-R] {-n name | -d} [-e en] <path>]
	[-getmerge [-nl] [-skip-empty-file] <src> <localdst>]
	[-head <file>]
	[-help [cmd ...]]
	[-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [-e] [<path> ...]]
	[-mkdir [-p] <path> ...]
	[-moveFromLocal [-f] [-p] [-l] [-d] <localsrc> ... <dst>]
	[-moveToLocal <src> <localdst>]
	[-mv <src> ... <dst>]
	[-put [-f] [-p] [-l] [-d] [-t <thread count>] [-q <thread pool queue size>] <localsrc> ... <dst>]
	[-renameSnapshot <snapshotDir> <oldName> <newName>]
	[-rm [-f] [-r|-R] [-skipTrash] [-safely] <src> ...]
	[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
	[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
	[-setfattr {-n name [-v value] | -x name} <path>]
	[-setrep [-R] [-w] <rep> <path> ...]
	[-stat [format] <path> ...]
	[-tail [-f] [-s <sleep interval>] <file>]
	[-test -[defswrz] <path>]
	[-text [-ignoreCrc] <src> ...]
	[-touch [-a] [-m] [-t TIMESTAMP (yyyyMMdd:HHmmss) ] [-c] <path> ...]
	[-touchz <path> ...]
	[-truncate [-w] <length> <path> ...]
	[-usage [cmd ...]]

Generic options supported are:
-conf <configuration file>        specify an application configuration file
-D <property=value>               define a value for a given property
-fs <file:///|hdfs://namenode:port> specify default filesystem URL to use, overrides 'fs.defaultFS' property from configurations.
-jt <local|resourcemanager:port>  specify a ResourceManager
-files <file1,...>                specify a comma-separated list of files to be copied to the map reduce cluster
-libjars <jar1,...>               specify a comma-separated list of jar files to be included in the classpath
-archives <archive1,...>          specify a comma-separated list of archives to be unarchived on the compute machines

The general command line syntax is:
command [genericOptions] [commandOptions]

Usage: ozone fs [generic options] -setrep [-R] [-w] <rep> <path> ...
bash-4.4$ ozone fs -setrep -w  3  ofs://om/s3v/buck/key2
Replication 3 set: ofs://om/s3v/buck/key2
Waiting for ofs://om/s3v/buck/key2 ... done

sadanand48 avatar Sep 05 '24 10:09 sadanand48

Does ozone fs -setrep work recursively on vol/bucket/dir at the moment? If not, do we have plans to support it later?

Recursive operation is handled in SetReplication (the implementation of -setrep), so I guess we get that for free. FileSystem.setReplication, which is implemented here for Ozone, only handles files.

On the other hand, it looks like -setrep is async by default (-w makes it wait for replication to complete), which this implementation does not support. I guess that's part of what @jojochuang referred to here:

requires client to read from source and write to the destination. I think that wouldn't be expected for a user coming from Hadoop land

adoroszlai avatar Oct 18 '24 07:10 adoroszlai

On the other hand, it looks like -setrep is async by default

Yes it is async in hadoop as the blocks are replicated/deleted according to the set replication factor (client only sends setrep request to NN) , Using rewrite is a hack here and if needed we could do it in a separate thread in order to be asynchronous but it is the client who will do the work of replicating not the server like HDFS.
If this is not desired, we could probably close this and leave the current behaviour as is.

sadanand48 avatar Oct 18 '24 09:10 sadanand48

Using rewrite is a hack here and if needed we could do it in a separate thread in order to be asynchronous but it is the client who will do the work of replicating not the server like HDFS. If this is not desired, we could probably close this and leave the current behaviour as is.

Alternatively, we could override the shell command from Hadoop, rejecting invocation without -w as "not implemented". This would make behavior consistent (since -w is forced), and let us add async implementation in the future. (Let me know if more details are needed.)

adoroszlai avatar Oct 24 '24 16:10 adoroszlai

We need more robot tests. Please include every positive and negative test case for robot cli tests.

kerneltime avatar Oct 24 '24 17:10 kerneltime

we could override the shell command from Hadoop, rejecting invocation without -w as "not implemented". This would make behavior consistent (since -w is forced), and let us add async implementation in the future.

Thanks @adoroszlai for the comment, I have made this change.

However this is at the command level i.e only would make sense if user is using ozone fs shell (also won't work for hadoop fs shell as we are overriding in ozone) but at the API level it is still same i.e non-async. Hadoop usecases generally involve using the hadoop fs provided API's and not FS shell , if it is okay there then we could go ahead with this patch.

sadanand48 avatar Oct 25 '24 05:10 sadanand48

Hadoop usecases generally involve using the hadoop fs provided API's and not FS shell

Thanks, I didn't know that.

adoroszlai avatar Oct 25 '24 05:10 adoroszlai

For example, the mapreduce.robot file that this patch touches when executed calls this code and it makes a call to fs.setReplication()

sadanand48 avatar Oct 25 '24 06:10 sadanand48

Thanks again @sadanand48 for the patch. Given that:

  • the same functionality is available via ozone sh CLI, which even supports rewriting EC keys
  • usage via FileSystem API has different behavior compared to Hadoop's async implementation

I suggest abandoning this.

adoroszlai avatar Nov 19 '24 12:11 adoroszlai

Sure, closing this.

sadanand48 avatar Nov 19 '24 14:11 sadanand48