support for -custom-uri on import
Please add support for custom uri designation for importing compressed manuals or enable a proper override in the default URI labeling behaviors.
I have followed: https://docs.marklogic.com/guide/mlcp-guide/en/importing-content-into-marklogic-server/controlling-database-uris-during-ingestion/transforming-the-default-uri.html
Various configurations do not work, perhaps I have the incorrect syntax, it is not obvious from the documentation what version of regular expressions is utilized, (PERL, PCRE?).
I have a need to import compressed manuals (all .zip but possibly gzip later), but need to do minor alterations to the URI as it's including the .zip extension by default.
java.lang.IllegalArgumentException: Invalid option argument for output_uri_replace :Boeing 777 Test Manual.zip,TESTPATH43/Boeing_777_Test_Manual
My filename is Boeing 777 Test Manual.zip and it needs to become /USER_INPUT_ROOT/MANUAL_NAME/<files>.
I have a python api thats acting as a wrapper for MLCP and it functions entirely without issue except for this behavior.
def import_data(database, root_path, files, marklogic_connection):
for file in files:
#convert spaces in file.filename to underscores
file_name = file.filename.replace(" ", "_")
# remove file extension from the end of string
file_name = file_name.split(".")[0]
# if root_path has a trailing slash, do nothing, else add a trailing slash
root_path = root_path if root_path.endswith("/") else f"{root_path}/"
# if root_path has starting slash, remove it
root_path = root_path[1:] if root_path.startswith("/") else root_path
file_uri = f"{root_path}{file_name}"
print(f"Importing {file.filename} to {file_uri}")
cmd = [
MLCP,
"import",
f"-host {marklogic_connection['host']}",
f"-port {marklogic_connection['port']}",
f"-database {database}",
f"-username {marklogic_connection['username']}",
f"-password {marklogic_connection['password']}",
"-input_compressed true",
"-mode local",
"-base_path /",
"-input_compression_codec zip",
"-ssl false",
f"-input_file_path '/tmp/{file.filename}'",
f"-output_uri_replace '{file.filename},{file_uri}'"
]
invoke_mlcp(cmd)
# remove the file from the /tmp directory
subprocess.run(["rm", f"/tmp/{file.filename}"])
invoke_mlcp() simply invokes the bash script provided on a subprocess.
Thank you
A temporary solution for us, and maybe other users, could be to utilize the python client library to update our URI's after uploading.
My apologies, you clearly state the syntax here but it is just difficult to find: https://docs.marklogic.com/guide/mlcp-guide/en/introduction-to-marklogic-content-pump/understanding-the-mlcp-command-line/regular-expression-syntax.html
Will continue trying to get my desired functionality.
The solution to my issue was described here: https://stackoverflow.com/a/72952214
The output_uri_replace needs to be encased in double quotes.
My request is now to expand your documentation on MLCP regarding this functionality as I've wasted a day of development time on this effort.
Hi @starhound, Thanks for filing the issue, I will take a look and triage it for documentation.
@abika5 Please create a ticket in Jira and address it in the next sprint.
Thanks, Vanessa