make load-data apparently not loading data to cluster

Open arminus opened this issue 4 years ago • 0 comments

This is the output when running make load-data: (I ran make before that, there's a 384,4MB sansa-examples-spark.jar present in examples/jars and my setup appears to be running fine):

make load-data
docker run -it --rm -v /home/www/bde/SANSA-Notebooks/sansa-notebooks/examples/data:/data --net spark-net -e "CORE_CONF_fs_defaultFS=hdfs://namenode:8020" bde2020/hadoop-namenode:1.1.0-hadoop2.8-java8 hdfs dfs -copyFromLocal /data /data
Configuring core
 - Setting fs.defaultFS=hdfs://namenode:8020
Configuring hdfs
 - Setting dfs.namenode.name.dir=file:///hadoop/dfs/name
Configuring yarn
Configuring httpfs
Configuring kms
Configuring for multihomed network
docker exec -it namenode hdfs dfs -ls /data
Found 1 items
drwxr-xr-x   - root supergroup          0 2021-04-16 13:45 /data/data

This was the 2nd time I ran make load-data, so besides apparently not uploading any data, I recreated another data dir inside /data on the 2nd run.

Navigating to http://localhost:8088/filebrowser/#/data I can see the nested dat dir but nothing else.

-> I un-jared sansa-examples-spark.jar into examples/data so that the data gets picked up by make load-data, but that step seems to be missing in one of the build targets.

In conjunction with that, the Zeppelin RDF notebook references a file hdfs://namenode:8020/data/rdf.nt - that file is not present in sansa-examples-spark.jar - so I wonder if there's some other issue in play here?

As a side note, copying the data now seems to be running forever (on a reasonable fast Linux box)

Apr 16 '21 14:04 arminus