spark-perf
spark-perf copied to clipboard
Input Data File Location
Hello,
I am working on spark on yarn setup and running k-means algorithm. I want to know the location of the input data file generated by spark-perf or it is in memory only?
Thanks
Hi, I have the same question. It seems the data will be read from/written to the HDFS specified in config.py. But I didn't see any files created in HDFS during the test. Is the input dataset created on-the-fly, or do we need to populate the datasets in HDFS before running the test? If it is the latter, anyone knows where the test datasets are? Thx!