spark-perf
spark-perf copied to clipboard
scale_factor setting question
I`m confused about scale_factor setting in config.py.template
# The default values configured below are appropriate for approximately 20 m1.xlarge nodes,
# in which each node has 15 GB of memory. Use this variable to scale the values (e.g.
# number of records in a generated dataset) if you are running the tests with more
# or fewer nodes. When developing new test suites, you might want to set this to a small
# value suitable for a single machine, such as 0.001.
SCALE_FACTOR = 1.0
scale_factor=1 for 20 m1.xlarge nodes(15GB mem) , why 0.001 for a single machine? what if c3.xlarge(7.5GB mem) nodes or c3.2xlarge(4vCPU) nodes?
Thanks!
I'm also confused about this. Something like a formula, based on CPU cores and RAM (per node) would make understanding and using easier I believe.