Are the EMR-related constants working?
Expected Behavior
Expected behavior when the following environment variables are set:
FEAST_SPARK_STAGING_LOCATION=s3://...
FEAST_SPARK_LAUNCHER="emr"
FEAST_EMR_CLUSTER_ID=""
FEAST_EMR_LOG_LOCATION=""
- Successfully connect to the specified EMR cluster.
- Initialize a SparkSession within the EMR environment.
- Execute the defined Spark job on the cluster.
- Upon completion of the Spark job, terminate the SparkSession and release cluster resources.
- Store the results of the Spark job in the online store.
Current Behavior
No Spark job is submitted to the EMR cluster. The code exits silently without any indication of success or failure of the materialization.
Steps to reproduce
FEAST_SPARK_STAGING_LOCATION=s3://...
FEAST_SPARK_LAUNCHER="emr"
FEAST_EMR_CLUSTER_ID=""
FEAST_EMR_LOG_LOCATION=""
RepoConfig:
{
"project": "",
"registry": "s3://",
"provider": "aws",
"entity_key_serialization_version": 2,
"online_store":
{
"type": "dynamodb",
"region": "us-west-2"
},
"offline_store":
{
"type": "spark",
"region": "us-west-2",
"staging_location": "s3://",
"spark_conf":
{
"spark.master":"",
"spark.ui.enabled": "true",
"spark.eventLog.enabled": "false",
"spark.sql.catalogImplementation": "hive",
"spark.sql.parser.quotedRegexColumnNames": "true",
"spark.sql.session.timeZone": "UTC",
"spark.jars.packages": "org.apache.hadoop:hadoop-aws:3.3.1"
}
},
"batch_engine":
{
"type": "spark.engine"
}
}
Specifications
- Version:
feast = { extras = ["aws", "gcp", "spark"], version = "==0.42.0" } - Platform: macOS
Possible Solution
The only code reference found for the constant FEAST_SPARK_LAUNCHER dates back four years and is located here: https://github.com/feast-dev/feast/blob/dc2c1dc67c8e191c29155ced3334244351f312c7/infra/terraform/aws/helm.tf#L93
Are the EMR-related constants working in the current FEAST version (0.42.0)?
# EMR cluster to run Feast Spark Jobs in
EMR_CLUSTER_ID: Optional[str] = None
# Region of EMR cluster
EMR_REGION: Optional[str] = None
# Template path of EMR cluster
EMR_CLUSTER_TEMPLATE_PATH: Optional[str] = None
# Log path of EMR cluster
EMR_LOG_LOCATION: Optional[str] = None
References:
- https://feast-spark.readthedocs.io/en/stable/_modules/feast_spark/constants.html
- https://docs.feast.dev/v0.11-branch/feast-on-kubernetes/reference-1/feast-and-spark#option-3.-use-aws-and-emr
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.