feast icon indicating copy to clipboard operation
feast copied to clipboard

Are the EMR-related constants working?

Open fcas opened this issue 1 year ago • 1 comments

Expected Behavior

Expected behavior when the following environment variables are set:

FEAST_SPARK_STAGING_LOCATION=s3://...
FEAST_SPARK_LAUNCHER="emr"
FEAST_EMR_CLUSTER_ID=""
FEAST_EMR_LOG_LOCATION=""
  1. Successfully connect to the specified EMR cluster.
  2. Initialize a SparkSession within the EMR environment.
  3. Execute the defined Spark job on the cluster.
  4. Upon completion of the Spark job, terminate the SparkSession and release cluster resources.
  5. Store the results of the Spark job in the online store.

Current Behavior

No Spark job is submitted to the EMR cluster. The code exits silently without any indication of success or failure of the materialization.

Steps to reproduce

FEAST_SPARK_STAGING_LOCATION=s3://...
FEAST_SPARK_LAUNCHER="emr"
FEAST_EMR_CLUSTER_ID=""
FEAST_EMR_LOG_LOCATION=""

RepoConfig:

{
    "project": "",
    "registry": "s3://",
    "provider": "aws",
    "entity_key_serialization_version": 2,
    "online_store":
    {
        "type": "dynamodb",
        "region": "us-west-2"
    },
    "offline_store":
    {
        "type": "spark",
        "region": "us-west-2",
        "staging_location": "s3://",
        "spark_conf":
        {
            "spark.master":"",
            "spark.ui.enabled": "true",
            "spark.eventLog.enabled": "false",
            "spark.sql.catalogImplementation": "hive",
            "spark.sql.parser.quotedRegexColumnNames": "true",
            "spark.sql.session.timeZone": "UTC",
            "spark.jars.packages": "org.apache.hadoop:hadoop-aws:3.3.1"
        }
    },
    "batch_engine":
    {
        "type": "spark.engine"
    }
}

Specifications

  • Version: feast = { extras = ["aws", "gcp", "spark"], version = "==0.42.0" }
  • Platform: macOS

Possible Solution

The only code reference found for the constant FEAST_SPARK_LAUNCHER dates back four years and is located here: https://github.com/feast-dev/feast/blob/dc2c1dc67c8e191c29155ced3334244351f312c7/infra/terraform/aws/helm.tf#L93

Are the EMR-related constants working in the current FEAST version (0.42.0)?

# EMR cluster to run Feast Spark Jobs in
EMR_CLUSTER_ID: Optional[str] = None

# Region of EMR cluster
EMR_REGION: Optional[str] = None

# Template path of EMR cluster
EMR_CLUSTER_TEMPLATE_PATH: Optional[str] = None

# Log path of EMR cluster
EMR_LOG_LOCATION: Optional[str] = None

References:

  • https://feast-spark.readthedocs.io/en/stable/_modules/feast_spark/constants.html
  • https://docs.feast.dev/v0.11-branch/feast-on-kubernetes/reference-1/feast-and-spark#option-3.-use-aws-and-emr

fcas avatar Jan 25 '25 01:01 fcas

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jun 27 '25 01:06 stale[bot]