spark-examples icon indicating copy to clipboard operation
spark-examples copied to clipboard

[ARCHIVED] Moved to github.com/NVIDIA/spark-xgboost-examples

Please note that this repo has been moved to the new repo spark-xgboost-examples.

This repo provides docs and example applications that demonstrate the RAPIDS.ai GPU-accelerated XGBoost-Spark project.

Examples

  • Mortgage: Scala, Python
  • Taxi: Scala, Python
  • Agaricus: Scala, Python

Getting Started Guides

Try one of the Getting Started guides below. Please note that they target the Mortgage dataset as written, but with a few changes to EXAMPLE_CLASS, trainDataPath, and evalDataPath, they can be easily adapted to the Taxi or Agaricus datasets.

You can get a small size datasets for each example in the datasets folder. These datasets are only provided for convenience. In order to test for performance, please prepare a larger dataset by following Preparing Datasets. We also provide a larger dataset: Morgage Dataset (1 GB uncompressed), which is used in the guides below.

  • Building applications
    • Scala
    • Python
  • Getting started on on-prem clusters
    • Standalone cluster for Scala
    • Standalone cluster for Python
    • YARN for Scala
    • YARN for Python
    • Kubernetes
  • Getting started on cloud service providers
    • Amazon AWS
      • EMR
      • SageMaker
    • Databricks
    • Google Cloud Platform
  • Getting started for Jupyter Notebook applications
    • Apache Toree Notebook for Scala
    • Jupyter Notebook for Python

These examples use default parameters for demo purposes. For a full list please see Supported XGBoost Parameters for Scala or Python

XGBoost-Spark API

  • Scala API
  • Python API

Advanced Topics

  • Multi-GPU configuration
  • Performance tuning

Contact Us

Please see the RAPIDS website for contact information.

License

This content is licensed under the Apache License 2.0