spark-examples
spark-examples copied to clipboard
[ARCHIVED] Moved to github.com/NVIDIA/spark-xgboost-examples
Please note that this repo has been moved to the new repo spark-xgboost-examples.
This repo provides docs and example applications that demonstrate the RAPIDS.ai GPU-accelerated XGBoost-Spark project.
Examples
- Mortgage: Scala, Python
- Taxi: Scala, Python
- Agaricus: Scala, Python
Getting Started Guides
Try one of the Getting Started guides below. Please note that they target the Mortgage dataset as written, but with a few changes to EXAMPLE_CLASS, trainDataPath, and evalDataPath, they can be easily adapted to the Taxi or Agaricus datasets.
You can get a small size datasets for each example in the datasets folder. These datasets are only provided for convenience. In order to test for performance, please prepare a larger dataset by following Preparing Datasets. We also provide a larger dataset: Morgage Dataset (1 GB uncompressed), which is used in the guides below.
- Building applications
- Scala
- Python
- Getting started on on-prem clusters
- Standalone cluster for Scala
- Standalone cluster for Python
- YARN for Scala
- YARN for Python
- Kubernetes
- Getting started on cloud service providers
- Amazon AWS
- EMR
- SageMaker
- Databricks
- Google Cloud Platform
- Amazon AWS
- Getting started for Jupyter Notebook applications
- Apache Toree Notebook for Scala
- Jupyter Notebook for Python
These examples use default parameters for demo purposes. For a full list please see Supported XGBoost Parameters for Scala or Python
XGBoost-Spark API
- Scala API
- Python API
Advanced Topics
- Multi-GPU configuration
- Performance tuning
Contact Us
Please see the RAPIDS website for contact information.
License
This content is licensed under the Apache License 2.0