Nvidia RAPIDS Tutorial

Requirements: NVIDIA GPU, Pascal or better with compute capability of 6.0+

RAPIDS is a set of software libraries for data science on GPUs. RAPIDS implements interfaces that are similar to pandas, scikit-learn, and others, enabling you to convert preprocessing and machine learning code to run orders of magnitude faster with relatively minimal code changes. Why limit your GPUs to just doing deep learning modeling?

Test a baseline

Create a preprocessing_test.py

import pandas as pd
import numpy as np
import time
import foundations

DF1_SIZE = int(2e5)
DF2_SIZE = int(1e4)
DF3_SIZE = int(1e6)

def random_dataframe(num_rows):
    df = pd.DataFrame()

    print("Creating {}-row dataframe".format(num_rows))
    df['col_a'] = np.random.choice(['a', 'b', 'c', 'd'], num_rows)
    df['col_b'] = np.random.randint(0, 10, num_rows)
    df['col_c'] = np.random.randint(0, 5, num_rows)
    df['col_d'] = np.random.randint(0, 3, num_rows)

    return df

df1 = random_dataframe(DF1_SIZE)
df2 = random_dataframe(DF2_SIZE)
df3 = pd.DataFrame(np.random.random((DF3_SIZE, 3)))

print("df1 df2 merging")
start_time = time.time()
df1.merge(df2, on='col_b', how='inner')
foundations.log_metric("join time", "{:.4f}".format(time.time() - start_time))

print("df3 sorting")
start_time = time.time()
df3.sort_values(by=list(df3))
foundations.log_metric("sort time", "{:.4f}".format(time.time() - start_time))

Run our script

python preprocessing_test.py

Create a RAPIDS Docker image for Atlas

First, create a requirements.txt

wheel
request
jsonschema
dill==0.2.8.2
redis==2.10.6
pandas==0.23.3
google-api-python-client==1.7.3
google-auth-httplib2==0.0.3
google-cloud-storage==1.10.0
PyYAML==5.1.2
pysftp==0.2.8
paramiko==2.4.1
mock==2.0.0
freezegun==0.3.8
boto3==1.9.86
boto==2.49.0
flask-restful==0.3.6
Flask==1.1.0
Werkzeug==0.15.4
Flask-Cors==3.0.6
mkdocs==1.0.4
promise==2.2.1
pyarmor==5.5.6
slackclient==1.3.0
scikit-learn==0.21.3
xgboost==0.90

Once we have that create a new Dockerfile:

FROM rapidsai/rapidsai:cuda10.0-runtime-ubuntu18.04

COPY requirements.txt /tmp
RUN /opt/conda/envs/rapids/bin/pip install --no-cache-dir -r /tmp/requirements.txt \
        && rm /tmp/requirements.txt

ENTRYPOINT ["/opt/conda/envs/rapids/bin/python"]

Save this as a text file named Dockerfile. This image will define the environment in which we are going to run jobs. Learn more about Docker here.

(if you need to use a version of CUDA, you can find a different Docker parent image here and replace the first line of the Dockerfile appropriately)

Now we just run

$ docker build . --tag rapidsai-atlas:latest

(using sudo only if necessary for your Docker setup)

Create or modify a job.config.yaml

Edit your job.config.yaml file if you have one, or create a new one. Add (or modify if appropriate) the following lines:

num_gpus: 1

and

worker:
  image: rapidsai-atlas:latest

Modify our baseline code to use cuDF

Open preprocessing_test.py in an editor and make the following changes:

Under

import pandas as pd
import numpy as np
import time
import foundations

add

import cudf

foundations.log_param('cuDF version', cudf.__version__)

Under

df1 = random_dataframe(DF1_SIZE)
df2 = random_dataframe(DF2_SIZE)
df3 = pd.DataFrame(np.random.random((DF3_SIZE, 3)))

add

df1 = cudf.from_pandas(df1)
df2 = cudf.from_pandas(df2)
df3 = cudf.from_pandas(df3)

Done! All we had to do was convert our pandas DataFrames to cuDF DataFrames. The standard interfaces are mostly the same.

Now run the following to submit your code to the scheduler using the custom Docker image that we created above.

foundations submit scheduler . main.py

Go back to the Atlas dashboard. Because we logged cuDF version as a parameter, you can check which job used cuDF. Compare the metrics and runtimes!

The cuDF job's recorded times (and overall runtime) should be way faster!