Upload Dataset#

This is a simple guide to uploading a dataset.

Create or Get a Dataset#

Create a new dataset:

from graviti import Workspace

ws = Workspace(f"{YOUR_ACCESSKEY}")
dataset = ws.datasets.create("Graviti-dataset-demo")

Or get an existing dataset:

dataset = ws.datasets.get("Graviti-dataset-demo")

DataFrame Preparation#

Users need to organize the data into DataFrame format with Schema. In addition, SDK also supports uploading different kinds of Binary Files, such as image, audio, etc.

from graviti import DataFrame
from graviti.file import Image
import graviti.portex as pt

std = pt.build_package("https://github.com/Project-OpenBytes/portex-standard", "main")
box2ds = std.label.Box2D(
    categories=["boat", "car"],
    attributes={
        "difficult": pt.boolean(),
        "occluded": pt.boolean(),
    },
)
schema = pt.record(
    {
        "filename": pt.string(),
        "image": std.file.Image(),
        "box2ds": pt.array(box2ds),
    }
)

filenames = ["a.jpg", "b.jpg", "c.jpg"]
data = []
for filename in filenames:
    row_data = {
        "filename": filename,
        "image": Image(f"PATH/TO/{filename}"),
        "box2ds": [
            {
                "xmin": 1,
                "ymin": 1,
                "xmax": 4,
                "ymax": 5,
                "category": "boat",
                "attribute": {
                    "difficult": False,
                    "occluded": False,
                },
            },
        ],
    }
    data.append(row_data)
df = DataFrame(data=data, schema=schema)

Upload and Commit#

Create or modify the sheet by uploading the dataframe, more details about sheet can be viewed in Sheet Management.

dataset["train"] = df
dataset.commit("Commit-1")

The method commit() actually includes creating, uploading and committing the draft.

Interested users can learn more about dataset version management in Version Control, which can also help troubleshooting when uploads fail.