Search in DataFrame#

Graviti SDK supports to triggered Graviti search from DataFrame through the following two methods:

Dataset Preparation#

Take the following DataFrame as an example:

from graviti import DataFrame
import graviti.portex as pt

std = pt.build_package("https://github.com/Project-OpenBytes/portex-standard", "main")
schema = pt.record(
    {
        "filename": pt.string(),
        "box2ds": pt.array(
            std.label.Box2D(
                categories=["boat", "car"],
            )
        )
    }
)

data = []
for filename in ("a.jpg", "b.jpg", "c.jpg"):
    data.append(
        {
            "filename": filename,
            "box2ds": [
                {
                    "xmin": 10,
                    "ymin": 10,
                    "xmax": 100,
                    "ymax": 100,
                    "category": "boat",
                },
                {
                    "xmin": 20,
                    "ymin": 20,
                    "xmax": 200,
                    "ymax": 200,
                    "category": "car" if filename == "a.jpg" else "boat",
                },
            ],
        }
    )

df = DataFrame(data, schema)
>>> df
   filename  box2ds
0  a.jpg     DataFrame(2, 5)
1  b.jpg     DataFrame(2, 5)
2  c.jpg     DataFrame(2, 5)

Upload the DataFrame:

from graviti import Workspace

ws = Workspace(f"{YOUR_ACCESSKEY}")

dataset = ws.datasets.create("search_demo")
dataset["train"] = df
dataset.commit("initial commit")

Get the uploaded DataFrame:

df = dataset["train"]

Query#

The query operation will use the lambda function to evaluate each rows, and return the True rows. The lambda function must return a boolean value.

SDK uses the engine.online() to start online searching. For example, search for all rows with filename as “a.jpg”:

>>> from graviti import engine
>>> with engine.online():
...    result = df.query(lambda x: x["filename"] == "a.jpg")
>>> result
   filename  box2ds
0  a.jpg     DataFrame(2, 5)

SDK use any() to match box2ds in rows where at least one category is boat:

>>> from graviti import engine
>>> with engine.online():
...    result = df.query(lambda x: (x["box2ds"]["category"]=="boat").any())
>>> result
   filename  box2ds
0  a.jpg     DataFrame(2, 5)
1  b.jpg     DataFrame(2, 5)
2  c.jpg     DataFrame(2, 5)

SDK use all() to match box2ds in rows whose category are all boat:

>>> from graviti import engine
>>> with engine.online():
...    result = df.query(lambda x: (x["box2ds"]["category"]=="boat").all())
>>> result
   filename  box2ds
0  b.jpg     DataFrame(2, 5)
1  c.jpg     DataFrame(2, 5)

Apply#

The apply operation will apply the lambda function to DataFrame row by row.

Search all box2ds with the categories of “car”:

>>> from graviti import engine
>>> with engine.online():
...    result = df.apply(lambda x: x["box2ds"].query(lambda y: y["category"]=="car"))
>>> result
0  DataFrame(1, 5)
1  DataFrame(0, 5)
2  DataFrame(0, 5)

Query & Apply#

SDK also supports calling apply() after the query().

Search all rows with the box2ds category has “car” and remove null rows:

>>> from graviti import engine
>>> with engine.online():
...     result = df.query(lambda x: (x["box2ds"]["category"] == "car").any()).apply(
...         lambda x: x["box2ds"].query(lambda y: y["category"] == "car")
...     )
>>> result
0  DataFrame(1, 5)