DataFrame#
DataFrame
is an integrated data structure with an easy-to-use API
for simplifying data processing in Dataset. A Graviti DataFrame contains 2-dimensional tabular data
and a Protex schema describing the names and types of each column.
Initialize a DataFrame#
Initialize a Dataframe from a list of dicts:
>>> from graviti import DataFrame
>>> data = [
... {"filename": "a.jpg"},
... {"filename": "b.jpg"},
... {"filename": "c.jpg"},
... ]
>>> df = DataFrame(data)
>>> df
filename
0 a.jpg
1 b.jpg
2 c.jpg
Initialize a DataFrame with multi-level column names:
>>> from graviti import DataFrame
>>> data = [
... {"attribute": {"weather": "sunny", "color": "red"}},
... {"attribute": {"weather": "rainy", "color": "black"}},
... {"attribute": {"weather": "sunny", "color": "white"}},
... ]
>>> df = DataFrame(data)
>>> df
attribute
color weather
0 red sunny
1 black rainy
2 white sunny
Initialize a DataFrame with nested DataFrame construction:
>>> from graviti import DataFrame
>>> data = [
... {"points": [{"xmin": 1, "ymin": 3}, {"xmin": 5, "ymin": 8}]},
... {"points": [{"xmin": 6, "ymin": 10}]},
... {"points": [{"xmin": 1, "ymin": 3}, {"xmin": 5, "ymin": 8}, {"xmin": 1, "ymin": 9}]},
... ]
>>> df = DataFrame(data)
>>> df
points
0 DataFrame(2, 2)
1 DataFrame(1, 2)
2 DataFrame(3, 2)
>>> df["points"][0]
xmin ymin
0 1 3
1 5 8
Read the DataFrame#
Read data by row:
df.loc[0]
Read data by column:
df[f"{COLUMN_NAME}"]
Read a DataFrame cell:
df.loc[0][f"{COLUMN_NAME}"]
df[f"{COLUMN_NAME}"][0]
Edit the DataFrame#
Edit Rows#
Edit one row:
df.loc[0] = {"filename": "d.jpg"}
Edit multiple rows:
df.loc[0:2] = [{"filename": "d.jpg"}, {"filename": "e.jpg"}]
Edit the Items of Column#
Edit one item:
df[f"{COLUMN_NAME}"][0] = "d.jpg"
Edit multiple items:
df[f"{COLUMN_NAME}"][0:2] = ["d.jpg", "e.jpg"]
Delete Rows#
Delete one row:
del df.loc[0]
Delete multiple rows:
del df.loc[0:2]
Extend Rows#
DataFrame supports method extend()
.
Extend rows to the end of the DataFrame:
df.extend([{"filename": "a.jpg"}])
Extend another Dataframe to the end of the DataFrame:
df1 = DataFrame([{"filename": "a.jpg"}])
df.extend(df1)
Add Columns#
DataFrame supports adding columns by setitem
:
>>> from graviti import DataFrame
>>> data = [
... {"filename": "a.jpg"},
... {"filename": "b.jpg"},
... {"filename": "c.jpg"},
... ]
>>> df = DataFrame(data)
>>> df
filename
0 a.jpg
1 b.jpg
2 c.jpg
>>> df["caption"] = ["a", "b", "c"]
>>> df
filename caption
0 a.jpg a
1 b.jpg b
2 c.jpg c
>>> df.schema
record(
fields={
'filename': string(),
'caption': string(),
},
)
The above example shows adding a column of data with no specified type, and the schema of the column will be inferred. In this case, the column schema can only be Portex Primitive Types.
If specific Portex type is required, please add a Series as the column to the DataFrame.
>>> from graviti import DataFrame, Series
>>> data = [
... {"filename": "a.jpg"},
... {"filename": "b.jpg"},
... {"filename": "c.jpg"},
... ]
>>> df = DataFrame(data)
>>> df
filename
0 a.jpg
1 b.jpg
2 c.jpg
>>> df["category"] = Series(["cat", "dog", "cat"], pt.enum(["cat", "dog"]))
>>> df
filename category
0 a.jpg cat
1 b.jpg dog
2 c.jpg cat
>>> df.schema
record(
fields={
'filename': string(),
'category': enum(
values=['cat', 'dog'],
),
},
)
Note that not all DataFrame can be modified. Only if the fields of the schema are from given
arguments, the DataFrame can be changed, like the above example. If the fields are defined in a
template, the DataFrame cannot be changed, and TypeError
will be raised:
>>> from graviti import DataFrame, Workspace
>>> import graviti.portex as pt
>>> std = pt.build_package("https://github.com/Project-OpenBytes/portex-standard", "main")
>>> box2ds = std.label.Box2D(
... categories=["boat", "car"],
... attributes={
... "difficult": pt.boolean(),
... "occluded": pt.boolean(),
... },
... )
>>> df = DataFrame(
... [
... {
... "xmin": 1,
... "ymin": 1,
... "xmax": 4,
... "ymax": 5,
... "category": "boat",
... "attribute": {
... "difficult": False,
... "occluded": False,
... },
... }
... ],
... schema=box2ds
... )
>>> df
xmin ymin xmax ymax category attribute
difficult occluded
0 1.0 1.0 4.0 5.0 boat False False
>>> df["caption"] = ["a"]
TypeError: Cannot set item 'caption' in ImmutableFields