graviti.dataframe.frame#

The implementation of the Graviti DataFrame.

Module Contents#

Classes#

DataFrame

Two-dimensional, size-mutable, potentially heterogeneous tabular data.

Attributes#

graviti.dataframe.frame.APPLY_KEY = apply_result[source]#
class graviti.dataframe.frame.DataFrame[source]#

Bases: graviti.dataframe.container.Container

Two-dimensional, size-mutable, potentially heterogeneous tabular data.

Parameters
  • data – The data that needs to be stored in DataFrame.

  • schema – The schema of the DataFrame. If None, will be inferred from data.

  • columns – Column labels to use for resulting frame when data does not have them, defaulting to RangeIndex(0, 1, 2, …, n). If data contains column labels, will perform column selection instead.

Examples

Constructing DataFrame from list.

>>> df = DataFrame(
...     [
...         {"filename": "a.jpg", "box2ds": {"x": 1, "y": 1}},
...         {"filename": "b.jpg", "box2ds": {"x": 2, "y": 2}},
...         {"filename": "c.jpg", "box2ds": {"x": 3, "y": 3}},
...     ]
... )
>>> df
    filename box2ds
             x      y
0   a.jpg    1      1
1   b.jpg    2      2
2   c.jpg    3      3
classmethod from_pyarrow(cls, array, schema=None)[source]#

Create DataFrame with pyarrow struct array.

Parameters
  • array (pyarrow.StructArray) – The input pyarrow struct array.

  • schema (Optional[graviti.portex.PortexRecordBase]) – The schema of the DataFrame.

  • cls (Type[_T]) –

Raises

TypeError – When the given schema is mismatched with the pyarrow array type.

Returns

The loaded DataFrame instance.

Return type

_T

property iloc(self)[source]#

Purely integer-location based indexing for selection by position.

Allowed inputs are:

  • An integer, e.g. 5.

  • A tuple, e.g. (5, "COLUMN_NAME")

Returns

The instance of the ILocIndexer.

Return type

graviti.dataframe.indexing.DataFrameILocIndexer

Examples

>>> df = DataFrame({"col1": [1, 2], "col2": [3, 4]})
>>> df.iloc[0]
col1    1
col2    3
Name: 0, dtype: int64
>>> df.iloc[0, "col1"]
1
property loc(self)[source]#

Access the row by indexes.

Allowed inputs are:

  • A single index, e.g. 5.

  • A tuple, e.g. (5, "COLUMN_NAME")

Returns

The instance of the LocIndexer.

Return type

graviti.dataframe.indexing.DataFrameLocIndexer

Examples

>>> df = DataFrame({"col1": [1, 2], "col2": [3, 4]})
>>> df.loc[0]
col1    1
col2    3
Name: 0, dtype: int64
>>> df.loc[0, "col1"]
1
property shape(self)[source]#

Return a tuple representing the dimensionality of the DataFrame.

Returns

Shape of the DataFrame.

Return type

Tuple[int, int]

Examples

>>> df = DataFrame(
...     [
...         {"filename": "a.jpg", "box2ds": {"x": 1, "y": 1}},
...         {"filename": "b.jpg", "box2ds": {"x": 2, "y": 2}},
...         {"filename": "c.jpg", "box2ds": {"x": 3, "y": 3}},
...     ]
... )
>>> df
    filename box2ds
             x      y
0   a.jpg    1      1
1   b.jpg    2      2
2   c.jpg    3      3
>>> df.shape
(3, 2)
head(self, n=5)[source]#

Return the first n rows.

Parameters
  • n (int) – Number of rows to select.

  • self (_T) –

Returns

The first n rows.

Return type

_T

Examples

>>> df = DataFrame(
...     [
...         {"animal": "alligator"},
...         {"animal": "bee"},
...         {"animal": "falcon"},
...         {"animal": "lion"},
...         {"animal": "monkey"},
...         {"animal": "parrot"},
...         {"animal": "shark"},
...         {"animal": "whale"},
...         {"animal": "zebra"},
...     ]
... )
>>> df
      animal
0  alligator
1        bee
2     falcon
3       lion
4     monkey
5     parrot
6      shark
7      whale
8      zebra

Viewing the first n lines (three in this case)

>>> df.head(3)
      animal
0  alligator
1        bee
2     falcon

For negative values of n

>>> df.head(-3)
      animal
0  alligator
1        bee
2     falcon
3       lion
4     monkey
5     parrot
tail(self, n=5)[source]#

Return the last n rows.

Parameters
  • n (int) – Number of rows to select.

  • self (_T) –

Returns

The last n rows.

Return type

_T

Examples

>>> df = DataFrame(
...     [
...         {"animal": "alligator"},
...         {"animal": "bee"},
...         {"animal": "falcon"},
...         {"animal": "lion"},
...         {"animal": "monkey"},
...         {"animal": "parrot"},
...         {"animal": "shark"},
...         {"animal": "whale"},
...         {"animal": "zebra"},
...     ]
... )
>>> df
      animal
0  alligator
1        bee
2     falcon
3       lion
4     monkey
5     parrot
6      shark
7      whale
8      zebra

Viewing the last 5 lines

>>> df.tail()
   animal
0  monkey
1  parrot
2   shark
3   whale
4   zebra

Viewing the last n lines (three in this case)

>>> df.tail(3)
  animal
0  shark
1  whale
2  zebra
extend(self, values)[source]#

Extend Sequence object or DataFrame to itself row by row.

Parameters

values (Union[Iterable[Dict[str, Any]], DataFrame]) – A sequence object or DataFrame.

Raises
  • TypeError – When the self is the member of another Dataframe.

  • TypeError – When the given Dataframe mismatched with the self schema.

Return type

None

Examples

>>> df = DataFrame([
...     {"filename": "a.jpg", "box2ds": {"x": 1, "y": 1}},
...     {"filename": "b.jpg", "box2ds": {"x": 2, "y": 2}},
... ])

Extended by another list.

>>> df.extend([{"filename": "c.jpg", "box2ds": {"x": 3, "y": 3}}])
>>> df
    filename box2ds
             x      y
0   a.jpg    1      1
1   b.jpg    2      2
2   c.jpg    3      3

Extended by another DataFrame.

>>> df2 = DataFrame([{"filename": "d.jpg", "box2ds": {"x": 4 "y": 4}}])
>>> df.extend(df2)
>>> df
    filename box2ds
             x      y
0   a.jpg    1      1
1   b.jpg    2      2
2   d.jpg    4      4
to_pylist(self)[source]#

Convert the DataFrame to a python list.

Returns

The python list representing the DataFrame.

Return type

List[Dict[str, Any]]

query(self, func)[source]#

Query the columns of a DataFrame with a lambda function.

Parameters

func (Callable[[Any], Any]) – The query function.

Returns

The query result DataFrame.

Raises

TypeError – When the DataFrame is not in a Commit.

Return type

DataFrame

Examples

>>> df = DataFrame([
...     {"filename": "a.jpg", "box2ds": {"x": 1, "y": 1}},
...     {"filename": "b.jpg", "box2ds": {"x": 2, "y": 2}},
... ])
>>> df.query(lambda x: x["filename"] == "a.jpg")
    filename box2ds
             x      y
0   a.jpg    1      1
apply(self, func)[source]#

Apply a function to the DataFrame row by row.

Parameters

func (Callable[[Any], Any]) – Function to apply to each row.

Returns

The apply result DataFrame or Series.

Raises

TypeError – When the DataFrame is not in a Commit.

Return type

graviti.dataframe.container.Container

Examples

>>> df = DataFrame([
...     {"filename": "a.jpg", "box2ds": {"x": 1, "y": 1}},
...     {"filename": "b.jpg", "box2ds": {"x": 2, "y": 2}},
... ])
>>> df.apply(lambda x: x["box2ds"]["x"] + 1)
    filename box2ds
             x      y
0   a.jpg    2      1
1   b.jpg    3      2