`graviti.dataframe.frame`#

The implementation of the Graviti DataFrame.

Module Contents#

Classes#

DataFrame

Two-dimensional, size-mutable, potentially heterogeneous tabular data.

Attributes#

`pd`
`APPLY_KEY`

graviti.dataframe.frame.pd[source]#

graviti.dataframe.frame.APPLY_KEY = apply_result[source]#

class graviti.dataframe.frame.DataFrame[source]#

Bases: graviti.dataframe.container.Container

Two-dimensional, size-mutable, potentially heterogeneous tabular data.

Parameters

data – The data that needs to be stored in DataFrame.
schema – The schema of the DataFrame. If None, will be inferred from data.
columns – Column labels to use for resulting frame when data does not have them, defaulting to RangeIndex(0, 1, 2, …, n). If data contains column labels, will perform column selection instead.

Examples

Constructing DataFrame from list.

>>> df = DataFrame(
...     [
...         {"filename": "a.jpg", "box2ds": {"x": 1, "y": 1}},
...         {"filename": "b.jpg", "box2ds": {"x": 2, "y": 2}},
...         {"filename": "c.jpg", "box2ds": {"x": 3, "y": 3}},
...     ]
... )
>>> df
    filename box2ds
             x      y
0   a.jpg    1      1
1   b.jpg    2      2
2   c.jpg    3      3

classmethod from_pyarrow(cls, array)[source]#

Create DataFrame with pyarrow struct array.

Parameters

array (pyarrow.StructArray) – The input pyarrow struct array.
cls (Type[_T]) –

Returns

The loaded DataFrame instance.

Return type

classmethod from_pandas(cls, df)[source]#

Create DataFrame with pandas DataFrame.

Parameters

df (pandas.DataFrame) – The input pandas DataFrame.
cls (Type[_T]) –

Raises

NotImplementedError – When the column index of input DataFrame is MultiIndex.

Returns

The loaded DataFrame instance.

Return type

property iloc(self)[source]#

Purely integer-location based indexing for selection by position.

Allowed inputs are:

An integer, e.g. 5.
A tuple, e.g. (5, "COLUMN_NAME")

Returns: The instance of the ILocIndexer.
Return type: graviti.dataframe.indexing.DataFrameILocIndexer

Examples

>>> df = DataFrame({"col1": [1, 2], "col2": [3, 4]})
>>> df.iloc[0]
col1    1
col2    3
Name: 0, dtype: int64
>>> df.iloc[0, "col1"]
1

property loc(self)[source]#

Access the row by indexes.

Allowed inputs are:

A single index, e.g. 5.
A tuple, e.g. (5, "COLUMN_NAME")

Returns: The instance of the LocIndexer.
Return type: graviti.dataframe.indexing.DataFrameLocIndexer

Examples

>>> df = DataFrame({"col1": [1, 2], "col2": [3, 4]})
>>> df.loc[0]
col1    1
col2    3
Name: 0, dtype: int64
>>> df.loc[0, "col1"]
1

property shape(self)[source]#

Return a tuple representing the dimensionality of the DataFrame.

Returns: Shape of the DataFrame.
Return type: Tuple[int, int]

Examples

>>> df = DataFrame(
...     [
...         {"filename": "a.jpg", "box2ds": {"x": 1, "y": 1}},
...         {"filename": "b.jpg", "box2ds": {"x": 2, "y": 2}},
...         {"filename": "c.jpg", "box2ds": {"x": 3, "y": 3}},
...     ]
... )
>>> df
    filename box2ds
             x      y
0   a.jpg    1      1
1   b.jpg    2      2
2   c.jpg    3      3
>>> df.shape
(3, 2)

property size(self)[source]#

Return an int representing the number of elements in this object.

Returns: Size of the DataFrame.
Return type: int

Examples

>>> df = DataFrame({"col1": [1, 2], "col2": [3, 4]})
>>> df.size
4

keys(self)[source]#

Return a iterator of the column names in DataFrame.

Returns: The column name iterator.
Return type: Iterator[str]

items(self)[source]#

Return a iterator of the column names and the columns in DataFrame.

Yields: The column name and the column.
Return type: Iterator[Tuple[str, graviti.dataframe.container.Container]]

head(self, n=5)[source]#

Return the first n rows.

Parameters

n (int) – Number of rows to select.
self (_T) –

Returns

The first n rows.

Return type

Examples

>>> df = DataFrame(
...     [
...         {"animal": "alligator"},
...         {"animal": "bee"},
...         {"animal": "falcon"},
...         {"animal": "lion"},
...         {"animal": "monkey"},
...         {"animal": "parrot"},
...         {"animal": "shark"},
...         {"animal": "whale"},
...         {"animal": "zebra"},
...     ]
... )
>>> df
      animal
0  alligator
1        bee
2     falcon
3       lion
4     monkey
5     parrot
6      shark
7      whale
8      zebra

Viewing the first n lines (three in this case)

>>> df.head(3)
      animal
0  alligator
1        bee
2     falcon

For negative values of n

>>> df.head(-3)
      animal
alligator
      bee
   falcon
     lion
   monkey
   parrot

tail(self, n=5)[source]#

Return the last n rows.

Parameters

n (int) – Number of rows to select.
self (_T) –

Returns

The last n rows.

Return type

Examples

>>> df = DataFrame(
...     [
...         {"animal": "alligator"},
...         {"animal": "bee"},
...         {"animal": "falcon"},
...         {"animal": "lion"},
...         {"animal": "monkey"},
...         {"animal": "parrot"},
...         {"animal": "shark"},
...         {"animal": "whale"},
...         {"animal": "zebra"},
...     ]
... )
>>> df
      animal
0  alligator
1        bee
2     falcon
3       lion
4     monkey
5     parrot
6      shark
7      whale
8      zebra

Viewing the last 5 lines

>>> df.tail()
   animal
monkey
parrot
 shark
 whale
 zebra

Viewing the last n lines (three in this case)

>>> df.tail(3)
  animal
0  shark
1  whale
2  zebra

extend(self, values)[source]#

Extend Sequence object or DataFrame to itself row by row.

Parameters

values (Union[Iterable[Dict[str, Any]], DataFrame]) – A sequence object or DataFrame.

Raises

TypeError – When the self is the member of another Dataframe.
TypeError – When the given Dataframe mismatched with the self schema.

Return type

None

Examples

>>> df = DataFrame([
...     {"filename": "a.jpg", "box2ds": {"x": 1, "y": 1}},
...     {"filename": "b.jpg", "box2ds": {"x": 2, "y": 2}},
... ])

Extended by another list.

>>> df.extend([{"filename": "c.jpg", "box2ds": {"x": 3, "y": 3}}])
>>> df
    filename box2ds
             x      y
0   a.jpg    1      1
1   b.jpg    2      2
2   c.jpg    3      3

Extended by another DataFrame.

>>> df2 = DataFrame([{"filename": "d.jpg", "box2ds": {"x": 4 "y": 4}}])
>>> df.extend(df2)
>>> df
    filename box2ds
             x      y
0   a.jpg    1      1
1   b.jpg    2      2
2   d.jpg    4      4

to_pylist(self, *, _to_backend=False)[source]#

Convert the DataFrame to a python list.

Returns: The python list representing the DataFrame.
Parameters: _to_backend (bool) –
Return type: List[Dict[str, Any]]

to_pandas(self)[source]#

Convert the graviti DataFrame to a pandas DataFrame.

Returns: The converted pandas DataFrame.
Return type: pandas.DataFrame

query(self, func)[source]#

Query the columns of a DataFrame with a lambda function.

Parameters: func (Callable[[Any], Any]) – The query function.
Returns: The query result DataFrame.
Raises: TypeError – When the DataFrame is not in a Commit.
Return type: DataFrame

Examples

>>> df = DataFrame([
...     {"filename": "a.jpg", "box2ds": {"x": 1, "y": 1}},
...     {"filename": "b.jpg", "box2ds": {"x": 2, "y": 2}},
... ])
>>> df.query(lambda x: x["filename"] == "a.jpg")
    filename box2ds
             x      y
0   a.jpg    1      1

apply(self, func)[source]#

Apply a function to the DataFrame row by row.

Parameters: func (Callable[[Any], Any]) – Function to apply to each row.
Returns: The apply result DataFrame or Series.
Raises: TypeError – When the DataFrame is not in a Commit.
Return type: graviti.dataframe.container.Container

Examples

>>> df = DataFrame([
...     {"filename": "a.jpg", "box2ds": {"x": 1, "y": 1}},
...     {"filename": "b.jpg", "box2ds": {"x": 2, "y": 2}},
... ])
>>> df.apply(lambda x: x["box2ds"]["x"] + 1)
    filename box2ds
             x      y
0   a.jpg    2      1
1   b.jpg    3      2

graviti.dataframe.frame#

Module Contents#

Classes#

Attributes#

`graviti.dataframe.frame`#