graviti.dataframe.frame
#
The implementation of the Graviti DataFrame.
Module Contents#
Classes#
Two-dimensional, size-mutable, potentially heterogeneous tabular data. |
Attributes#
- class graviti.dataframe.frame.DataFrame[source]#
Bases:
graviti.dataframe.container.Container
Two-dimensional, size-mutable, potentially heterogeneous tabular data.
- Parameters
data – The data that needs to be stored in DataFrame.
schema – The schema of the DataFrame. If None, will be inferred from data.
columns – Column labels to use for resulting frame when data does not have them, defaulting to RangeIndex(0, 1, 2, …, n). If data contains column labels, will perform column selection instead.
Examples
Constructing DataFrame from list.
>>> df = DataFrame( ... [ ... {"filename": "a.jpg", "box2ds": {"x": 1, "y": 1}}, ... {"filename": "b.jpg", "box2ds": {"x": 2, "y": 2}}, ... {"filename": "c.jpg", "box2ds": {"x": 3, "y": 3}}, ... ] ... ) >>> df filename box2ds x y 0 a.jpg 1 1 1 b.jpg 2 2 2 c.jpg 3 3
- classmethod from_pyarrow(cls, array)[source]#
Create DataFrame with pyarrow struct array.
- Parameters
array (pyarrow.StructArray) – The input pyarrow struct array.
cls (Type[_T]) –
- Returns
The loaded
DataFrame
instance.- Return type
_T
- classmethod from_pandas(cls, df)[source]#
Create DataFrame with pandas DataFrame.
- Parameters
df (pandas.DataFrame) – The input pandas DataFrame.
cls (Type[_T]) –
- Raises
NotImplementedError – When the column index of input DataFrame is MultiIndex.
- Returns
The loaded
DataFrame
instance.- Return type
_T
- property iloc(self)[source]#
Purely integer-location based indexing for selection by position.
Allowed inputs are:
An integer, e.g.
5
.A tuple, e.g.
(5, "COLUMN_NAME")
- Returns
The instance of the ILocIndexer.
- Return type
Examples
>>> df = DataFrame({"col1": [1, 2], "col2": [3, 4]}) >>> df.iloc[0] col1 1 col2 3 Name: 0, dtype: int64 >>> df.iloc[0, "col1"] 1
- property loc(self)[source]#
Access the row by indexes.
Allowed inputs are:
A single index, e.g.
5
.A tuple, e.g.
(5, "COLUMN_NAME")
- Returns
The instance of the LocIndexer.
- Return type
Examples
>>> df = DataFrame({"col1": [1, 2], "col2": [3, 4]}) >>> df.loc[0] col1 1 col2 3 Name: 0, dtype: int64 >>> df.loc[0, "col1"] 1
- property shape(self)[source]#
Return a tuple representing the dimensionality of the DataFrame.
- Returns
Shape of the DataFrame.
- Return type
Tuple[int, int]
Examples
>>> df = DataFrame( ... [ ... {"filename": "a.jpg", "box2ds": {"x": 1, "y": 1}}, ... {"filename": "b.jpg", "box2ds": {"x": 2, "y": 2}}, ... {"filename": "c.jpg", "box2ds": {"x": 3, "y": 3}}, ... ] ... ) >>> df filename box2ds x y 0 a.jpg 1 1 1 b.jpg 2 2 2 c.jpg 3 3 >>> df.shape (3, 2)
- property size(self)[source]#
Return an int representing the number of elements in this object.
- Returns
Size of the DataFrame.
- Return type
int
Examples
>>> df = DataFrame({"col1": [1, 2], "col2": [3, 4]}) >>> df.size 4
- keys(self)[source]#
Return a iterator of the column names in DataFrame.
- Returns
The column name iterator.
- Return type
Iterator[str]
- items(self)[source]#
Return a iterator of the column names and the columns in DataFrame.
- Yields
The column name and the column.
- Return type
Iterator[Tuple[str, graviti.dataframe.container.Container]]
- head(self, n=5)[source]#
Return the first n rows.
- Parameters
n (int) – Number of rows to select.
self (_T) –
- Returns
The first n rows.
- Return type
_T
Examples
>>> df = DataFrame( ... [ ... {"animal": "alligator"}, ... {"animal": "bee"}, ... {"animal": "falcon"}, ... {"animal": "lion"}, ... {"animal": "monkey"}, ... {"animal": "parrot"}, ... {"animal": "shark"}, ... {"animal": "whale"}, ... {"animal": "zebra"}, ... ] ... ) >>> df animal 0 alligator 1 bee 2 falcon 3 lion 4 monkey 5 parrot 6 shark 7 whale 8 zebra
Viewing the first n lines (three in this case)
>>> df.head(3) animal 0 alligator 1 bee 2 falcon
For negative values of n
>>> df.head(-3) animal 0 alligator 1 bee 2 falcon 3 lion 4 monkey 5 parrot
- tail(self, n=5)[source]#
Return the last n rows.
- Parameters
n (int) – Number of rows to select.
self (_T) –
- Returns
The last n rows.
- Return type
_T
Examples
>>> df = DataFrame( ... [ ... {"animal": "alligator"}, ... {"animal": "bee"}, ... {"animal": "falcon"}, ... {"animal": "lion"}, ... {"animal": "monkey"}, ... {"animal": "parrot"}, ... {"animal": "shark"}, ... {"animal": "whale"}, ... {"animal": "zebra"}, ... ] ... ) >>> df animal 0 alligator 1 bee 2 falcon 3 lion 4 monkey 5 parrot 6 shark 7 whale 8 zebra
Viewing the last 5 lines
>>> df.tail() animal 0 monkey 1 parrot 2 shark 3 whale 4 zebra
Viewing the last n lines (three in this case)
>>> df.tail(3) animal 0 shark 1 whale 2 zebra
- extend(self, values)[source]#
Extend Sequence object or DataFrame to itself row by row.
- Parameters
values (Union[Iterable[Dict[str, Any]], DataFrame]) – A sequence object or DataFrame.
- Raises
TypeError – When the self is the member of another Dataframe.
TypeError – When the given Dataframe mismatched with the self schema.
- Return type
None
Examples
>>> df = DataFrame([ ... {"filename": "a.jpg", "box2ds": {"x": 1, "y": 1}}, ... {"filename": "b.jpg", "box2ds": {"x": 2, "y": 2}}, ... ])
Extended by another list.
>>> df.extend([{"filename": "c.jpg", "box2ds": {"x": 3, "y": 3}}]) >>> df filename box2ds x y 0 a.jpg 1 1 1 b.jpg 2 2 2 c.jpg 3 3
Extended by another DataFrame.
>>> df2 = DataFrame([{"filename": "d.jpg", "box2ds": {"x": 4 "y": 4}}]) >>> df.extend(df2) >>> df filename box2ds x y 0 a.jpg 1 1 1 b.jpg 2 2 2 d.jpg 4 4
- to_pylist(self, *, _to_backend=False)[source]#
Convert the DataFrame to a python list.
- Returns
The python list representing the DataFrame.
- Parameters
_to_backend (bool) –
- Return type
List[Dict[str, Any]]
- to_pandas(self)[source]#
Convert the graviti DataFrame to a pandas DataFrame.
- Returns
The converted pandas DataFrame.
- Return type
pandas.DataFrame
- query(self, func)[source]#
Query the columns of a DataFrame with a lambda function.
- Parameters
- Returns
The query result DataFrame.
- Raises
TypeError – When the DataFrame is not in a Commit.
- Return type
Examples
>>> df = DataFrame([ ... {"filename": "a.jpg", "box2ds": {"x": 1, "y": 1}}, ... {"filename": "b.jpg", "box2ds": {"x": 2, "y": 2}}, ... ]) >>> df.query(lambda x: x["filename"] == "a.jpg") filename box2ds x y 0 a.jpg 1 1
- apply(self, func)[source]#
Apply a function to the DataFrame row by row.
- Parameters
func (Callable[[Any], Any]) – Function to apply to each row.
- Returns
The apply result DataFrame or Series.
- Raises
TypeError – When the DataFrame is not in a Commit.
- Return type
Examples
>>> df = DataFrame([ ... {"filename": "a.jpg", "box2ds": {"x": 1, "y": 1}}, ... {"filename": "b.jpg", "box2ds": {"x": 2, "y": 2}}, ... ]) >>> df.apply(lambda x: x["box2ds"]["x"] + 1) filename box2ds x y 0 a.jpg 2 1 1 b.jpg 3 2