graviti.utility.paging
#
Paging list related class.
Module Contents#
Classes#
The offsets manager of the paging list. |
|
Page is an array wrapper and represents a page in paging list. |
|
SlicedPage is an array wrapper and represents a sliced page in paging list. |
|
LazyPage is a placeholder when the paging list page is not loaded yet. |
|
LazySlicedPage is a placeholder when the sliced paging list page is not loaded yet. |
|
PagingList is a list composed of multiple lists (pages). |
|
LazyFactory is a factory for creating paging lists. |
Attributes#
- class graviti.utility.paging.Offsets(total_count, limit)[source]#
The offsets manager of the paging list.
- Parameters
total_count (int) – The total count of the elements in the paging list.
limit (int) – The size of each page.
- update(self, start, stop, lengths)[source]#
Update the offsets when setting or deleting paging list items.
- Parameters
start (int) – The start index.
stop (int) – The stop index.
lengths (Iterable[int]) – The length of the set values.
- Return type
None
- get_coordinate(self, index)[source]#
Get the page coordinate of the elements.
- Parameters
index (int) – The index of the element in paging list.
- Returns
The page number and the index of the page.
- Return type
Tuple[int, int]
- class graviti.utility.paging.Page(array)[source]#
Page is an array wrapper and represents a page in paging list.
- Parameters
array (pyarrow.Array) – The pyarrow array.
- get_item(self, index)[source]#
Return the item at the given index.
- Parameters
index (int) – Position of the mutable sequence.
- Returns
The item at the given index.
- Return type
- get_slice(self, start=None, stop=None, step=None)[source]#
Return a sliced page according to the given start and stop index.
- Parameters
start (Optional[int]) – The start index.
stop (Optional[int]) – The stop index.
step (Optional[int]) – The slice step.
- Returns
A sliced page according to the given start and stop index.
- Return type
- class graviti.utility.paging.SlicedPage(ranging, array)[source]#
Bases:
Page
SlicedPage is an array wrapper and represents a sliced page in paging list.
- Parameters
ranging (range) – The range instance of this page.
array (pyarrow.Array) – The pyarrow array.
- class graviti.utility.paging.LazyPage(ranging, pos, array_getter)[source]#
Bases:
Page
LazyPage is a placeholder when the paging list page is not loaded yet.
- Parameters
pos (int) – The page number.
ranging (range) – The range instance of this page.
parent – The parent paging list.
array_getter (Callable[[int], pyarrow.Array]) –
- get_slice(self, start=None, stop=None, step=None)[source]#
Return a lazy sliced page according to the given start and stop index.
- Parameters
start (Optional[int]) – The start index.
stop (Optional[int]) – The stop index.
step (Optional[int]) – The slice step.
- Returns
A sliced page according to the given start and stop index.
- Return type
- class graviti.utility.paging.LazySlicedPage(ranging, pos, array_getter)[source]#
Bases:
LazyPage
LazySlicedPage is a placeholder when the sliced paging list page is not loaded yet.
- Parameters
ranging (range) –
pos (int) –
array_getter (Callable[[int], pyarrow.Array]) –
- class graviti.utility.paging.PagingList(array)[source]#
PagingList is a list composed of multiple lists (pages).
- Parameters
array (pyarrow.Array) – The input pyarrow array.
- classmethod from_factory(cls, factory, keys, patype)[source]#
Create PagingList from LazyFactory.
- Parameters
factory (LazyFactory) – The parent
LazyFactory
instance.keys (Tuple[str, Ellipsis]) – The keys to access the array from factory.
patype (pyarrow.DataType) – The pyarrow DataType of the elements in the list.
cls (Type[_P]) –
- Returns
The PagingList instance created from given factory.
- Return type
_P
- set_item(self, index, value)[source]#
Update the element value in PagingList at the given index.
- Parameters
index (int) – The element index.
value (Any) – The value needs to be set into the PagingList.
- Return type
None
- set_slice(self, index, values)[source]#
Update the element values in PagingList at the given slice with another PagingList.
- Parameters
index (slice) – The element slice.
values (PagingList) – The PagingList which contains the elements to be set.
- Raises
ArrowTypeError – When two pyarrow types mismatch.
ValueError – When the input size mismatches with the slice size (when step != 1).
- Return type
None
- set_slice_iterable(self, index, values)[source]#
Update the element values in PagingList at the given slice with iterable object.
- Parameters
index (slice) – The element slice.
values (Iterable[Any]) – The iterable object which contains the elements to be set.
- Raises
ValueError – When the assign input size mismatches with the slice size (when step != 1).
- Return type
None
- extend(self, values)[source]#
Extend PagingList by appending elements from another PagingList.
- Parameters
values (PagingList) – The PagingList which contains the elements to be extended.
- Raises
ArrowTypeError – When two pyarrow types mismatch.
- Return type
None
- extend_iterable(self, values)[source]#
Extend PagingList by appending elements from the iterable.
- Parameters
values (Iterable[Any]) – Elements to be extended into the PagingList.
- Return type
None
- class graviti.utility.paging.LazyFactory(total_count, limit, getter, patype)[source]#
LazyFactory is a factory for creating paging lists.
- Parameters
total_count (int) – The total count of the elements in the paging lists.
limit (int) – The size of each lazy load page.
getter (Callable[[int, int], Any]) – A callable object to get the source data.
patype (pyarrow.DataType) – The pyarrow DataType of the data in the factory.
Examples
>>> import pyarrow as pa >>> patype = pa.struct( ... { ... "remotePath": pa.string(), ... "label": pa.struct({"CLASSIFICATION": pa.struct({"category": pa.string()})}), ... } ... ) >>> TOTAL_COUNT = 1000 >>> def getter(offset: int, limit: int) -> List[Dict[str, Any]]: ... stop = min(offset + limit, TOTAL_COUNT) ... return [ ... { ... "remotePath": f"{i:06}.jpg", ... "label": {"CLASSIFICATION": {"category": "cat" if i % 2 else "dog"}}, ... } ... for i in range(offset, stop) ... ] ... >>> factory = LazyFactory(TOTAL_COUNT, 128, getter, patype) >>> paths = factory.create_list(("remotePath",)) >>> categories = factory.create_list(("label", "CLASSIFICATION", "category")) >>> len(paths) 1000 >>> list(paths) [<pyarrow.StringScalar: '000000.jpg'>, <pyarrow.StringScalar: '000001.jpg'>, <pyarrow.StringScalar: '000002.jpg'>, <pyarrow.StringScalar: '000003.jpg'>, <pyarrow.StringScalar: '000004.jpg'>, <pyarrow.StringScalar: '000005.jpg'>, ... ... <pyarrow.StringScalar: '000999.jpg'>] >>> len(categories) 1000 >>> list(categories) [<pyarrow.StringScalar: 'dog'>, <pyarrow.StringScalar: 'cat'>, <pyarrow.StringScalar: 'dog'>, <pyarrow.StringScalar: 'cat'>, <pyarrow.StringScalar: 'dog'>, ... ... <pyarrow.StringScalar: 'cat'>]
- get_array(self, pos, keys)[source]#
Get the array from the factory.
- Parameters
pos (int) – The page number.
keys (Tuple[str, Ellipsis]) – The keys to access the array from factory.
- Returns
The requested pyarrow array.
- Return type
pyarrow.Array
- create_list(self, keys)[source]#
Create a paging list from the factory.
- Parameters
keys (Tuple[str, Ellipsis]) – The keys to access the array from factory.
- Returns
A paging list created by the given keys.
- Return type