graviti.utility.paging#

Paging list related class.

Module Contents#

Classes#

Offsets

The offsets manager of the paging list.

Page

Page is an array wrapper and represents a page in paging list.

SlicedPage

SlicedPage is an array wrapper and represents a sliced page in paging list.

LazyPage

LazyPage is a placeholder when the paging list page is not loaded yet.

LazySlicedPage

LazySlicedPage is a placeholder when the sliced paging list page is not loaded yet.

PagingList

PagingList is a list composed of multiple lists (pages).

LazyFactory

LazyFactory is a factory for creating paging lists.

Attributes#

class graviti.utility.paging.Offsets(total_count, limit)[source]#

The offsets manager of the paging list.

Parameters
  • total_count (int) – The total count of the elements in the paging list.

  • limit (int) – The size of each page.

update(self, start, stop, lengths)[source]#

Update the offsets when setting or deleting paging list items.

Parameters
  • start (int) – The start index.

  • stop (int) – The stop index.

  • lengths (Iterable[int]) – The length of the set values.

Return type

None

get_coordinate(self, index)[source]#

Get the page coordinate of the elements.

Parameters

index (int) – The index of the element in paging list.

Returns

The page number and the index of the page.

Return type

Tuple[int, int]

extend(self, lengths)[source]#

Update the offsets when extending the paging list.

Parameters

lengths (Iterable[int]) – The lengths of the extended pages.

Return type

None

copy(self)[source]#

Return a copy of the Offsets.

Returns

A copy of the Offsets.

Parameters

self (_O) –

Return type

_O

class graviti.utility.paging.Page(array)[source]#

Page is an array wrapper and represents a page in paging list.

Parameters

array (pyarrow.Array) – The pyarrow array.

get_item(self, index)[source]#

Return the item at the given index.

Parameters

index (int) – Position of the mutable sequence.

Returns

The item at the given index.

Return type

Any

get_slice(self, start=None, stop=None, step=None)[source]#

Return a sliced page according to the given start and stop index.

Parameters
  • start (Optional[int]) – The start index.

  • stop (Optional[int]) – The stop index.

  • step (Optional[int]) – The slice step.

Returns

A sliced page according to the given start and stop index.

Return type

Page

get_array(self)[source]#

Get the array inside the page.

Returns

The array inside the page.

Return type

pyarrow.array

class graviti.utility.paging.SlicedPage(ranging, array)[source]#

Bases: Page

SlicedPage is an array wrapper and represents a sliced page in paging list.

Parameters
  • ranging (range) – The range instance of this page.

  • array (pyarrow.Array) – The pyarrow array.

get_array(self)[source]#

Get the array inside the page.

Returns

The array inside the page.

Return type

pyarrow.array

class graviti.utility.paging.LazyPage(ranging, pos, array_getter)[source]#

Bases: Page

LazyPage is a placeholder when the paging list page is not loaded yet.

Parameters
  • pos (int) – The page number.

  • ranging (range) – The range instance of this page.

  • parent – The parent paging list.

  • array_getter (Callable[[int], pyarrow.Array]) –

get_slice(self, start=None, stop=None, step=None)[source]#

Return a lazy sliced page according to the given start and stop index.

Parameters
  • start (Optional[int]) – The start index.

  • stop (Optional[int]) – The stop index.

  • step (Optional[int]) – The slice step.

Returns

A sliced page according to the given start and stop index.

Return type

LazySlicedPage

get_array(self)[source]#

Get the array inside the page.

Returns

The array inside the page.

Return type

pyarrow.Array

class graviti.utility.paging.LazySlicedPage(ranging, pos, array_getter)[source]#

Bases: LazyPage

LazySlicedPage is a placeholder when the sliced paging list page is not loaded yet.

Parameters
  • ranging (range) –

  • pos (int) –

  • array_getter (Callable[[int], pyarrow.Array]) –

get_array(self)[source]#

Get the array inside the page.

Returns

The array inside the page.

Return type

pyarrow.Array

class graviti.utility.paging.PagingList(array)[source]#

PagingList is a list composed of multiple lists (pages).

Parameters

array (pyarrow.Array) – The input pyarrow array.

classmethod from_factory(cls, factory, keys, patype)[source]#

Create PagingList from LazyFactory.

Parameters
  • factory (LazyFactory) – The parent LazyFactory instance.

  • keys (Tuple[str, Ellipsis]) – The keys to access the array from factory.

  • patype (pyarrow.DataType) – The pyarrow DataType of the elements in the list.

  • cls (Type[_P]) –

Returns

The PagingList instance created from given factory.

Return type

_P

set_item(self, index, value)[source]#

Update the element value in PagingList at the given index.

Parameters
  • index (int) – The element index.

  • value (Any) – The value needs to be set into the PagingList.

Return type

None

set_slice(self, index, values)[source]#

Update the element values in PagingList at the given slice with another PagingList.

Parameters
  • index (slice) – The element slice.

  • values (PagingList) – The PagingList which contains the elements to be set.

Raises
  • ArrowTypeError – When two pyarrow types mismatch.

  • ValueError – When the input size mismatches with the slice size (when step != 1).

Return type

None

set_slice_iterable(self, index, values)[source]#

Update the element values in PagingList at the given slice with iterable object.

Parameters
  • index (slice) – The element slice.

  • values (Iterable[Any]) – The iterable object which contains the elements to be set.

Raises

ValueError – When the assign input size mismatches with the slice size (when step != 1).

Return type

None

extend(self, values)[source]#

Extend PagingList by appending elements from another PagingList.

Parameters

values (PagingList) – The PagingList which contains the elements to be extended.

Raises

ArrowTypeError – When two pyarrow types mismatch.

Return type

None

extend_iterable(self, values)[source]#

Extend PagingList by appending elements from the iterable.

Parameters

values (Iterable[Any]) – Elements to be extended into the PagingList.

Return type

None

copy(self)[source]#

Return a copy of the paging list.

Returns

A copy of the paging list.

Parameters

self (_P) –

Return type

_P

to_pyarrow(self)[source]#

Convert the paging list to pyarrow ChunkedArray.

Returns

The pyarrow ChunkedArray.

Return type

pyarrow.ChunkedArray

graviti.utility.paging.PagingLists[source]#
class graviti.utility.paging.LazyFactory(total_count, limit, getter, patype)[source]#

LazyFactory is a factory for creating paging lists.

Parameters
  • total_count (int) – The total count of the elements in the paging lists.

  • limit (int) – The size of each lazy load page.

  • getter (Callable[[int, int], Any]) – A callable object to get the source data.

  • patype (pyarrow.DataType) – The pyarrow DataType of the data in the factory.

Examples

>>> import pyarrow as pa
>>> patype = pa.struct(
...     {
...         "remotePath": pa.string(),
...         "label": pa.struct({"CLASSIFICATION": pa.struct({"category": pa.string()})}),
...     }
... )
>>> TOTAL_COUNT = 1000
>>> def getter(offset: int, limit: int) -> List[Dict[str, Any]]:
...     stop = min(offset + limit, TOTAL_COUNT)
...     return [
...         {
...             "remotePath": f"{i:06}.jpg",
...             "label": {"CLASSIFICATION": {"category": "cat" if i % 2 else "dog"}},
...         }
...         for i in range(offset, stop)
...     ]
...
>>> factory = LazyFactory(TOTAL_COUNT, 128, getter, patype)
>>> paths = factory.create_list(("remotePath",))
>>> categories = factory.create_list(("label", "CLASSIFICATION", "category"))
>>> len(paths)
1000
>>> list(paths)
[<pyarrow.StringScalar: '000000.jpg'>,
 <pyarrow.StringScalar: '000001.jpg'>,
 <pyarrow.StringScalar: '000002.jpg'>,
 <pyarrow.StringScalar: '000003.jpg'>,
 <pyarrow.StringScalar: '000004.jpg'>,
 <pyarrow.StringScalar: '000005.jpg'>,
 ...
 ...
 <pyarrow.StringScalar: '000999.jpg'>]
>>> len(categories)
1000
>>> list(categories)
[<pyarrow.StringScalar: 'dog'>,
 <pyarrow.StringScalar: 'cat'>,
 <pyarrow.StringScalar: 'dog'>,
 <pyarrow.StringScalar: 'cat'>,
 <pyarrow.StringScalar: 'dog'>,
 ...
 ...
 <pyarrow.StringScalar: 'cat'>]
get_array(self, pos, keys)[source]#

Get the array from the factory.

Parameters
  • pos (int) – The page number.

  • keys (Tuple[str, Ellipsis]) – The keys to access the array from factory.

Returns

The requested pyarrow array.

Return type

pyarrow.Array

create_list(self, keys)[source]#

Create a paging list from the factory.

Parameters

keys (Tuple[str, Ellipsis]) – The keys to access the array from factory.

Returns

A paging list created by the given keys.

Return type

PagingList

create_lists(self, keys)[source]#

Create a dict of PagingList from the given keys.

Parameters

keys (List[Tuple[str, Ellipsis]]) – A list of keys to create the paging lists.

Returns

The created paging lists.

Return type

PagingLists

get_page_ranges(self)[source]#

A Generator which generates the range of the pages in the factory.

Yields

The page ranges.

Return type

Iterator[range]