Skip to content

Data lake series

DataLakeSeries

Bases: Resource

Implementation of a resource for data lake series. This resource defines the data model used by its resource container(model.container.DataLakeMeasures). It inherits from Pydantic's BaseModel to get all its superpowers, which are used to parse, validate the API response and to easily switch between the Python representation (both serialized and deserialized) and Java representation (serialized only).

Notes

This class will only exist temporarily it its current appearance since
there are some inconsistencies in the StreamPipes API.

convert_to_pandas_representation()

Returns the dictionary representation of a data lake series to be used when creating a pandas Dataframe.

It contains only the "header rows" (the column names) and "rows" that contain the actual data.

RETURNS DESCRIPTION
pandas_repr

Dictionary with the keys headers and rows

TYPE: dict[str, Any]

from_json(json_string) classmethod

Creates an instance of DataLakeSeries from a given JSON string.

This method is used by the resource container to parse the JSON response of the StreamPipes API. Currently, it only supports data lake series that consist of exactly one series of data.

PARAMETER DESCRIPTION
json_string

The JSON string the data lake series should be created on.

TYPE: str

RETURNS DESCRIPTION
DataLakeSeries

Instance of DataLakeSeries that is created based on the given JSON string.

RAISES DESCRIPTION
StreamPipesUnsupportedDataLakeSeries

If the data lake series returned by the StreamPipes API cannot be parsed with the current version of the Python client.

to_pandas()

Returns the data lake series in representation of a Pandas Dataframe.

RETURNS DESCRIPTION
pd

The data lake series in form of a pandas dataframe

TYPE: pd.DataFrame

StreamPipesUnsupportedDataLakeSeries()

Bases: Exception

Exception to be raised when the returned data lake series cannot be parsed with the current implementation of the resource.