Streaming sources
Module for defining data sources for streaming and batch data.
This module provides data source implementations, such as EventHub, StreamingTable, and Table, to facilitate users usage on streaming.
EventHubSource
dataclass
Bases: DataSource
Represents a data source for consuming messages from Azure Event Hub.
Attributes:
| Name | Type | Description |
|---|---|---|
topic |
str
|
The name of the Event Hub topic to subscribe to. |
namespace |
str
|
The Event Hub namespace. |
connection_string |
str
|
The connection string for authenticating with Event Hub. |
context |
Context
|
The Spark context for managing the Spark session. |
starting_offsets |
str
|
The starting offsets for consuming messages. Defaults to "latest". |
max_offsets_per_trigger |
int
|
The maximum number of offsets to process per trigger. Defaults to 72000. |
schema |
Optional[StructType]
|
The schema of the data, if available. If the schema is not set, the target able will be created using the raw schema from EventHub. In both cases, a column named 'value' containing the raw data from topic will be added to the target table. This parameter defaults to None. |
Source code in blipdataforge/streaming/sources.py
server
property
Returns the Event Hub server address.
load()
Loads data from the Event Hub as a streaming DataFrame.
Returns:
| Type | Description |
|---|---|
DataFrame
|
A DataFrame representing the data stream. |
Source code in blipdataforge/streaming/sources.py
StreamingTableSource
dataclass
Bases: DataSource
Represents a streaming data source from a Delta Live Table.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
The name of the table. |
database |
str
|
The database where the table resides. Defaults to an empty string. |
catalog |
str
|
The catalog where the table resides. Defaults to an empty string. |
Source code in blipdataforge/streaming/sources.py
load()
Loads data from the streaming table as a DataFrame.
Returns:
| Type | Description |
|---|---|
DataFrame
|
A DataFrame representing the streaming data. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the catalog is specified but the database is not. |
Source code in blipdataforge/streaming/sources.py
TableSource
dataclass
Bases: DataSource
Represents a batch data source from a Delta table.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
The name of the table. |
database |
str
|
The database where the table resides. |
catalog |
str
|
The catalog where the table resides. |
Source code in blipdataforge/streaming/sources.py
load()
Loads data from the table as a DataFrame.
Returns:
| Type | Description |
|---|---|
DataFrame
|
A DataFrame representing the table data. |