Share files with your client
You can use the DataPlatform.share_files() function to share files with your
client via email. This function allows you send an email to your client with
a list of links to download these files.
You just provide the email address of the recipient, and a list of paths to the files you want to send, and the function will do the rest: creating a download link for each file, adding it to the body of the email, and sending the email to the recipient.
Only download links are supported!
Some people might ask you to attach the files to the email as real attachments, like when
you attach a PDF file to the email, etc. But the DataPlatform.share_files() function
doesn't attach any file to the email, it actually add download links to the body of
the email, so that your client can download the files that he wants via these download links.
This is a much more secure way of sharing the files.
Attaching files with personal/private/secret data to an email is prohibited, not only because the Information Security department of Blip doesn't allow it, but also, because this practice goes against the brazilian law (more specifically, the LGPD - Lei Geral de Proteção de Dados Pessoais).
Creating the files
To demonstrate the use of the DataPlatform.share_files() function, we need first to create
some files. The source code below creates a sample Spark DataFrame, and then, it tries to
write the data from this DataFrame into a CSV file.
from pyspark.sql import SparkSession
from datetime import date
from pyspark.sql import Row
spark = SparkSession.builder.getOrCreate()
data = [
Row(id = 1, value = 28.3, date = date(2021,1,1)),
Row(id = 2, value = 15.8, date = date(2021,1,1)),
Row(id = 3, value = 20.1, date = date(2021,1,2)),
Row(id = 4, value = 12.6, date = date(2021,1,3))
]
df = spark.createDataFrame(data)
file_path = "/local_disk0/tmp/sales/"
df.write.mode("overwrite").csv(file_path)
If we take a look at the files that were created by writing this DataFrame into CSV, we
can see a lot of "commit files" (i.e. _committed...) and "status files" (i.e. _started...)
inside the folder that was created.
But we also have the CSV files that contains the data from each partition of the Spark
DataFrame (i.e. part-...).
Therefore, suppose you want to share these CSV files that contains the data
from the Spark DataFrame with your client? This is where the DataPlatform.share_files() function
comes in.
['_committed_2013327814026063441',
'_committed_2369586526975359895',
'_committed_4627923378166736463',
'_committed_8475885235137697843',
'_committed_8990975055876416605',
'_committed_vacuum1974657704178089845',
'_started_2013327814026063441',
'_started_4627923378166736463',
'part-00000.csv',
'part-00001.csv',
'part-00002.csv',
'part-00003.csv']
Save the files into a volume!
In order to share these files that you've just created, you must first
save these files into an external volume. You can use the DataPlatform.persist_files()
function to do that. If you are unfamiliar with this function, you can checkout
the user guide for this function.
To make things easier, I'm going to reserve the names of these CSV files into a separate Python object. So that I can easily reference these files when I need to:
csvs = list()
partitions = os.listdir(file_path)
for partition in partitions:
if partition.endswith(".csv"):
csvs.append(partition)
print(csvs)
Then, I'm going to use the DataPlatform.persist_files() function to persist these
CSV files into a new external volume at the address clients_sandbox.blipdataforge.sales_files.
More precisely, these CSV files are going to be persisted into a folder inside this external volume.
This folder is at the path ingest/sales inside this volume.
from blipdataforge import DataPlatform
dp = DataPlatform()
dp.persist_files(
database="blipdataforge",
volume_name="sales_files",
destination_folder="/ingest/sales/",
files=[file_path + "/" + name for name in csvs]
)
Sharing the files with your client
Now that we have created the files we want to share, we need to share them.
If we want to send the email to pedro.duarte@blip.ai, then, we can do that by
using the following command:
dp.share_files(
receiver_email="pedro.duarte@blip.ai",
cc_email="victor.fernandes@blip.ai",
catalog="clients_sandbox",
database="blipdataforge",
volume_name="sales_files",
files=csvs,
folder="ingest/sales"
)
Changing the email title
You can use the email_title_opts argument from share_files() to change the behaviour
of the title of the email that is sent by the system. This argument should receive an object of
type EmailTitleOpts as input.
You might want to use this argument specially to make the title of the email behave like an unique identifier. Something that identifies the exact file and the data that is being shared within the email, which also makes the email easy to locate through the search bar in the GMail service.
To create such object, you can use the DataPlatform.create_email_title_opts() function.
You can read the "help" of this function (with help(DataPlatform.create_email_title_opts)) to get a complete description of the options and arguments
that you can use, and how they change the email title that is created by the system.
Take the snippet below as an example. Here, we are using the include_filename argument to automatically
include in the email title the filename of the file that is being sent in the email. We are also using
the start_date and end_date arguments to also include a "time range"/"period" in the email title.
from blipdataforge import DataPlatform
dp = DataPlatform()
email_title = dp.create_email_title_opts(
include_filename=True,
start_date=date(2024,12,9),
end_date=date(2024,12,10)
)
dp.share_files(
receiver_email="pedro.duarte@blip.ai",
cc_email="pedro.duarte@blip.ai",
catalog="dataplatform_sandbox",
database="blipdataforge",
volume_name="sales_files",
files=csvs,
folder="ingest/sales",
email_title_opts=email_title
)
By using the email_title object exposed in the example above, the title of the email that is going to
be created by the share_files() function will be something like this: