Skip to content

Share files with your client

You can use the DataPlatform.share_files() function to share files with your client via email. This function allows you send an email to your client with a list of links to download these files.

You just provide the email address of the recipient, and a list of paths to the files you want to send, and the function will do the rest: creating a download link for each file, adding it to the body of the email, and sending the email to the recipient.

Some people might ask you to attach the files to the email as real attachments, like when you attach a PDF file to the email, etc. But the DataPlatform.share_files() function doesn't attach any file to the email, it actually add download links to the body of the email, so that your client can download the files that he wants via these download links. This is a much more secure way of sharing the files.

Attaching files with personal/private/secret data to an email is prohibited, not only because the Information Security department of Blip doesn't allow it, but also, because this practice goes against the brazilian law (more specifically, the LGPD - Lei Geral de Proteção de Dados Pessoais).

Creating the files

To demonstrate the use of the DataPlatform.share_files() function, we need first to create some files. The source code below creates a sample Spark DataFrame, and then, it tries to write the data from this DataFrame into a CSV file.

from pyspark.sql import SparkSession
from datetime import date
from pyspark.sql import Row

spark = SparkSession.builder.getOrCreate()
data = [
  Row(id = 1, value = 28.3, date = date(2021,1,1)),
  Row(id = 2, value = 15.8, date = date(2021,1,1)),
  Row(id = 3, value = 20.1, date = date(2021,1,2)),
  Row(id = 4, value = 12.6, date = date(2021,1,3))
]

df = spark.createDataFrame(data)
file_path = "/local_disk0/tmp/sales/"
df.write.mode("overwrite").csv(file_path)

If we take a look at the files that were created by writing this DataFrame into CSV, we can see a lot of "commit files" (i.e. _committed...) and "status files" (i.e. _started...) inside the folder that was created. But we also have the CSV files that contains the data from each partition of the Spark DataFrame (i.e. part-...).

Therefore, suppose you want to share these CSV files that contains the data from the Spark DataFrame with your client? This is where the DataPlatform.share_files() function comes in.

import os
print(list(os.listdir(file_path)))
['_committed_2013327814026063441',
 '_committed_2369586526975359895',
 '_committed_4627923378166736463',
 '_committed_8475885235137697843',
 '_committed_8990975055876416605',
 '_committed_vacuum1974657704178089845',
 '_started_2013327814026063441',
 '_started_4627923378166736463',
 'part-00000.csv',
 'part-00001.csv',
 'part-00002.csv',
 'part-00003.csv']

Save the files into a volume!

In order to share these files that you've just created, you must first save these files into an external volume. You can use the DataPlatform.persist_files() function to do that. If you are unfamiliar with this function, you can checkout the user guide for this function.

To make things easier, I'm going to reserve the names of these CSV files into a separate Python object. So that I can easily reference these files when I need to:

csvs = list()
partitions = os.listdir(file_path)
for partition in partitions:
    if partition.endswith(".csv"):
        csvs.append(partition)

print(csvs)
['part-00000.csv',
 'part-00001.csv',
 'part-00003.csv',
 'part-00005.csv',
 'part-00007.csv']

Then, I'm going to use the DataPlatform.persist_files() function to persist these CSV files into a new external volume at the address clients_sandbox.blipdataforge.sales_files. More precisely, these CSV files are going to be persisted into a folder inside this external volume. This folder is at the path ingest/sales inside this volume.

from blipdataforge import DataPlatform
dp = DataPlatform()
dp.persist_files(
    database="blipdataforge",
    volume_name="sales_files",
    destination_folder="/ingest/sales/",
    files=[file_path + "/" + name for name in csvs]
)

Sharing the files with your client

Now that we have created the files we want to share, we need to share them. If we want to send the email to pedro.duarte@blip.ai, then, we can do that by using the following command:

dp.share_files(
    receiver_email="pedro.duarte@blip.ai",
    cc_email="victor.fernandes@blip.ai",
    catalog="clients_sandbox",
    database="blipdataforge",
    volume_name="sales_files",
    files=csvs,
    folder="ingest/sales"
)

Changing the email title

You can use the email_title_opts argument from share_files() to change the behaviour of the title of the email that is sent by the system. This argument should receive an object of type EmailTitleOpts as input.

You might want to use this argument specially to make the title of the email behave like an unique identifier. Something that identifies the exact file and the data that is being shared within the email, which also makes the email easy to locate through the search bar in the GMail service.

To create such object, you can use the DataPlatform.create_email_title_opts() function. You can read the "help" of this function (with help(DataPlatform.create_email_title_opts)) to get a complete description of the options and arguments that you can use, and how they change the email title that is created by the system.

Take the snippet below as an example. Here, we are using the include_filename argument to automatically include in the email title the filename of the file that is being sent in the email. We are also using the start_date and end_date arguments to also include a "time range"/"period" in the email title.

from blipdataforge import DataPlatform
dp = DataPlatform()
email_title = dp.create_email_title_opts(
  include_filename=True,
  start_date=date(2024,12,9),
  end_date=date(2024,12,10)
)

dp.share_files(
    receiver_email="pedro.duarte@blip.ai",
    cc_email="pedro.duarte@blip.ai",
    catalog="dataplatform_sandbox",
    database="blipdataforge",
    volume_name="sales_files",
    files=csvs,
    folder="ingest/sales",
    email_title_opts=email_title
)

By using the email_title object exposed in the example above, the title of the email that is going to be created by the share_files() function will be something like this:

Your data is ready for download - sales.csv - From 2024-12-09 to 2024-12-10