felipe volpone
Jan 31, 2022

> Why not ask DataLake team to create one more Kafka/Kinesis topic for your team?

That plus aggregating and some processing it's pretty much what a FS is.

The DataLake is more like a raw data storage and the FS can provide better (and more contextualized ) info on top of that.

As en example: let's say we have all the 5 orders from a given customer in DL. In FS, we can have an attribute/feature that filters and counts the orders done during lunch (calculating and filtering the orders based on their created_at timestamp). That way, we have FS as a source of truth and it's clear to the company what are the rules that define the total_orders_lunch feature.

felipe volpone
felipe volpone

Written by felipe volpone

I’m into distributed systems and how we can make them easier to develop and maintain. Writing code to scale to millions of users @ Ifood, formerly Red Hat .

No responses yet