DataHub job stories ...


Focused on the individual data engineer and data scientist who may work in an enterprise ...


Power users ...

👩‍🔬 Data Scientist

👨‍🔬 Data engineer

I want somewhere to keep my data ...

I want to keep track of changes and go back to previous changes ...

I want to share with other data scientists on my team and have them contribute in a structured way so that we can collaborate ...

I want to easily (repeatedly) share with clients / stakeholders in a way they can understand so that I can get feedback and deliver to my clients

I want clients to be able to send me data in an easy way

I want to keep my data in sync with my code so that i don't get errors

I want to close gap between storage and deployment ...

I want to check my data so that to avoid errors and delay later on in costly parts (e.g. doing 5h of machine learning on data that's wrong)

I want to store workflows and code for processing my data with my data so that they keep in sync ...

I want to automate those workflows so that they run on every data change or when I trigger them so that i reduce effort and build more complex flows ...

I want to unit test my data (and have CI of D) so that I have reliable data and don't get bugged by data science teams ...

I want to pull data from other systems ...

I want to use tools i'm already familiar with ...

I want to store into cloud storage ...

I want to be able to be able to present data in human viewable form so that collaborators esp downstream (e.g. data scientists) can inspect what I'm giving them ...

I want to publish datasets like packages ... so that i can manage dependency ...

I want to provide documentation and annotations for my data so that i don't have to answer questions and can respond with rtfm

I want to have an inventory of schemas around for validating data so that I know what I need to provide and what i'm getting ...

I want to add simple visualizations quickly so that I don't need to start learning some viz lib or other stuff ...