Amundsen Lyft
Augmented data graph amundsen uses a graph database under the hood to store relationships between various data assets tables dashboards protobuf events etc.
Amundsen lyft. Search for data within your organization by a simple text search. Popular tables api leverages popularityscore attribute of table super type to enable custom sorting strategy. Please visit roadmap if you are interested in amundsen upcoming roadmap items. Amundsen lyft s data discovery metadata engine april 2019 software engineering daily podcast on amundsen april 2019 how lyft drives data discovery july 2019 data engineering podcast on solving data discovery at lyft aug 2019.
Amundsen is a metadata driven application for improving the productivity of data analysts data scientists and engineers when interacting with data. Blog posts and interviews. We will introduce amundsen which is an open source data discovery platform from lyft. Adding more kinds of data resources data sets dashboards people phase 1 complete phase 2 in development phase 3 in scoping streams schemas workflows.
It does that today by indexing data resources tables dashboards streams etc and powering a page rank style search based on usage patterns e g. Amundsen is built on 3 key pillars. Now let s talk technical. Started at lyft amundsen has made data engineers data analysts and data scientists 20 more productive.
How does it work. Amundsen has a concept of popular tables which is a default entry point of the application for now. Lyft s data warehouse is on hive and all physical partitions are stored in s3. Highly queried tables show up earlier than less.
At the same time there s a lot of value a metadata driven solution can provide in the space of compliance in tracking personal data across the entire data infrastructure. Amundsen is a metadata driven application for improving the productivity of data analysts data scientists and engineers when interacting with data. Impact amundsen at lyft 66 beta release internal generally available ga release alpha release. Amundsen has been running in production at lyft for over a year now with about 1000 weekly active users at lyft internally.
We have extremely high penetration rate 90 for technical roles like data scientist research scientists and data engineers while also being used by business users like marketing and community associates. A pagerank inspired search algorithm recommends results based on names descriptions tags and querying viewing activity on the.