Amundsen Data Catalog
Online interview and live q a with the co creator of amundsen the oss data catalog built at lyft.
Amundsen data catalog. It does that today by indexing data resources tables dashboards streams etc and powering a page rank style search based on usage patterns e g. Amundsen is a data discovery and metadata engine for improving the productivity of data analysts data scientists and engineers when interacting with data. On the data discovery classification page the overview tab includes a summary of the current classification state of the database. And google cloud platform s data catalog though i m only including this because it seems like a response to the current trend of metadata hubs in the market.
Read more read less. A graph data model is to represent relationship between entity vertex and relation. Other tools such as dataportal airbnb databook uber amundsen lyft datahub linkedin marquez wework and data catalog google were all developed internally to reduce friction of. Cloud data catalogs azure data catalog aws glue gcp data catalog considerations.
The summary includes a detailed list of all classified. Otherwise you end up with metadata silos. Amundsen is built on 3 key pillars. To achieve end to end governance all data has to flow through embedded solutions.
In this article we will focus on lyft s very own data discovery and metadata platform. In response to uber s success the ride sharing market saw a major wave of competitors arrive and among those there is lyft. We will introduce amundsen which is an open source data discovery platform from lyft. Go to advanced data security under the security heading in your azure sql database pane.
As a result you might use the tools in the wider scope than you planned and increase lock in. Select advanced data security and then select the data discovery classification card. Go to the azure portal. Graph data model is not a common choice for most applications but we believe it is a great fit for amundsen as it deals with a lot of relationships among entities.
The goal of this blog post is to quickly summarise the problem these products are created to solve give a quick overview of each of the. This way we build a comprehensive data catalog containing lineage information to identify trace and secure the data we have and which can be consumed through integration with amundsen in a. In our last article we spoke of uber s databook an in house platform designed by their very own engineers with the aim to turn data into contextualized assets. Highly queried tables show up earlier.
Augmented data graph amundsen uses a graph database under the hood to store relationships between various data assets tables dashboards protobuf events etc. About this event mark grover is a product manager at lyft and the co creator of amundsen.