If you have any questions, don't hesitate to contact me on twitter :). This tutorial was made possible by startdataengineering, a great place to start your data engineering journey. Which are frequently sort-after tools for most data engineering job roles. You can set up dashboard cards to send people to dashboards, saved questions, and URLs, and use values from the card to update filters at the destination, or parameterize links to external sites. This tutorial has covered tools from airflow, pyspark, AWS EMR, AWS S3, AWS Redshift, AWS EC2 and metabase for visualization of our data. Custom destinations: choose what happens when people click on charts in your dashboard. To set up our AWS infrastructure we run the setup_infra.sh script in our project./setup_infra.sh Ĭongratulations, using AWS tools, Airflow and Metabase you have been able to run a batch processing job flow :). Lastly we use Metabase for visualization of our loan tracker metrics.Load the loan tracker metrics into our data warehouse. We use Airflow for our workflow orchestration to orchestrate the following.Ĭlassify our loan tracker metrics using Apache Spark. git clone ģ m4.xlarge type nodes for our AWS EMR cluster.ġ dc2.large for our AWS Redshift cluster.ġ iam role to allow Redshift access to S3. Setup PrerequisiteĬlone and cd into the project directory. Using your experience as a data engineer you need to make the loan tracker metric data available to both decision-makers and end users. Lets assume you work for a bank as a data engineer and you have been tasked to create a batch processing workflow that tracks the number of loans in each loan category of the bank every day. What is Metabase Metabase is an easy-to-use, open source business intelligence tool that lets you analyse data from a variety of data destinations and sources. This tutorial was made possible by startdataengineering for those who are eager to get into the data engineering field, you can start from there. This tutorial covers some core components of batch processing you can apply to your projects in no time. However, if you're looking for a powerful data catalog solution with robust data discovery and metadata management features, Magda is the better choice.įeel free to try out both tools and determine which one suits your needs best.Batch processing is one of the sort-after requirements for data engineering roles and getting a data engineering project running batch processing on the cloud can be time consuming. If your primary need is data visualization and exploration, Metabase is the way to go. How to build Metabase Dashboards Metabase 3.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |