Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

ELT Data Engineering Project - Trip Analysis

ELT Data Engineering Project - Trip Analysis

End-to-end ELT pipeline using the Taxi Trips dataset to orchestrate ingestion, transformation, and analytics. GCP • Prefect • dbt • BigQuery

Stack: GCS, Prefect, dbt, BigQuery, Looker Studio
Data: Taxi Trips dataset
Goal: Build a reproducible ELT pipeline and analytics dashboard

Technology Stack

The following technologies are used to build this project

Data Pipeline Architecture

Data pipeline architecture

Dashboard

Live Dashboard

Reproduce it yourself

  1. First clone this repo to your local machine.

git clone https://github.com/Khunmi/ELT_Project_DZC

  1. Setup your Google Cloud environment

export GOOGLE_APPLICATION_CREDENTIALS=<path_to_your_credentials>.json
gcloud auth activate-service-account --key-file $GOOGLE_APPLICATION_CREDENTIALS
gcloud auth application-default login

Check out this link for a video walkthrough.

  1. Setup your orchestration

python prefect/block/make_gcp_blocks.py
prefect agent start -q 'default'
python prefect docs/parameterized_flow.py
  1. Data transformation and modeling using dbt

dbt build --var 'is_test_run: false'

You will get 4 tables in Citibike_data_dbt data set

Data Visualization and Dashboarding

Future work