Data Engineering projects

How to build a DAG based Task Scheduling tool for Multiprocessor systems using python Scheduling Big Data Workloads in the Cloud with pyDag Design, Development and Deployment of a simple Data Pipeline Data Engineering technical challenge How to Build a Lossless Data Compression and Data Decompression Pipeline A parallel implementation of the bzip2 high-quality data compressor tool in Python. Introduction to Apache Spark Big data with Apache Spark A Text Analysis of Andres Manuel Lopez Obrador’s Speeches Text analytics with python Building an Amazon Prime content-based Movie Recommender System TF-IDF, Cosine similarity, BM25, BERT Understanding Similarity Measures for Text Analysis Distance Metric of similarity Dockerizing an Apache Spark Standalone Cluster Small Big Data ecosystem with In-Memory Processing Build a Big Data Pipeline with PySpark and AWS EMR on EC2 How to Build a Big Data Pipeline with PySpark and AWS EMR on EC2 Spot Fleets and On-Demand Instances Building an ETL pipelin...