15535 Sand Canyon
Job Category: Big Data
Job Number: 20329
- Build robust, scalable real time and batch data pipelines using services like AWS Glue, Amazon EMR, SSIS, Python, PySpark, many others.
- Provide inputs in choosing and implementing cost effective, scalable and fault tolerant solutions
- Well versed with modern data platforms and stay up to date with emerging trends in data engineering field.
- Work directly with business teams to extract, transform, and load data from a wide variety of data sources using SQL and other technology
- Familiar with Data Virtualization Technology
- Knowledge of Master Data Management specific to ETL processes.
- Improve ongoing reporting and analysis processes through automation, scale and best practice
- Implement data structures using best practices in data modeling, ETL/ELT processes, and SQL, as well as other technology
- Passion for data in all shapes and forms
Knowledge, Skills and Abilities
- At least 7 years of experience with building data pipelines on cloud platforms like AWS and/or GCP
- Hands on experience with Python, Spark and Kafka to build our next gen streaming data platform, ingesting TB of data monthly.
- Knowledge of Snowflake, Redshift, Athena, batch and Streaming data, Timeseries, Kubernetes
- Experience building large data lakes in the cloud.
- Understanding of ETL and ELT, Informatica, AWS Glue, Google Cloud Dataprep, Amazon EMR and be able to choose the right tool for the job.
- Attending daily sprint meetings
- Working closely with PM and other upstream source system teams to address production issues.
- SQL server and SSIS
Understanding of eCommerce, global scale networks and cloud architecture, other big data technologies
- 7-10+ years of Data Platform expertise
- Some form of formal education (boot camp) in Computer Science, Software Development, Database or related technologies
- Bachelor' s Degree in Computer Science is highly preferred.