DESCRIPTION
External job description
Amazon Music is an immersive audio entertainment service that deepens connections between fans, artists, and creators. From personalized music playlists to exclusive podcasts, concert livestreams to artist merch, Amazon Music is innovating at some of the most exciting intersections of music and culture. We offer experiences that serve all listeners with our different tiers of service: Prime members get access to all the music in shuffle mode, and top ad-free podcasts, included with their membership; customers can upgrade to Amazon Music Unlimited for unlimited, on-demand access to 100 million songs, including millions in HD, Ultra HD, and spatial audio; and anyone can listen for free by downloading the Amazon Music app or via Alexa-enabled devices. Join us for the opportunity to influence how Amazon Music engages fans, artists, and creators on a global scale.
If you love the challenges that come with big data then this role is for you. We collect billions of events a day, manage petabyte scale data on Redshift and S3, and develop data pipelines using Spark/Scala EMR, SQL based ETL, and Java services.
You are a talented, enthusiastic, and detail-oriented Data Engineer, Data Science, Business Intelligence, or Software Development who knows how to take on big data challenges in an agile way. Duties include big data design and analysis, data modeling, and development, deployment, and operations of big data pipelines. You will also help hire, mentor, and develop peers in the the Music Data Experience team including Data Scientists, Data Engineers, and Software Engineers. You’ll help build Amazon Music’s most important data pipelines and data sets, and expand self-service data knowledge and capabilities through an Amazon Music data university.
This role requires you to live at the cross section of data and engineering. You have a deep understanding of data, analytical techniques, and how to connect insights to the business, and you have practical experience in insisting on highest standards on operations in ETL and big data pipelines. With our Amazon Music Unlimited and Prime Music services, and our top music provider spot on the Alexa platform, providing high quality, high availability data to our internal customers is critical to our customer experiences.
Music Data Experience team develops data specifically for a set of key business domains like personalization and marketing and provides and protects a robust self-service core data experience for all internal customers. We deal in AWS technologies like Redshift, S3, EMR, EC2, DynamoDB, Kinesis Firehose, and Lambda. In 2020 your team will migrate Amazon Music’s information model and data pipelines to a data exchange store (Data Lake) and EMR/Spark processing layer. You’ll build our data university and partner with Product, Marketing, BI, and ML teams to build new behavioral events, pipelines, datasets, models, and reporting to support their initiatives. You’ll also continue to develop big data pipelines.
Key job responsibilities
Build Data Platform and Data Lake solutions
Build Data Engineering tools
Build real time and micro batch data pipelines
About the team
The Music Data eXperience (MDX) team is responsible for the definition, design, production, and quality of foundational datasets consumed by the whole org, data management tools, and the self-service data lake and warehouse platforms on which these datasets are published, stored, shared, and consumed for analytics and science modeling. MDX is split into two sub teams *PARAM* (Platform Architecture Research and AutoMation) and *IDEA* (Intelligence, Data Engineering & Analytics). Data Platform (PARAM) team owns the self-service data lake Data EXchange Store (DEX) and Data Warehouse platforms, builds tools and frameworks for efficient data management, and owns the orchestration and configuration platform for data pipelines. Data Engineering (IDEA) Team owns the foundational data model and datasets, the Spark and Datanet ETL jobs and business logic to build them, away team support for datasets, org wide launch support (when required), the Executive Daily Summary (EDS), and future batch dataset data quality frameworks.
BASIC QUALIFICATIONS
– 1+ years of data engineering experience
– Experience with SQL
– Experience with data modeling, warehousing and building ETL pipelines
– Experience with one or more query language (e.g., SQL, PL/SQL, DDL, MDX, HiveQL, SparkSQL, Scala)
– Experience with one or more scripting language (e.g., Python, KornShell)
PREFERRED QUALIFICATIONS
– Experience with big data technologies such as: Hadoop, Hive, Spark, EMR
– Experience with any ETL tool like, Informatica, ODI, SSIS, BODI, Datastage, etc.