Data Engineer, Baseball Operations - New York Yankees (Bronx · NY)
The New York Yankees Baseball Operations department is accepting applications for an experienced data engineer with a focus on data quality analysis. This position reports to our senior Baseball Operations executives and will assist in the development and maintenance of our data processing pipelines. This position is based in Bronx, NY.
- Prepare, clean, format analytical datasets for processing by data scientists
- Become an expert in our datasets, their strengths and weaknesses and write code to pull and verify data in response to data scientist requests
- Using R, visualize complex, multi-source data to pinpoint data quality issues
- Build automated pipelines for processing and cleaning data
- Conduct database feature engineering to support ongoing quantitative research
- Work with developers to create and deploy systems for anomaly detection
- Interface with data scientists, software developers, and other baseball operations staff as needed
- Design department-wide principles and workflow for data quality management
- Serve as the main point-of-contact for questions about data structures, definitions, and quality
- Bachelor’s degree in Computer Science or related field
- 3+ years of experience developing in SQL (preferably T-SQL)
- 2+ years of experience with data profiling, data modeling, and data pipeline development
- 2+ years of experience developing in R (or a similar statistical programming language), including experience with data manipulation and visualization in that language
- Ability to write succinct code with optimal performance and simplicity
- Excellent communication and problem-solving skills – must be able to break down a complex task and put together an execution strategy with little guidance
- An understanding of typical baseball data structures, basic and advanced baseball metrics, and knowledge of current baseball research areas
Describe your experience writing in T-SQL.
Describe your experience writing in R. What packages do you use most?
Describe your experience with data engineering and the specific techniques you’ve used.
At a high-level, describe briefly what steps you would take to identify data biases or inconsistencies in an unfamiliar/new dataset?
Have you ever worked with baseball data sets before? If so, please describe which ones and how you used them.
How did you hear about this job?