We are seeking a highly skilled and passionate Data Engineer to join our growing team focused on building and deploying cutting-edge AI/ML solutions. As a Data Engineer, you will play a crucial role in designing, building, and maintaining the data infrastructure powering the AI models for Rocket Copilot, our AI legal assistant. You will work closely with Machine Learning Engineers, Data Scientists, and Product Managers to ensure the availability of high-quality data for training, fine-tuning, and evaluating generative models. This role requires a strong understanding of data engineering principles, experience with large-scale data processing, and a passion for pushing the boundaries of AI.
Responsibilities:
- Design, develop, and maintain robust, scalable, and efficient data pipelines for ingesting, processing, transforming, and storing large datasets used for training and evaluating generative AI models.
- Perform data cleaning, normalization, transformation, and feature engineering to prepare data for model training. This includes handling unstructured data like text, images, and audio.
- Build and manage the data infrastructure, including data lakes, data warehouses, and databases, optimized for AI workloads.
- Implement data quality checks and monitoring systems to ensure data accuracy, completeness, and consistency.
- Contribute to the development and implementation of MLOps best practices for data management and model deployment.
- Work with GCP and Snowflake and their data and AI offering.
- Optimize data pipelines and infrastructure for performance, scalability, and cost-effectiveness.
What you'll need:
- 5+ years of python and SQL experience.
- 3+ experience of leveraging technologies such as Airflow, Apache Spark.
- Experience working with large language models (LLMs), diffusion models, or other generative models.
- Experience with MLOps tools and practices.
- Strong understanding of data architectures and patterns.
- Experience with containerization technologies (e.g., Docker, Kubernetes).
- Contributions to open-source projects.
- Experience in DataOps implementation and support.
- Experience in MLOps implementation and support.
- Experience in building and supporting AI/ML platform.
Benefits & Perks:
- Comprehensive health plans (including Medical, Dental and Vision insurance for full-time employees)
- Unlimited PTO
- Competitive salary packages
- Life insurance
- Disability benefits
- Supplemental Optional Life Insurance Benefits
- FSA Options Optional
- HSA with Company Match
- 401k program with Company Match
- Fertility Assistance and Planning options
- Wellhub & ClassPass fitness platforms
- Comprehensive Pet Insurance options
- Financial Wellbeing & Student Loan Program access
- Access to additional Mental Health & Wellbeing resources