APIs, storage, distribution, distributed computation, consumption and infrastructure management.
Collaborate with product management and other engineering teams to understand and define the analytics requirements for our customers and design and
build features, data source integrations, data platform data-pipelines.
Build internal applications to administer, monitor and troubleshoot the data pipelines data-integrations.
Collaborate with cloud-infrastructure team on infrastructure automation, cloud engineering and security design.
Implement technically best practices for optimized data flow and data quality.
Production deployment design and technical support of our large scale data-pipelines.
Create user documentation to maintain both development and operations continuity.
Work together with your agile team to improve process and delivery through collaborative problem-solving.
Cultivate and enhance a culture built around standard methodologies for CI/CD, alerting, and monitoring
Proficient in Python with good knowledge of dealing with data analysis python libraries - Pandas, Numpy
Experience / Familiarity with Database Modeling, data management eco system.
Experience with dev ops tools like Git, Maven, Gradle, Jenkins, Docker etc.
Experience in Apache PySpark Distributed computing platform.
Excellent written and verbal communication skills.
Passion for learning and implementing new technologies.
Ability to operate under fast-paced environment
Top leading US MNC