Job Description
Job Summary
Hiring for Industry Top Employers under direct payroll.
We are looking for an experienced Deep Learning Engineering Manager to lead the development and deployment of scalable AI solutions. This role requires deep expertise in Deep Learning, Generative AI, and production AI systems, along with strong leadership capabilities to manage teams, drive architecture decisions, and deliver business-impacting AI initiatives.
Job Description
- Lead the design, development, and deployment of deep learning and Generative AI solutions
- Architect end-to-end AI systems (data pipelines, model training, deployment, monitoring)
- Drive development of LLM-based applications (RAG, fine-tuning, prompt engineering)
- Establish and scale MLOps practices (CI/CD, model versioning, monitoring, automation)
- Build and manage cloud-native AI platforms (AWS/GCP/Azure)
- Ensure high-performance AI systems (latency, scalability, reliability)
- Collaborate with Product, Data, and Engineering teams to define AI strategy and roadmap
- Lead technical architecture reviews and decision-making
- Ensure adherence to Responsible AI, security, and compliance standards
- Mentor and guide teams to deliver high-quality AI solutions
Roles & Responsibilities
- Lead and manage a team of AI/ML engineers and tech leads
- Own delivery of multiple AI/ML projects and programs
- Design scalable deep learning architectures (Transformers, CNNs, sequence models)
- Develop and oversee NLP, Computer Vision, and Generative AI solutions
- Implement distributed training and large-scale data processing systems
- Drive continuous model improvement and performance optimization
- Establish model monitoring, evaluation, and feedback loops
- Manage stakeholder communication and align AI initiatives with business goals
- Promote best practices in coding, system design, and AI engineering
- Stay updated with advancements in Deep Learning and AI technologies
Deep Learning & AI Expertise
- Deep Learning (Transformers, CNN, RNN, Attention Mechanisms)
- Generative AI & Large Language Models (LLMs)
- Natural Language Processing (NLP)
- Computer Vision
- Retrieval-Augmented Generation (RAG), Fine-tuning, Prompt Engineering
- Model Evaluation, Optimization, and Tuning
Programming & Frameworks
- Python (Expert level)
- PyTorch (Preferred) / TensorFlow
- Hugging Face Transformers
- FastAPI / Flask (API integration)
MLOps & Engineering
- CI/CD Pipelines (GitHub Actions, Jenkins)
- Docker, Kubernetes
- MLflow, Kubeflow
- Model Monitoring, Logging, A/B Testing
Cloud & Distributed Systems
- AWS (SageMaker, S3, EC2, Lambda)
- GCP (Vertex AI, BigQuery)
- Azure ML
- Distributed Computing (Spark, Ray – good to have)
Data & Platform Engineering
- Data Pipelines (Airflow, Spark)
- SQL / NoSQL Databases
- Vector Databases (FAISS, Pinecone, Weaviate)