Overview
Skills
Job Details
Title: SRE
Duration: Long term Contract
Top Skills' Details
Strong Database design expertise (Cassandra / Vector DB) - Weaviate database administration is ideal.
Hands-on with installation, configuration, scaling.
Kubernetes deployment experience.
Python (code commits) and GitHub exposure.
Familiarity with Weaviate (data vectorization) and Memgraph (graph representation).
Admin / SRE capabilities: monitoring, RAG, and applying security best practices.
Experience Range: 5 7 years with 2 3 quarters of relevant project experience (minimum).
Job Description:
Site Reliability Engineers are responsible and take ownership for reliability, scalability, automation, and other aspects related to uptime and availability of our database services. You will need to have strong skills in following areas:
Understanding of GenAI Foundation Models and Vector DB: Leveraging foundational AI models and Vector database technologies for advanced AI capabilities.
Evaluate different DB technologies for AI and RAG capabilities
Install, configure, and maintain vector databases such as Weaviate, Memgraph, Pinecone, Milvus, etc on hybrid infrastructure platforms
Design and manage embedding storage architectures optimized for high-dimensional vector search.
Monitor and improve the performance and scalability of vector databases for large-scale deployments.
Manage data ingestion, indexing strategies (e.g. HNSW, IVF, Annoy), and re-indexing tasks.
Ensure data consistency, replication, backups, and disaster recovery plans are in place.
Implement security best practices including access controls, encryption, and audit logging.
Collaborate with AI/ML and data engineering teams to integrate vector databases with NLP, CV, and recommendation system pipelines; assess large and varied data sources and help development teams design RAG applications.
Document configurations, architecture decisions, and operational procedures.
Knowledge and ability to thrive in an Agile DevOps environment; responsibility to manage database availability, scalability and reliability with an automation approach.