Overview
Skills
Job Details
AI Engineer
Charlotte, NC or Dallas, TX
Long Term
Contract
AI System Requirements: Embedding Models + Knowledge Graphs + Advanced RAG + Ontology Extraction
Objectives
- Enable semantic search and reasoning over domain-specific data.
- Integrate embeddings with knowledge graphs for hybrid retrieval.
- Support advanced RAG pipelines for contextualized generation.
- Automate ontology extraction to enrich structured knowledge bases.
Core Components
- Python Frameworks: PyTorch, TensorFlow, HuggingFace Transformers.
- Embedding Models: Sentence-BERT, OpenAI embeddings, domain-specific fine-tuned models.
- Knowledge Graphs: Neo4j, RDF/SPARQL, graph-based reasoning engines.
- Advanced RAG: Hybrid retrievers (vector + symbolic), context re-ranking, multi-hop reasoning.
- Ontology Extraction: NLP pipelines for entity/relation extraction, schema induction, ontology alignment.
. Functional Requirements
- Data ingestion and preprocessing (structured + unstructured).
- Embedding generation and storage in vector DB (e.g., Pinecone, FAISS, Weaviate).
- Knowledge graph construction and querying.
- Retrieval pipeline combining embeddings + KG queries.
- Ontology extraction from text corpora to update KG schema.
- RAG pipeline for contextualized text generation with grounding.
. Non-Functional Requirements
- Scalability: handle millions of documents and graph nodes.
- Accuracy: embeddings must achieve 90% semantic similarity on benchmark tasks.
- Latency: retrieval + generation under 2 seconds per query.
- Extensibility: modular design for plugging in new models or KG schemas.
Integration Architecture
- Frontend: API endpoints for query and generation.
- Backend: Python services orchestrating embeddings, KG queries, and RAG.
- Storage: Vector DB + Graph DB.
- Pipeline: Ontology extraction KG enrichment Hybrid retrieval RAG generation.
Munesh
,
CYBER SPHERE LLC