Location: On-site 5 days/week — local candidates only
Clearance: Must be able to obtain and maintain a Public Trust clearance
Position Summary
We are seeking a Senior Data Scientist with deep, hands-on expertise in Natural Language Processing (NLP) and Generative AI/LLMs to support a federal data science initiative. The ideal candidate is a true self-starter who can operate independently, translate complex analytic problems into automated data solutions, and communicate findings clearly to both technical teams and executive leadership.
Key Responsibilities
• Apply hands-on experience in Python, NLP frameworks, SQL, Pandas, NLTK, and spaCy to solve real-world data challenges
• Analyze trends and transactional data using strong SQL skills
• Develop, test, and deploy new techniques for NLP understanding
• Build scalable ML and Generative AI solutions, including Large Language Models (LLMs)
• Train and optimize NLP/LLM models and build Python-based data pipelines
• Build cloud-native solutions on AWS
• Determine the nature of analytic problems, evaluate options, and recommend resolutions
• Advise on methods and data needed to evaluate complex data problems
• Collaborate with data collectors and analysts to close gaps on complex monitoring problems
• Deliver accurate, timely, and sophisticated data analysis
Basic Qualifications
• Bachelor''s degree in Statistics, Applied Mathematics, Computer Science, or Information Science, with industry experience in Python, NLP frameworks, SQL, Pandas, NLTK, spaCy, data science, and AI/ML/LLM engineering
• 10+ years overall IT industry experience
• Education/experience combinations accepted: Master''s + 10 years; Bachelor''s + 12 years; or 18 years in lieu of a degree
Required Skills
• Solid experience with NLP, Python, NLP frameworks, SQL, Pandas, NLTK, and spaCy
• Experience with Generative AI and LLMs
• Demonstrated self-starter, able to operate independently
• Fluency in Python, version control/Git, standard Python packages (Pandas, NumPy, Matplotlib), and ML frameworks
• Knowledge of TensorFlow, PyTorch, Pandas, scikit-learn, NLTK, AWS EC2 (Azure ML a plus)
• Experience with scalable data engineering frameworks (e.g., Apache Spark) and orchestration frameworks (e.g., Airflow), and/or semantic search
• Expert-level data analysis and advanced statistical/ML methods to build, train, test, and evaluate supervised and unsupervised models
• Experience with ML model deployment and operations (DevOps, MLOps, LLMOps)
• Experience with NLP/Generative AI libraries (e.g., spaCy, LangChain), text annotation tools, and semantic frameworks
• Ability to clean and process large volumes of real-world data
• Experience retrieving/manipulating data from varied sources (DB2, Oracle, SQL Server, Hadoop, flat files)
• Experience with database management systems (PostgreSQL, MySQL, SQLite, SQL, etc.)
• Excellent analytical and problem-solving skills; ability to identify risks and propose solutions
• Excellent written and verbal communication skills across audiences, including executive leadership
Desired Skills
• Prior experience on federal or state government IT projects
• Industry experience strongly preferred
• Experience with, or willingness to learn, the Hadoop ecosystem (Spark, Impala, Hive)
• Experience in an analytical research environment
• Experience in parallel/GPU processing (CUDA)
• Experience with Mathematica
• Experience with markup languages (LaTeX, HTML)
• Experience with NLP for anomaly detection