About the Company
Blackstraw.ai is an end-to-end technology services company specializing in Artificial Intelligence (AI) and Engineering solutions across Data Science, Data Engineering, LLM/GenAI and LLMOps. Founded in 2018, we help global enterprises across North America, Europe and Asia to build and operationalize AI systems that create measurable business impact. Our mission is to make AI adoption simpler, faster and scalable through a blend of deep domain expertise, reusable accelerators and proven engineering practices.
With a 400+ strong team of engineers, data scientists and AI specialists, we partner with organizations to deliver real-world outcomes in areas such as predictive analytics, computer vision, natural language processing and Generative AI. Headquartered in Florida (USA) with operations in Canada and India, Blackstraw.ai continues to empower global enterprises to unlock the true potential of AI.
About the Role
We are building advanced systems to extract, structure, and operationalize data from a wide range of external digital sources. As the Lead for Data Extraction Engineering, you will architect and drive the development of robust, scalable, and resilient extraction frameworks that operate reliably across dynamic and evolving environments. This role requires deep technical expertise, strong problem‑solving skills, and the ability to guide engineering teams toward high‑quality delivery.
Location: USA / Canada (remote)
Experience: 5 to 10 years
Employment Type: Full-time only
Key Responsibilities
Technical Leadership
Lead the design and development of scalable data extraction frameworks and reusable components.
Define best practices, coding standards, and architectural patterns for extraction systems.
Evaluate and integrate new tools, libraries, and platforms to enhance extraction capabilities.
Data Extraction Engineering
Develop strategies to extract structured and semi‑structured data from dynamic, complex, and frequently changing web sources.
Build resilient pipelines that handle challenges such as:
Dynamic web applications
Rate limits and throttling
Anti‑bot and blocking mechanisms
Frequent DOM or API structure changes
Optimize extraction workflows for accuracy, efficiency, fault tolerance, and long‑term maintainability.
Collaboration & Integration
Work closely with platform, data engineering, and DevOps teams to integrate extraction workflows into broader data pipelines.
Ensure extraction systems scale effectively across distributed environments.
Provide technical mentorship to junior engineers and contribute to team capability building.
Must‑Have Experience
5 to 10+ years of software engineering experience with strong expertise in Python.
Hands‑on experience with:
Scraping frameworks (e.g., Scrapy or similar)
Browser automation tools (e.g., Playwright, Selenium)
Platforms/tools such as Zyte or equivalent ecosystems
Proven experience working with:
Dynamic and JavaScript‑heavy web applications
Rate limits, blocking, and anti‑bot mechanisms
Strong understanding of efficient data extraction patterns, architectural trade‑offs, and performance optimization.
Nice to Have
Experience building distributed or large‑scale data extraction systems.
Familiarity with proxy management, rotation strategies, or similar techniques.
Exposure to cloud‑based data pipelines (AWS, Google Cloud Platform, Azure).
Understanding of CI/CD, containerization (Docker), and orchestration (Kubernetes).
Blackstraw provides equal employment opportunities to applicants and employees without regard to race, color, religion, age, sex, sexual orientation, gender identity/expression, national origin, marital status, protected veteran status, disability status, or any other basis as protected by federal, state, or local law