Lead - Data Extraction Engineer

Remote • Posted 4 hours ago • Updated 4 hours ago

Full Time

No Travel Required

Remote

Depends on Experience

Fitment

Dice Job Match Score™

🛠️ Calibrating flux capacitors...

Job Details

Skills

Data Extraction
Selenium
Scraping
Scrapy
Zyte
Python
Web Scraping

Summary

About the Company

Blackstraw.ai is an end-to-end technology services company specializing in Artificial Intelligence (AI) and Engineering solutions across Data Science, Data Engineering, LLM/GenAI and LLMOps. Founded in 2018, we help global enterprises across North America, Europe and Asia to build and operationalize AI systems that create measurable business impact. Our mission is to make AI adoption simpler, faster and scalable through a blend of deep domain expertise, reusable accelerators and proven engineering practices.

With a 400+ strong team of engineers, data scientists and AI specialists, we partner with organizations to deliver real-world outcomes in areas such as predictive analytics, computer vision, natural language processing and Generative AI. Headquartered in Florida (USA) with operations in Canada and India, Blackstraw.ai continues to empower global enterprises to unlock the true potential of AI.

About the Role

We are building advanced systems to extract, structure, and operationalize data from a wide range of external digital sources. As the Lead for Data Extraction Engineering, you will architect and drive the development of robust, scalable, and resilient extraction frameworks that operate reliably across dynamic and evolving environments. This role requires deep technical expertise, strong problem‑solving skills, and the ability to guide engineering teams toward high‑quality delivery.

Location: USA / Canada (remote)
Experience: 5 to 10 years
Employment Type: Full-time only

Key Responsibilities

Technical Leadership

Lead the design and development of scalable data extraction frameworks and reusable components.
Define best practices, coding standards, and architectural patterns for extraction systems.
Evaluate and integrate new tools, libraries, and platforms to enhance extraction capabilities.

Data Extraction Engineering

Develop strategies to extract structured and semi‑structured data from dynamic, complex, and frequently changing web sources.
Build resilient pipelines that handle challenges such as:
- Dynamic web applications
- Rate limits and throttling
- Anti‑bot and blocking mechanisms
- Frequent DOM or API structure changes
Optimize extraction workflows for accuracy, efficiency, fault tolerance, and long‑term maintainability.

Collaboration & Integration

Work closely with platform, data engineering, and DevOps teams to integrate extraction workflows into broader data pipelines.
Ensure extraction systems scale effectively across distributed environments.
Provide technical mentorship to junior engineers and contribute to team capability building.

Must‑Have Experience

5 to 10+ years of software engineering experience with strong expertise in Python.
Hands‑on experience with:
- Scraping frameworks (e.g., Scrapy or similar)
- Browser automation tools (e.g., Playwright, Selenium)
- Platforms/tools such as Zyte or equivalent ecosystems
Proven experience working with:
- Dynamic and JavaScript‑heavy web applications
- Rate limits, blocking, and anti‑bot mechanisms
Strong understanding of efficient data extraction patterns, architectural trade‑offs, and performance optimization.

Nice to Have

Experience building distributed or large‑scale data extraction systems.
Familiarity with proxy management, rotation strategies, or similar techniques.
Exposure to cloud‑based data pipelines (AWS, Google Cloud Platform, Azure).
Understanding of CI/CD, containerization (Docker), and orchestration (Kubernetes).

Blackstraw provides equal employment opportunities to applicants and employees without regard to race, color, religion, age, sex, sexual orientation, gender identity/expression, national origin, marital status, protected veteran status, disability status, or any other basis as protected by federal, state, or local law

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Dice Id: 91162909
Position Id: 8931448
Posted 4 hours ago

Create job alert

Never miss an opportunity! Create an alert based on the job you applied for.

Similar Jobs

Lead Data Engineer

Remote or Lowell, Massachusetts

•

Today

Why UKG: At UKG, the work you do matters. The code you ship, the decisions you make, and the care you show a customer all add up to real impact. Today, tens of millions of workers start and end their days with our workforce operating platform. Helping people get paid, grow in their careers, and shape the future of their industries. That's what we do. We never stop learning. We never stop challenging the norm. We push for better, and we celebrate the wins along the way. Here, you'll get flexibi

Full-time

USD 129,500.00 - 186,100.00 per year

Data Engineer

Remote

•

Today

Job Family: Database Administration Travel Required: Up to 25% Clearance Required: None What You Will Do: Design, develop, and maintain robust data pipelines and ETL/ELT processes to ingest, transform, and load data from diverse structured and unstructured sources.Build and optimize data models and data architectures to support analytics, reporting, and operational use cases.Implement and maintain CI/CD pipelines for data engineering workflows, including data pipelines and scheduled jobs, usi

Full-time

USD 85,000.00 - 141,000.00 per year

Staff Data Engineer

Remote or Eagan, Minnesota

•

Today

Staff Data Engineer, Public Records and Derived Authorities Are you ready to shape the future of AI-driven content technology while leading cutting-edge innovation in a mission-critical role? Do you thrive in environments where your technical expertise can directly impact how the world's leading professionals' access and utilize information? We are seeking a talented, self-driven and highly motivated Staff Data Engineer, with experience working with and developing large data solutions to join ou

Full-time

USD 118,400.00 per year

Senior Data Engineer

Remote

•

Today

WHO ARE WE? Launch Potato is a profitable digital media company that reaches over 30M+ monthly visitors through brands such as FinanceBuzz, All About Cookies, and OnlyInYourState. As The Discovery and Conversion Company, our mission is to connect consumers with the world's leading brands through data-driven content and technology. Headquartered in South Florida with a remote-first team spanning over 15 countries, we've built a high-growth, high-performance culture where speed, ownership, and

Full-time

Search all similar jobs