Company Profile
Blackstraw.ai is an end-to-end technology services company specializing in Artificial Intelligence (AI)
and Engineering solutions across Data Science, Data Engineering, LLM/GenAI and LLMOps. Founded
in 2018, we help global enterprises across North America, Europe and Asia to build and
operationalize AI systems that create measurable business impact. Our mission is to make AI
adoption simpler, faster and scalable through a blend of deep domain expertise, reusable
accelerators and proven engineering practices.
With a 400+ strong team of engineers, data scientists and AI specialists, we partner with
organizations to deliver real-world outcomes in areas such as predictive analytics, computer vision,
natural language processing and Generative AI.
Headquartered in Florida (USA) with operations in USA, Canada and India, Blackstraw.ai continues to
empower global enterprises to unlock the true potential of AI.
Location: USA / Canada
Experience: 6 to 10 years
Employment Type: Full-time
Role Overview: This role focuses on the hands-on-code execution. They will build the ingestion pipelines from sources like SAP, Salesforce, or RDBMS into the Google Cloud Platform landing zone and transform them into high-quality data products.
Key Responsibilities
- Pipeline Development: Build robust ETL/ELT pipelines using Delta Live Tables (DLT) / Spark Declarative Pipelines and PySpark on Google Cloud Platform.
- Streaming & Ingestion: Implement real-time data ingestion using Cloud Pub/Sub and Databricks Structured Streaming.
- Advanced Transformations: Expertly use PySpark for complex unstructured data and SparkSQL or BigQuery SQL for structured business logic.
- Data Quality: Develop automated data validation frameworks using Great Expectations or Databricks Expectations within DLT.
- DevOps/CI-CD: Automate environment provisioning and code deployment using Terraform and Google Cloud Build (or GitHub Actions).
- Cost Management: Monitor and optimize DBU (Databricks Unit) consumption and Google Cloud Platform compute costs.
Technical Requirements
- Languages: Expert-level PySpark, SQL, and Python.
- Google Cloud Platform Ecosystem: Hands-on experience with Cloud Functions, Cloud Run, and BigQuery.
- Frameworks: Experience with Spark Declarative Pipelines (formerly Delta Live Tables) for Spark-centric flows.
- Tools: Proficiency with the Databricks CLI, Asset Bundles (DABs), and version control via Databricks Repos.
Soft Skills
- Ability to translate complex technical concepts into actionable insights
- Strong problemsolving mindset with a bias for experimentation and innovation.
- Collaborative, proactive, and comfortable working in fastpaced environments.
We are an equal opportunity employer. Employment decisions are based on qualifications, merit, and business needs. We do not discriminate on any basis protected by applicable laws in the countries where we operate.