CANDIDATES ONLY WITH GOOD FINANCIAL AND HEDGE FUND EXPERIENCE
This role is in the Research Department
Title -- Data ingestion Expert / Data Engineer
PYTHON good in python
Good in SQL
Prior experience in a large variety of alternative data sets <---- THIS IS KEY
Must have financial services
What is Alternative Data Sets -
NON- Market data data from a vendor like consumer spending behaviour or web traffic behaviour, various kinds of data you get from web scrapes
Alternative data hedge funds use non-traditional data, like satellite imagery, social media sentiment, geolocation, and web scraping, to find investment signals missed by conventional research, offering early insights into market trends, consumer behaviour, and company performance to gain an edge over rivals. This diverse data, often large and complex (big data), comes from outside a company and helps funds spot opportunities and risks before they appear in standard reports, with most hedge funds now integrating some form of it.
Common Sources of Alternative Data
- Geolocation Data : Foot traffic, device movement to gauge retail/foot traffic.
- Social Media & Web Data : Sentiment analysis from Twitter, Reddit, review sites, web scraping for trends.
- Satellite Imagery : Tracking parking lots, oil storage, construction to predict retail/commodity activity.
- Credit Card & Transaction Data : Aggregated purchase data to see consumer spending patterns.
- Web Traffic & App Downloads : Real-time indicators of product adoption and company health.
How Hedge Funds Use It
- Early Signals : Spotting shifts in demand or sentiment before earnings reports.
- Competitive Edge : Gaining insights into consumer perception and competitor performance.
- Quantitative Models : Feeding large datasets into complex algorithms for predictive power.
Key Providers & Tools
Companies like YipitData, Quandl (now part of Nasdaq), and others gather and process this data.
Services like AlphaSense and Alternative Soft help manage and analyze this information.
Challenges
- Data Quality: Ensuring data is accurate, relevant, and not misleading.
- Volume & Complexity: Handling massive, unstructured datasets requires advanced tech.
Integrate with APIs and bring into Snowflake <--- Must have API experience and Snowflake
Write ETL process, ingest from APIs S3 Buckets STFP servers Snowflake vendor shares. <--- Must have ETL, APIs, S3 Buckets, STFP Servers, Snowflake Vendor Shares.
Required to write the code to put it into snowflake as a starting point Data engineering work is to put it into snowflake
Next part of the role is to --
Integrate the raw data they ve ingested into their analytics systems Which is internally built Python tools and SQL processes they ve developed. They have Internally built Python libraries. PREFECT is an orchestration tool. (Prefect.io)
DBT a data transformation tool THIS IS THE TOOL KIT.
Candidate must be users of existing tools and processes and someone who has enough experience to suggest improvements to things , they have a non-perfect system and they want someone who can contribute ideas and do the necessary work to improve their data processes. They want someone who can build data quality and alerting tools and building leveraging DBT *** for work flows and adding features to existing things what they ve already built. Help out in every way possible.
PYTHON SKILLS THEY MUST HAVE -
NumPy, SiPy or sip y, Pandas SiPy - they have a fairyy typical data manipulation stack in python. But we re also open to other libraries that they don t use
Like Polaris, desk PySpark.
WORK FAST WITH EXPERIENCE IN THE FINANCIAL (may another hedge fund)
Ideally from the financial industry where they have worked with a variety of data sets in a time pressure environment. (what they are not going to get is 2 months to solve this problem. What you will get -- here are the 10 data sets to process in the next 2 weeks, let s make this happen and streamline processes as new data comes in and you get more efficient.
Ongoing que of data set ingestion projects
ALSO
Mixed in with the above is improvement projects, this new feature in data loading project to get more visibility to the project for the stakeholders
Python on Windows. We work in a Virtual machine-based environment. They work instead a slightly less modern environment. Virtual machine and this influence how they think of problem solving.
Normal hours are 9-6 hours (someone builds a data process and it breaks you would have to have some amount of ownership. ) They may need to start at 6 in the morning to fix the problem in the system they built themselves. (own what you built) 5 days a week, 4 days on site but also must get up at 6 in the am to fix something to fix it.
Excellent communications and talk to other engineers and talk to the data scientists to talk to the business. Understand the downstream intent of using the data <--- What is the business purpose of this data? Be smart.
8-10 daily rate (due support element )
***
dbt (data build tool) is an analytics engineering framework that helps data teams transform data in their warehouses using SQL, bringing software development best practices (like version control, testing, and documentation) to data pipelines, enabling analysts and engineers to build scalable, reliable, and collaborative data models for analytics and BI. It focuses on the "T" (transformation) in ELT , allowing teams to build modular, testable, and documented data transformations directly within cloud data warehouses.
Job Type: Contract
Work Location: In person