Senior Data & AI Engineer
About the role
Contract: 6 months with strong likelihood of extension and/or conversion FTE
Daily Responsibilities: Triage overnight pipeline failures (Airflow DAGs, ETL jobs) and resolve data quality issues Write and optimize complex SQL across Trino, Snowflake, and SQL Server for IAM reporting Build and iterate on MCP server tools / agentic AI workflows — prompt tuning, RAG testing, LLM output validation Develop and maintain ETL/ELT pipelines (Airflow, Spark, DBT, Iceberg) for new and existing data feeds Translate IAM business rules (PAM, entitlements, password compliance) into queryable data models Collaborate with IAM analysts and stakeholders to clarify requirements and deliver data solutions Maintain Snowflake semantic layer views and Cortex AI features for dashboards and AI agents Run QA test cases — validate pipeline outputs, AI agent responses, and source-to-target accuracy Monitor AI governance controls — cost logs, guardrails, audit trails Review and merge GitHub PRs; maintain CI/CD workflows via GitHub Actions Update technical documentation (data dictionaries, runbooks, READMEs) Design, build, and optimize data models and semantic views in Snowflake, including Snowflake Cortex for AI-powered natural language querying and ML-assisted analytics. Build and maintain semantic layer definitions that expose IAM data domains in a consistent, reusable format for AI agents and dashboards. Design and implement production-ready agentic AI systems using MCP server architecture and LLM frameworks such as LangChain, Claude/Anthropic APIs, or equivalent tooling. Build RAG (Retrieval-Augmented Generation) pipelines with vector database integration (e.g. Pinecone, Chroma) and embedding-based retrieval strategies. Write and iterate on prompt engineering strategies — chain-of-thought patterns, tool use, structured output formatting, and context management at scale. Enforce AI governance: LLM cost controls, output monitoring, logging, guardrails, and audit trails to ensure responsible AI use within IAM. Integrate MCP servers and AI agents with existing Trino, SQL Server, and Snowflake data sources. Rapidly onboard to IAM domain — PAM (Privileged Access Management), entitlement profiling, PAI datamart structures, password compliance, and access certification workflows. Translate complex IAM business rules and access governance logic into queryable data models and automated pipelines. Collaborate with IAM analysts and stakeholders to capture and formalize business requirements into technical specifications. Build and maintain QA frameworks and structured test cases for data pipelines, AI agent outputs, and SQL transformations. Validate data accuracy, completeness, and business rule compliance across source-to-target data flows; perform regression testing on pipeline or logic changes. Write complex analytical SQL across Trino, Snowflake, and SQL Server. Build and maintain ETL/ELT pipelines using Apache Airflow (DAG authoring, scheduling, dependency management), Apache Spark, and DBT. Work with Apache Iceberg and Hadoop-based data sources; understand distributed data processing patterns in hybrid on-prem and cloud environments (AWS/Azure). Use GitHub for version control, branch management, pull requests, and peer code review; author and maintain GitHub Actions CI/CD workflows. Produce and maintain clear technical documentation (README files, runbooks, data dictionaries).
What program/technology/software knowledge is essential for this role? Describe in what capacity the selected candidate will be using it:
Python — Used as the baseline language for all AI and data tooling: building pipelines, AI agents, automation scripts, and IAM business logic transformations. Advanced SQL (Snowflake, Trino, SQL Server)— Used daily to write complex analytical queries across multiple IAM data sources and platforms; translating business rules into queryable data models. Snowflake + Cortex AI— Used to design and build semantic views and data models; Cortex specifically for AI-powered natural language querying and ML-assisted analytics on IAM data. MCP Server Architecture— Used to build and deploy production-grade agentic AI tools that connect LLMs to IAM data sources (Trino, Snowflake, SQL Server). LLM APIs (Claude/Anthropic, GPT) — Used to design, implement, and govern AI agent systems; includes prompt engineering, cost monitoring, output validation, and audit logging. LangChain + RAG Pipelines + Vector DBs (Pinecone, Chroma)— Used to build retrieval-augmented generation systems that allow AI agents to reason over IAM domain knowledge and documentation. Apache Airflow — Used to author, schedule, and manage ETL/ELT pipeline DAGs across IAM data domains. Apache Spark— Used for large-scale distributed data processing within ETL/ELT pipelines. DBT — Used for SQL-based data transformations and maintaining data models in the lakehouse layer. Apache Iceberg / Hadoop— Used to work with lakehouse-format data sources in hybrid on-prem and cloud environments. AWS / Azure — Used to deploy and manage data pipelines and AI workloads across hybrid cloud infrastructure. GitHub + GitHub Actions** — Used for version control, branch/PR workflows, peer code review, and CI/CD automation of pipeline and AI tool deployments.
Must-have Skills/Experiences and/or Education, certifications, qualifications, designations: Proficiency in Python (baseline for all AI/data tooling) and advanced SQL across at least two platforms (Snowflake, Trino, SQL Server). Hands-on experience with Snowflake SQL, Cortex AI features, and semantic view design. Practical experience building MCP servers or agentic AI frameworks, including LLM API integration (Claude/Anthropic, GPT, or equivalent). Experience building RAG pipelines with LangChain or similar orchestration frameworks and vector database integration (Pinecone, Chroma, or similar). Demonstrated ability to design effective prompts with context management, chain-of-thought patterns, and governance controls (cost monitoring, logging, guardrails). Familiarity with IAM/PAM domain concepts (privileged access, entitlements, access certifications). Experience writing structured test plans and executing test cases for data or software systems. Experience authoring Apache Airflow DAGs, Apache Spark jobs, and managing pipeline dependencies. Knowledge of modern data lakehouse tooling: Apache Iceberg, DBT, and/or Hadoop ecosystem (HDFS, Hive). Experience with hybrid cloud deployments (AWS and/or Azure) alongside on-prem infrastructure. Proficient with Git workflows, GitHub Actions CI/CD pipeline configuration, and code review practices. Ability to produce clear technical documentation and data dictionaries. Strong problem-solving skills and ability to deliver in critical timelines with minimal oversight. Excellent communication skills — comfortable engaging with both technical teams and IAM business stakeholders.
Nice-to-have Skills/Experience and/or Education, certifications, qualifications, designations: Experience with Trino (distributed SQL query engine) — actively used in this environment. Exposure to SailPoint IdentityIQ or other IGA platforms. Familiarity with PAM tools such as CyberArk or BeyondTrust. Real-time streaming experience with Apache Kafka. Knowledge of SharePoint integration for reporting outputs. Prior work in a regulated financial services environment. GitHub portfolio of shipped AI/LLM projects or production RAG systems. Experience with additional orchestration tools: KubeFlow, Dagster, or Temporal. QA-related certification (e.g. ISTQB, Agile Testing). Computer Engineering, Computer Science, or related technical degree/diploma or equivalent experience. Any experience with tools like Tableau, Power BI, or Airflow UI will be an added advantage.
Soft skills: Strong Problem-Solving Skills — Ability to work through complex IAM data and AI challenges independently, delivering solutions under critical timelines with minimal oversight. Excellent Communication Skills — Must be comfortable engaging with both technical teams (engineers, platform teams) and non-technical IAM business stakeholders; translating business requirements into technical specifications. Self-Starter / Fast Learner — Expected to rapidly onboard to the IAM domain (PAM, entitlements, PAI datamart, access certifications) with no hand-holding implied. Collaboration — Works cross-functionally with IAM analysts, platform teams, and business stakeholders across on-prem and cloud environments.
FP Inc. is committed to creating an inclusive environment where all team members and clients feel like they belong. In accordance with the requirements set out in the Employment Standards Act, FP Inc. hereby declares that AI is utilized in the screening process for this position. The hourly compensation range for this role is $60/hr -$75/hr. We seek applicants with a wide range of abilities, and we provide an accessible candidate experience. We advocate for you and welcome anyone regardless of race, colour, religion, national origin, sex, physical or mental disability, or age.