Hands on experience with LLM (commercial and open source), LangChain, LlamaIndex, Retrieval Augmented Generation, and Vector Databases
3-5 years of experience with graph databases (Neo4J and Cypher would be a big plus)
7+ years of professional experience in data science, with strong foundation in statistical analysis, machine learning, and data visualization
Proficiency in programing languages such as Python and Java and familiarity with libraries/frameworks such as TensorFlow, PyTorch, scikit-learn, Scipy, Pandas, etc
Solid understanding of experimental design, hypothesis testing, and casual inference methods
Experience with SQL and relational databases for data manipulation and querying
Excellent communication and collaboration skills, with the ability to explain complex technical concepts to non-technical stakeholders
Strong problem-solving and critical-thinking abilities, with a demonstrated ability to tackle open-ended problems and drive projects to completion
Knowledge of software engineering best practices, including version control, agile SDLC, testing and deployment pipelines
Good Knowledge of Kubernetes, Kafka, REST-ful design would be a plus
Development experience of at least one public cloud provider would be a plus
Responsibilities:
Drive AI/Machine Learning development and implementation especially around Generative AI Large Language Model for OTC derivatives affirmations/confirmations
Work independently or collaboratively (depending on the circumstance) to help business stakeholders and Bank’s IT Directors level people to identify analytical/technical opportunities to improve the existing process or address existing pain points
Must have solid knowledge and implementation experience of Large Language Model pipeline: Pre-trained LLM (such as Llama, Mistral), GPT, Gemini, RAG, Model Fine-tuning, Prompt Engineering. Vector DB, LangChain, LlamaIndex etc
Integrate generative LLM techniques. This could include chat, Retrieval Augmented Generation, and automated scenario simulation and optimization