ColPali is vision based RAG (Retrieval Augmented Generation), what differs this architecture from your normal rag is the ability to capture visual data from the document
Thanks for these source i can create this repo:
https://www.youtube.com/watch?v=DI9Q60T_054
https://github.com/merveenoyan/smol-vision/blob/main/ColPali_%2B_Qwen2_VL.ipynb