This GitHub repository contains an updated list of Federated Learning papers as of February 24, 2025.
- The resources are collected from various sources, including arXiv, NeurIPS, ICML, ICLR, ACL, EMNLP, AAAI, IJCAI, KDD, CVPR, ICCV, ECCV, NIPS, IEEE, ACM, Springer, ScienceDirect, Wiley, Nature, Science, and other top AI/ML conferences and journals.
- For a better reading experience, visit the Shinyapps website.
Explore additional research papers on the following topics:
- For Large Language Models papers, please visit the LLM Repository.
- For Backdoor Learning papers, please visit the Backdoor Learning Repository.
- For Federated Learning papers, please visit the Federated Learning Repository.
- For Machine Unlearning papers, please visit the Machine Unlearning Repository.
For contributions, inquiries, or suggestions, feel free to reach out via email.
If you find this application helpful and would like to support its development, you can buy me a coffee using one of the following methods:
- Techcombank (Vietnam): 5877 5555 55 (Nguyen Thi Lan Phuong)
- PayPal or Credit/Debit Card: https://ko-fi.com/miutheladycat
Due to GitHub repository limitations, this section includes only those papers that provide accompanying code, sorted by publish date. For access to the full list of papers, please visit the Shinyapps website.
No. | Title | Authors | Publish Date | Venue | Code | URL |
---|---|---|---|---|---|---|
1 | CER: Confidence Enhanced Reasoning in LLMs | Ali Razghandi, Seyed Mohammad Hadi Hosseini, Mahdieh Soleymani Baghshah | 2025-02-22 | arXiv …, 2025 | https://github.com/ | http://arxiv.org/abs/2502.14634v1 |
2 | Dynamic Low-Rank Sparse Adaptation for Large Language Models | Weizhong Huang, Yuxin Zhang, Xiawu Zheng, Yang Liu, Jing Lin, Yiwu Yao, Rongrong Ji | 2025-02-22 | arXiv …, 2025 | https://github.com/wzhuang-xmu/LoSA | http://arxiv.org/abs/2502.14816v1 |
3 | A General Pseudonymization Framework for Cloud-Based LLMs: Replacing Privacy Information in Controlled Text Generation | Shilong Hou, Ruilin Shang, Zi Long, Xianghua Fu, Yin Chen | 2025-02-21 | arXiv | https://github.com/Mebymeby/Pseudonymization-Framework | http://arxiv.org/abs/2502.15233v1 |
4 | On the logical skills of large language models: evaluations using arbitrarily complex first-order logic problems | Shokhrukh Ibragimov, Arnulf Jentzen, Benno Kuckuck | 2025-02-21 | arXiv:2502.14180, 2025 | https://github.com/bkuckuck/logical-skills-of-llms | http://arxiv.org/abs/2502.14180v1 |
5 | Transfer-Prompting: Enhancing Cross-Task Adaptation in Large Language Models via Dual-Stage Prompts Optimization | Yupeng Chang, Yi Chang, Yuan Wu | 2025-02-21 | arXiv:2502.14211, 2025 | https://github.com/llm172/Transfer-Prompting | http://arxiv.org/abs/2502.14211v1 |
6 | Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models | Ya Wang, Zhijian Zhuo, Yutao Zeng, Xun Zhou, Jian Yang, Xiaoqing Li | 2025-02-21 | arXiv | https://github.com/kaihemo/SDD | http://arxiv.org/abs/2502.15499v1 |
7 | STeCa: Step-level Trajectory Calibration for LLM Agent Learning | Hanlin Wang, Jian Wang, Chak Tou Leong, Wenjie Li | 2025-02-21 | arXiv:2502.14276, 2025 | https://github.com/WangHanLinHenry/STeCa | http://arxiv.org/abs/2502.14276v1 |
8 | Probe Pruning: Accelerating LLMs through Dynamic Pruning via Model-Probing | Qi Le, Enmao Diao, Ziyan Wang, Xinran Wang, Jie Ding, Li Yang, Ali Anwar | 2025-02-21 | arXiv | https://github.com/Qi-Le1/Probe_Pruning | http://arxiv.org/abs/2502.15618v1 |
9 | PredictaBoard: Benchmarking LLM Score Predictability | Lorenzo Pacchiardi, Konstantinos Voudouris, Ben Slater, Fernando Martínez-Plumed, José Hernández-Orallo, Lexin Zhou, Wout Schellaert | 2025-02-21 | arXiv …, 2025 | https://github.com/Kinds-of-Intelligence-CFI/PredictaBoard | http://arxiv.org/abs/2502.14445v1 |
10 | Plan-over-Graph: Towards Parallelable LLM Agent Schedule | Shiqi Zhang, Xinbei Ma, Zouying Cao, Zhuosheng Zhang, Hai Zhao | 2025-02-21 | arXiv:2502.14563, 2025 | https://github.com/zsq259/Plan-over-Graph | http://arxiv.org/abs/2502.14563v1 |
11 | Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs | Giulio Zizzo, Giandomenico Cornacchia, Kieran Fraser, Muhammad Zaid Hameed, Ambrish Rawat, Beat Buesser, Mark Purcell, Pin-Yu Chen, Prasanna Sattigeri, Kush Varshney | 2025-02-21 | arXiv | https://github.com/IBM/Adversarial-Prompt-Evaluation | http://arxiv.org/abs/2502.15427v1 |
12 | Middle-Layer Representation Alignment for Cross-Lingual Transfer in Fine-Tuned LLMs | Danni Liu, Jan Niehues | 2025-02-21 | arXiv:2502.14830, 2025 | https://github.com/dannigt/mid-align | http://arxiv.org/abs/2502.14830v1 |
13 | LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention | Shang Yang, Junxian Guo, Haotian Tang, Qinghao Hu, Guangxuan Xiao, Jiaming Tang, Yujun Lin, Zhijian Liu, Yao Lu, Song Han | 2025-02-21 | arXiv …, 2025 | https://github.com/mit-han-lab/omniserve | http://arxiv.org/abs/2502.14866v1 |
14 | Investigating the Adaptive Robustness with Knowledge Conflicts in LLM-based Multi-Agent Systems | Tianjie Ju, Bowen Wang, Hao Fei, Mong-Li Lee, Wynne Hsu, Yun Li, Qianren Wang, Pengzhou Cheng, Zongru Wu, Zhuosheng Zhang, Gongshen Liu | 2025-02-21 | arXiv | https://github.com/wbw625/MultiAgentRobustness | http://arxiv.org/abs/2502.15153v1 |
15 | From RAG to Memory: Non-Parametric Continual Learning for Large Language Models | Bernal Jiménez Gutiérrez, Yiheng Shu, Weijian Qi, Sizhe Zhou, Yu Su | 2025-02-21 | arXiv:2502.14802, 2025 | https://github.com/OSU-NLP-Group/HippoRAG | http://arxiv.org/abs/2502.14802v1 |
16 | FormalSpecCpp: A Dataset of C++ Formal Specifications created using LLMs | Madhurima Chakraborty, Peter Pirkelbauer, Qing Yi | 2025-02-21 | arXiv | https://github.com/MadhuNimmo/FormalSpecCpp | http://arxiv.org/abs/2502.15217v1 |
17 | CORBA: Contagious Recursive Blocking Attacks on Multi-Agent Systems Based on Large Language Models | Zhenhong Zhou, Zherui Li, Jie Zhang, Yuanhe Zhang, Kun Wang, Yang Liu, Qing Guo | 2025-02-21 | arXiv …, 2025 | https://github.com/zhrli324/Corba | http://arxiv.org/abs/2502.14529v1 |
18 | MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models | Shrey Pandit, Jiawei Xu, Junyuan Hong, Zhangyang Wang, Tianlong Chen, Kaidi Xu, Ying Ding | 2025-02-21 | arXiv …, 2025 | https://medhallu.github.io/ | http://arxiv.org/abs/2502.14302v1 |
19 | Is Safety Standard Same for Everyone? User-Specific Safety Evaluation of Large Language Models | Yeonjun In, Wonjoong Kim, Kanghoon Yoon, Sungchul Kim, Mehrab Tanjim, Kibum Kim, Chanyoung Park | 2025-02-20 | arXiv | https://github.com/yeonjun-in/U-SafeBench | http://arxiv.org/abs/2502.15086v1 |
20 | DataSciBench: An LLM Agent Benchmark for Data Science | Dan Zhang, Sining Zhoubian, Min Cai, Fengzu Li, Lekang Yang, Wei Wang, Tianjiao Dong, Ziniu Hu, Jie Tang, Yisong Yue | 2025-02-19 | arXiv | https://github.com/THUDM/DataSciBench | http://arxiv.org/abs/2502.13897v1 |
21 | SIFT: Grounding LLM Reasoning in Contexts via Stickers | Zihao Zeng, Xuyao Huang, Boxiu Li, Zhijie Deng | 2025-02-19 | arXiv | https://github.com/zhijie-group/SIFT | http://arxiv.org/abs/2502.14922v1 |
22 | Proving Olympiad Inequalities by Synergizing LLMs and Symbolic Reasoning | Zenan Li, Zhaoyu Li, Wen Tang, Xian Zhang, Yuan Yao, Xujie Si, Fan Yang, Kaiyu Yang, Xiaoxing Ma | 2025-02-19 | arXiv | https://github.com/Lizn-zn/NeqLIPS/ | http://arxiv.org/abs/2502.13834v1 |
23 | PRIV-QA: Privacy-Preserving Question Answering for Cloud Large Language Models | Guangwei Li, Yuansen Zhang, Yinggui Wang, Shoumeng Yan, Lei Wang, Tao Wei | 2025-02-19 | arXiv | https://github.com/ligw1998/PRIV-QA | http://arxiv.org/abs/2502.13564v1 |
24 | LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization | Guanzheng Chen, Xin Li, Michael Qizhe Shieh, Lidong Bing | 2025-02-19 | arXiv | https://github.com/DAMO-NLP-SG/LongPO | http://arxiv.org/abs/2502.13922v2 |
25 | Judging the Judges: A Collection of LLM-Generated Relevance Judgements | Hossein A. Rahmani, Clemencia Siro, Mohammad Aliannejadi, Nick Craswell, Charles L. A. Clarke, Guglielmo Faggioli, Bhaskar Mitra, Paul Thomas, Emine Yilmaz | 2025-02-19 | arXiv | https://llm4eval.github.io/LLMJudge-benchmark/ | http://arxiv.org/abs/2502.13908v1 |
26 | Lost in Sequence: Do Large Language Models Understand Sequential Recommendation? | Sein Kim, Hongseok Kang, Kibum Kim, Jiwan Kim, Donghyun Kim, Minchul Yang, Kwangjin Oh, Julian McAuley, Chanyoung Park | 2025-02-19 | arXiv | https://github.com/Sein-Kim/LLM-SRec | http://arxiv.org/abs/2502.13909v2 |
27 | Craw4LLM: Efficient Web Crawling for LLM Pretraining | Shi Yu, Zhiyuan Liu, Chenyan Xiong | 2025-02-19 | arXiv | https://github.com/cxcscmu/Crawl4LLM | http://arxiv.org/abs/2502.13347v1 |
28 | Collaborative Retrieval for Large Language Model-based Conversational Recommender Systems | Yaochen Zhu, Chao Wan, Harald Steck, Dawen Liang, Yesu Feng, Nathan Kallus, Jundong Li | 2025-02-19 | arXiv | https://github.com/yaochenzhu/CRAG | http://arxiv.org/abs/2502.14137v1 |
29 |
|
Vishal Dey, Xiao Hu, Xia Ning | 2025-02-19 | arXiv | https://github.com/ninglab/GeLLMO | http://arxiv.org/abs/2502.13398v1 |
30 | Benchmarking LLMs for Political Science: A United Nations Perspective | Yueqing Liang, Liangwei Yang, Chen Wang, Congying Xia, Rui Meng, Xiongxiao Xu, Haoran Wang, Ali Payani, Kai Shu | 2025-02-19 | arXiv | https://github.com/yueqingliang1/UNBench | http://arxiv.org/abs/2502.14122v1 |
31 | ArtMentor: AI-Assisted Evaluation of Artworks to Explore Multimodal Large Language Models Capabilities | Chanjin Zheng, Zengyi Yu, Yilin Jiang, Mingzi Zhang, Xunuo Lu, Jing Jin, Liteng Gao | 2025-02-19 | arXiv | https://artmentor.github.io/ | http://arxiv.org/abs/2502.13832v1 |
32 | AI-Empowered Catalyst Discovery: A Survey from Classical Machine Learning Approaches to Large Language Models | Yuanyuan Xu, Hanchen Wang, Wenjie Zhang, Lexing Xie, Yin Chen, Flora Salim, Ying Zhang, Justin Gooding, Toby Walsh | 2025-02-19 | arXiv | https://github.com/LuckyGirl-XU/Awesome-Artificial-Intelligence-Empowered-Catalyst-Discovery | http://arxiv.org/abs/2502.13626v1 |
33 | Trust Me, I'm Wrong: High-Certainty Hallucinations in LLMs | Adi Simhi, Itay Itzhak, Fazl Barez, Gabriel Stanovsky, Yonatan Belinkov | 2025-02-18 | arXiv | https://github.com/technion-cs-nlp/Trust_me_Im_wrong | http://arxiv.org/abs/2502.12964v1 |
34 | Text2World: Benchmarking Large Language Models for Symbolic World Model Generation | Mengkang Hu, Tianxing Chen, Yude Zou, Yuheng Lei, Qiguang Chen, Ming Li, Hongyuan Zhang, Wenqi Shao, Ping Luo | 2025-02-18 | arXiv | https://text-to-world.github.io/ | http://arxiv.org/abs/2502.13092v1 |
35 | SparAMX: Accelerating Compressed LLMs Token Generation on AMX-powered CPUs | Ahmed F. AbouElhamayed, Jordan Dotzel, Yash Akhauri, Chi-Chih Chang, Sameh Gobriel, J. Pablo Muñoz, Vui Seng Chua, Nilesh Jain, Mohamed S. Abdelfattah | 2025-02-18 | arXiv | https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning/tree/main/SparAMX | http://arxiv.org/abs/2502.12444v1 |
36 | Soundwave: Less is More for Speech-Text Alignment in LLMs | Yuhao Zhang, Zhiheng Liu, Fan Bu, Ruiyu Zhang, Benyou Wang, Haizhou Li | 2025-02-18 | arXiv | https://github.com/FreedomIntelligence/Soundwave | http://arxiv.org/abs/2502.12900v1 |
37 | SEA: Low-Resource Safety Alignment for Multimodal Large Language Models via Synthetic Embeddings | Weikai Lu, Hao Peng, Huiping Zhuang, Cen Chen, Ziqian Zeng | 2025-02-18 | arXiv | https://github.com/ZeroNLP/SEA | http://arxiv.org/abs/2502.12562v1 |
38 | PTQ1.61: Push the Real Limit of Extremely Low-Bit Post-Training Quantization Methods for Large Language Models | Jiaqi Zhao, Miao Zhang, Ming Wang, Yuzhang Shang, Kaihao Zhang, Weili Guan, Yaowei Wang, Min Zhang | 2025-02-18 | arXiv | https://github.com/zjq0455/PTQ1.61 | http://arxiv.org/abs/2502.13179v1 |
39 | MoBA: Mixture of Block Attention for Long-Context LLMs | Enzhe Lu, Zhejun Jiang, Jingyuan Liu, Yulun Du, Tao Jiang, Chao Hong, Shaowei Liu, Weiran He, Enming Yuan, Yuzhi Wang, Zhiqi Huang, Huan Yuan, Suting Xu, Xinran Xu, Guokun Lai, Yanru Chen, Huabin Zheng, Junjie Yan, Jianlin Su, Yuxin Wu, Neo Y. Zhang, Zhilin Yang, Xinyu Zhou, Mingxing Zhang, Jiezhong Qiu | 2025-02-18 | arXiv | https://github.com/MoonshotAI/MoBA | http://arxiv.org/abs/2502.13189v1 |
40 | Investigating and Extending Homans' Social Exchange Theory with Large Language Model based Agents | Lei Wang, Zheqing Zhang, Xu Chen | 2025-02-18 | arXiv | https://github.com/Paitesanshi/SET | http://arxiv.org/abs/2502.12450v1 |
41 | G-Refer: Graph Retrieval-Augmented Large Language Model for Explainable Recommendation | Yuhan Li, Xinni Zhang, Linhao Luo, Heng Chang, Yuxiang Ren, Irwin King, Jia Li | 2025-02-18 | arXiv | https://github.com/Yuhan1i/G-Refer | http://arxiv.org/abs/2502.12586v1 |
42 | Can LLM Watermarks Robustly Prevent Unauthorized Knowledge Distillation? | Leyi Pan, Aiwei Liu, Shiyu Huang, Yijian Lu, Xuming Hu, Lijie Wen, Irwin King, Philip S. Yu | 2025-02-17 | arXiv | https://github.com/THU-BPM/Watermark-Radioactivity-Attack | http://arxiv.org/abs/2502.11598v1 |
43 | VRoPE: Rotary Position Embedding for Video Large Language Models | Zikang Liu, Longteng Guo, Yepeng Tang, Junxian Cai, Kai Ma, Xi Chen, Jing Liu | 2025-02-17 | arXiv | https://github.com/johncaged/VRoPE | http://arxiv.org/abs/2502.11664v1 |
44 | Language Models Can See Better: Visual Contrastive Decoding For LLM Multimodal Reasoning | Yuqi Pang, Bowen Yang, Haoqin Tu, Yun Cao, Zeyu Zhang | 2025-02-17 | arXiv | https://github.com/Pbhgit/MVCD | http://arxiv.org/abs/2502.11751v1 |
45 | Idiosyncrasies in Large Language Models | Mingjie Sun, Yida Yin, Zhiqiu Xu, J. Zico Kolter, Zhuang Liu | 2025-02-17 | arXiv | https://eric-mingjie.github.io/llm-idiosyncrasies/index.html | http://arxiv.org/abs/2502.12150v1 |
46 | Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation Capabilities | Hanbin Wang, Xiaoxuan Zhou, Zhipeng Xu, Keyuan Cheng, Yuxin Zuo, Kai Tian, Jingwei Song, Junting Lu, Wenhui Hu, Xueyang Liu | 2025-02-17 | arXiv | https://github.com/wanghanbinpanda/CodeVision | http://arxiv.org/abs/2502.11829v1 |
47 | A-MEM: Agentic Memory for LLM Agents | Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, Yongfeng Zhang | 2025-02-17 | arXiv | https://github.com/WujiangXu/AgenticMemory | http://arxiv.org/abs/2502.12110v1 |
48 | Bitnet.cpp: Efficient Edge Inference for Ternary LLMs | Jinheng Wang, Hansong Zhou, Ting Song, Shijie Cao, Yan Xia, Ting Cao, Jianyu Wei, Shuming Ma, Hongyu Wang, Furu Wei | 2025-02-17 | arXiv | https://github.com/microsoft/BitNet/tree/paper | http://arxiv.org/abs/2502.11880v1 |
49 | RIDE: Enhancing Large Language Model Alignment through Restyled In-Context Learning Demonstration Exemplars | Yuncheng Hua, Lizhen Qu, Zhuang Li, Hao Xue, Flora D. Salim, Gholamreza Haffari | 2025-02-17 | arXiv | https://github.com/AnonymousCode-ComputerScience/RIDE | http://arxiv.org/abs/2502.11681v1 |
50 | A Survey of Personalized Large Language Models: Progress and Future Directions | Jiahong Liu, Zexuan Qiu, Zhongyang Li, Quanyu Dai, Jieming Zhu, Minda Hu, Menglin Yang, Irwin King | 2025-02-17 | arXiv | https://github.com/JiahongLiu21/Awesome-Personalized-Large-Language-Models | http://arxiv.org/abs/2502.11528v1 |
51 | "Nuclear Deployed!": Analyzing Catastrophic Risks in Decision-making of Autonomous LLM Agents | Rongwu Xu, Xiaojian Li, Shuo Chen, Wei Xu | 2025-02-17 | arXiv | https://github.com/pillowsofwind/LLM-CBRN-Risks | http://arxiv.org/abs/2502.11355v1 |
52 | Atom of Thoughts for Markov LLM Test-Time Scaling | Fengwei Teng, Zhaoyang Yu, Quan Shi, Jiayi Zhang, Chenglin Wu, Yuyu Luo | 2025-02-17 | arXiv | https://github.com/qixucen/atom | http://arxiv.org/abs/2502.12018v1 |
53 | How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training | Yixin Ou, Yunzhi Yao, Ningyu Zhang, Hui Jin, Jiacheng Sun, Shumin Deng, Zhenguo Li, Huajun Chen | 2025-02-16 | arXiv | https://github.com/zjunlp/DynamicKnowledgeCircuits | http://arxiv.org/abs/2502.11196v1 |
54 | SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors | Bohan Lyu, Siqiao Huang, Zichen Liang | 2025-02-16 | arXiv | https://github.com/Imbernoulli/SURGE | http://arxiv.org/abs/2502.11167v1 |
55 | ReLearn: Unlearning via Learning for Large Language Models | Haoming Xu, Ningyuan Zhao, Liming Yang, Sendong Zhao, Shumin Deng, Mengru Wang, Bryan Hooi, Nay Oo, Huajun Chen, Ningyu Zhang | 2025-02-16 | arXiv | https://github.com/zjunlp/unlearn | http://arxiv.org/abs/2502.11190v1 |
56 | Ramp Up NTT in Record Time using GPU-Accelerated Algorithms and LLM-based Code Generation | Yu Cui, Hang Fu, Licheng Wang, Haibin Zhang | 2025-02-16 | arXiv | https://github.com/LMPC-Lab/GenGPUCrypto | http://arxiv.org/abs/2502.11110v1 |
57 | MasRouter: Learning to Route LLMs for Multi-Agent Systems | Yanwei Yue, Guibin Zhang, Boyang Liu, Guancheng Wan, Kun Wang, Dawei Cheng, Yiyan Qi | 2025-02-16 | arXiv | https://github.com/yanweiyue/masrouter | http://arxiv.org/abs/2502.11133v1 |
58 | G-Safeguard: A Topology-Guided Security Lens and Treatment on LLM-based Multi-agent Systems | Shilong Wang, Guibin Zhang, Miao Yu, Guancheng Wan, Fanci Meng, Chongye Guo, Kun Wang, Yang Wang | 2025-02-16 | arXiv | https://github.com/wslong20/G-safeguard | http://arxiv.org/abs/2502.11127v1 |
59 | Exposing Numeracy Gaps: A Benchmark to Evaluate Fundamental Numerical Abilities in Large Language Models | Haoyang Li, Xuejia Chen, Zhanchao XU, Darian Li, Nicole Hu, Fei Teng, Yiming Li, Luyu Qiu, Chen Jason Zhang, Qing Li, Lei Chen | 2025-02-16 | arXiv | https://github.com/TreeAI-Lab/NumericBench | http://arxiv.org/abs/2502.11075v1 |
60 | CORDIAL: Can Multimodal Large Language Models Effectively Understand Coherence Relationships? | Aashish Anantha Ramakrishnan, Aadarsh Anantha Ramakrishnan, Dongwon Lee | 2025-02-16 | arXiv | https://github.com/aashish2000/CORDIAL | http://arxiv.org/abs/2502.11300v1 |
61 | Reasoning-Augmented Conversation for Multi-Turn Jailbreak Attacks on Large Language Models | Zonghao Ying, Deyue Zhang, Zonglei Jing, Yisong Xiao, Quanchen Zou, Aishan Liu, Siyuan Liang, Xiangzheng Zhang, Xianglong Liu, Dacheng Tao | 2025-02-16 | arXiv | https://github.com/NY1024/RACE | http://arxiv.org/abs/2502.11054v1 |
62 | BoT: Breaking Long Thought Processes of o1-like Large Language Models through Backdoor Attack | Zihao Zhu, Hongbao Zhang, Mingda Zhang, Ruotong Wang, Guanzong Wu, Ke Xu, Baoyuan Wu | 2025-02-16 | arXiv | https://github.com/zihao-ai/BoT | http://arxiv.org/abs/2502.12202v1 |
63 | Injecting Domain-Specific Knowledge into Large Language Models: A Comprehensive Survey | Zirui Song, Bin Yan, Yuhan Liu, Miao Fang, Mingzhe Li, Rui Yan, Xiuying Chen | 2025-02-15 | arXiv | https://github.com/abilliyb/Knowledge_Injection_Survey_Papers | http://arxiv.org/abs/2502.10708v1 |
64 | SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models | Daniel Fleischer, Moshe Berchansky, Gad Markovits, Moshe Wasserblat | 2025-02-15 | arXiv …, 2025 | https://github.com/IntelLabs/RAG-FiT/tree/square | http://arxiv.org/abs/2502.09390v1 |
65 | LintLLM: An Open-Source Verilog Linting Framework Based on Large Language Models | Zhigang Fang, Renzhi Chen, Zhijie Yang, Yang Guo, Huadong Dai, Lei Wang | 2025-02-15 | arXiv | https://github.com/fangzhigang32/Static-Verilog-Analysis | http://arxiv.org/abs/2502.10815v1 |
66 | An Empirical Analysis of Uncertainty in Large Language Model Evaluations | Qiujie Xie, Qingqiu Li, Zhuohao Yu, Yuejie Zhang, Yue Zhang, Linyi Yang | 2025-02-15 | arXiv | https://github.com/hasakiXie123/LLM-Evaluator-Uncertainty | http://arxiv.org/abs/2502.10709v1 |
67 | EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents | Rui Yang, Hanyang Chen, Junyu Zhang, Mark Zhao, Cheng Qian, Kangrui Wang, Qineng Wang, Teja Venkat Koripella, Marziyeh Movahedi, Manling Li, Heng Ji, Huan Zhang, Tong Zhang | 2025-02-15 | arXiv …, 2025 | https://embodiedbench.github.io | http://arxiv.org/abs/2502.09560v1 |
68 | Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs | Siyan Zhao, Mingyi Hong, Yang Liu, Devamanyu Hazarika, Kaixiang Lin | 2025-02-15 | arXiv …, 2025 | https://prefeval.github.io/ | http://arxiv.org/abs/2502.09597v1 |
69 | KKA: Improving Vision Anomaly Detection through Anomaly-related Knowledge from Large Language Models | Dong Chen, Zhengqing Hu, Peiguang Fan, Yueting Zhuang, Yafei Li, Qidong Liu, Xiaoheng Jiang, Mingliang Xu | 2025-02-14 | arXiv | https://github.com/Anfeather/KKA | http://arxiv.org/abs/2502.14880v1 |
70 | Large Language Diffusion Models | Shen Nie, Fengqi Zhu, Zebin You, Xiaolu Zhang, Jingyang Ou, Jun Hu, Jun Zhou, Yankai Lin, Ji-Rong Wen, Chongxuan Li | 2025-02-14 | arXiv | https://ml-gsai.github.io/LLaDA-demo/ | http://arxiv.org/abs/2502.09992v1 |
71 | V2V-LLM: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multi-Modal Large Language Models | Hsu-kuang Chiu, Ryo Hachiuma, Chien-Yi Wang, Stephen F. Smith, Yu-Chiang Frank Wang, Min-Hung Chen | 2025-02-14 | arXiv | https://eddyhkchiu.github.io/v2vllm.github.io/ | http://arxiv.org/abs/2502.09980v1 |
72 | LaRA: Benchmarking Retrieval-Augmented Generation and Long-Context LLMs - No Silver Bullet for LC or RAG Routing | Kuan Li, Liwen Zhang, Yong Jiang, Pengjun Xie, Fei Huang, Shuai Wang, Minhao Cheng | 2025-02-14 | arXiv | https://github.com/likuanppd/LaRA | http://arxiv.org/abs/2502.09977v1 |
73 | MM-RLHF: The Next Step Forward in Multimodal LLM Alignment | Yi-Fan Zhang, Tao Yu, Haochen Tian, Chaoyou Fu, Peiyan Li, Jianshu Zeng, Wulin Xie, Yang Shi, Huanyu Zhang, Junkang Wu, Xue Wang, Yibo Hu, Bin Wen, Fan Yang, Zhang Zhang, Tingting Gao, Di Zhang, Liang Wang, Rong Jin, Tieniu Tan | 2025-02-14 | arXiv | https://mm-rlhf.github.io/ | http://arxiv.org/abs/2502.10391v1 |
74 | Bag of Tricks for Inference-time Computation of LLM Reasoning | Fan Liu, Wenshuo Chao, Naiqiang Tan, Hao Liu | 2025-02-13 | arXiv:2502.07191, 2025 | https://github.com/usail-hkust/benchmark_inference_time_computation_LL | http://arxiv.org/abs/2502.07191v2 |
75 | FinRL-DeepSeek: LLM-Infused Risk-Sensitive Reinforcement Learning for Trading Agents | Mostapha Benhenda | 2025-02-13 | arXiv:2502.07393, 2025 | https://github.com/benstaf/FinRL_DeepSeek | http://arxiv.org/abs/2502.07393v1 |
76 | Ask Patients with Patience: Enabling LLMs for Human-Centric Medical Dialogue with Grounded Reasoning | Jiayuan Zhu, Junde Wu | 2025-02-13 | arXiv:2502.07143, 2025 | https://github.com/SuperMedIntel/AskPatients | http://arxiv.org/abs/2502.07143v1 |
77 | LLM-Generated Microservice Implementations from RESTful API Definitions | Saurabh Chauhan, Zeeshan Rasheed, Abdul Malik Sami, Zheying Zhang, Jussi Rasku, Kai-Kristian Kemell, Pekka Abrahamsson | 2025-02-13 | arXiv | https://github.com/sirbh/code-gen | http://arxiv.org/abs/2502.09766v1 |
78 | DrugImproverGPT: A Large Language Model for Drug Optimization with Fine-Tuning via Structured Policy Optimization | Xuefeng Liu, Songhao Jiang, Siyu Chen, Zhuoran Yang, Yuxin Chen, Ian Foster, Rick Stevens | 2025-02-13 | arXiv …, 2025 | https://github.com/xuefeng-cs/DrugImproverGPT | http://arxiv.org/abs/2502.07237v1 |
79 | The Hidden Dimensions of LLM Alignment: A Multi-Dimensional Safety Analysis | Wenbo Pan, Zhichao Liu, Qiguang Chen, Xiangyang Zhou, Haining Yu, Xiaohua Jia | 2025-02-13 | arXiv | https://github.com/BMPixel/safety-residual-space | http://arxiv.org/abs/2502.09674v1 |
80 | LongReD: Mitigating Short-Text Degradation of Long-Context Large Language Models via Restoration Distillation | Zican Dong, Junyi Li, Jinhao Jiang, Mingyu Xu, Wayne Xin Zhao, Bingning Wang, Weipeng Chen | 2025-02-13 | arXiv …, 2025 | https://github.com/RUCAIBox/LongReD | http://arxiv.org/abs/2502.07365v1 |
81 | DarwinLM: Evolutionary Structured Pruning of Large Language Models | Shengkun Tang, Oliver Sieberling, Eldar Kurtic, Zhiqiang Shen, Dan Alistarh | 2025-02-13 | arXiv …, 2025 | https://github.com/IST-DASLab/DarwinLM | http://arxiv.org/abs/2502.07780v1 |
82 | LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters! | Dacheng Li, Shiyi Cao, Tyler Griggs, Shu Liu, Xiangxi Mo, Eric Tang, Sumanth Hegde, Kourosh Hakhamaneshi, Shishir G. Patil, Matei Zaharia, Joseph E. Gonzalez, Ion Stoica | 2025-02-13 | arXiv …, 2025 | https://github.com/NovaSky-AI/SkyThought | http://arxiv.org/abs/2502.07374v2 |
83 | Making Them a Malicious Database: Exploiting Query Code to Jailbreak Aligned Large Language Models | Qingsong Zou, Jingyu Xiao, Qing Li, Zhi Yan, Yuhang Wang, Li Xu, Wenxuan Wang, Kuofeng Gao, Ruoyu Li, Yong Jiang | 2025-02-13 | arXiv | https://github.com/horizonsinzqs/QueryAttack | http://arxiv.org/abs/2502.09723v1 |
84 | LawGPT: Knowledge-Guided Data Generation and Its Application to Legal LLM | Zhi Zhou, Kun-Yang Yu, Shi-Yu Tian, Xiao-Wen Yang, Jiang-Xin Shi, Pengxiao Song, Yi-Xuan Jin, Lan-Zhe Guo, Yu-Feng Li | 2025-02-12 | arXiv …, 2025 | https://github.com/LAMDASZ-ML/Knowledge-Guide-Data-Generation | http://arxiv.org/abs/2502.06572v2 |
85 | Systematic Outliers in Large Language Models | Yongqi An, Xu Zhao, Tao Yu, Ming Tang, Jinqiao Wang | 2025-02-12 | arXiv:2502.06415, 2025 | https://github.com/an-yongqi/systematic-outliers | http://arxiv.org/abs/2502.06415v1 |
86 | RALLRec: Improving Retrieval Augmented Large Language Model Recommendation with Representation Learning | Jian Xu, Sichun Luo, Xiangyu Chen, Haoming Huang, Hanxu Hou, Linqi Song | 2025-02-12 | arXiv …, 2025 | https://github.com/JianXu95/RALLRec | http://arxiv.org/abs/2502.06101v2 |
87 | Towards Zero-Shot Anomaly Detection and Reasoning with Multimodal Large Language Models | Jiacong Xu, Shao-Yuan Lo, Bardia Safaei, Vishal M. Patel, Isht Dwivedi | 2025-02-12 | arXiv …, 2025 | https://xujiacong.github.io/Anomaly-OV/ | http://arxiv.org/abs/2502.07601v1 |
88 | Large Language Models Meet Symbolic Provers for Logical Reasoning Evaluation | Chengwen Qi, Ren Ma, Bowen Li, He Du, Binyuan Hui, Jinwang Wu, Yuanjun Laili, Conghui He | 2025-02-12 | arXiv …, 2025 | https://github.com/opendatalab/ProverGen | http://arxiv.org/abs/2502.06563v1 |
89 | Data Augmentation to Improve Large Language Models in Food Hazard and Product Detection | Areeg Fahad Rasheed, M. Zarkoosh, Shimam Amer Chasib, Safa F. Abbas | 2025-02-12 | arXiv | https://github.com/AREEG94FAHAD/food-hazard-prdouct-cls | http://arxiv.org/abs/2502.08687v1 |
90 | Calibrating LLMs with Information-Theoretic Evidential Deep Learning | Yawei Li, David Rügamer, Bernd Bischl, Mina Rezaei | 2025-02-12 | arXiv:2502.06351, 2025 | https://github.com/sandylaker/ib-edl | http://arxiv.org/abs/2502.06351v2 |
91 | The Foundational Capabilities of Large Language Models in Predicting Postoperative Risks Using Clinical Notes | Charles Alba, Bing Xue, Joanna Abraham, Thomas Kannampallil, Chenyang Lu | 2025-02-11 | npj Digital Medicine | https://github.com/cja5553/LLMs_in_perioperative_care | http://arxiv.org/abs/2402.17493v5 |
92 | Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining | Daouda Sow, Herbert Woisetschläger, Saikiran Bulusu, Shiqiang Wang, Hans-Arno Jacobsen, Yingbin Liang | 2025-02-10 | arXiv | https://github.com/sowmaster/Sample-Level-Loss-Reweighting-ICLR-2025 | http://arxiv.org/abs/2502.06733v1 |
93 | LLMs in Software Security: A Survey of Vulnerability Detection Techniques and Insights | Ze Sheng, Zhicheng Chen, Shuning Gu, Heqing Huang, Guofei Gu, Jeff Huang | 2025-02-10 | arXiv | https://github.com/OwenSanzas/LLM-For-Vulnerability-Detection | http://arxiv.org/abs/2502.07049v2 |
94 | AutoAgent: A Fully-Automated and Zero-Code Framework for LLM Agents | Jiabin Tang, Tianyu Fan, Chao Huang | 2025-02-09 | arXiv | https://github.com/HKUDS/AutoAgent | http://arxiv.org/abs/2502.05957v2 |
95 | MetaChain: A Fully-Automated and Zero-Code Framework for LLM Agents | Jiabin Tang, Tianyu Fan, Chao Huang | 2025-02-09 | arXiv | https://github.com/HKUDS/MetaChain | http://arxiv.org/abs/2502.05957v1 |
96 | LLM-Powered Decentralized Generative Agents with Adaptive Hierarchical Knowledge Graph for Cooperative Planning | Hanqing Yang, Jingdi Chen, Marie Siew, Tania Lorido-Botran, Carlee Joe-Wong | 2025-02-08 | arXiv | https://happyeureka.github.io/damcs | http://arxiv.org/abs/2502.05453v1 |
97 | Learning Conformal Abstention Policies for Adaptive Risk Management in Large Language and Vision-Language Models | Sina Tayebati, Divake Kumar, Nastaran Darabi, Dinithi Jayasuriya, Ranganath Krishnan, Amit Ranjan Trivedi | 2025-02-08 | arXiv | https://github.com/sinatayebati/vlm-uncertainty | http://arxiv.org/abs/2502.06884v1 |
98 | OntoTune: Ontology-Driven Self-training for Aligning Large Language Models | Zhiqiang Liu, Chengtao Gan, Junjie Wang, Yichi Zhang, Zhongpu Bo, Mengshu Sun, Huajun Chen, Wen Zhang | 2025-02-08 | arXiv | https://github.com/zjukg/OntoTune | http://arxiv.org/abs/2502.05478v1 |
99 | DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails | Yihe Deng, Yu Yang, Junkai Zhang, Wei Wang, Bo Li | 2025-02-07 | arXiv | https://github.com/yihedeng9/DuoGuard | http://arxiv.org/abs/2502.05163v1 |
100 | QuEST: Stable Training of LLMs with 1-Bit Weights and Activations | Andrei Panferov, Jiale Chen, Soroush Tabesh, Roberto L. Castro, Mahdi Nikdan, Dan Alistarh | 2025-02-07 | arXiv | https://github.com/IST-DASLab/QuEST | http://arxiv.org/abs/2502.05003v1 |
101 | LLM-Supported Natural Language to Bash Translation | Finnian Westenfelder, Erik Hemberg, Miguel Tulla, Stephen Moskal, Una-May O'Reilly, Silviu Chiricescu | 2025-02-07 | arXiv | https://github.com/westenfelder/NL2SH | http://arxiv.org/abs/2502.06858v1 |
102 | Confidence Elicitation: A New Attack Vector for Large Language Models | Brian Formento, Chuan Sheng Foo, See-Kiong Ng | 2025-02-07 | arXiv | https://github.com/Aniloid2/Confidence_Elicitation_Attacks | http://arxiv.org/abs/2502.04643v2 |
103 | Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research | Junde Wu, Jiayuan Zhu, Yuyuan Liu | 2025-02-07 | arXiv | https://github.com/theworldofagents/Agentic-Reasoning | http://arxiv.org/abs/2502.04644v1 |
104 | ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning | Yuwei Yin, Giuseppe Carenini | 2025-02-07 | arXiv | https://github.com/YuweiYin/ARR | http://arxiv.org/abs/2502.04689v2 |
105 | ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization | Yinjie Wang, Ling Yang, Guohao Li, Mengdi Wang, Bryon Aragam | 2025-02-06 | arXiv | https://github.com/Gen-Verse/ScoreFlow | http://arxiv.org/abs/2502.04306v1 |
106 | "Short-length" Adversarial Training Helps LLMs Defend "Long-length" Jailbreak Attacks: Theoretical and Empirical Evidence | Shaopeng Fu, Liang Ding, Di Wang | 2025-02-06 | arXiv | https://github.com/fshp971/adv-icl | http://arxiv.org/abs/2502.04204v1 |
107 | Aggregate and conquer: detecting and steering LLM concepts by combining nonlinear predictors over multiple layers | Daniel Beaglehole, Adityanarayanan Radhakrishnan, Enric Boix-Adserà, Mikhail Belkin | 2025-02-06 | arXiv | https://github.com/dmbeaglehole/neural_controllers | http://arxiv.org/abs/2502.03708v1 |
108 | Beyond Prompt Content: Enhancing LLM Performance via Content-Format Integrated Prompt Optimization | Yuanye Liu, Jiahang Xu, Li Lyna Zhang, Qi Chen, Xuan Feng, Yang Chen, Zhongxin Guo, Yuqing Yang, Peng Cheng | 2025-02-06 | arXiv | https://github.com/HenryLau7/CFPO | http://arxiv.org/abs/2502.04295v2 |
109 | CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference | Zehua Pei, Lancheng Zou, Hui-Ling Zhen, Xianzhi Yu, Wulong Liu, Sinno Jialin Pan, Mingxuan Yuan, Bei Yu | 2025-02-06 | arXiv | https://github.com/JarvisPei/CMoE | http://arxiv.org/abs/2502.04416v1 |
110 | EmoBench-M: Benchmarking Emotional Intelligence for Multimodal Large Language Models | He Hu, Yucheng Zhou, Lianzhong You, Hongbo Xu, Qianning Wang, Zheng Lian, Fei Richard Yu, Fei Ma, Laizhong Cui | 2025-02-06 | arXiv | https://emo-gml.github.io/ | http://arxiv.org/abs/2502.04424v1 |
111 | My LLM might Mimic AAE -- But When Should it? | Sandra C. Sandoval, Christabel Acquaye, Kwesi Cobbina, Mohammad Nayeem Teli, Hal Daumé III | 2025-02-06 | arXiv | https://github.com/smelliecat/AAEMime | http://arxiv.org/abs/2502.04564v2 |
112 | Predicting Large Language Model Capabilities on Closed-Book QA Tasks Using Only Information Available Prior to Training | Changhao Jiang, Ming Zhang, Junjie Ye, Xiaoran Fan, Yifei Cao, Jiajun Sun, Zhiheng Xi, Shihan Dou, Yi Dong, Yujiong Shen, Jingqi Tong, Zhen Wang, Tao Liang, Zhihui Fei, Mingyang Wan, Guojun Ma, Qi Zhang, Tao Gui, Xuanjing Huang | 2025-02-06 | arXiv | https://github.com/yuhui1038/SMI | http://arxiv.org/abs/2502.04066v1 |
113 | Robotouille: An Asynchronous Planning Benchmark for LLM Agents | Gonzalo Gonzalez-Pumariega, Leong Su Yean, Neha Sunkara, Sanjiban Choudhury | 2025-02-06 | arXiv | https://github.com/portal-cornell/robotouille | http://arxiv.org/abs/2502.05227v1 |
114 | Preference Leakage: A Contamination Problem in LLM-as-a-judge | Dawei Li, Renliang Sun, Yue Huang, Ming Zhong, Bohan Jiang, Jiawei Han, Xiangliang Zhang, Wei Wang, Huan Liu | 2025-02-05 | arXiv …, 2025 | https://github.com/David-Li0406/Preference-Leakage | http://arxiv.org/abs/2502.01534v1 |
115 | PDE-Controller: LLMs for Autoformalization and Reasoning of PDEs | Mauricio Soroco, Jialin Song, Mengzhou Xia, Kye Emond, Weiran Sun, Wuyang Chen | 2025-02-05 | arXiv …, 2025 | https://pde-controller.github.io/ | http://arxiv.org/abs/2502.00963v1 |
116 | PICBench: Benchmarking LLMs for Photonic Integrated Circuits Design | Yuchao Wu, Xiaofei Yu, Hao Chen, Yang Luo, Yeyu Tong, Yuzhe Ma | 2025-02-05 | arXiv | https://github.com/PICDA/PICBench | http://arxiv.org/abs/2502.03159v1 |
117 | Picky LLMs and Unreliable RMs: An Empirical Study on Safety Alignment after Instruction Tuning | Guanlin Li, Kangjie Chen, Shangwei Guo, Jie Zhang, Han Qiu, Chao Zhang, Guoyin Wang, Tianwei Zhang, Jiwei Li | 2025-02-05 | arXiv …, 2025 | https://github.com/GuanlinLee/llm_instruction_tuning | http://arxiv.org/abs/2502.01116v1 |
118 | Knowledge Distillation from Large Language Models for Household Energy Modeling | Mohannad Takrouri, Nicolás M. Cuadrado, Martin Takáč | 2025-02-05 | arXiv | https://github.com/Singularity-AI-Lab/LLM-Energy-Knowledge-Distillation | http://arxiv.org/abs/2502.03034v1 |
119 | Robust-LLaVA: On the Effectiveness of Large-Scale Robust Image Encoders for Multi-modal Large Language Models | Hashmat Shadab Malik, Fahad Shamshad, Muzammal Naseer, Karthik Nandakumar, Fahad Khan, Salman Khan | 2025-02-05 | arXiv …, 2025 | https://github.com/HashmatShadab/Robust-LLaVA | http://arxiv.org/abs/2502.01576v1 |
120 | SPRI: Aligning Large Language Models with Context-Situated Principles | Hongli Zhan, Muneeza Azmat, Raya Horesh, Junyi Jessy Li, Mikhail Yurochkin | 2025-02-05 | arXiv | https://github.com/honglizhan/SPRI-public | http://arxiv.org/abs/2502.03397v1 |
121 | Tool Unlearning for Tool-Augmented LLMs | Jiali Cheng, Hadi Amiri | 2025-02-05 | arXiv:2502.01083, 2025 | https://clu-uml.github.io/MU-Bench-Project-Page/ | http://arxiv.org/abs/2502.01083v1 |
122 | LLM-TA: An LLM-Enhanced Thematic Analysis Pipeline for Transcripts from Parents of Children with Congenital Heart Disease | Muhammad Zain Raza, Jiawei Xu, Terence Lim, Lily Boddy, Carlos M. Mery, Andrew Well, Ying Ding | 2025-02-05 | arXiv …, 2025 | https://github.com/jiaweixu98/LLM-TA | http://arxiv.org/abs/2502.01620v1 |
123 | Overcoming Vision Language Model Challenges in Diagram Understanding: A Proof-of-Concept with XML-Driven Large Language Models Solutions | Shue Shiinoki, Ryo Koshihara, Hayato Motegi, Masumi Morishige | 2025-02-05 | arXiv | https://github.com/galirage/spreadsheet-intelligence | http://arxiv.org/abs/2502.04389v1 |
124 | Intent Representation Learning with Large Language Model for Recommendation | Yu Wang, Lei Sang, Yi Zhang, Yiwen Zhang | 2025-02-05 | arXiv | https://github.com/wangyu0627/IRLLRec | http://arxiv.org/abs/2502.03307v1 |
125 | Demystifying Long Chain-of-Thought Reasoning in LLMs | Edward Yeo, Yuxuan Tong, Morry Niu, Graham Neubig, Xiang Yue | 2025-02-05 | arXiv | https://github.com/eddycmu/demystify-long-cot | http://arxiv.org/abs/2502.03373v1 |
126 | Breaking Focus: Contextual Distraction Curse in Large Language Models | Yue Huang, Yanbo Wang, Zixiang Xu, Chujie Gao, Siyuan Wu, Jiayi Ye, Xiuying Chen, Pin-Yu Chen, Xiangliang Zhang | 2025-02-05 | arXiv …, 2025 | https://github.com/wyf23187/LLM_CDV | http://arxiv.org/abs/2502.01609v1 |
127 | AtmosSci-Bench: Evaluating the Recent Advance of Large Language Model for Atmospheric Science | Chenyue Li, Wen Deng, Mengqian Lu, Binhang Yuan | 2025-02-05 | arXiv:2502.01159, 2025 | https://github.com/Relaxed-System-Lab/AtmosSci-Bench | http://arxiv.org/abs/2502.01159v1 |
128 | AdaSVD: Adaptive Singular Value Decomposition for Large Language Models | Zhiteng Li, Mingyuan Xia, Jingyuan Zhang, Zheng Hui, Linghe Kong, Yulun Zhang, Xiaokang Yang | 2025-02-05 | arXiv …, 2025 | https://github.com/ZHITENGLI/AdaSVD | http://arxiv.org/abs/2502.01403v2 |
129 | A Benchmark for the Detection of Metalinguistic Disagreements between LLMs and Knowledge Graphs | Bradley P. Allen, Paul T. Groth | 2025-02-05 | arXiv | https://github.com/bradleypallen/trex-metalinguistic-disagreement | http://arxiv.org/abs/2502.02896v1 |
130 | CTR-Driven Advertising Image Generation with Multimodal Large Language Models | Xingye Chen, Wei Feng, Zhenbang Du, Weizhen Wang, Yanyin Chen, Haohan Wang, Linkai Liu, Yaoyu Li, Jinyuan Zhao, Yu Li, Zheng Zhang, Jingjing Lv, Junjie Shen, Zhangang Lin, Jingping Shao, Yuanjie Shao, Xinge You, Changxin Gao, Nong Sang | 2025-02-05 | THE WEB … | https://github.com/Chenguoz/CAIG | http://arxiv.org/abs/2502.06823v1 |
131 | A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods | Isha Puri, Shivchander Sudalairaj, Guangxuan Xu, Kai Xu, Akash Srivastava | 2025-02-05 | arXiv …, 2025 | https://probabilistic-inference-scaling.github.io | http://arxiv.org/abs/2502.01618v2 |
132 | Do Large Language Model Benchmarks Test Reliability? | Joshua Vendrow, Edward Vendrow, Sara Beery, Aleksander Madry | 2025-02-05 | arXiv | https://github.com/MadryLab/platinum-benchmarks | http://arxiv.org/abs/2502.03461v1 |
133 | A Training-Free Length Extrapolation Approach for LLMs: Greedy Attention Logit Interpolation (GALI) | Yan Li, Tianyi Zhang, Zechuan Li, Soyeon Caren Han | 2025-02-04 | arXiv | https://github.com/AcademyCityL/GALI | http://arxiv.org/abs/2502.02659v1 |
134 | SAISA: Towards Multimodal Large Language Models with Both Training and Inference Efficiency | Qianhao Yuan, Yanjiang Liu, Yaojie Lu, Hongyu Lin, Ben He, Xianpei Han, Le Sun | 2025-02-04 | arXiv | https://github.com/icip-cas/SAISA | http://arxiv.org/abs/2502.02458v1 |
135 | AutoGUI: Scaling GUI Grounding with Automatic Functionality Annotations from LLMs | Hongxin Li, Jingfan Chen, Jingran Su, Yuntao Chen, Qing Li, Zhaoxiang Zhang | 2025-02-04 | arXiv | https://autogui-project.github.io/ | http://arxiv.org/abs/2502.01977v1 |
136 | Risk-Aware Driving Scenario Analysis with Large Language Models | Yuan Gao, Mattia Piccinini, Johannes Betz | 2025-02-04 | arXiv | https://github.com/yuangao-tum/Riskaware-Scenario-analyse | http://arxiv.org/abs/2502.02145v1 |
137 | Multi-Lingual Cyber Threat Detection in Tweets/X Using ML, DL, and LLM: A Comparative Analysis | Saydul Akbar Murad, Ashim Dahal, Nick Rahimi | 2025-02-04 | arXiv | https://github.com/Mmurrad/Tweet-Data-Classification | http://arxiv.org/abs/2502.04346v1 |
138 | CognArtive: Large Language Models for Automating Art Analysis and Decoding Aesthetic Elements | Afshin Khadangi, Amir Sartipi, Igor Tchappi, Gilbert Fridgen | 2025-02-04 | arXiv | https://cognartive.github.io/ | http://arxiv.org/abs/2502.04353v1 |
139 | CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token-Level Routing | Wenhao Zheng, Yixiao Chen, Weitong Zhang, Souvik Kundu, Yun Li, Zhengzhong Liu, Eric P. Xing, Hongyi Wang, Huaxiu Yao | 2025-02-04 | arXiv | https://github.com/aiming-lab/CITER | http://arxiv.org/abs/2502.01976v1 |
140 | Progressive Binarization with Semi-Structured Pruning for LLMs | Xianglong Yan, Tianao Zhang, Zhiteng Li, Yulun Zhang | 2025-02-03 | arXiv | https://github.com/XIANGLONGYAN/PBS2P | http://arxiv.org/abs/2502.01705v1 |
141 | A Comprehensive Analysis on LLM-based Node Classification Algorithms | Xixi Wu, Yifei Shen, Fangzhou Ge, Caihua Shan, Yizhu Jiao, Xiangguo Sun, Hong Cheng | 2025-02-03 | arXiv …, 2025 | https://llmnodebed.github.io/ | http://arxiv.org/abs/2502.00829v1 |
142 | MorphBPE: A Morpho-Aware Tokenizer Bridging Linguistic Complexity for Efficient LLM Training Across Morphologies | Ehsaneddin Asgari, Yassine El Kheir, Mohammad Ali Sadraei Javaheri | 2025-02-03 | arXiv:2502.00894, 2025 | https://github.com/llm-lab-org/MorphBPE | http://arxiv.org/abs/2502.00894v1 |
143 | RTBAgent: A LLM-based Agent System for Real-Time Bidding | Leng Cai, Junxuan He, Yikai Li, Junjie Liang, Yuanping Lin, Ziming Quan, Yawen Zeng, Jin Xu | 2025-02-03 | arXiv …, 2025 | https://github.com/CaiLeng/RTBAgent | http://arxiv.org/abs/2502.00792v1 |
144 | RankFlow: A Multi-Role Collaborative Reranking Workflow Utilizing Large Language Models | Can Jin, Hongwu Peng, Anxiang Zhang, Nuo Chen, Jiahui Zhao, Xi Xie, Kuangzheng Li, Shuya Feng, Kai Zhong, Caiwen Ding, Dimitris N. Metaxas | 2025-02-03 | arXiv …, 2025 | https://github.com/jincan333/RankFlow | http://arxiv.org/abs/2502.00709v2 |
145 | MetaOpenFOAM 2.0: Large Language Model Driven Chain of Thought for Automating CFD Simulation and Post-Processing | Yuxuan Chen, Xu Zhu, Hua Zhou, Zhuyin Ren | 2025-02-02 | arXiv:2502.00498, 2025 | https://github.com/Terry-cyx/MetaOpenFOAM | http://arxiv.org/abs/2502.00498v1 |
146 | UGPhysics: A Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language Models | Xin Xu, Qiyun Xu, Tong Xiao, Tianhao Chen, Yuchen Yan, Jiaxin Zhang, Shizhe Diao, Can Yang, Yang Wang | 2025-02-02 | arXiv …, 2025 | https://github.com/YangLabHKUST/UGPhysics | http://arxiv.org/abs/2502.00334v1 |
147 | UniAttn: Reducing Inference Costs via Softmax Unification for Post-Training LLMs | Yizhe Xiong, Wei Huang, Xin Ye, Hui Chen, Zijia Lin, Haoran Lian, Zhenpeng Su, Jungong Han, Guiguang Ding | 2025-02-02 | arXiv …, 2025 | https://github.com/Bostoncake/UniAttn | http://arxiv.org/abs/2502.00439v1 |
148 | LIBRA: Measuring Bias of Large Language Model from a Local Context | Bo Pang, Tingrui Qiao, Caroline Walker, Chris Cunningham, Yun Sing Koh | 2025-02-02 | arXiv | https://github.com/ipangbo/LIBRA | http://arxiv.org/abs/2502.01679v1 |
149 | Speculative Ensemble: Fast Large Language Model Ensemble via Speculation | Jiale Fu, Yuchu Jiang, Junkai Chen, Jiaming Fan, Xin Geng, Xu Yang | 2025-02-01 | arXiv | https://github.com/Kamichanw/Speculative-Ensemble/ | http://arxiv.org/abs/2502.01662v1 |
150 | Differentially Private Steering for Large Language Model Alignment | Anmol Goel, Yaxi Hu, Iryna Gurevych, Amartya Sanyal | 2025-02-01 | arXiv:2501.18532, 2025 | https://github.com/UKPLab/iclr2025-psa | http://arxiv.org/abs/2501.18532v1 |
151 | LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models | Shenghao Fu, Qize Yang, Qijie Mo, Junkai Yan, Xihan Wei, Jingke Meng, Xiaohua Xie, Wei-Shi Zheng | 2025-01-31 | arXiv | https://github.com/iSEE-Laboratory/LLMDet | http://arxiv.org/abs/2501.18954v1 |
152 | Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation | Tiansheng Huang, Sihao Hu, Fatih Ilhan, Selim Furkan Tekin, Ling Liu | 2025-01-31 | arXiv:2501.17433, 2025 | https://github.com/git-disl/Virus | http://arxiv.org/abs/2501.17433v1 |
153 | 2SSP: A Two-Stage Framework for Structured Pruning of LLMs | Fabrizio Sandri, Elia Cunegatti, Giovanni Iacca | 2025-01-31 | arXiv:2501.17771, 2025 | https://github.com/FabrizioSandri/2SSP | http://arxiv.org/abs/2501.17771v1 |
154 | Reward-Guided Speculative Decoding for Efficient LLM Reasoning | Baohao Liao, Yuhui Xu, Hanze Dong, Junnan Li, Christof Monz, Silvio Savarese, Doyen Sahoo, Caiming Xiong | 2025-01-31 | arXiv | https://github.com/BaohaoLiao/RSD | http://arxiv.org/abs/2501.19324v1 |
155 | ExeCoder: Empowering Large Language Models with Executability Representation for Code Translation | Minghua He, Fangkai Yang, Pu Zhao, Wenjie Yin, Yu Kang, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang | 2025-01-30 | arXiv | https://execoder4trans.github.io/ | http://arxiv.org/abs/2501.18460v2 |
156 | Uncertainty Quantification and Decomposition for LLM-based Recommendation | Wonbin Kweon, Sanghwan Jang, SeongKu Kang, Hwanjo Yu | 2025-01-30 | arXiv:2501.17630, 2025 | https://github.com/WonbinKweon/UNC_LLM_REC_WWW2025 | http://arxiv.org/abs/2501.17630v1 |
157 | CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs | Jinlan Fu, Shenzhen Huangfu, Hao Fei, Xiaoyu Shen, Bryan Hooi, Xipeng Qiu, See-Kiong Ng | 2025-01-28 | arXiv | https://github.com/LVUGAI/CHiP | http://arxiv.org/abs/2501.16629v1 |
158 | SafeRAG: Benchmarking Security in Retrieval-Augmented Generation of Large Language Model | Xun Liang, Simin Niu, Zhiyu Li, Sensen Zhang, Hanyu Wang, Feiyu Xiong, Jason Zhaoxin Fan, Bo Tang, Shichao Song, Mengwei Wang, Jiawei Yang | 2025-01-28 | arXiv | https://github.com/IAAR-Shanghai/SafeRAG | http://arxiv.org/abs/2501.18636v1 |
159 | xJailbreak: Representation Space Guided Reinforcement Learning for Interpretable LLM Jailbreaking | Sunbowen Lee, Shiwen Ni, Chi Wei, Shuaimin Li, Liyang Fan, Ahmadreza Argha, Hamid Alinejad-Rokny, Ruifeng Xu, Yicheng Gong, Min Yang | 2025-01-28 | arXiv | https://github.com/Aegis1863/xJailbreak | http://arxiv.org/abs/2501.16727v2 |
160 | Large Language Model Critics for Execution-Free Evaluation of Code Changes | Aashish Yadavally, Hoan Nguyen, Laurent Callot, Gauthier Guinet | 2025-01-28 | arXiv | https://github.com/amazon-science/code-agent-eval | http://arxiv.org/abs/2501.16655v1 |
161 | Towards Evaluating and Building Versatile Large Language Models for Medicine | Chaoyi Wu, Pengcheng Qiu, Jinxin Liu, Hongfei Gu, Na Li, Ya Zhang, Yanfeng Wang, Weidi Xie | 2025-01-27 | arXiv | https://henrychur.github.io/MedS-Bench/ | https://doi.org/10.48550/arXiv.2408.12547 |
162 | LCTG Bench: LLM Controlled Text Generation Benchmark | Kentaro Kurihara, Masato Mita, Peinan Zhang, Shota Sasaki, Ryosuke Ishigami, Naoaki Okazaki | 2025-01-27 | arXiv | https://github.com/CyberAgentAILab/LCTG-Bench | http://arxiv.org/abs/2501.15875v1 |
163 | Analyzing and Boosting the Power of Fine-Grained Visual Recognition for Multi-modal Large Language Models | Hulingxiao He, Geng Li, Zijun Geng, Jinglin Xu, Yuxin Peng | 2025-01-25 | arXiv | https://github.com/PKU-ICST-MIPL/Finedefics_ICLR2025 | http://arxiv.org/abs/2501.15140v1 |
164 | A Causality-aware Paradigm for Evaluating Creativity of Multimodal Large Language Models | Zhongzhan Huang, Shanshan Zhong, Pan Zhou, Shanghua Gao, Marinka Zitnik, Liang Lin | 2025-01-25 | arXiv | https://lotbench.github.io | http://arxiv.org/abs/2501.15147v1 |
165 | MDEval: Evaluating and Enhancing Markdown Awareness in Large Language Models | Zhongpu Chen, Yinfeng Liu, Long Shi, Zhi-Jie Wang, Xingyan Chen, Yu Zhao, Fuji Ren | 2025-01-25 | arXiv | https://github.com/SWUFE-DB-Group/MDEval-Benchmark | http://arxiv.org/abs/2501.15000v1 |
166 | PIP: Perturbation-based Iterative Pruning for Large Language Models | Yi Cao, Wei-Jie Xu, Yucheng Shen, Weijie Shi, Chi-Min Chan, Jiajie Xu | 2025-01-25 | arXiv | https://github.com/caoyiiiiii/PIP | http://arxiv.org/abs/2501.15278v1 |
167 | DRESSing Up LLM: Efficient Stylized Question-Answering via Style Subspace Editing | Xinyu Ma, Yifeng Xu, Yang Lin, Tianlong Wang, Xu Chu, Xin Gao, Junfeng Zhao, Yasha Wang | 2025-01-24 | arXiv | https://github.com/ArthurLeoM/DRESS-LLM | http://arxiv.org/abs/2501.14371v1 |
168 | Softplus Attention with Re-weighting Boosts Length Extrapolation in Large Language Models | Bo Gao, Michael W. Spratling | 2025-01-24 | arXiv:2501.13428, 2025 | https://github.com/iminfine/freeatten | http://arxiv.org/abs/2501.13428v2 |
169 | MedAgentBench: Dataset for Benchmarking LLMs as Agents in Medical Applications | Yixing Jiang, Kameron C. Black, Gloria Geng, Danny Park, Andrew Y. Ng, Jonathan H. Chen | 2025-01-24 | arXiv | https://github.com/stanfordmlgroup/MedAgentBench | http://arxiv.org/abs/2501.14654v1 |
170 | Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant Evaluation | Sadegh Mahdavi, Muchen Li, Kaiwen Liu, Christos Thrampoulidis, Leonid Sigal, Renjie Liao | 2025-01-24 | arXiv | https://github.com/DSL-Lab/aops | http://arxiv.org/abs/2501.14275v1 |
171 | JustLogic: A Comprehensive Benchmark for Evaluating Deductive Reasoning in Large Language Models | Michael K. Chen, Xikun Zhang, Dacheng Tao | 2025-01-24 | arXiv | https://github.com/michaelchen-lab/JustLogic | http://arxiv.org/abs/2501.14851v1 |
172 | FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration | Kai-Tuo Xu, Feng-Long Xie, Xu Tang, Yao Hu | 2025-01-24 | arXiv | https://github.com/FireRedTeam/FireRedASR | http://arxiv.org/abs/2501.14350v1 |
173 | Can Large Language Models Understand Preferences in Personalized Recommendation? | Zhaoxuan Tan, Zinan Zeng, Qingkai Zeng, Zhenyu Wu, Zheyuan Liu, Fengran Mo, Meng Jiang | 2025-01-24 | arXiv …, 2025 | https://github.com/TamSiuhin/PerRecBench | http://arxiv.org/abs/2501.13391v1 |
174 | MedAgentBench: A Realistic Virtual EHR Environment to Benchmark Medical LLM Agents | Yixing Jiang, Kameron C. Black, Gloria Geng, Danny Park, James Zou, Andrew Y. Ng, Jonathan H. Chen | 2025-01-24 | arXiv | https://github.com/stanfordmlgroup/MedAgentBench | http://arxiv.org/abs/2501.14654v2 |
175 | Do as We Do, Not as You Think: the Conformity of Large Language Models | Zhiyuan Weng, Guikun Chen, Wenguan Wang | 2025-01-24 | arXiv:2501.13381, 2025 | https://github.com/Zhiyuan-Weng/BenchForm | http://arxiv.org/abs/2501.13381v1 |
176 | Evaluating and Improving Graph to Text Generation with Large Language Models | Jie He, Yijun Yang, Wanqiu Long, Deyi Xiong, Victor Gutierrez Basulto, Jeff Z. Pan | 2025-01-24 | arXiv | https://github.com/probe2/kg_text | http://arxiv.org/abs/2501.14497v1 |
177 | Distillation Quantification for Large Language Models | Sunbowen Lee, Junting Zhou, Chang Ao, Kaige Li, Xinrun Du, Sirui He, Jiaheng Liu, Min Yang, Zhoufutu Wen, Shiwen Ni | 2025-01-23 | arXiv …, 2025 | https://github.com/Aegis1863/LLMs-Distillation-Quantification | http://arxiv.org/abs/2501.12619v1 |
178 | Low-Rank Adapters Meet Neural Architecture Search for LLM Compression | J. Pablo Muñoz, Jinjie Yuan, Nilesh Jain | 2025-01-23 | arXiv | https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning | http://arxiv.org/abs/2501.16372v1 |
179 | LLM-guided Instance-level Image Manipulation with Diffusion U-Net Cross-Attention Maps | Andrey Palaev, Adil Khan, Syed M. Ahsan Kazmi | 2025-01-23 | arXiv | https://github.com/Palandr123/DiffusionU-NetLLM | http://arxiv.org/abs/2501.14046v1 |
180 | OstQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting | Xing Hu, Yuan Cheng, Dawei Yang, Zukang Xu, Zhihang Yuan, Jiangyong Yu, Chen Xu, Zhe Jiang, Sifan Zhou | 2025-01-23 | arXiv | https://github.com/BrotherHappy/OSTQuant | http://arxiv.org/abs/2501.13987v1 |
181 | Quantification of Large Language Model Distillation | Sunbowen Lee, Junting Zhou, Chang Ao, Kaige Li, Xinrun Du, Sirui He, Haihong Wu, Tianci Liu, Jiaheng Liu, Hamid Alinejad-Rokny, Min Yang, Yitao Liang, Zhoufutu Wen, Shiwen Ni | 2025-01-22 | arXiv | https://github.com/Aegis1863/LLMs-Distillation-Quantification | http://arxiv.org/abs/2501.12619v3 |
182 | A Survey of Graph Retrieval-Augmented Generation for Customized Large Language Models | Qinggang Zhang, Shengyuan Chen, Yuanchen Bei, Zheng Yuan, Huachi Zhou, Zijin Hong, Junnan Dong, Hao Chen, Yi Chang, Xiao Huang | 2025-01-21 | arXiv | https://github.com/DEEP-PolyU/Awesome-GraphRAG | http://arxiv.org/abs/2501.13958v1 |
183 | Can open source large language models be used for tumor documentation in Germany? -- An evaluation on urological doctors' notes | Stefan Lenz, Arsenij Ustjanzew, Marco Jeray, Torsten Panholzer | 2025-01-21 | arXiv | https://github.com/stefan-m-lenz/UroLlmEval | http://arxiv.org/abs/2501.12106v1 |
184 | EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents | Zhili Cheng, Yuge Tu, Ran Li, Shiqi Dai, Jinyi Hu, Shengding Hu, Jiahao Li, Yang Shi, Tianyu Yu, Weize Chen, Lei Shi, Maosong Sun | 2025-01-21 | arXiv | https://github.com/thunlp/EmbodiedEval | http://arxiv.org/abs/2501.11858v1 |
185 | VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model | Xianwei Zhuang, Yuxin Xie, Yufan Deng, Liming Liang, Jinghan Ru, Yuguo Yin, Yuexian Zou | 2025-01-21 | arXiv | https://vargpt-1.github.io/ | http://arxiv.org/abs/2501.12327v1 |
186 | Glinthawk: A Two-Tiered Architecture for High-Throughput LLM Inference | Pouya Hamadanian, Sadjad Fouladi | 2025-01-20 | arXiv | https://github.com/microsoft/glinthawk | http://arxiv.org/abs/2501.11779v1 |
187 | Explain-Query-Test: Self-Evaluating LLMs Via Explanation and Comprehension Discrepancy | Saeid Asgari Taghanaki, Joao Monteiro | 2025-01-20 | arXiv | https://github.com/asgsaeid/EQT | http://arxiv.org/abs/2501.11721v1 |
188 | Teaching Large Language Models to Regress Accurate Image Quality Scores using Score Distribution | Zhiyuan You, Xin Cai, Jinjin Gu, Tianfan Xue, Chao Dong | 2025-01-20 | arXiv | https://depictqa.github.io/deqa-score/ | http://arxiv.org/abs/2501.11561v1 |
189 | InsQABench: Benchmarking Chinese Insurance Domain Question Answering with Large Language Models | Jing Ding, Kai Feng, Binbin Lin, Jiarui Cai, Qiushi Wang, Yu Xie, Xiaojin Zhang, Zhongyu Wei, Wei Chen | 2025-01-19 | arXiv | https://github.com/HaileyFamo/InsQABench | http://arxiv.org/abs/2501.10943v1 |
190 | Control LLM: Controlled Evolution for Intelligence Retention in LLM | Haichao Wei, Yunxiang Ren, Zhoutong Fu, Aman Lunia, Yi-Lin Chen, Alice Leung, Ya Xu | 2025-01-19 | arXiv | https://github.com/linkedin/ControlLLM | http://arxiv.org/abs/2501.10979v1 |
191 | ChaosEater: Fully Automating Chaos Engineering with Large Language Models | Daisuke Kikuta, Hiroki Ikeuchi, Kengo Tajiri, Yuusuke Nakano | 2025-01-19 | arXiv | https://ntt-dkiku.github.io/chaos-eater | http://arxiv.org/abs/2501.11107v1 |
192 | LAVCap: LLM-based Audio-Visual Captioning using Optimal Transport | Kyeongha Rho, Hyeongkeun Lee, Valentio Iverson, Joon Son Chung | 2025-01-18 | arXiv:2501.09291, 2025 | https://github.com/NAVER-INTEL-Co-Lab/gaudi-lavcap | http://arxiv.org/abs/2501.09291v1 |
193 | Monte Carlo Tree Search for Comprehensive Exploration in LLM-Based Automatic Heuristic Design | Zhi Zheng, Zhuoliang Xie, Zhenkun Wang, Bryan Hooi | 2025-01-17 | arXiv:2501.08603, 2025 | https://github.com/zz1358m/MCTS-AHD-master | http://arxiv.org/abs/2501.08603v2 |
194 | When language and vision meet road safety: leveraging multimodal large language models for video-based traffic accident analysis | Ruixuan Zhang, Beichen Wang, Juexiao Zhang, Zilin Bian, Chen Feng, Kaan Ozbay | 2025-01-17 | arXiv | https://github.com/ai4ce/SeeUnsafe | http://arxiv.org/abs/2501.10604v1 |
195 | FaceXBench: Evaluating Multimodal LLMs on Face Understanding | Kartik Narayan, Vibashan VS, Vishal M. Patel | 2025-01-17 | arXiv | https://kartik-3004.github.io/facexbench/ | http://arxiv.org/abs/2501.10360v1 |
196 | PaSa: An LLM Agent for Comprehensive Academic Paper Search | Yichen He, Guanhua Huang, Peiyuan Feng, Yuan Lin, Yuchen Zhang, Hang Li, Weinan E | 2025-01-17 | arXiv | https://github.com/bytedance/pasa | http://arxiv.org/abs/2501.10120v1 |
197 | Gandalf the Red: Adaptive Security for LLMs | Niklas Pfister, Václav Volhejn, Manuel Knott, Santiago Arias, Julia Bazińska, Mykhailo Bichurin, Alan Commike, Janet Darling, Peter Dienes, Matthew Fiedler, David Haber, Matthias Kraft, Marco Lancini, Max Mathys, Damián Pascual-Ortiz, Jakub Podolak, Adrià Romero-López, Kyriacos Shiarlis, Andreas Signer, Zsolt Terek, Athanasios Theocharis, Daniel Timbrell, Samuel Trautwein, Samuel Watts, Natalie Wu, Mateo Rojas-Carulla | 2025-01-16 | arXiv …, 2025 | https://github.com/lakeraai/dsec-gandalf | http://arxiv.org/abs/2501.07927v1 |
198 | PokerBench: Training Large Language Models to become Professional Poker Players | Richard Zhuang, Akshat Gupta, Richard Yang, Aniket Rahane, Zhengyu Li, Gopala Anumanchipalli | 2025-01-16 | arXiv …, 2025 | https://github.com/pokerllm/pokerbench | http://arxiv.org/abs/2501.08328v1 |
199 | CWEval: Outcome-driven Evaluation on Functionality and Security of LLM Code Generation | Jinjun Peng, Leyi Cui, Kele Huang, Junfeng Yang, Baishakhi Ray | 2025-01-16 | arXiv:2501.08200, 2025 | https://github.com/Co1lin/CWEval | http://arxiv.org/abs/2501.08200v1 |
200 | LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding | Hongyu Li, Jinyu Chen, Ziyu Wei, Shaofei Huang, Tianrui Hui, Jialin Gao, Xiaoming Wei, Si Liu | 2025-01-16 | arXiv …, 2025 | https://github.com/appletea233/LLaVA-ST | http://arxiv.org/abs/2501.08282v1 |
201 | Multilingual LLMs Struggle to Link Orthography and Semantics in Bilingual Word Processing | Eshaan Tanwar, Gayatri Oke, Tanmoy Chakraborty | 2025-01-16 | arXiv:2501.09127, 2025 | https://github.com/EshaanT/Bilingual_processing_LLMs | http://arxiv.org/abs/2501.09127v1 |
202 | OpenCSG Chinese Corpus: A Series of High-quality Chinese Datasets for LLM Training | Yijiong Yu, Ziyun Dai, Zekun Wang, Wei Wang, Ran Chen, Ji Pei | 2025-01-16 | arXiv …, 2025 | https://github.com/yuyijiong/fineweb-edu-chinese | http://arxiv.org/abs/2501.08197v1 |
203 | LAMS: LLM-Driven Automatic Mode Switching for Assistive Teleoperation | Yiran Tao, Jehan Yang, Dan Ding, Zackory Erickson | 2025-01-15 | arXiv | https://lams-assistance.github.io/ | http://arxiv.org/abs/2501.08558v1 |
204 | The Inherent Limits of Pretrained LLMs: The Unexpected Convergence of Instruction Tuning and In-Context Learning Capabilities | Irina Bigoulaeva, Harish Tayyar Madabushi, Iryna Gurevych | 2025-01-15 | arXiv | https://github.com/UKPLab/arxiv2025-inherent-limits-plms | http://arxiv.org/abs/2501.08716v1 |
205 | A Roadmap to Guide the Integration of LLMs in Hierarchical Planning | Israel Puerta-Merino, Carlos Núñez-Molina, Pablo Mesejo, Juan Fernández-Olivares | 2025-01-14 | arXiv | https://llmforplanning.github.io | http://arxiv.org/abs/2501.08068v1 |
206 | Lifelong Learning of Large Language Model based Agents: A Roadmap | Junhao Zheng, Chengming Shi, Xidi Cai, Qiuke Li, Duzhen Zhang, Chenxing Li, Dong Yu, Qianli Ma | 2025-01-13 | arXiv | https://github.com/qianlima-lab/awesome-lifelong-llm-agent | https://doi.org/10.48550/arXiv.2501.07278 |
207 | SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training | Tianjin Huang, Ziquan Zhu, Gaojie Jin, Lu Liu, Zhangyang Wang, Shiwei Liu | 2025-01-12 | arXiv | https://github.com/TianjinYellow/SPAM-Optimizer | http://arxiv.org/abs/2501.06842v1 |
208 | ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation | Xuanle Zhao, Xianzhen Luo, Qi Shi, Chi Chen, Shuo Wang, Wanxiang Che, Zhiyuan Liu, Maosong Sun | 2025-01-11 | arXiv | https://github.com/thunlp/ChartCoder | https://doi.org/10.48550/arXiv.2501.06598 |
209 | ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning | Xiangru Tang, Tianyu Hu, Muyang Ye, Yanjun Shao, Xunjian Yin, Siru Ouyang, Wangchunshu Zhou, Pan Lu, Zhuosheng Zhang, Yilun Zhao, Arman Cohan, Mark Gerstein | 2025-01-11 | arXiv | https://github.com/gersteinlab/chemagent | https://doi.org/10.48550/arXiv.2501.06590 |
210 | Step-by-Step Mastery: Enhancing Soft Constraint Following Ability of Large Language Models | Qingyu Ren, Jie Zeng, Qianyu He, Jiaqing Liang, Yanghua Xiao, Weikang Zhou, Zeye Sun, Fei Yu | 2025-01-11 | arXiv | https://github.com/Rainier-rq/FollowSoftConstraints | https://doi.org/10.48550/arXiv.2501.04945 |
211 | Demystifying Domain-adaptive Post-training for Financial LLMs | Zixuan Ke, Yifei Ming, Xuan-Phi Nguyen, Caiming Xiong, Shafiq Joty | 2025-01-11 | arXiv …, 2025 | https://github.com/SalesforceAIResearch/FinDap | http://arxiv.org/abs/2501.04961v1 |
212 | FairCode: Evaluating Social Bias of LLMs in Code Generation | Yongkang Du, Jen-tse Huang, Jieyu Zhao, Lu Lin | 2025-01-11 | arXiv:2501.05396, 2025 | https://github.com/YongkDu/FairCode | http://arxiv.org/abs/2501.05396v1 |
213 | HaVen: Hallucination-Mitigated LLM for Verilog Code Generation Aligned with HDL Engineers | Yiyao Yang, Fu Teng, Pengju Liu, Mengnan Qi, Chenyang Lv, Ji Li, Xuhong Zhang, Zhezhi He | 2025-01-11 | arXiv …, 2025 | https://github.com/Intelligent-Computing-Research-Group/HaVen | http://arxiv.org/abs/2501.04908v1 |
214 | SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution | Chengxing Xie, Bowen Li, Chang Gao, He Du, Wai Lam, Difan Zou, Kai Chen | 2025-01-11 | arXiv …, 2025 | https://github.com/InternLM/SWE-Fixer | http://arxiv.org/abs/2501.05040v1 |
215 | ChronoSense: Exploring Temporal Understanding in Large Language Models with Time Intervals of Events | Duygu Sezen Islakoglu, Jan-Christoph Kalo | 2025-01-10 | arXiv | https://github.com/duyguislakoglu/chronosense | https://doi.org/10.48550/arXiv.2501.03040 |
216 | Environmental large language model Evaluation (ELLE) dataset: A Benchmark for Evaluating Generative AI applications in Eco-environment Domain | Jing Guo, Nan Li, Ming Xu | 2025-01-10 | arXiv | https://github.com/CEEAI/elle | https://doi.org/10.48550/arXiv.2501.06277 |
217 | LLM4SR: A Survey on Large Language Models for Scientific Research | Ziming Luo, Zonglin Yang, Zexin Xu, Wei Yang, Xinya Du | 2025-01-10 | arXiv | https://github.com/du-nlp-lab/LLM4SR | https://doi.org/10.48550/arXiv.2501.04306 |
218 | Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models | You Li, Heyu Huang, Chi Chen, Kaiyu Huang, Chao Huang, Zonghao Guo, Zhiyuan Liu, Jinan Xu, Yuhua Li, Ruixuan Li, Maosong Sun | 2025-01-10 | arXiv | https://migician-vg.github.io/ | https://doi.org/10.48550/arXiv.2501.05767 |
219 | MinMo: A Multimodal Large Language Model for Seamless Voice Interaction | Qian Chen, Yafeng Chen, Yanni Chen, Mengzhe Chen, Yingda Chen, Chong Deng, Zhihao Du, Ruize Gao, Changfeng Gao, Zhifu Gao, Yabin Li, Xiang Lv, Jiaqing Liu, Haoneng Luo, Bin Ma, Chongjia Ni, Xian Shi, Jialong Tang, Hui Wang, Hao Wang, Wen Wang, Yuxuan Wang, Yunlan Xu, Fan Yu, Zhijie Yan, Yexin Yang, Baosong Yang, Xian Yang, Guanrou Yang, Tianyu Zhao, Qinglin Zhang, Shiliang Zhang, Nan Zhao, Pei Zhang, Chong Zhang, Jinren Zhou | 2025-01-10 | arXiv | https://funaudiollm.github.io/minmo | https://doi.org/10.48550/arXiv.2501.06282 |
220 | FlairGPT: Repurposing LLMs for Interior Designs | Gabrielle Littlefair, Niladri Shekhar Dutt, Niloy J. Mitra | 2025-01-10 | arXiv:2501.04648, 2025 | https://flairgpt.github.io/ | http://arxiv.org/abs/2501.04648v1 |
221 | Activating Associative Disease-Aware Vision Token Memory for LLM-Based X-ray Report Generation | Xiao Wang, Fuling Wang, Haowen Wang, Bo Jiang, Chuanfu Li, Yaowei Wang, Yonghong Tian, Jin Tang | 2025-01-09 | arXiv …, 2025 | https://github.com/Event-AHU/Medical_Image_Analysis | http://arxiv.org/abs/2501.03458v1 |
222 | Visual Large Language Models for Generalized and Specialized Applications | Yifan Li, Zhixin Lai, Wentao Bao, Zhen Tan, Anh Dao, Kewei Sui, Jiayi Shen, Dong Liu, Huan Liu, Yu Kong | 2025-01-06 | arXiv | https://github.com/JackYFL/awesome-VLLMs | https://doi.org/10.48550/arXiv.2501.02765 |
223 | CALM: Curiosity-Driven Auditing for Large Language Models | Xiang Zheng, Longxiang Wang, Yi Liu, Xingjun Ma, Chao Shen, Cong Wang | 2025-01-06 | arXiv | https://github.com/x-zheng16/CALM | https://doi.org/10.48550/arXiv.2501.02997 |
224 | BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning | Beichen Zhang, Yuhong Liu, Xiaoyi Dong, Yuhang Zang, Pan Zhang, Haodong Duan, Yuhang Cao, Dahua Lin, Jiaqi Wang | 2025-01-06 | arXiv | https://github.com/beichenzbc/BoostStep | https://doi.org/10.48550/arXiv.2501.03226 |
225 | LangFair: A Python Package for Assessing Bias and Fairness in Large Language Model Use Cases | Dylan Bouchard, Mohit Singh Chauhan, David Skarbrevik, Viren Bajaj, Zeya Ahmad | 2025-01-06 | arXiv | https://github.com/cvs-health/langfair | https://doi.org/10.48550/arXiv.2501.03112 |
226 | Multi-LLM Collaborative Caption Generation in Scientific Documents | Jaeyoung Kim, Jongho Lee, Hong-Jun Choi, Ting-Yao Hsu, Chieh-Yang Huang, Sungchul Kim, Ryan Rossi, Tong Yu, Clyde Lee Giles, Ting-Hao 'Kenneth' Huang, Sungchul Choi | 2025-01-05 | arXiv | https://github.com/teamreboott/MLBCAP | http://arxiv.org/abs/2501.02552v1 |
227 | HALO: Hadamard-Assisted Lower-Precision Optimization for LLMs | Saleh Ashkboos, Mahdi Nikdan, Soroush Tabesh, Roberto L. Castro, Torsten Hoefler, Dan Alistarh | 2025-01-05 | arXiv | https://github.com/IST-DASLab/HALO | http://arxiv.org/abs/2501.02625v2 |
228 | REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models | Jian Hu | 2025-01-04 | arXiv | https://github.com/OpenRLHF/OpenRLHF | https://doi.org/10.48550/arXiv.2501.03262 |
229 | Cold-Start Recommendation towards the Era of Large Language Models (LLMs): A Comprehensive Survey and Roadmap | Weizhi Zhang, Yuanchen Bei, Liangwei Yang, Henry Peng Zou, Peilin Zhou, Aiwei Liu, Yinghui Li, Hao Chen, Jianling Wang, Yu Wang, Feiran Huang, Sheng Zhou, Jiajun Bu, Allen Lin, James Caverlee, Fakhri Karray, Irwin King, Philip S. Yu | 2025-01-04 | arXiv | https://github.com/YuanchenBei/Awesome-Cold-Start-Recommendation | https://doi.org/10.48550/arXiv.2501.01945 |
230 | Aligning Large Language Models for Faithful Integrity Against Opposing Argument | Yong Zhao, Yang Deng, See-Kiong Ng, Tat-Seng Chua | 2025-01-04 | arXiv | https://github.com/zhaoy777/AFICE | https://doi.org/10.48550/arXiv.2501.01336 |
231 | MIRAGE: Exploring How Large Language Models Perform in Complex Social Interactive Environments | Cai Yin, Zhouhong Gu, Du Zhaohan, Ye Zheyu, Cao Shaosheng, Xu Yiqian, Feng Hongwei, Chen Ping | 2025-01-04 | arXiv | https://github.com/lime728/MIRAGE | https://doi.org/10.48550/arXiv.2501.01652 |
232 | Text Clustering as Classification with LLMs | Chen Huang, Guoxiu He | 2025-01-04 | Available at SSRN 5081002 | https://github.com/ECNU-Text-Computing/Text-Clustering-via-LLM | http://arxiv.org/abs/2410.00927v2 |
233 | UAVs Meet LLMs: Overviews and Perspectives Toward Agentic Low-Altitude Mobility | Yonglin Tian, Fei Lin, Yiduo Li, Tengchao Zhang, Qiyao Zhang, Xuan Fu, Jun Huang, Xingyuan Dai, Yutong Wang, Chunwei Tian, Bai Li, Yisheng Lv, Levente Kovács, Fei-Yue Wang | 2025-01-04 | arXiv | https://github.com/Hub-Tian/UAVs_Meet_LLMs | http://arxiv.org/abs/2501.02341v1 |
234 | FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving | Zihao Ye, Lequn Chen, Ruihang Lai, Wuwei Lin, Yineng Zhang, Stephanie Wang, Tianqi Chen, Baris Kasikci, Vinod Grover, Arvind Krishnamurthy, Luis Ceze | 2025-01-03 | arXiv …, 2025 | http://github.com/flashinfer-ai/flashinfer | http://arxiv.org/abs/2501.01005v1 |
235 | Instruction-Following Evaluation for Large Language Models | Jeffrey Zhou, Tianjian Lu, Swaroop Mishra, Siddhartha Brahma, Sujoy Basu, Yi Luan, Denny Zhou, Le Hou | 2025-01-03 | arXiv | https://github.com/google-research/google-research/tree/master/instruction_following_eval | https://doi.org/10.48550/arXiv.2311.07911 |
236 | Labels Generated by Large Language Model Helps Measuring People's Empathy in Vitro | Md. Rakibul Hasan, Yue Yao, Md. Zakir Hossain, Aneesh Krishna, Imre Rudas, Shafin Rahman, Tom Gedeon | 2025-01-02 | arXiv | https://github.com/hasan-rakibul/LLMPathy | https://doi.org/10.48550/arXiv.2501.00691 |
237 | Aligning LLMs with Domain Invariant Reward Models | David Wu, Sanjiban Choudhury | 2025-01-02 | arXiv:2501.00911, 2025 | https://github.com/portal-cornell/dial | http://arxiv.org/abs/2501.00911v1 |
238 | Linguistic Minimal Pairs Elicit Linguistic Similarity in Large Language Models | Xinyu Zhou, Delong Chen, Samuel Cahyawijaya, Xufeng Duan, Zhenguang G. Cai | 2025 | arXiv | https://github.com/ChenDelong1999/Linguistic-Similarity | https://doi.org/10.48550/arXiv.2409.12435 |
239 | Match, Compare, or Select? An Investigation of Large Language Models for Entity Matching | Tianshu Wang, Xiaoyang Chen, Hongyu Lin, Xuanang Chen, Xianpei Han, Le Sun, Hao Wang, Zhenyu Zeng | 2025 | arXiv | https://github.com/tshu-w/ComEM | https://doi.org/10.48550/arXiv.2405.16884 |
240 | Prompting Large Language Models to Tackle the Full Software Development Lifecycle: A Case Study | Bowen Li, Wenhan Wu, Ziwei Tang, Lin Shi, John Yang, Jinyang Li, Shunyu Yao, Chen Qian, Binyuan Hui, Qicheng Zhang, Zhiyin Yu, He Du, Ping Yang, Dahua Lin, Chao Peng, Kai Chen | 2025 | COLING | https://github.com/open-compass/DevEval | https://aclanthology.org/2025.coling-main.502/ |
241 | The Dark Side of Function Calling: Pathways to Jailbreaking Large Language Models | Zihui Wu, Haichang Gao, Jianping He, Ping Wang | 2025 | arXiv | https://github.com/wooozihui/jailbreakfunction | https://doi.org/10.48550/arXiv.2407.17915 |
242 | Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models | Taiqiang Wu, Chaofan Tao, Jiahao Wang, Runming Yang, Zhe Zhao, Ngai Wong | 2025 | COLING | https://github.com/wutaiqiang/LLM_KD_AKL | https://aclanthology.org/2025.coling-main.383/ |
243 | Retrieval Augmented Instruction Tuning for Open NER with Large Language Models | Tingyu Xie, Jian Zhang, Yan Zhang, Yuanyuan Liang, Qi Li, Hongwei Wang | 2025 | arXiv | https://github.com/Emma1066/Retrieval-Augmented-IT-OpenNER | https://doi.org/10.48550/arXiv.2406.17305 |
244 | Towards Efficient and Effective Adaptation of Large Language Models for Sequential Recommendation | Hangyu Wang, Jianghao Lin, Bo Chen, Yang Yang, Ruiming Tang, Weinan Zhang, Yong Yu | 2025 | arXiv | https://github.com/justarter/E2URec | https://doi.org/10.48550/arXiv.2310.01612 |
245 | The Only Way is Ethics: A Guide to Ethical Research with Large Language Models | Eddie L. Ungless, Nikolas Vitsakis, Zeerak Talat, James Garforth, Björn Ross, Arno Onken, Atoosa Kasirzadeh, Alexandra Birch | 2025 | COLING | https://github.com/MxEddie/Ethics-Whitepaper | https://aclanthology.org/2025.coling-main.603/ |
246 | Towards Data Contamination Detection for Modern Large Language Models: Limitations, Inconsistencies, and Oracle Challenges | Vinay Samuel, Yue Zhou, Henry Peng Zou | 2025 | arXiv | https://github.com/vsamuel2003/data-contamination | https://doi.org/10.48550/arXiv.2409.09927 |
247 | Unveiling Uncertainty: A Deep Dive into Calibration and Performance of Multimodal Large Language Models | Zijun Chen, Wenbo Hu, Guande He, Zhijie Deng, Zheng Zhang, Richang Hong | 2025 | COLING | https://github.com/hfutml/Calibration-MLLM | https://aclanthology.org/2025.coling-main.208/ |
248 | Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning | Xingchen Zeng, Haichuan Lin, Yilin Ye, Wei Zeng | 2025 | arXiv | https://github.com/zengxingchen/ChartQA-MLLM | https://doi.org/10.48550/arXiv.2407.20174 |
249 | Enhancing chest X-ray datasets with privacy-preserving large language models and multi-type annotations: a data-driven approach for improved classification | Ricardo Bigolin Lanfredi, Pritam Mukherjee, Ronald M. Summers | 2025 | arXiv | https://github.com/rsummers11/CADLab/tree/master/MAPLEZ_LLM_report_labeler/ | https://doi.org/10.48550/arXiv.2403.04024 |
250 | KnowledgePrompts: Exploring the Abilities of Large Language Models to Solve Proportional Analogies via Knowledge-Enhanced Prompting | Thilini Wijesiriwardene, Ruwan Wickramarachchi, Sreeram Reddy Vennam, Vinija Jain, Aman Chadha, Amitava Das, Ponnurangam Kumaraguru, Amit P. Sheth | 2025 | COLING | https://github.com/Thiliniiw/KnowledgePrompts/ | https://aclanthology.org/2025.coling-main.268/ |
251 | LLMTreeRec: Unleashing the Power of Large Language Models for Cold-Start Recommendations | Wenlin Zhang, Chuhan Wu, Xiangyang Li, Yuhao Wang, Kuicai Dong, Yichao Wang, Xinyi Dai, Xiangyu Zhao, Huifeng Guo, Ruiming Tang | 2025 | COLING | https://github.com/Applied-Machine-Learning-Lab/LLMTreeRec | https://aclanthology.org/2025.coling-main.59/ |
252 | QuickLLaMA: Query-aware Inference Acceleration for Large Language Models | Jingyao Li, Han Shi, Sitong Wu, Chuanyang Zheng, Zhenguo Li, Xin Jiang, Hong Xu, Jiaya Jia | 2025 | COLING | https://github.com/dvlab-research/Q-LLM | https://aclanthology.org/2025.coling-main.34/ |
253 | Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation | Ruiyang Ren, Yuhao Wang, Yingqi Qu, Wayne Xin Zhao, Jing Liu, Hao Tian, Hua Wu, Ji-Rong Wen, Haifeng Wang | 2025 | arXiv | https://github.com/RUCAIBox/LLM-Knowledge-Boundary | https://doi.org/10.48550/arXiv.2307.11019 |
254 | EarthMarker: A Visual Prompting Multimodal Large Language Model for Remote Sensing | Wei Zhang, Miaoxin Cai, Tong Zhang, Yin Zhuang, Jun Li, Xuerui Mao | 2025 | IEEE Trans. Geosci. Remote. Sens. | https://github.com/wivizhang/EarthMarker | https://doi.org/10.1109/TGRS.2024.3523505 |
255 | Surveillance Video-and-Language Understanding: from Small to Large Multimodal Models | Tongtong Yuan, Xuange Zhang, Bo Liu, Kun Liu, Jian Jin, Zhenzhen Jiao | 2025 | IEEE Transactions on Circuits and Systems for Video Technology | https://xuange923.github.io/Surveillance-Video-Understanding | https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10681489 |
256 | Alternate Preference Optimization for Unlearning Factual Knowledge in Large Language Models | Anmol Reddy Mekala, Vineeth Dorna, Shreya Dubey, Abhishek Lalwani, David Koleczek, Mukund Rungta, Sadid A. Hasan, Elita A. Lobo | 2025 | arXiv | https://github.com/molereddy/Alternate-Preference-Optimization | https://doi.org/10.48550/arXiv.2409.13474 |
257 | Awakening Augmented Generation: Learning to Awaken Internal Knowledge of Large Language Models for Question Answering | Huanxuan Liao, Shizhu He, Yao Xu, Yuanzhe Zhang, Shengping Liu, Kang Liu, Jun Zhao | 2025 | COLING | https://github.com/Xnhyacinth/IAG | https://aclanthology.org/2025.coling-main.89/ |
258 | CodeJudge-Eval: Can Large Language Models be Good Judges in Code Understanding? | Yuwei Zhao, Ziyang Luo, Yuchen Tian, Hongzhan Lin, Weixiang Yan, Annan Li, Jing Ma | 2025 | arXiv | https://github.com/CodeLLM-Research/CodeJudge-Eval | https://doi.org/10.48550/arXiv.2408.10718 |
259 | InternLM-Law: An Open Source Chinese Legal Large Language Model | Zhiwei Fei, Songyang Zhang, Xiaoyu Shen, Dawei Zhu, Xiao Wang, Maosong Cao, Fengzhe Zhou, Yining Li, Wenwei Zhang, Dahua Lin, Kai Chen, Jidong Ge | 2025 | arXiv | https://github.com/InternLM/InternLM-Law | https://doi.org/10.48550/arXiv.2406.14887 |
260 | Distilling Rule-based Knowledge into Large Language Models | Wenkai Yang, Yankai Lin, Jie Zhou, Ji-Rong Wen | 2025 | COLING | https://github.com/RUCBM/rule-distillation | https://aclanthology.org/2025.coling-main.61/ |
261 | Exploiting the Index Gradients for Optimization-Based Jailbreaking on Large Language Models | Jiahui Li, Yongchang Hao, Haoyu Xu, Xing Wang, Yu Hong | 2025 | COLING | https://github.com/jiah-li/magic | https://aclanthology.org/2025.coling-main.305/ |
262 | Exploring Concept Depth: How Large Language Models Acquire Knowledge and Concept at Different Layers? | Mingyu Jin, Qinkai Yu, Jingyuan Huang, Qingcheng Zeng, Zhenting Wang, Wenyue Hua, Haiyan Zhao, Kai Mei, Yanda Meng, Kaize Ding, Fan Yang, Mengnan Du, Yongfeng Zhang | 2025 | COLING | https://github.com/Luckfort/CD | https://aclanthology.org/2025.coling-main.37/ |
263 | Filter-then-Generate: Large Language Models with Structure-Text Adapter for Knowledge Graph Completion | Ben Liu, Jihai Zhang, Fangquan Lin, Cheng Yang, Min Peng | 2025 | COLING | https://github.com/LB0828/FtG | https://aclanthology.org/2025.coling-main.740/ |
264 | GraCoRe: Benchmarking Graph Comprehension and Complex Reasoning in Large Language Models | Zike Yuan, Ming Liu, Hui Wang, Bing Qin | 2025 | arXiv | https://github.com/ZIKEYUAN/GraCoRe | https://doi.org/10.48550/arXiv.2407.02936 |
265 | Gracefully Filtering Backdoor Samples for Generative Large Language Models without Retraining | Zongru Wu, Pengzhou Cheng, Lingyong Fang, Zhuosheng Zhang, Gongshen Liu | 2025 | COLING | https://github.com/ZrW00/GraceFul | https://aclanthology.org/2025.coling-main.220/ |
266 | ICLEval: Evaluating In-Context Learning Ability of Large Language Models | Wentong Chen, Yankai Lin, ZhenHao Zhou, HongYun Huang, Yantao Jia, Zhao Cao, Ji-Rong Wen | 2025 | arXiv | https://github.com/yiye3/ICLEval | https://doi.org/10.48550/arXiv.2406.14955 |
267 | Distributed Mixture-of-Agents for Edge Inference with Large Language Models | Purbesh Mitra, Priyanka Kaswan, Sennur Ulukus | 2024-12-30 | arXiv | https://github.com/purbeshmitra/distributed_moa | http://arxiv.org/abs/2412.21200v1 |
268 | Do Current Video LLMs Have Strong OCR Abilities? A Preliminary Study | Yulin Fei, Yuhui Gao, Xingyuan Xian, Xiaojin Zhang, Tao Wu, Wei Chen | 2024-12-29 | arXiv | https://github.com/YuHuiGao/FG-Bench | http://arxiv.org/abs/2412.20613v1 |
269 | Mind the Data Gap: Bridging LLMs to Enterprise Data Integration | Moe Kayali, Fabian Wenz, Nesime Tatbul, Çağatay Demiralp | 2024-12-29 | arXiv | https://goby-benchmark.github.io/ | http://arxiv.org/abs/2412.20331v1 |
270 | TokenRing: An Efficient Parallelism Framework for Infinite-Context LLMs via Bidirectional Communication | Zongwu Wang, Fangxin Liu, Mingshuai Li, Li Jiang | 2024-12-29 | arXiv | https://github.com/ACA-Lab-SJTU/token-ring | http://arxiv.org/abs/2412.20501v1 |
271 | On the Compositional Generalization of Multimodal LLMs for Medical Imaging | Zhenyang Cai, Junying Chen, Rongsheng Wang, Weihong Wang, Yonglin Deng, Dingjie Song, Yize Chen, Zixu Zhang, Benyou Wang | 2024-12-28 | arXiv | https://github.com/FreedomIntelligence/Med-MAT | http://arxiv.org/abs/2412.20070v1 |
272 | Toward Adaptive Reasoning in Large Language Models with Thought Rollback | Sijia Chen, Baochun Li | 2024-12-27 | ICML | https://github.com/iQua/llmpebase/tree/main/examples/ThoughtRollback | https://openreview.net/forum?id=aoAPOOtN9E |
273 | An Engorgio Prompt Makes Large Language Model Babble on | Jianshuo Dong, Ziyuan Zhang, Qingjie Zhang, Han Qiu, Tianwei Zhang, Hao Wang, Hewu Li, Qi Li, Chao Zhang, Ke Xu | 2024-12-27 | arXiv | https://github.com/jianshuod/Engorgio-prompt | http://arxiv.org/abs/2412.19394v1 |
274 | Gradient Weight-normalized Low-rank Projection for Efficient LLM Training | Jia-Hong Huang, Yixian Shen, Hongyi Zhu, Stevan Rudinac, Evangelos Kanoulas | 2024-12-27 | arXiv | https://github.com/Jhhuangkay/Gradient-Weight-normalized-Low-rank-Projection-for-Efficient-LLM-Training | http://arxiv.org/abs/2412.19616v1 |
275 | MLLM-SUL: Multimodal Large Language Model for Semantic Scene Understanding and Localization in Traffic Scenarios | Jiaqi Fan, Jianhua Wu, Jincheng Gao, Jianhao Yu, Yafei Wang, Hongqing Chu, Bingzhao Gao | 2024-12-27 | arXiv | https://github.com/fjq-tongji/MLLM-SUL | http://arxiv.org/abs/2412.19406v1 |
276 | A Survey on Large Language Model Acceleration based on KV Cache Management | Haoyang Li, Yiming Li, Anxin Tian, Tianhao Tang, Zhanchao Xu, Xuejia Chen, Nicole Hu, Wei Dong, Qing Li, Lei Chen | 2024-12-27 | arXiv | https://github.com/TreeAI-Lab/Awesome-KV-Cache-Management | http://arxiv.org/abs/2412.19442v2 |
277 | Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment | Ziang Yan, Zhilin Li, Yinan He, Chenting Wang, Kunchang Li, Xinhao Li, Xiangyu Zeng, Zilei Wang, Yali Wang, Yu Qiao, Limin Wang, Yi Wang | 2024-12-26 | arXiv | https://github.com/OpenGVLab/TPO | http://arxiv.org/abs/2412.19326v1 |
278 | CoEvo: Continual Evolution of Symbolic Solutions Using Large Language Models | Ping Guo, Qingfu Zhang, Xi Lin | 2024-12-25 | arXiv | https://github.com/pgg3/CoEvo | http://arxiv.org/abs/2412.18890v1 |
279 | Large Language Model guided Deep Reinforcement Learning for Decision Making in Autonomous Driving | Hao Pang, Zhenpo Wang, Guoqiang Li | 2024-12-24 | arXiv | https://bitmobility.github.io/LGDRL/ | http://arxiv.org/abs/2412.18511v1 |
280 | Token-Budget-Aware LLM Reasoning | Tingxu Han, Zhenting Wang, Chunrong Fang, Shiyu Zhao, Shiqing Ma, Zhenyu Chen | 2024-12-24 | arXiv | https://github.com/GeniusHTX/TALE | http://arxiv.org/abs/2412.18547v3 |
281 | Property Enhanced Instruction Tuning for Multi-task Molecule Generation with Large Language Models | Xuan Lin, Long Chen, Yile Wang, Xiangxiang Zeng, Philip S. Yu | 2024-12-24 | arXiv | https://github.com/chenlong164/PEIT | http://arxiv.org/abs/2412.18084v1 |
282 | ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation | Mengyang Wu, Yuzhi Zhao, Jialun Cao, Mingjie Xu, Zhongming Jiang, Xuehui Wang, Qinbin Li, Guangneng Hu, Shengchao Qin, Chi-Wing Fu | 2024-12-24 | arXiv | https://github.com/zhaoyuzhi/ICM-Assistant | http://arxiv.org/abs/2412.18216v1 |
283 | Distilling Fine-grained Sentiment Understanding from Large Language Models | Yice Zhang, Guangyu Xie, Hongling Xu, Kaiheng Hou, Jianzhu Bao, Qianlong Wang, Shiwei Chen, Ruifeng Xu | 2024-12-24 | arXiv | https://github.com/HITSZ-HLT/FSA-Distillation | http://arxiv.org/abs/2412.18552v2 |
284 | 3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding | Tatiana Zemskova, Dmitry Yudin | 2024-12-24 | arXiv | https://github.com/CognitiveAISystems/3DGraphLLM | http://arxiv.org/abs/2412.18450v2 |
285 | Assessing Human Editing Effort on LLM-Generated Texts via Compression-Based Edit Distance | Nicolas Devatine, Louis Abraham | 2024-12-23 | arXiv | https://github.com/NDV-tiime/CompressionDistance | http://arxiv.org/abs/2412.17321v1 |
286 | Large Language Model Safety: A Holistic Survey | Dan Shi, Tianhao Shen, Yufei Huang, Zhigen Li, Yongqi Leng, Renren Jin, Chuang Liu, Xinwei Wu, Zishan Guo, Linhao Yu, Ling Shi, Bojian Jiang, Deyi Xiong | 2024-12-23 | arXiv | https://github.com/tjunlp-lab/Awesome-LLM-Safety-Papers | http://arxiv.org/abs/2412.17686v1 |
287 | CoF: Coarse to Fine-Grained Image Understanding for Multi-modal Large Language Models | Yeyuan Wang, Dehong Gao, Bin Li, Rujiao Long, Lei Yi, Xiaoyan Cai, Libin Yang, Jinxia Zhang, Shanqing Yu, Qi Xuan | 2024-12-22 | arXiv | https://github.com/Gavin001201/CoF | http://arxiv.org/abs/2412.16869v1 |
288 | MINTQA: A Multi-Hop Question Answering Benchmark for Evaluating LLMs on New and Tail Knowledge | Jie He, Nan Hu, Wanqiu Long, Jiaoyan Chen, Jeff Z. Pan | 2024-12-22 | arXiv | https://github.com/probe2/multi-hop/ | http://arxiv.org/abs/2412.17032v1 |
289 | Large Language Model Can Be a Foundation for Hidden Rationale-Based Retrieval | Luo Ji, Feixiang Guo, Teng Chen, Qingqing Gu, Xiaoyu Wang, Ningyuan Xi, Yihong Wang, Peng Yu, Yue Zhao, Hongyang Lei, Zhonglin Jiang, Yong Chen | 2024-12-21 | arXiv | https://github.com/flyfree5/LaHoRe | http://arxiv.org/abs/2412.16615v1 |
290 | Template-Driven LLM-Paraphrased Framework for Tabular Math Word Problem Generation | Xiaoqiang Kang, Zimu Wang, Xiaobo Jin, Wei Wang, Kaizhu Huang, Qiufeng Wang | 2024-12-20 | arXiv | https://github.com/Jason8Kang/TELL | http://arxiv.org/abs/2412.15594v1 |
291 | WebLLM: A High-Performance In-Browser LLM Inference Engine | Charlie F. Ruan, Yucheng Qin, Xun Zhou, Ruihang Lai, Hongyi Jin, Yixin Dong, Bohan Hou, Meng-Shiun Yu, Yiyan Zhai, Sudeep Agarwal, Hangrui Cao, Siyuan Feng, Tianqi Chen | 2024-12-20 | arXiv | https://github.com/mlc-ai/web-llm | http://arxiv.org/abs/2412.15803v1 |
292 | TL-Training: A Task-Feature-Based Framework for Training Large Language Models in Tool Use | Junjie Ye, Yilong Wu, Sixian Li, Yuming Yang, Tao Gui, Qi Zhang, Xuanjing Huang, Peng Wang, Zhongchao Shi, Jianping Fan, Zhengyin Du | 2024-12-20 | arXiv | https://github.com/Junjie-Ye/TL-Training | http://arxiv.org/abs/2412.15495v1 |
293 | PruneVid: Visual Token Pruning for Efficient Video Large Language Models | Xiaohu Huang, Hao Zhou, Kai Han | 2024-12-20 | arXiv | https://github.com/Visual-AI/PruneVid | http://arxiv.org/abs/2412.16117v1 |
294 | Mitigating Social Bias in Large Language Models: A Multi-Objective Approach within a Multi-Agent Framework | Zhenjie Xu, Wenqing Chen, Yi Tang, Xuanying Li, Cheng Hu, Zhixuan Chu, Kui Ren, Zibin Zheng, Zhichao Lu | 2024-12-20 | arXiv | https://github.com/Cortantse/MOMA | http://arxiv.org/abs/2412.15504v1 |
295 | Beyond Human Data: Aligning Multimodal Large Language Models by Iterative Self-Evolution | Wentao Tan, Qiong Cao, Yibing Zhan, Chao Xue, Changxing Ding | 2024-12-20 | arXiv | https://github.com/WentaoTan/SENA | http://arxiv.org/abs/2412.15650v1 |
296 | On Verbalized Confidence Scores for LLMs | Daniel Yang, Yao-Hung Hubert Tsai, Makoto Yamada | 2024-12-19 | arXiv | https://github.com/danielyxyang/llm-verbalized-uq | http://arxiv.org/abs/2412.14737v1 |
297 | Sliding Windows Are Not the End: Exploring Full Ranking with Long-Context Large Language Models | Wenhan Liu, Xinyu Ma, Yutao Zhu, Ziliang Zhao, Shuaiqiang Wang, Dawei Yin, Zhicheng Dou | 2024-12-19 | arXiv | https://github.com/8421BCD/fullrank | http://arxiv.org/abs/2412.14574v1 |
298 | ORBIT: Cost-Effective Dataset Curation for Large Language Model Domain Adaptation with an Astronomy Case Study | Eric Modesitt, Ke Yang, Spencer Hulsey, Chengxiang Zhai, Volodymyr Kindratenko | 2024-12-19 | arXiv | https://github.com/ModeEric/ORBIT-Llama | http://arxiv.org/abs/2412.14436v1 |
299 | Agent-SafetyBench: Evaluating the Safety of LLM Agents | Zhexin Zhang, Shiyao Cui, Yida Lu, Jingzhuo Zhou, Junxiao Yang, Hongning Wang, Minlie Huang | 2024-12-19 | arXiv | https://github.com/thu-coai/Agent-SafetyBench | http://arxiv.org/abs/2412.14470v1 |
300 | InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models | Cong Wei, Yujie Zhong, Haoxian Tan, Yingsen Zeng, Yong Liu, Zheng Zhao, Yujiu Yang | 2024-12-18 | arXiv | https://github.com/congvvc/InstructSeg | http://arxiv.org/abs/2412.14006v1 |
301 | Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces | Jihan Yang, Shusheng Yang, Anjali W. Gupta, Rilyn Han, Li Fei-Fei, Saining Xie | 2024-12-18 | arXiv | https://vision-x-nyu.github.io/thinking-in-space.github.io/ | http://arxiv.org/abs/2412.14171v1 |
302 | ResQ: Mixed-Precision Quantization of Large Language Models with Low-Rank Residuals | Utkarsh Saxena, Sayeh Sharify, Kaushik Roy, Xin Wang | 2024-12-18 | arXiv | https://github.com/utkarsh-dmx/project-resq | http://arxiv.org/abs/2412.14363v1 |
303 | Beyond Outcomes: Transparent Assessment of LLM Reasoning in Games | Wenye Lin, Jonathan Roberts, Yunhan Yang, Samuel Albanie, Zongqing Lu, Kai Han | 2024-12-18 | arXiv | https://visual-ai.github.io/gamebot | http://arxiv.org/abs/2412.13602v1 |
304 | Few-shot Steerable Alignment: Adapting Rewards and LLM Policies with Neural Processes | Katarzyna Kobalczyk, Claudio Fanconi, Hao Sun, Mihaela van der Schaar | 2024-12-18 | arXiv | https://github.com/kasia-kobalczyk/few-shot-steerable-alignment | http://arxiv.org/abs/2412.13998v1 |
305 | Crabs: Consuming Resrouce via Auto-generation for LLM-DoS Attack under Black-box Settings | Yuanhe Zhang, Zhenhong Zhou, Wei Zhang, Xinyue Wang, Xiaojun Jia, Yang Liu, Sen Su | 2024-12-18 | arXiv | https://github.com/shuita2333/AutoDoS | http://arxiv.org/abs/2412.13879v1 |
306 | Enhancing Knowledge Distillation for LLMs with Response-Priming Prompting | Vijay Goyal, Mustafa Khan, Aprameya Tirupati, Harveer Saini, Michael Lam, Kevin Zhu | 2024-12-18 | arXiv | https://github.com/alonso130r/knowledge-distillation | http://arxiv.org/abs/2412.17846v1 |
307 | Are Your LLMs Capable of Stable Reasoning? | Junnan Liu, Hongwei Liu, Linchen Xiao, Ziyi Wang, Kuikun Liu, Songyang Gao, Wenwei Zhang, Songyang Zhang, Kai Chen | 2024-12-17 | arXiv | https://github.com/open-compass/GPassK | http://arxiv.org/abs/2412.13147v2 |
308 | Benchmarking and Understanding Compositional Relational Reasoning of LLMs | Ruikang Ni, Da Xiao, Qingye Meng, Xiangyu Li, Shihui Zheng, Hongliang Liang | 2024-12-17 | arXiv | https://github.com/Caiyun-AI/GAR | http://arxiv.org/abs/2412.12841v1 |
309 | Graph Learning in the Era of LLMs: A Survey from the Perspective of Data, Models, and Tasks | Xunkai Li, Zhengyu Wu, Jiayi Wu, Hanwen Cui, Jishuo Jia, Rong-Hua Li, Guoren Wang | 2024-12-17 | arXiv | https://github.com/xkLi-Allen/Awesome-GNN-in-LLMs-Papers | http://arxiv.org/abs/2412.12456v1 |
310 | NLSR: Neuron-Level Safety Realignment of Large Language Models Against Harmful Fine-Tuning | Xin Yi, Shunfan Zheng, Linlin Wang, Gerard de Melo, Xiaoling Wang, Liang He | 2024-12-17 | arXiv | https://github.com/xinykou/NLSR | http://arxiv.org/abs/2412.12497v1 |
311 | SafeAgentBench: A Benchmark for Safe Task Planning of Embodied LLM Agents | Sheng Yin, Xianghe Pang, Yuanzhuo Ding, Menglan Chen, Yutong Bi, Yichen Xiong, Wenhao Huang, Zhen Xiang, Jing Shao, Siheng Chen | 2024-12-17 | arXiv | https://github.com/shengyin1224/SafeAgentBench | http://arxiv.org/abs/2412.13178v2 |
312 | SafeDrive: Knowledge- and Data-Driven Risk-Sensitive Decision-Making for Autonomous Vehicles with Large Language Models | Zhiyuan Zhou, Heye Huang, Boqi Li, Shiyue Zhao, Yao Mu, Jianqiang Wang | 2024-12-17 | arXiv | https://mezzi33.github.io/SafeDrive/ | http://arxiv.org/abs/2412.13238v2 |
313 | Assessing the Limitations of Large Language Models in Clinical Fact Decomposition | Monica Munnangi, Akshay Swaminathan, Jason Alan Fries, Jenelle Jindal, Sanjana Narayanan, Ivan Lopez, Lucia Tu, Philip Chung, Jesutofunmi A. Omiye, Mehr Kashyap, Nigam Shah | 2024-12-17 | arXiv | https://github.com/som-shahlab/factehr | http://arxiv.org/abs/2412.12422v1 |
314 | LLMs Can Simulate Standardized Patients via Agent Coevolution | Zhuoyun Du, Lujie Zheng, Renjun Hu, Yuyang Xu, Xiawei Li, Ying Sun, Wei Chen, Jian Wu, Haolei Cai, Haohao Ying | 2024-12-16 | arXiv | https://github.com/ZJUMAI/EvoPatient | http://arxiv.org/abs/2412.11716v1 |
315 | RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation | Xiaoxi Li, Jiajie Jin, Yujia Zhou, Yongkang Wu, Zhonghua Li, Qi Ye, Zhicheng Dou | 2024-12-16 | arXiv | https://github.com/sunnynexus/RetroLLM | http://arxiv.org/abs/2412.11919v1 |
316 | RL-LLM-DT: An Automatic Decision Tree Generation Method Based on RL Evaluation and LLM Enhancement | Junjie Lin, Jian Zhao, Lin Liu, Yue Deng, Youpeng Zhao, Lanxiao Huang, Xia Lin, Wengang Zhou, Houqiang Li | 2024-12-16 | arXiv | https://github.com/Linjunjie99/RL-LLM-DT | http://arxiv.org/abs/2412.11417v2 |
317 | Does VLM Classification Benefit from LLM Description Semantics? | Pingchuan Ma, Lennart Rietdorf, Dmytro Kotovenko, Vincent Tao Hu, Björn Ommer | 2024-12-16 | arXiv | https://github.com/CompVis/DisCLIP | http://arxiv.org/abs/2412.11917v3 |
318 | BlenderLLM: Training Large Language Models for Computer-Aided Design with Self-improvement | Yuhao Du, Shunian Chen, Wenbo Zan, Peizhao Li, Mingxuan Wang, Dingjie Song, Bo Li, Yan Hu, Benyou Wang | 2024-12-16 | arXiv | https://github.com/FreedomIntelligence/BlenderLLM | http://arxiv.org/abs/2412.14203v1 |
319 | Analyzing Images of Legal Documents: Toward Multi-Modal LLMs for Access to Justice | Hannes Westermann, Jaromir Savelka | 2024-12-16 | arXiv | https://github.com/hwestermann/AI4A2J_analyzing_images_of_legal_documents | http://arxiv.org/abs/2412.15260v1 |
320 | NITRO: LLM Inference on Intel Laptop NPUs | Anthony Fei, Mohamed S. Abdelfattah | 2024-12-15 | arXiv | https://github.com/abdelfattah-lab/nitro | http://arxiv.org/abs/2412.11053v1 |
321 | Empowering LLMs to Understand and Generate Complex Vector Graphics | Ximing Xing, Juncheng Hu, Guotao Liang, Jing Zhang, Dong Xu, Qian Yu | 2024-12-15 | arXiv | https://ximinng.github.io/LLM4SVGProject/ | http://arxiv.org/abs/2412.11102v1 |
322 | Learning to Verify Summary Facts with Fine-Grained LLM Feedback | Jihwan Oh, Jeonghwan Choi, Nicole Hee-Yeon Kim, Taewon Yun, Hwanjun Song | 2024-12-14 | arXiv | https://github.com/DISL-Lab/FineSumFact | http://arxiv.org/abs/2412.10689v1 |
323 | B-VLLM: A Vision Large Language Model with Balanced Spatio-Temporal Tokens | Zhuqiang Lu, Zhenfei Yin, Mengwei He, Zhihui Wang, Zicheng Liu, Zhiyong Wang, Kun Hu | 2024-12-13 | arXiv | https://github.com/zhuqiangLu/B-VLLM | http://arxiv.org/abs/2412.09919v1 |
324 | Can LLMs Convert Graphs to Text-Attributed Graphs? | Zehong Wang, Sidney Liu, Zheyuan Zhang, Tianyi Ma, Chuxu Zhang, Yanfang Ye | 2024-12-13 | arXiv | https://github.com/Zehong-Wang/TANS | http://arxiv.org/abs/2412.10136v1 |
325 | ChainStream: An LLM-based Framework for Unified Synthetic Sensing | Jiacheng Liu, Yuanchun Li, Liangyan Li, Yi Sun, Hao Wen, Xiangyu Li, Yao Guo, Yunxin Liu | 2024-12-13 | arXiv | https://github.com/MobileLLM/ChainStream | http://arxiv.org/abs/2412.15240v1 |
326 | CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models | Zhihao Du, Yuxuan Wang, Qian Chen, Xian Shi, Xiang Lv, Tianyu Zhao, Zhifu Gao, Yexin Yang, Changfeng Gao, Hui Wang, Fan Yu, Huadai Liu, Zhengyan Sheng, Yue Gu, Chong Deng, Wen Wang, Shiliang Zhang, Zhijie Yan, Jingren Zhou | 2024-12-13 | arXiv | https://funaudiollm.github.io/cosyvoice2 | http://arxiv.org/abs/2412.10117v3 |
327 | Enhancing Multimodal Large Language Models Complex Reason via Similarity Computation | Xiaofeng Zhang, Fanshuo Zeng, Yihao Quan, Zheng Hui, Jiawei Yao | 2024-12-13 | arXiv | https://github.com/FanshuoZeng/Simignore | http://arxiv.org/abs/2412.09817v1 |
328 | Can Modern LLMs Act as Agent Cores in Radiology Environments? | Qiaoyu Zheng, Chaoyi Wu, Pengcheng Qiu, Lisong Dai, Ya Zhang, Yanfeng Wang, Weidi Xie | 2024-12-12 | arXiv | https://github.com/MAGIC-AI4Med/RadABench | http://arxiv.org/abs/2412.09529v2 |
329 | RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios | Ruiwen Zhou, Wenyue Hua, Liangming Pan, Sitao Cheng, Xiaobao Wu, En Yu, William Yang Wang | 2024-12-12 | arXiv | https://github.com/skyriver-2000/RuleArena | http://arxiv.org/abs/2412.08972v1 |
330 | Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine | Xiaoshuang Huang, Lingdong Shen, Jia Liu, Fangxin Shang, Hongxiang Li, Haifeng Huang, Yehui Yang | 2024-12-12 | arXiv | https://github.com/ShawnHuang497/MedPLIB | http://arxiv.org/abs/2412.09278v1 |
331 | What Makes Cryptic Crosswords Challenging for LLMs? | Abdelrahman Sadallah, Daria Kotova, Ekaterina Kochmar | 2024-12-12 | COLING 2025 | https://github.com/bodasadallah/decrypting-crosswords | http://arxiv.org/abs/2412.09012v1 |
332 | Multi-GraspLLM: A Multimodal LLM for Multi-Hand Semantic Guided Grasp Generation | Haosheng Li, Weixin Mao, Weipeng Deng, Chenyu Meng, Haoqiang Fan, Tiancai Wang, Ping Tan, Hongan Wang, Xiaoming Deng | 2024-12-11 | arXiv | https://multi-graspllm.github.io | http://arxiv.org/abs/2412.08468v1 |
333 | Concept Bottleneck Large Language Models | Chung-En Sun, Tuomas Oikarinen, Berk Ustun, Tsui-Wei Weng | 2024-12-11 | arXiv | https://github.com/Trustworthy-ML-Lab/CB-LLMs | http://arxiv.org/abs/2412.07992v1 |
334 | Autoformalizing and Simulating Game-Theoretic Scenarios using LLM-augmented Agents | Agnieszka Mensfelt, Kostas Stathis, Vince Trencsenyi | 2024-12-11 | arXiv | https://github.com/dicelab-rhul/autoformalizing-agents | http://arxiv.org/abs/2412.08805v1 |
335 | IntellectSeeker: A Personalized Literature Management System with the Probabilistic Model and Large Language Model | Weizhen Bian, Siyan Liu, Yubo Zhou, Dezhi Chen, Yijie Liao, Zhenzhen Fan, Aobo Wang | 2024-12-10 | KSEM | https://github.com/LuckyBian/ISY5001 | https://doi.org/10.1007/978-981-97-5489-2_24 |
336 | DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation | Jianzong Wu, Chao Tang, Jingbo Wang, Yanhong Zeng, Xiangtai Li, Yunhai Tong | 2024-12-10 | arXiv | https://jianzongwu.github.io/projects/diffsensei/ | http://arxiv.org/abs/2412.07589v1 |
337 | Frame Representation Hypothesis: Multi-Token LLM Interpretability and Concept-Guided Text Generation | Pedro H. V. Valois, Lincon S. Souza, Erica K. Shimomoto, Kazuhiro Fukui | 2024-12-10 | arXiv | https://github.com/phvv-me/frame-representation-hypothesis | http://arxiv.org/abs/2412.07334v2 |
338 | LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation | Eunsu Kim, Juyoung Suk, Seungone Kim, Niklas Muennighoff, Dongkwan Kim, Alice Oh | 2024-12-10 | arXiv | https://github.com/interview-eval/ | http://arxiv.org/abs/2412.10424v2 |
339 | Methods for Legal Citation Prediction in the Age of LLMs: An Australian Law Case Study | Ehsan Shareghi, Jiuzhou Han, Paul Burgess | 2024-12-09 | arXiv | https://auslawbench.github.io | http://arxiv.org/abs/2412.06272v1 |
340 | PediaBench: A Comprehensive Chinese Pediatric Dataset for Benchmarking Large Language Models | Qian Zhang, Panfeng Chen, Jiali Li, Linkun Feng, Shuyu Liu, Heng Zhao, Mei Chen, Hui Li, Yanhao Wang | 2024-12-09 | arXiv | https://github.com/ACMISLab/PediaBench | http://arxiv.org/abs/2412.06287v2 |
341 | Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models | Xiao Xu, Tianhao Niu, Yuxi Xie, Libo Qin, Wanxiang Che, Min-Yen Kan | 2024-12-08 | arXiv | https://github.com/LooperXX/MMGiC | http://arxiv.org/abs/2412.05939v1 |
342 | KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models | Fan Wang, Juyong Jiang, Chansung Park, Sunghun Kim, Jing Tang | 2024-12-08 | arXiv | https://github.com/juyongjiang/KaSA | http://arxiv.org/abs/2412.06071v1 |
343 | Training-Free Bayesianization for Low-Rank Adapters of Large Language Models | Haizhou Shi, Yibin Wang, Ligong Han, Huan Zhang, Hao Wang | 2024-12-07 | arXiv | https://github.com/Wang-ML-Lab/bayesian-peft | http://arxiv.org/abs/2412.05723v1 |
344 | LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods | Haitao Li, Qian Dong, Junjie Chen, Huixue Su, Yujia Zhou, Qingyao Ai, Ziyi Ye, Yiqun Liu | 2024-12-07 | arXiv | https://github.com/CSHaitao/Awesome-LLMs-as-Judges | http://arxiv.org/abs/2412.05579v2 |
345 | Towards Learning to Reason: Comparing LLMs with Neuro-Symbolic on Arithmetic Relations in Abstract Reasoning | Michael Hersche, Giacomo Camposampiero, Roger Wattenhofer, Abu Sebastian, Abbas Rahimi | 2024-12-07 | arXiv | https://github.com/IBM/raven-large-language-models | http://arxiv.org/abs/2412.05586v1 |
346 | EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios | Lu Qiu, Yuying Ge, Yi Chen, Yixiao Ge, Ying Shan, Xihui Liu | 2024-12-05 | arXiv | https://qiulu66.github.io/egoplanbench2/ | http://arxiv.org/abs/2412.04447v1 |
347 | LossAgent: Towards Any Optimization Objectives for Image Processing with LLM Agents | Bingchen Li, Xin Li, Yiting Lu, Zhibo Chen | 2024-12-05 | arXiv | https://github.com/lbc12345/LossAgent | http://arxiv.org/abs/2412.04090v1 |
348 | Reinforcement Learning Enhanced LLMs: A Survey | Shuhe Wang, Shengyu Zhang, Jie Zhang, Runyi Hu, Xiaoya Li, Tianwei Zhang, Jiwei Li, Fei Wu, Guoyin Wang, Eduard Hovy | 2024-12-05 | arXiv | https://github.com/ShuheWang1998/Reinforcement-Learning-Enhanced-LLMs-A-Survey | http://arxiv.org/abs/2412.10400v2 |
349 | VidHalluc: Evaluating Temporal Hallucinations in Multimodal Large Language Models for Video Understanding | Chaoyu Li, Eun Woo Im, Pooyan Fazli | 2024-12-04 | arXiv | https://vid-halluc.github.io/ | http://arxiv.org/abs/2412.03735v1 |
350 | From Individual to Society: A Survey on Social Simulation Driven by Large Language Model-based Agents | Xinyi Mou, Xuanwen Ding, Qi He, Liang Wang, Jingcong Liang, Xinnong Zhang, Libo Sun, Jiayu Lin, Jie Zhou, Xuanjing Huang, Zhongyu Wei | 2024-12-04 | arXiv | https://github.com/FudanDISC/SocialAgent | http://arxiv.org/abs/2412.03563v1 |
351 | Improving Linguistic Diversity of Large Language Models with Possibility Exploration Fine-Tuning | Long Mai, Julie Carson-Berndsen | 2024-12-04 | arXiv | https://github.com/mailong25/peft_diversity | http://arxiv.org/abs/2412.03343v1 |
352 | Alignment at Pre-training! Towards Native Alignment for Arabic LLMs | Juhao Liang, Zhenyang Cai, Jianqing Zhu, Huang Huang, Kewei Zong, Bang An, Mosen Alharthi, Juncai He, Lian Zhang, Haizhou Li, Benyou Wang, Jinchao Xu | 2024-12-04 | arXiv | https://github.com/FreedomIntelligence/AceGPT-v2 | http://arxiv.org/abs/2412.03253v1 |
353 | Fine-Grained Behavior Simulation with Role-Playing Large Language Model on Social Media | Kun Li, Chenwei Dai, Wei Zhou, Songlin Hu | 2024-12-04 | arXiv | https://github.com/linkseed18612254945/FineRob | http://arxiv.org/abs/2412.03148v1 |
354 | AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning | Yiwu Zhong, Zhuoming Liu, Yin Li, Liwei Wang | 2024-12-04 | arXiv | https://github.com/LaVi-Lab/AIM | http://arxiv.org/abs/2412.03248v1 |
355 | AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information? | Kaixiong Gong, Kaituo Feng, Bohao Li, Yibing Wang, Mofan Cheng, Shijia Yang, Jiaming Han, Benyou Wang, Yutong Bai, Zhuoran Yang, Xiangyu Yue | 2024-12-03 | arXiv | https://av-odyssey.github.io/ | http://arxiv.org/abs/2412.02611v1 |
356 | CNNSum: Exploring Long-Context Summarization with Large Language Models in Chinese Novels | Lingxiao Wei, He Yan, Xiangju Lu, Junmin Zhu, Jun Wang, Wei Zhang | 2024-12-03 | arXiv | https://github.com/CxsGhost/CNNSum | http://arxiv.org/abs/2412.02819v4 |
357 | Drawing Pandas: A Benchmark for LLMs in Generating Plotting Code | Timur Galimzyanov, Sergey Titov, Yaroslav Golubev, Egor Bogomolov | 2024-12-03 | arXiv | https://github.com/JetBrains-Research/PandasPlotBench | http://arxiv.org/abs/2412.02764v1 |
358 | Unleashing GHOST: An LLM-Powered Framework for Automated Hardware Trojan Design | Md Omar Faruque, Peter Jamieson, Ahmad Patooghy, Abdel-Hameed A. Badawy | 2024-12-03 | arXiv | https://github.com/HSTRG1/GHOSTbenchmarks | http://arxiv.org/abs/2412.02816v1 |
359 | DaDu-E: Rethinking the Role of Large Language Model in Robotic Computing Pipeline | Wenhao Sun, Sai Hou, Zixuan Wang, Bo Yu, Shaoshan Liu, Xu Yang, Shuai Liang, Yiming Gan, Yinhe Han | 2024-12-02 | arXiv | https://rlc-lab.github.io/dadu-e/ | http://arxiv.org/abs/2412.01663v1 |
360 | DFRot: Achieving Outlier-Free and Massive Activation-Free for Rotated LLMs with Refined Rotation | Jingyang Xiang, Sai Qian Zhang | 2024-12-01 | arXiv | https://github.com/JingyangXiang/DFRot | http://arxiv.org/abs/2412.00648v2 |
361 | Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification | Wenxuan Huang, Zijie Zhai, Yunhang Shen, Shaosheng Cao, Fei Zhao, Xiangfeng Xu, Zheyu Ye, Shaohui Lin | 2024-12-01 | arXiv | https://github.com/Osilly/dynamic_llava | http://arxiv.org/abs/2412.00876v3 |
362 | GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models | Kunsheng Tang, Wenbo Zhou, Jie Zhang, Aishan Liu, Gelei Deng, Shuai Li, Peigui Qi, Weiming Zhang, Tianwei Zhang, Nenghai Yu | 2024-12 | CCS '24: Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security | https://github.com/kstanghere/GenderCARE-ccs24 | https://dl.acm.org/doi/10.1145/3658644.3670284 |
363 | Mitigating Entity-Level Hallucination in Large Language Models | Weihang Su, Yichen Tang, Qingyao Ai, Changyue Wang, Zhijing Wu, Yiqun Liu | 2024-12 | SIGIR-AP 2024: Proceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region | https://github.com/oneal2000/EntityHallucination | https://dl.acm.org/doi/10.1145/3673791.3698403 |
364 | Optimization-based Prompt Injection Attack to LLM-as-a-Judge | Jiawen Shi, Zenghui Yuan, Yinuo Liu, Yue Huang, Pan Zhou, Lichao Sun, Neil Zhenqiang Gong | 2024-12 | CCS '24: Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security | https://github.com/ShiJiawenwen/JudgeDeceiver | https://dl.acm.org/doi/10.1145/3658644.3690291 |
365 | PLeak: Prompt Leaking Attacks against Large Language Model Applications | Bo Hui, Haolin Yuan, Neil Zhenqiang Gong, Philippe Burlina, Yinzhi Cao | 2024-12 | CCS '24: Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security | https://github.com/BHui97/PLeak | https://dl.acm.org/doi/10.1145/3658644.3670370 |
366 | Node Importance Estimation Leveraging LLMs for Semantic Augmentation in Knowledge Graphs | Xinyu Lin, Tianyu Zhang, Chengbin Hou, Jinbao Wang, Jianye Xue, Hairong Lv | 2024-11-30 | arXiv | https://github.com/XinyuLin-FZ/LENIE | http://arxiv.org/abs/2412.00478v1 |
367 | AgriBench: A Hierarchical Agriculture Benchmark for Multimodal Large Language Models | Yutong Zhou, Masahiro Ryo | 2024-11-30 | arXiv | https://github.com/Yutong-Zhou-cv/AgriBench | http://arxiv.org/abs/2412.00465v2 |
368 | DroidCall: A Dataset for LLM-powered Android Intent Invocation | Weikai Xie, Li Zhang, Shihe Wang, Rongjie Yi, Mengwei Xu | 2024-11-30 | arXiv | https://github.com/UbiquitousLearning/DroidCall | http://arxiv.org/abs/2412.00402v1 |
369 | T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs | Shukang Yin, Chaoyou Fu, Sirui Zhao, Yunhang Shen, Chunjiang Ge, Yan Yang, Zuwei Long, Yuhan Dai, Tong Xu, Xing Sun, Ran He, Caifeng Shan, Enhong Chen | 2024-11-29 | arXiv | https://github.com/xjtupanda/T2Vid | http://arxiv.org/abs/2411.19951v2 |
370 | TQA-Bench: Evaluating LLMs for Multi-Table Question Answering with Scalable Context and Symbolic Extension | Zipeng Qiu, You Peng, Guangxin He, Binhang Yuan, Chen Wang | 2024-11-29 | arXiv | https://github.com/Relaxed-System-Lab/TQA-Bench | http://arxiv.org/abs/2411.19504v1 |
371 | Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings | Qiong Wu, Wenhao Lin, Weihao Ye, Yiyi Zhou, Xiaoshuai Sun, Rongrong Ji | 2024-11-29 | arXiv | https://github.com/DoubtedSteam/DyVTE | http://arxiv.org/abs/2411.19628v1 |
372 | Ensemble Watermarks for Large Language Models | Georg Niess, Roman Kern | 2024-11-29 | arXiv | http://github.com/CommodoreEU/master-generation | http://arxiv.org/abs/2411.19563v1 |
373 | Auto-RAG: Autonomous Retrieval-Augmented Generation for Large Language Models | Tian Yu, Shaolei Zhang, Yang Feng | 2024-11-29 | arXiv | https://github.com/ictnlp/Auto-RAG | http://arxiv.org/abs/2411.19443v1 |
374 | Personalized Federated Fine-Tuning for LLMs via Data-Driven Heterogeneous Model Architectures | Yicheng Zhang, Zhen Qin, Zhaomin Wu, Shuiguang Deng | 2024-11-28 | arXiv | https://github.com/zyc140345/FedAMoLE | http://arxiv.org/abs/2411.19128v1 |
375 | Enhancing Visual Reasoning with Autonomous Imagination in Multimodal Large Language Models | Jingming Liu, Yumeng Li, Boyuan Xiao, Yichang Jian, Ziang Qin, Tianjia Shao, Yao-Xiang Ding, Kun Zhou | 2024-11-27 | arXiv | https://future-item.github.io/autoimagine-site | http://arxiv.org/abs/2411.18142v1 |
376 | TimeMarker: A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability | Shimin Chen, Xiaohan Lan, Yitian Yuan, Zequn Jie, Lin Ma | 2024-11-27 | arXiv | https://github.com/TimeMarker-LLM/TimeMarker/ | http://arxiv.org/abs/2411.18211v1 |
377 | ChatRex: Taming Multimodal LLM for Joint Perception and Understanding | Qing Jiang, Gen Luo, Yuqin Yang, Yuda Xiong, Yihao Chen, Zhaoyang Zeng, Tianhe Ren, Lei Zhang | 2024-11-27 | arXiv | https://github.com/IDEA-Research/ChatRex | http://arxiv.org/abs/2411.18363v2 |
378 | Can LLMs be Good Graph Judger for Knowledge Graph Construction? | Haoyu Huang, Chong Chen, Conghui He, Yang Li, Jiawei Jiang, Wentao Zhang | 2024-11-26 | arXiv | https://github.com/hhy-huang/GraphJudger | http://arxiv.org/abs/2411.17388v1 |
379 | Leveraging Large Language Models and Topic Modeling for Toxicity Classification | Haniyeh Ehsani Oskouie, Christina Chance, Claire Huang, Margaret Capetz, Elizabeth Eyeson, Majid Sarrafzadeh | 2024-11-26 | arXiv | https://github.com/aheldis/Toxicity-Classification | http://arxiv.org/abs/2411.17876v1 |
380 | Star Attention: Efficient LLM Inference over Long Sequences | Shantanu Acharya, Fei Jia, Boris Ginsburg | 2024-11-26 | arXiv | https://github.com/NVIDIA/Star-Attention | http://arxiv.org/abs/2411.17116v1 |
381 | Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models | Ronghuan Wu, Wanchao Su, Jing Liao | 2024-11-25 | arXiv | https://chat2svg.github.io/ | http://arxiv.org/abs/2411.16602v1 |
382 | From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge | Dawei Li, Bohan Jiang, Liangjie Huang, Alimohammad Beigi, Chengshuai Zhao, Zhen Tan, Amrita Bhattacharjee, Yuxuan Jiang, Canyu Chen, Tianhao Wu, Kai Shu, Lu Cheng, Huan Liu | 2024-11-25 | arXiv | https://github.com/llm-as-a-judge/Awesome-LLM-as-a-judge | http://arxiv.org/abs/2411.16594v4 |
383 | Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision | Zhiheng Xi, Dingwen Yang, Jixuan Huang, Jiafu Tang, Guanyu Li, Yiwen Ding, Wei He, Boyang Hong, Shihan Do, Wenyu Zhan, Xiao Wang, Rui Zheng, Tao Ji, Xiaowei Shi, Yitao Zhai, Rongxiang Weng, Jingang Wang, Xunliang Cai, Tao Gui, Zuxuan Wu, Qi Zhang, Xipeng Qiu, Xuanjing Huang, Yu-Gang Jiang | 2024-11-25 | arXiv | https://mathcritique.github.io/ | http://arxiv.org/abs/2411.16579v1 |
384 | ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration | Haozhan Shen, Kangjia Zhao, Tiancheng Zhao, Ruochen Xu, Zilun Zhang, Mingwei Zhu, Jianwei Yin | 2024-11-25 | arXiv | https://github.com/om-ai-lab/ZoomEye | http://arxiv.org/abs/2411.16044v1 |
385 | CS-Eval: A Comprehensive Large Language Model Benchmark for CyberSecurity | Zhengmin Yu, Jiutian Zeng, Siyi Chen, Wenhan Xu, Dandan Xu, Xiangyu Liu, Zonghao Ying, Nan Wang, Yuan Zhang, Min Yang | 2024-11-25 | arXiv | https://github.com/CS-EVAL/CS-Eval | http://arxiv.org/abs/2411.16239v2 |
386 | BayLing 2: A Multilingual Large Language Model with Efficient Language Alignment | Shaolei Zhang, Kehao Zhang, Qingkai Fang, Shoutao Guo, Yan Zhou, Xiaodong Liu, Yang Feng | 2024-11-25 | arXiv | https://github.com/ictnlp/BayLing | https://doi.org/10.48550/arXiv.2411.16300 |
387 | Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering | Federico Cocchi, Nicholas Moratelli, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara | 2024-11-25 | arXiv | https://github.com/aimagelab/ReflectiVA | http://arxiv.org/abs/2411.16863v1 |
388 | VidHal: Benchmarking Temporal Hallucinations in Vision LLMs | Wey Yeh Choong, Yangyang Guo, Mohan Kankanhalli | 2024-11-25 | arXiv | https://github.com/Lookuz/VidHal | http://arxiv.org/abs/2411.16771v1 |
389 | Multi-label Sequential Sentence Classification via Large Language Model | Mengfei Lan, Lecheng Zheng, Shufan Ming, Halil Kilicoglu | 2024-11-23 | EMNLP | https://github.com/ScienceNLP-Lab/LLM-SSC | https://aclanthology.org/2024.findings-emnlp.944 |
390 | ChemSafetyBench: Benchmarking LLM Safety on Chemistry Domain | Haochen Zhao, Xiangru Tang, Ziran Yang, Xiao Han, Xuanzhi Feng, Yueqing Fan, Senhao Cheng, Di Jin, Yilun Zhao, Arman Cohan, Mark Gerstein | 2024-11-23 | arXiv | https://github.com/HaochenZhao/SafeAgent4Chem | http://arxiv.org/abs/2411.16736v1 |
391 | Seed-Free Synthetic Data Generation Framework for Instruction-Tuning LLMs: A Case Study in Thai | Parinthapat Pengpun, Can Udomcharoenchaikit, Weerayut Buaphet, Peerat Limkonchotiwat | 2024-11-23 | arXiv | https://github.com/parinzee/seed-free-synthetic-instruct | http://arxiv.org/abs/2411.15484v1 |
392 | MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs | Chaoyou Fu, Yi-Fan Zhang, Shukang Yin, Bo Li, Xinyu Fang, Sirui Zhao, Haodong Duan, Xing Sun, Ziwei Liu, Liang Wang, Caifeng Shan, Ran He | 2024-11-22 | arXiv | https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Benchmarks | http://arxiv.org/abs/2411.15296v2 |
393 | SemiKong: Curating, Training, and Evaluating A Semiconductor Industry-Specific Large Language Model | Christopher Nguyen, William Nguyen, Atsushi Suzuki, Daisuke Oku, Hong An Phan, Sang Dinh, Zooey Nguyen, Anh Ha, Shruti Raghavan, Huy Vo, Thang Nguyen, Lan Nguyen, Yoshikuni Hirayama | 2024-11-21 | arXiv | https://github.com/aitomatic/semikong | http://arxiv.org/abs/2411.13802v2 |
394 | UnifiedCrawl: Aggregated Common Crawl for Affordable Adaptation of LLMs on Low-Resource Languages | Bethel Melesse Tessema, Akhil Kedia, Tae-Sun Chung | 2024-11-21 | arXiv | https://github.com/bethelmelesse/unifiedcrawl | http://arxiv.org/abs/2411.14343v1 |
395 | DRPruning: Efficient Large Language Model Pruning through Distributionally Robust Optimization | Hexuan Deng, Wenxiang Jiao, Xuebo Liu, Min Zhang, Zhaopeng Tu | 2024-11-21 | arXiv | https://github.com/hexuandeng/DRPruning | http://arxiv.org/abs/2411.14055v1 |
396 | Disentangling Memory and Reasoning Ability in Large Language Models | Mingyu Jin, Weidi Luo, Sitao Cheng, Xinyi Wang, Wenyue Hua, Ruixiang Tang, William Yang Wang, Yongfeng Zhang | 2024-11-20 | arXiv | https://github.com/MingyuJ666/Disentangling-Memory-and-Reasoning | http://arxiv.org/abs/2411.13504v2 |
397 | DriveMLLM: A Benchmark for Spatial Understanding with Multimodal Large Language Models in Autonomous Driving | Xianda Guo, Ruijun Zhang, Yiqun Duan, Yuhang He, Chenming Zhang, Shuai Liu, Long Chen | 2024-11-20 | arXiv | https://github.com/XiandaGuo/Drive-MLLM | http://arxiv.org/abs/2411.13112v2 |
398 | On the Consistency of Video Large Language Models in Temporal Comprehension | Minjoon Jung, Junbin Xiao, Byoung-Tak Zhang, Angela Yao | 2024-11-20 | arXiv | https://github.com/minjoong507/Consistency-of-Video-LLM | http://arxiv.org/abs/2411.12951v1 |
399 | Does Unlearning Truly Unlearn? A Black Box Evaluation of LLM Unlearning Methods | Jai Doshi, Asa Cooper Stickland | 2024-11-18 | arXiv | https://github.com/JaiDoshi/Knowledge-Erasure | http://arxiv.org/abs/2411.12103v2 |
400 | FLAME: Frozen Large Language Models Enable Data-Efficient Language-Image Pre-training | Anjia Cao, Xing Wei, Zhiheng Ma | 2024-11-18 | arXiv | https://github.com/MIV-XJTU/FLAME | http://arxiv.org/abs/2411.11927v2 |
401 | Understanding Multimodal LLMs: the Mechanistic Interpretability of Llava in Visual Question Answering | Zeping Yu, Sophia Ananiadou | 2024-11-17 | arXiv | https://github.com/zepingyu0512/llava-mechanism | http://arxiv.org/abs/2411.10950v1 |
402 | TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Models | Tingyu Qu, Mingxiao Li, Tinne Tuytelaars, Marie-Francine Moens | 2024-11-17 | arXiv | https://github.com/tingyu215/TS-LLaVA | http://arxiv.org/abs/2411.11066v1 |
403 | BianCang: A Traditional Chinese Medicine Large Language Model | Sibo Wei, Xueping Peng, Yi-fei Wang, Jiasheng Si, Weiyu Zhang, Wenpeng Lu, Xiaoming Wu, Yinglong Wang | 2024-11-17 | arXiv | https://github.com/QLU-NLP/BianCang | http://arxiv.org/abs/2411.11027v1 |
404 | Multilingual Large Language Models: A Systematic Survey | Shaolin Zhu, Supryadi, Shaoyang Xu, Haoran Sun, Leiyu Pan, Menglong Cui, Jiangcun Du, Renren Jin, António Branco, Deyi Xiong | 2024-11-17 | arXiv | https://github.com/tjunlp-lab/Awesome-Multilingual-LLMs-Papers | http://arxiv.org/abs/2411.11072v2 |
405 | Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model | Ting Liu, Liangtao Shi, Richang Hong, Yue Hu, Quanjun Yin, Linfeng Zhang | 2024-11-16 | arXiv | https://github.com/liuting20/MustDrop | http://arxiv.org/abs/2411.10803v1 |
406 | Evaluating Creativity and Deception in Large Language Models: A Simulation Framework for Multi-Agent Balderdash | Parsa Hejabi, Elnaz Rahmati, Alireza S. Ziabari, Preni Golazizian, Jesse Thomason, Morteza Dehghani | 2024-11-15 | arXiv | https://github.com/ParsaHejabi/Simulation-Framework-for-Multi-Agent-Balderdash | http://arxiv.org/abs/2411.10422v1 |
407 | Instruction-Guided Editing Controls for Images and Multimedia: A Survey in LLM era | Thanh Tam Nguyen, Zhao Ren, Trinh Pham, Thanh Trung Huynh, Phi Le Nguyen, Hongzhi Yin, Quoc Viet Hung Nguyen | 2024-11-15 | arXiv | https://github.com/tamlhp/awesome-instruction-editing | http://arxiv.org/abs/2411.09955v2 |
408 | Orca: Enhancing Role-Playing Abilities of Large Language Models by Integrating Personality Traits | Yuxuan Huang | 2024-11-15 | arXiv | https://github.com/Aipura/Orca | http://arxiv.org/abs/2411.10006v1 |
409 | Thinking Before Looking: Improving Multimodal LLM Reasoning via Mitigating Visual Hallucination | Haojie Zheng, Tianyang Xu, Hanchi Sun, Shu Pu, Ruoxi Chen, Lichao Sun | 2024-11-15 | arXiv | https://github.com/Terry-Xu-666/visual_inference_chain | http://arxiv.org/abs/2411.12591v1 |
410 | DROJ: A Prompt-Driven Attack against Large Language Models | Leyang Hu, Boran Wang | 2024-11-14 | arXiv | https://github.com/Leon-Leyang/LLM-Safeguard | http://arxiv.org/abs/2411.09125v1 |
411 | MM-Eval: A Hierarchical Benchmark for Modern Mongolian Evaluation in LLMs | Mengyuan Zhang, Ruihui Wang, Bo Xia, Yuan Sun, Xiaobing Zhao | 2024-11-14 | arXiv | https://github.com/joenahm/MM-Eval | http://arxiv.org/abs/2411.09492v1 |
412 | LHRS-Bot-Nova: Improved Multimodal Large Language Model for Remote Sensing Vision-Language Interpretation | Zhenshi Li, Dilxat Muhtar, Feng Gu, Xueliang Zhang, Pengfeng Xiao, Guangjun He, Xiaoxiang Zhu | 2024-11-14 | arXiv | https://github.com/NJU-LHRS/LHRS-Bot | https://doi.org/10.48550/arXiv.2411.09301 |
413 | CorrectBench: Automatic Testbench Generation with Functional Self-Correction using LLMs for HDL Design | Ruidi Qiu, Grace Li Zhang, Rolf Drechsler, Ulf Schlichtmann, Bing Li | 2024-11-13 | arXiv | https://github.com/AutoBench/CorrectBench | http://arxiv.org/abs/2411.08510v1 |
414 | DART-LLM: Dependency-Aware Multi-Robot Task Decomposition and Execution using Large Language Models | Yongdong Wang, Runze Xiao, Jun Younes Louhi Kasahara, Ryosuke Yajima, Keiji Nagatani, Atsushi Yamashita, Hajime Asama | 2024-11-13 | arXiv | https://wyd0817.github.io/project-dart-llm/ | http://arxiv.org/abs/2411.09022v1 |
415 | Large Language Models Can Self-Improve in Long-context Reasoning | Siheng Li, Cheng Yang, Zesen Cheng, Lemao Liu, Mo Yu, Yujiu Yang, Wai Lam | 2024-11-12 | arXiv | https://github.com/SihengLi99/SEALONG | http://arxiv.org/abs/2411.08147v1 |
416 | Verbosity |
Yusen Zhang, Sarkar Snigdha Sarathi Das, Rui Zhang | 2024-11-12 | arXiv | https://github.com/psunlpgroup/VerbosityLLM | http://arxiv.org/abs/2411.07858v2 |
417 | ClinicalBench: Can LLMs Beat Traditional ML Models in Clinical Prediction? | Canyu Chen, Jian Yu, Shan Chen, Che Liu, Zhongwei Wan, Danielle Bitterman, Fei Wang, Kai Shu | 2024-11-10 | arXiv | https://clinicalbench.github.io | http://arxiv.org/abs/2411.06469v1 |
418 | Golden Touchstone: A Comprehensive Bilingual Benchmark for Evaluating Financial Large Language Models | Xiaojun Wu, Junxi Liu, Huanyi Su, Zhouchi Lin, Yiyan Qi, Chengjin Xu, Jiajun Su, Jiajie Zhong, Fuwei Wang, Saizhuo Wang, Fengrui Hua, Jia Li, Jian Guo | 2024-11-09 | arXiv | https://github.com/IDEA-FinAI/Golden-Touchstone | http://arxiv.org/abs/2411.06272v1 |
419 | TourSynbio-Search: A Large Language Model Driven Agent Framework for Unified Search Method for Protein Engineering | Yungeng Liu, Zan Chen, Yu Guang Wang, Yiqing Shen | 2024-11-09 | arXiv | https://github.com/tsynbio/Toursynbio-Search | http://arxiv.org/abs/2411.06024v1 |
420 | WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models | Shengda Fan, Xin Cong, Yuepeng Fu, Zhong Zhang, Shuyan Zhang, Yuanwei Liu, Yesai Wu, Yankai Lin, Zhiyuan Liu, Maosong Sun | 2024-11-08 | arXiv | https://github.com/OpenBMB/WorkflowLLM | http://arxiv.org/abs/2411.05451v1 |
421 | Game-theoretic LLM: Agent Workflow for Negotiation Games | Wenyue Hua, Ollie Liu, Lingyao Li, Alfonso Amayuelas, Julie Chen, Lucas Jiang, Mingyu Jin, Lizhou Fan, Fei Sun, William Wang, Xintong Wang, Yongfeng Zhang | 2024-11-08 | arXiv | https://github.com/Wenyueh/game_theory | http://arxiv.org/abs/2411.05990v2 |
422 | Exploring the Alignment Landscape: LLMs and Geometric Deep Models in Protein Representation | Dong Shu, Bingbing Duan, Kai Guo, Kaixiong Zhou, Jiliang Tang, Mengnan Du | 2024-11-08 | arXiv | https://github.com/Tizzzzy/LLM-GDM-alignment | http://arxiv.org/abs/2411.05316v1 |
423 | AutoProteinEngine: A Large Language Model Driven Agent Framework for Multimodal AutoML in Protein Engineering | Yungeng Liu, Zan Chen, Yu Guang Wang, Yiqing Shen | 2024-11-07 | arXiv | https://github.com/tsynbio/AutoPE | http://arxiv.org/abs/2411.04440v1 |
424 | FineTuneBench: How well do commercial fine-tuning APIs infuse knowledge into LLMs? | Eric Wu, Kevin Wu, James Zou | 2024-11-07 | arXiv | https://github.com/kevinwu23/StanfordFineTuneBench | http://arxiv.org/abs/2411.05059v2 |
425 | Robust and Efficient Fine-tuning of LLMs with Bayesian Reparameterization of Low-Rank Adaptation | Ayan Sengupta, Vaibhav Seth, Arinjay Pathak, Natraj Raman, Sriram Gopalakrishnan, Tanmoy Chakraborty | 2024-11-07 | arXiv | https://github.com/LCS2-IIITD/MonteCLoRA | http://arxiv.org/abs/2411.04358v2 |
426 | Thanos: Enhancing Conversational Agents with Skill-of-Mind-Infused Large Language Model | Young-Jun Lee, Dokyong Lee, Junyoung Youn, Kyeongjin Oh, Ho-Jin Choi | 2024-11-07 | arXiv | https://github.com/passing2961/Thanos | http://arxiv.org/abs/2411.04496v1 |
427 | Abstract2Appendix: Academic Reviews Enhance LLM Long-Context Capabilities | Shengzhi Li, Kittipat Kampa, Rongyu Lin, Bohang Li, Shichao Pei | 2024-11-07 | arXiv | https://github.com/findalexli/Abstract2Appendix | http://arxiv.org/abs/2411.05232v1 |
428 | Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models | Zhijian Zhuo, Ya Wang, Yutao Zeng, Xiaoqing Li, Xun Zhou, Jinwen Ma | 2024-11-06 | arXiv | https://github.com/BryceZhuo/PolyCom | http://arxiv.org/abs/2411.03884v1 |
429 | QUILL: Quotation Generation Enhancement of Large Language Models | Jin Xiao, Bowei Zhang, Qianyu He, Jiaqing Liang, Feng Wei, Jinglei Chen, Zujie Liang, Deqing Yang, Yanghua Xiao | 2024-11-06 | arXiv | https://github.com/GraceXiaoo/QUILL | http://arxiv.org/abs/2411.03675v1 |
430 | SMoA: Improving Multi-agent Large Language Models with Sparse Mixture-of-Agents | Dawei Li, Zhen Tan, Peijia Qian, Yifan Li, Kumar Satvik Chaudhary, Lijie Hu, Jiayi Shen | 2024-11-05 | arXiv | https://github.com/David-Li0406/SMoA | http://arxiv.org/abs/2411.03284v1 |
431 | Stochastic Monkeys at Play: Random Augmentations Cheaply Break LLM Safety Alignment | Jason Vega, Junsheng Huang, Gaokai Zhang, Hangoo Kang, Minjia Zhang, Gagandeep Singh | 2024-11-05 | arXiv | https://github.com/uiuc-focal-lab/stochastic-monkeys/ | http://arxiv.org/abs/2411.02785v2 |
432 | Change Is the Only Constant: Dynamic LLM Slicing based on Layer Redundancy | Razvan-Gabriel Dumitru, Paul-Ioan Clotan, Vikas Yadav, Darius Peteleaza, Mihai Surdeanu | 2024-11-05 | arXiv | https://github.com/RazvanDu/DynamicSlicing | http://arxiv.org/abs/2411.03513v1 |
433 | Leveraging Large Language Models in Code Question Answering: Baselines and Issues | Georgy Andryushchenko, Vladimir Ivanov, Vladimir Makharev, Elizaveta Tukhtina, Aidar Valeev | 2024-11-05 | arXiv | https://github.com/IU-AES-AI4Code/CodeQuestionAnswering | http://arxiv.org/abs/2411.03012v1 |
434 | FlexCAD: Unified and Versatile Controllable CAD Generation with Fine-tuned Large Language Models | Zhanwei Zhang, Shizhao Sun, Wenxiao Wang, Deng Cai, Jiang Bian | 2024-11-05 | arXiv | https://github.com/microsoft/CADGeneration/FlexCAD | http://arxiv.org/abs/2411.05823v1 |
435 | Culinary Class Wars: Evaluating LLMs using ASH in Cuisine Transfer Task | Hoonick Lee, Mogan Gim, Donghyeon Park, Donghee Choi, Jaewoo Kang | 2024-11-04 | arXiv | http://github.com/dmis-lab/CulinaryASH | http://arxiv.org/abs/2411.01996v1 |
436 | Eurekaverse: Environment Curriculum Generation via Large Language Models | William Liang, Sam Wang, Hung-Ju Wang, Osbert Bastani, Dinesh Jayaraman, Yecheng Jason Ma | 2024-11-04 | arXiv | https://eureka-research.github.io/eurekaverse | http://arxiv.org/abs/2411.01775v1 |
437 | SQL Injection Jailbreak: a structural disaster of large language models | Jiawei Zhao, Kejiang Chen, Weiming Zhang, Nenghai Yu | 2024-11-03 | arXiv | https://github.com/weiyezhimeng/SQL-Injection-Jailbreak | http://arxiv.org/abs/2411.01565v3 |
438 | TODO: Enhancing LLM Alignment with Ternary Preferences | Yuxiang Guo, Lu Yin, Bo Jiang, Jiaqi Zhang | 2024-11-02 | arXiv | https://github.com/XXares/TODO | http://arxiv.org/abs/2411.02442v1 |
439 | Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis | Shijia Liao, Yuxuan Wang, Tianyu Li, Yifan Cheng, Ruoyi Zhang, Rongzhi Zhou, Yijin Xing | 2024-11-02 | arXiv | https://github.com/fishaudio/fish-speech | http://arxiv.org/abs/2411.01156v2 |
440 | Leveraging LLM and Text-Queried Separation for Noise-Robust Sound Event Detection | Han Yin, Yang Xiao, Jisheng Bai, Rohan Kumar Das | 2024-11-02 | arXiv | https://github.com/apple-yinhan/Noise-robust-SED | http://arxiv.org/abs/2411.01174v1 |
441 | Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM | Xiong Wang, Yangze Li, Chaoyou Fu, Yunhang Shen, Lei Xie, Ke Li, Xing Sun, Long Ma | 2024-11-01 | arXiv | https://freeze-omni.github.io/ | http://arxiv.org/abs/2411.00774v5 |
442 | LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models | Nam V. Nguyen, Thong T. Doan, Luong Tran, Van Nguyen, Quang Pham | 2024-11-01 | arXiv | https://fsoft-aic.github.io/fsoft-LibMoE.github.io | http://arxiv.org/abs/2411.00918v1 |
443 | Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling | Yiwen Ding, Zhiheng Xi, Wei He, Zhuoyuan Li, Yitao Zhai, Xiaowei Shi, Xunliang Cai, Tao Gui, Qi Zhang, Xuanjing Huang | 2024-11-01 | arXiv | https://github.com/Yiwen-Ding/Guided-Self-Improvement | http://arxiv.org/abs/2411.00750v1 |
444 | MoD: A Distribution-Based Approach for Merging Large Language Models | Quy-Anh Dang, Chris Ngo | 2024-11-01 | arXiv | https://github.com/knovel-eng/mod | http://arxiv.org/abs/2411.00406v1 |
445 | SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Models | Jianyi Zhang, Da-Cheng Juan, Cyrus Rashtchian, Chun-Sung Ferng, Heinrich Jiang, Yiran Chen | 2024-11-01 | arXiv | https://jayzhang42.github.io/sled_page/ | http://arxiv.org/abs/2411.02433v2 |
446 | Beyond Utility: Evaluating LLM as Recommender | Chumeng Jiang, Jiayin Wang, Weizhi Ma, Charles L. A. Clarke, Shuai Wang, Chuhan Wu, Min Zhang | 2024-11-01 | arXiv | https://github.com/JiangDeccc/EvaLLMasRecommender | http://arxiv.org/abs/2411.00331v1 |
447 | EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Unified Compression and Adaptive Layer Voting | Zhongzhi Yu, Zheng Wang, Yuhan Li, Haoran You, Ruijie Gao, Xiaoya Zhou, Sreenidhi Reddy Bommu, Yang Katie Zhao, Yingyan Celine Lin | 2024-11 | DAC '24: Proceedings of the 61st ACM/IEEE Design Automation Conference | https://github.com/GATECH-EIC/Edge-LLM | https://dl.acm.org/doi/10.1145/3649329.3658473 |
448 | Have You Merged My Model? On The Robustness of Large Language Model IP Protection Methods Against Model Merging | Tianshuo Cong, Delong Ran, Zesen Liu, Xinlei He, Jinyuan Liu, Yichen Gong, Qi Li, Anyu Wang, Xiaoyun Wang | 2024-11 | LAMPS '24: Proceedings of the 1st ACM Workshop on Large AI Systems and Models with Privacy and Safety Analysis | https://github.com/ThuCCSLab/MergeGuard | https://dl.acm.org/doi/10.1145/3689217.3690614 |
449 | Large Language Models for Anomaly Detection in Computational Workflows: From Supervised Fine-Tuning to In-Context Learning | Hongwei Jin, George Papadimitriou, Krishnan Raghavan, Pawel Zuk, Prasanna Balaprakash, Cong Wang, Anirban Mandal, Ewa Deelman | 2024-11 | SC '24: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis | https://github.com/PoSeiDon-Workflows/LLM_AD | https://dl.acm.org/doi/10.1109/SC41406.2024.00098 |
450 | LLaMo: Large Language Model-based Molecular Graph Assistant | Jinyoung Park, Minseong Bae, Dohwan Ko, Hyunwoo J. Kim | 2024-10-31 | arXiv | https://github.com/mlvlab/LLaMo | http://arxiv.org/abs/2411.00871v1 |
451 | What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective | Ming Li, Yanhong Li, Tianyi Zhou | 2024-10-31 | arXiv | https://github.com/MingLiiii/Layer_Gradient | http://arxiv.org/abs/2410.23743v1 |
452 | Scaling Up Membership Inference: When and How Attacks Succeed on Large Language Models | Haritz Puerto, Martin Gubri, Sangdoo Yun, Seong Joon Oh | 2024-10-31 | arXiv | https://github.com/parameterlab/mia-scaling | http://arxiv.org/abs/2411.00154v1 |
453 | BitStack: Fine-Grained Size Control for Compressed Large Language Models in Variable Memory Environments | Xinghao Wang, Pengyu Wang, Bo Wang, Dong Zhang, Yunhua Zhou, Xipeng Qiu | 2024-10-31 | arXiv | https://github.com/xinghaow99/BitStack | http://arxiv.org/abs/2410.23918v1 |
454 | LLM4Mat-Bench: Benchmarking Large Language Models for Materials Property Prediction | Andre Niyongabo Rubungo, Kangming Li, Jason Hattrick-Simpers, Adji Bousso Dieng | 2024-10-31 | arXiv | https://github.com/vertaix/LLM4Mat-Bench | http://arxiv.org/abs/2411.00177v3 |
455 | End-to-End Ontology Learning with Large Language Models | Andy Lo, Albert Q. Jiang, Wenda Li, Mateja Jamnik | 2024-10-31 | arXiv | https://github.com/andylolu2/ollm | http://arxiv.org/abs/2410.23584v1 |
456 | DetectRL: Benchmarking LLM-Generated Text Detection in Real-World Scenarios | Junchao Wu, Runzhe Zhan, Derek F. Wong, Shu Yang, Xinyi Yang, Yulin Yuan, Lidia S. Chao | 2024-10-31 | arXiv | https://github.com/NLP2CT/DetectRL | http://arxiv.org/abs/2410.23746v1 |
457 | SciPIP: An LLM-based Scientific Paper Idea Proposer | Wenxiao Wang, Lihui Gu, Liye Zhang, Yunxiang Luo, Yi Dai, Chen Shen, Liang Xie, Binbin Lin, Xiaofei He, Jieping Ye | 2024-10-30 | arXiv | https://github.com/cheerss/SciPIP | http://arxiv.org/abs/2410.23166v1 |
458 | Online Intrinsic Rewards for Decision Making Agents from Large Language Model Feedback | Qinqing Zheng, Mikael Henaff, Amy Zhang, Aditya Grover, Brandon Amos | 2024-10-30 | arXiv | https://github.com/facebookresearch/oni | http://arxiv.org/abs/2410.23022v2 |
459 | ReasoningRec: Bridging Personalized Recommendations and Human-Interpretable Explanations through LLM Reasoning | Millennium Bismay, Xiangjue Dong, James Caverlee | 2024-10-30 | arXiv | https://github.com/millenniumbismay/reasoningrec | http://arxiv.org/abs/2410.23180v1 |
460 | Real-Time Personalization for LLM-based Recommendation with Customized In-Context Learning | Keqin Bao, Ming Yan, Yang Zhang, Jizhi Zhang, Wenjie Wang, Fuli Feng, Xiangnan He | 2024-10-30 | arXiv | https://github.com/ym689/rec_icl | http://arxiv.org/abs/2410.23136v1 |
461 | Causality-Enhanced Behavior Sequence Modeling in LLMs for Personalized Recommendation | Yang Zhang, Juntao You, Yimeng Bai, Jizhi Zhang, Keqin Bao, Wenjie Wang, Tat-Seng Chua | 2024-10-30 | arXiv | https://github.com/itsmeyjt/CFT | http://arxiv.org/abs/2410.22809v1 |
462 | On Memorization of Large Language Models in Logical Reasoning | Chulin Xie, Yangsibo Huang, Chiyuan Zhang, Da Yu, Xinyun Chen, Bill Yuchen Lin, Bo Li, Badih Ghazi, Ravi Kumar | 2024-10-30 | arXiv | https://memkklogic.github.io | http://arxiv.org/abs/2410.23123v1 |
463 | Comparative Analysis of Demonstration Selection Algorithms for LLM In-Context Learning | Dong Shu, Mengnan Du | 2024-10-30 | arXiv | https://github.com/Tizzzzy/Demonstration_Selection_Overview | http://arxiv.org/abs/2410.23099v1 |
464 | BUZZ: Beehive-structured Sparse KV Cache with Segmented Heavy Hitters for Efficient LLM Inference | Junqi Zhao, Zhijin Fang, Shu Li, Shaohui Yang, Shichao He | 2024-10-30 | arXiv | https://github.com/JunqiZhao888/buzz-llm | http://arxiv.org/abs/2410.23079v1 |
465 | Distinguishing Ignorance from Error in LLM Hallucinations | Adi Simhi, Jonathan Herzig, Idan Szpektor, Yonatan Belinkov | 2024-10-29 | arXiv | https://github.com/technion-cs-nlp/hallucination-mitigation | http://arxiv.org/abs/2410.22071v1 |
466 | Leveraging LLMs for Hypothetical Deduction in Logical Inference: A Neuro-Symbolic Approach | Qingchuan Li, Jiatong Li, Tongxuan Liu, Yuting Zeng, Mingyue Cheng, Weizhe Huang, Qi Liu | 2024-10-29 | arXiv | https://github.com/wufeiwuwoshihua/nshy | http://arxiv.org/abs/2410.21779v1 |
467 | Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance | Dongmin Park, Sebin Kim, Taehong Moon, Minkyu Kim, Kangwook Lee, Jaewoong Cho | 2024-10-29 | arXiv | https://github.com/krafton-ai/Rare2Frequent | http://arxiv.org/abs/2410.22376v1 |
468 | Scaling LLM Inference with Optimized Sample Compute Allocation | Kexun Zhang, Shang Zhou, Danqing Wang, William Yang Wang, Lei Li | 2024-10-29 | arXiv | https://github.com/LeiLiLab/OSCA | http://arxiv.org/abs/2410.22480v1 |
469 | Hacking Back the AI-Hacker: Prompt Injection as a Defense Against LLM-driven Cyberattacks | Dario Pasquini, Evgenios M. Kornaropoulos, Giuseppe Ateniese | 2024-10-28 | arXiv | https://github.com/pasquini-dario/project_mantis | http://arxiv.org/abs/2410.20911v2 |
470 | Instruction-Tuned LLMs Succeed in Document-Level MT Without Fine-Tuning -- But BLEU Turns a Blind Eye | Yirong Sun, Dawei Zhu, Yanjun Chen, Erjia Xiao, Xinghao Chen, Xiaoyu Shen | 2024-10-28 | arXiv | https://github.com/EIT-NLP/BLEUless_DocMT | http://arxiv.org/abs/2410.20941v2 |
471 | LLMCBench: Benchmarking Large Language Model Compression for Efficient Deployment | Ge Yang, Changyi He, Jinyang Guo, Jianyu Wu, Yifu Ding, Aishan Liu, Haotong Qin, Pengliang Ji, Xianglong Liu | 2024-10-28 | arXiv | https://github.com/AboveParadise/LLMCBench | http://arxiv.org/abs/2410.21352v2 |
472 | NewTerm: Benchmarking Real-Time New Terms for Large Language Models with Annual Updates | Hexuan Deng, Wenxiang Jiao, Xuebo Liu, Min Zhang, Zhaopeng Tu | 2024-10-28 | arXiv | https://github.com/hexuandeng/NewTerm | http://arxiv.org/abs/2410.20814v1 |
473 | ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference | Hanshi Sun, Li-Wen Chang, Wenlei Bao, Size Zheng, Ningxin Zheng, Xin Liu, Harry Dong, Yuejie Chi, Beidi Chen | 2024-10-28 | arXiv | https://github.com/bytedance/ShadowKV | http://arxiv.org/abs/2410.21465v1 |
474 | Shopping MMLU: A Massive Multi-Task Online Shopping Benchmark for Large Language Models | Yilun Jin, Zheng Li, Chenwei Zhang, Tianyu Cao, Yifan Gao, Pratik Jayarao, Mao Li, Xin Liu, Ritesh Sarkhel, Xianfeng Tang, Haodong Wang, Zhengyang Wang, Wenju Xu, Jingfeng Yang, Qingyu Yin, Xian Li, Priyanka Nigam, Yi Xu, Kai Chen, Qiang Yang, Meng Jiang, Bing Yin | 2024-10-28 | arXiv | https://github.com/KL4805/ShoppingMMLU | http://arxiv.org/abs/2410.20745v2 |
475 | Simple is Effective: The Roles of Graphs and Large Language Models in Knowledge-Graph-Based Retrieval-Augmented Generation | Mufei Li, Siqi Miao, Pan Li | 2024-10-28 | arXiv | https://github.com/Graph-COM/SubgraphRAG | http://arxiv.org/abs/2410.20724v2 |
476 | SocialGPT: Prompting LLMs for Social Relation Reasoning via Greedy Segment Optimization | Wanhua Li, Zibin Meng, Jiawei Zhou, Donglai Wei, Chuang Gan, Hanspeter Pfister | 2024-10-28 | arXiv | https://mengzibin.github.io/SocialGPT.github.io/ | http://arxiv.org/abs/2410.21411v1 |
477 | Learning from Response not Preference: A Stackelberg Approach for LLM Detoxification using Non-parallel Data | Xinhong Xie, Tao Li, Quanyan Zhu | 2024-10-27 | arXiv | https://github.com/XXXinhong/Detoxification_LLM | http://arxiv.org/abs/2410.20298v1 |
478 | Enhancing Inflation Nowcasting with LLM: Sentiment Analysis on News | Marc-Antoine Allard, Paul Teiletche, Adam Zinebi | 2024-10-26 | arXiv | https://github.com/paultltc/InflaBERT | http://arxiv.org/abs/2410.20198v1 |
479 | LLMs Can Evolve Continually on Modality for X-Modal Reasoning | Jiazuo Yu, Haomiao Xiong, Lu Zhang, Haiwen Diao, Yunzhi Zhuge, Lanqing Hong, Dong Wang, Huchuan Lu, You He, Long Chen | 2024-10-26 | arXiv | https://github.com/JiazuoYu/PathWeave | http://arxiv.org/abs/2410.20178v2 |
480 | APRICOT: Active Preference Learning and Constraint-Aware Task Planning with LLMs | Huaxiaoyue Wang, Nathaniel Chin, Gonzalo Gonzalez-Pumariega, Xiangwan Sun, Neha Sunkara, Maximus Adrian Pace, Jeannette Bohg, Sanjiban Choudhury | 2024-10-25 | arXiv | https://portal-cornell.github.io/apricot/ | http://arxiv.org/abs/2410.19656v1 |
481 | Language Agents Meet Causality -- Bridging LLMs and Causal World Models | John Gkountouras, Matthias Lindemann, Phillip Lippe, Efstratios Gavves, Ivan Titov | 2024-10-25 | arXiv | https://j0hngou.github.io/LLMCWM/ | http://arxiv.org/abs/2410.19923v1 |
482 | Delving into the Reversal Curse: How Far Can Large Language Models Generalize? | Zhengkai Lin, Zhihang Fu, Kai Liu, Liang Xie, Binbin Lin, Wenxiao Wang, Deng Cai, Yue Wu, Jieping Ye | 2024-10-24 | arXiv | https://github.com/alibaba/thinking_bias | http://arxiv.org/abs/2410.18808v2 |
483 | Distill Visual Chart Reasoning Ability from LLMs to MLLMs | Wei He, Zhiheng Xi, Wanxu Zhao, Xiaoran Fan, Yiwen Ding, Zifei Shan, Tao Gui, Qi Zhang, Xuanjing Huang | 2024-10-24 | arXiv | https://github.com/hewei2001/ReachQA | http://arxiv.org/abs/2410.18798v1 |
484 | GCoder: Improving Large Language Model for Generalized Graph Problem Solving | Qifan Zhang, Xiaobin Hong, Jianheng Tang, Nuo Chen, Yuhan Li, Wenzhong Li, Jing Tang, Jia Li | 2024-10-24 | arXiv | https://github.com/Bklight999/WWW25-GCoder/tree/master | http://arxiv.org/abs/2410.19084v1 |
485 | Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design | Ruisi Cai, Yeonju Ro, Geon-Woo Kim, Peihao Wang, Babak Ehteshami Bejnordi, Aditya Akella, Zhangyang Wang | 2024-10-24 | arXiv | https://github.com/VITA-Group/READ-ME | http://arxiv.org/abs/2410.19123v1 |
486 | AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models | Kim Sung-Bin, Oh Hyun-Bin, JungMok Lee, Arda Senocak, Joon Son Chung, Tae-Hyun Oh | 2024-10-23 | arXiv | https://github.com/AVHBench/AVHBench | http://arxiv.org/abs/2410.18325v1 |
487 | CoreInfer: Accelerating Large Language Model Inference with Semantics-Inspired Adaptive Sparse Activation | Qinsi Wang, Saeed Vahidian, Hancheng Ye, Jianyang Gu, Jianyi Zhang, Yiran Chen | 2024-10-23 | arXiv | https://wangqinsi1.github.io/coreinfer_page/ | http://arxiv.org/abs/2410.18311v1 |
488 | Cross-model Control: Improving Multiple Large Language Models in One-time Training | Jiayi Wu, Hao Sun, Hengyi Cai, Lixin Su, Shuaiqiang Wang, Dawei Yin, Xiang Li, Ming Gao | 2024-10-23 | arXiv | https://github.com/wujwyi/CMC | http://arxiv.org/abs/2410.17599v1 |
489 | ETHIC: Evaluating Large Language Models on Long-Context Tasks with High Information Coverage | Taewhoo Lee, Chanwoong Yoon, Kyochul Jang, Donghyeon Lee, Minju Song, Hyunjae Kim, Jaewoo Kang | 2024-10-22 | arXiv | https://github.com/dmis-lab/ETHIC | http://arxiv.org/abs/2410.16848v1 |
490 | Large Language Models Empowered Personalized Web Agents | Hongru Cai, Yongqi Li, Wenjie Wang, Fengbin Zhu, Xiaoyu Shen, Wenjie Li, Tat-Seng Chua | 2024-10-22 | arXiv | https://hongrucai.github.io/PersonalWAB/ | http://arxiv.org/abs/2410.17236v1 |
491 | Improving Causal Reasoning in Large Language Models: A Survey | Longxuan Yu, Delin Chen, Siheng Xiong, Qingyang Wu, Qingzhen Liu, Dawei Li, Zhikai Chen, Xiaoze Liu, Liangming Pan | 2024-10-22 | arXiv | https://github.com/chendl02/Awesome-LLM-causal-reasoning | http://arxiv.org/abs/2410.16676v3 |
492 | VoiceBench: Benchmarking LLM-Based Voice Assistants | Yiming Chen, Xianghu Yue, Chen Zhang, Xiaoxue Gao, Robby T. Tan, Haizhou Li | 2024-10-22 | arXiv | https://github.com/MatthewCYM/VoiceBench | http://arxiv.org/abs/2410.17196v3 |
493 | DEAN: Deactivating the Coupled Neurons to Mitigate Fairness-Privacy Conflicts in Large Language Models | Chen Qian, Dongrui Liu, Jie Zhang, Yong Liu, Jing Shao | 2024-10-22 | arXiv | https://github.com/ChnQ/DEAN | http://arxiv.org/abs/2410.16672v1 |
494 | AMUSD: Asynchronous Multi-Device Speculative Decoding for LLM Acceleration | Bradley McDanel | 2024-10-22 | arXiv | https://github.com/BradMcDanel/AMUSD/ | http://arxiv.org/abs/2410.17375v1 |
495 | Automated Spinal MRI Labelling from Reports Using a Large Language Model | Robin Y. Park, Rhydian Windsor, Amir Jamaludin, Andrew Zisserman | 2024-10-22 | MICCAI | https://github.com/robinyjpark/AutoLabelClassifier | https://doi.org/10.1007/978-3-031-72086-4_10 |
496 | CoPS: Empowering LLM Agents with Provable Cross-Task Experience Sharing | Chen Yang, Chenyang Zhao, Quanquan Gu, Dongruo Zhou | 2024-10-22 | arXiv | https://github.com/uclaml/COPS | http://arxiv.org/abs/2410.16670v1 |
497 | LLaVA-KD: A Framework of Distilling Multimodal Large Language Models | Yuxuan Cai, Jiangning Zhang, Haoyang He, Xinwei He, Ao Tong, Zhenye Gan, Chengjie Wang, Xiang Bai | 2024-10-21 | arXiv | https://github.com/Fantasyele/LLaVA-KD | http://arxiv.org/abs/2410.16236v2 |
498 | Mesa-Extrapolation: A Weave Position Encoding Method for Enhanced Extrapolation in LLMs | Xin Ma, Yang Liu, Jingjing Liu, Xiaoxu Ma | 2024-10-21 | arXiv | https://github.com/soacker/Mesa-Extrapolation | http://arxiv.org/abs/2410.15859v3 |
499 | MagicPIG: LSH Sampling for Efficient LLM Generation | Zhuoming Chen, Ranajoy Sadhukhan, Zihao Ye, Yang Zhou, Jianyu Zhang, Niklas Nolte, Yuandong Tian, Matthijs Douze, Leon Bottou, Zhihao Jia, Beidi Chen | 2024-10-21 | arXiv | https://github.com/Infini-AI-Lab/MagicPIG | http://arxiv.org/abs/2410.16179v4 |
500 | RAC: Efficient LLM Factuality Correction with Retrieval Augmentation | Changmao Li, Jeffrey Flanigan | 2024-10-21 | arXiv | https://github.com/jlab-nlp/Retrieval-Augmented-Correction | http://arxiv.org/abs/2410.15667v1 |
501 | Developing Retrieval Augmented Generation (RAG) based LLM Systems from PDFs: An Experience Report | Ayman Asad Khan, Md Toufique Hasan, Kai Kristian Kemell, Jussi Rasku, Pekka Abrahamsson | 2024-10-21 | arXiv | https://github.com/GPT-Laboratory/RAG-LLM-Development-Guidebook-from-PDFs | http://arxiv.org/abs/2410.15944v1 |
502 | CausalGraph2LLM: Evaluating LLMs for Causal Queries | Ivaxi Sheth, Bahare Fatemi, Mario Fritz | 2024-10-21 | arXiv | https://github.com/ivaxi0s/CausalGraph2LLM | http://arxiv.org/abs/2410.15939v1 |
503 | Boosting Jailbreak Transferability for Large Language Models | Hanqing Liu, Lifeng Zhou, Huanqian Yan | 2024-10-21 | arXiv | https://github.com/HqingLiu/SI-GCG | http://arxiv.org/abs/2410.15645v2 |
504 | A Comprehensive Evaluation of Cognitive Biases in LLMs | Simon Malberg, Roman Poletukhin, Carolin M. Schuster, Georg Groh | 2024-10-20 | arXiv | https://github.com/simonmalberg/cognitive-biases-in-llms | http://arxiv.org/abs/2410.15413v1 |
505 | Are LLMs Good Zero-Shot Fallacy Classifiers? | Fengjun Pan, Xiaobao Wu, Zongrui Li, Anh Tuan Luu | 2024-10-19 | arXiv | https://github.com/panFJCharlotte98/Fallacy_Detection | http://arxiv.org/abs/2410.15050v1 |
506 | Evaluating Deep Unlearning in Large Language Models | Ruihan Wu, Chhavi Yadav, Russ Salakhutdinov, Kamalika Chaudhuri | 2024-10-19 | arXiv | https://github.com/wrh14/deep_unlearning | http://arxiv.org/abs/2410.15153v3 |
507 | Explaining Graph Neural Networks with Large Language Models: A Counterfactual Perspective for Molecular Property Prediction | Yinhan He, Zaiyi Zheng, Patrick Soga, Yaozhen Zhu, yushun Dong, Jundong Li | 2024-10-19 | EMNLP 2024 (Findings) | https://github.com/YinhanHe123/new\_LLM4GNNExplanation | http://arxiv.org/abs/2410.15165v1 |
508 | GlitchMiner: Mining Glitch Tokens in Large Language Models via Gradient-based Discrete Optimization | Zihui Wu, Haichang Gao, Ping Wang, Shudong Zhang, Zhaoxiang Liu, Shiguo Lian | 2024-10-19 | arXiv | https://github.com/wooozihui/GlitchMiner | http://arxiv.org/abs/2410.15052v4 |
509 | Imprompter: Tricking LLM Agents into Improper Tool Use | Xiaohan Fu, Shuheng Li, Zihan Wang, Yihao Liu, Rajesh K. Gupta, Taylor Berg-Kirkpatrick, Earlence Fernandes | 2024-10-19 | arXiv | https://github.com/Reapor-Yurnero/imprompter | http://arxiv.org/abs/2410.14923v2 |
510 | MCCoder: Streamlining Motion Control with LLM-Assisted Code Generation and Rigorous Verification | Yin Li, Liangwei Wang, Shiyuan Piao, Boo-Ho Yang, Ziyue Li, Wei Zeng, Fugee Tsung | 2024-10-19 | arXiv | https://github.com/MCCodeAI/MCCoder | http://arxiv.org/abs/2410.15154v1 |
511 | REEF: Representation Encoding Fingerprints for Large Language Models | Jie Zhang, Dongrui Liu, Chen Qian, Linfeng Zhang, Yong Liu, Yu Qiao, Jing Shao | 2024-10-18 | arXiv | https://github.com/tmylla/REEF | http://arxiv.org/abs/2410.14273v1 |
512 | Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation | Shuo Tang, Xianghe Pang, Zexi Liu, Bohan Tang, Rui Ye, Xiaowen Dong, Yanfeng Wang, Siheng Chen | 2024-10-18 | arXiv | https://github.com/ShuoTang123/MATRIX-Gen | http://arxiv.org/abs/2410.14251v1 |
513 | SRAP-Agent: Simulating and Optimizing Scarce Resource Allocation Policy with LLM-based Agent | Jiarui Ji, Yang Li, Hongtao Liu, Zhicheng Du, Zhewei Wei, Weiran Shen, Qi Qi, Yankai Lin | 2024-10-18 | arXiv | https://github.com/jijiarui-cather/SRAPAgent_Framework | http://arxiv.org/abs/2410.14152v1 |
514 | Towards Faithful Natural Language Explanations: A Study Using Activation Patching in Large Language Models | Wei Jie Yeo, Ranjan Satapathy, Erik Cambria | 2024-10-18 | arXiv | https://github.com/wj210/Causal-Faithfulness | https://doi.org/10.48550/arXiv.2410.14155 |
515 | Enabling Scalable Evaluation of Bias Patterns in Medical LLMs | Hamed Fayyaz, Raphael Poulain, Rahmatollah Beheshti | 2024-10-18 | arXiv | https://github.com/healthylaife/autofair | http://arxiv.org/abs/2410.14763v1 |
516 | CoMAL: Collaborative Multi-Agent Large Language Models for Mixed-Autonomy Traffic | Huaiyuan Yao, Longchao Da, Vishnu Nandam, Justin Turnau, Zhiwei Liu, Linsey Pang, Hua Wei | 2024-10-18 | arXiv | https://github.com/Hyan-Yao/CoMAL | http://arxiv.org/abs/2410.14368v1 |
517 | Do LLMs Overcome Shortcut Learning? An Evaluation of Shortcut Challenges in Large Language Models | Yu Yuan, Lili Zhao, Kai Zhang, Guangting Zheng, Qi Liu | 2024-10-17 | EMNLP | https://github.com/yyhappier/ShortcutSuite | https://aclanthology.org/2024.emnlp-main.679 |
518 | Data Defenses Against Large Language Models | William Agnew, Harry H. Jiang, Cella Sum, Maarten Sap, Sauvik Das | 2024-10-17 | arXiv | https://github.com/wagnew3/LLMDataDefenses | http://arxiv.org/abs/2410.13138v1 |
519 | FaithBench: A Diverse Hallucination Benchmark for Summarization by Modern LLMs | Forrest Sheng Bao, Miaoran Li, Renyi Qu, Ge Luo, Erana Wan, Yujia Tang, Weisi Fan, Manveer Singh Tamber, Suleman Kazi, Vivek Sourabh, Mike Qi, Ruixuan Tu, Chenyu Xu, Matthew Gonzales, Ofer Mendelevitch, Amin Ahmad | 2024-10-17 | arXiv | https://github.com/vectara/FaithBench | http://arxiv.org/abs/2410.13210v1 |
520 | LLM-Rank: A Graph Theoretical Approach to Pruning Large Language Models | David Hoffmann, Kailash Budhathoki, Matthaeus Kleindessner | 2024-10-17 | arXiv | https://github.com/amazon-science/llm-rank-pruning | http://arxiv.org/abs/2410.13299v2 |
521 | Retrieval-Augmented Personalization for Multimodal Large Language Models | Haoran Hao, Jiaming Han, Changsheng Li, Yu-Feng Li, Xiangyu Yue | 2024-10-17 | arXiv | https://github.com/Hoar012/RAP-MLLM | http://arxiv.org/abs/2410.13360v2 |
522 | SLM-Mod: Small Language Models Surpass LLMs at Content Moderation | Xianyang Zhan, Agam Goyal, Yilun Chen, Eshwar Chandrasekharan, Koustuv Saha | 2024-10-17 | arXiv | https://github.com/AGoyal0512/SLM-Mod | http://arxiv.org/abs/2410.13155v1 |
523 | aiXcoder-7B: A Lightweight and Effective Large Language Model for Code Completion | Siyuan Jiang, Jia Li, He Zong, Huanyu Liu, Hao Zhu, Shukai Hu, Erlu Li, Jiazheng Ding, Yu Han, Wei Ning, Gen Wang, Yihong Dong, Kechi Zhang, Ge Li | 2024-10-17 | arXiv | https://github.com/aixcoder-plugin/aiXcoder-7B | http://arxiv.org/abs/2410.13187v2 |
524 | POROver: Improving Safety and Reducing Overrefusal in Large Language Models with Overgeneration and Preference Optimization | Batuhan K. Karaman, Ishmam Zabir, Alon Benhaim, Vishrav Chaudhary, Mert R. Sabuncu, Xia Song | 2024-10-16 | arXiv | https://github.com/batuhankmkaraman/POROver | http://arxiv.org/abs/2410.12999v1 |
525 | Semantics-Adaptive Activation Intervention for LLMs via Dynamic Steering Vectors | Weixuan Wang, Jingyuan Yang, Wei Peng | 2024-10-16 | arXiv | https://github.com/weixuan-wang123/SADI | http://arxiv.org/abs/2410.12299v1 |
526 | Self-Pluralising Culture Alignment for Large Language Models | Shaoyang Xu, Yongqi Leng, Linhao Yu, Deyi Xiong | 2024-10-16 | arXiv | https://github.com/shaoyangxu/CultureSPA | http://arxiv.org/abs/2410.12971v1 |
527 | Qtok: A Comprehensive Framework for Evaluating Multilingual Tokenizer Quality in Large Language Models | Iaroslav Chelombitko, Egor Safronov, Aleksey Komissarov | 2024-10-16 | arXiv | https://github.com/nup-csai/Qtok/ | http://arxiv.org/abs/2410.12989v1 |
528 | ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs | Jingming Zhuo, Songyang Zhang, Xinyu Fang, Haodong Duan, Dahua Lin, Kai Chen | 2024-10-16 | arXiv | https://github.com/open-compass/ProSA | http://arxiv.org/abs/2410.12405v1 |
529 | Hypothesis Testing the Circuit Hypothesis in LLMs | Claudia Shi, Nicolas Beltran-Velez, Achille Nazaret, Carolina Zheng, Adrià Garriga-Alonso, Andrew Jesson, Maggie Makar, David M. Blei | 2024-10-16 | arXiv | https://github.com/blei-lab/circuitry | http://arxiv.org/abs/2410.13032v1 |
530 | DAQ: Density-Aware Post-Training Weight-Only Quantization For LLMs | Yingsong Luo, Ling Chen | 2024-10-16 | arXiv | https://github.com/LuoYingSong/DAQ | http://arxiv.org/abs/2410.12187v2 |
531 | Codellm-Devkit: A Framework for Contextualizing Code LLMs with Program Analysis Insights | Rahul Krishna, Rangeet Pan, Raju Pavuluri, Srikanth Tamilselvam, Maja Vukovic, Saurabh Sinha | 2024-10-16 | arXiv | https://github.com/IBM/codellm-devkit | http://arxiv.org/abs/2410.13007v1 |
532 | Neuron-based Personality Trait Induction in Large Language Models | Jia Deng, Tianyi Tang, Yanbin Yin, Wenhao Yang, Wayne Xin Zhao, Ji-Rong Wen | 2024-10-16 | arXiv | https://github.com/RUCAIBox/NPTI | https://doi.org/10.48550/arXiv.2410.12327 |
533 | HerO at AVeriTeC: The Herd of Open Large Language Models for Verifying Real-World Claims | Yejun Yoon, Jaeyoon Jung, Seunghyun Yoon, Kunwoo Park | 2024-10-16 | arXiv | https://github.com/ssu-humane/HerO | https://doi.org/10.48550/arXiv.2410.12377 |
534 | Exploring Model Kinship for Merging Large Language Models | Yedi Hu, Yunzhi Yao, Ningyu Zhang, Shumin Deng, Huajun Chen | 2024-10-16 | arXiv | https://github.com/zjunlp/ModelKinship | https://doi.org/10.48550/arXiv.2410.12613 |
535 | Bridging the Language Gaps in Large Language Models with Inference-Time Cross-Lingual Intervention | Weixuan Wang, Minghao Wu, Barry Haddow, Alexandra Birch | 2024-10-16 | arXiv | https://github.com/weixuan-wang123/INCLINE | https://doi.org/10.48550/arXiv.2410.12462 |
536 | Automatically Generating Visual Hallucination Test Cases for Multimodal Large Language Models | Zhongye Liu, Hongbin Liu, Yuepeng Hu, Zedian Shao, Neil Zhenqiang Gong | 2024-10-15 | arXiv | https://github.com/lycheeefish/VHExpansion | https://doi.org/10.48550/arXiv.2410.11242 |
537 | Layer-wise Importance Matters: Less Memory for Better Performance in Parameter-efficient Fine-tuning of Large Language Models | Kai Yao, Penglei Gao, Lichun Li, Yuan Zhao, Xiaofeng Wang, Wei Wang, Jianke Zhu | 2024-10-15 | EMNLP | https://github.com/Kaiseem/IST | https://aclanthology.org/2024.findings-emnlp.109 |
538 | Subspace Optimization for Large Language Models with Convergence Guarantees | Yutong He, Pengrui Li, Yipeng Hu, Chuyan Chen, Kun Yuan | 2024-10-15 | arXiv | https://github.com/pkumelon/Golore | https://doi.org/10.48550/arXiv.2410.11289 |
539 | Zero-shot Model-based Reinforcement Learning using Large Language Models | Abdelhakim Benechehab, Youssef Attia El Hili, Ambroise Odonnat, Oussama Zekri, Albert Thomas, Giuseppe Paolo, Maurizio Filippone, Ievgen Redko, Balázs Kégl | 2024-10-15 | arXiv | https://github.com/abenechehab/dicl | https://doi.org/10.48550/arXiv.2410.11711 |
540 | LLM2Swarm: Robot Swarms that Responsively Reason, Plan, and Collaborate through LLMs | Volker Strobel, Marco Dorigo, Mario Fritz | 2024-10-15 | arXiv | https://github.com/Pold87/LLM2Swarm/ | http://arxiv.org/abs/2410.11387v3 |
541 | SGEdit: Bridging LLM with Text2Image Generative Model for Scene Graph-based Image Editing | Zhiyuan Zhang, DongDong Chen, Jing Liao | 2024-10-15 | arXiv | https://bestzzhang.github.io/SGEdit | http://arxiv.org/abs/2410.11815v1 |
542 | Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues | Qibing Ren, Hao Li, Dongrui Liu, Zhanxu Xie, Xiaoya Lu, Yu Qiao, Lei Sha, Junchi Yan, Lizhuang Ma, Jing Shao | 2024-10-14 | arXiv | https://github.com/renqibing/ActorAttack | http://arxiv.org/abs/2410.10700v1 |
543 | Locking Down the Finetuned LLMs Safety | Minjun Zhu, Linyi Yang, Yifan Wei, Ningyu Zhang, Yue Zhang | 2024-10-14 | arXiv | https://github.com/zhu-minjun/SafetyLock | http://arxiv.org/abs/2410.10343v1 |
544 | DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads | Guangxuan Xiao, Jiaming Tang, Jingwei Zuo, Junxian Guo, Shang Yang, Haotian Tang, Yao Fu, Song Han | 2024-10-14 | arXiv | https://github.com/mit-han-lab/duo-attention | http://arxiv.org/abs/2410.10819v1 |
545 | Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free | Ziyue Li, Tianyi Zhou | 2024-10-14 | arXiv | https://github.com/tianyi-lab/MoE-Embedding | http://arxiv.org/abs/2410.10814v2 |
546 | One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks | Fangru Lin, Shaoguang Mao, Emanuele La Malfa, Valentin Hofmann, Adrian de Wynter, Jing Yao, Si-Qing Chen, Michael J. Wooldridge, Furu Wei | 2024-10-14 | arXiv | https://github.com/fangru-lin/redial_dialect_robustness_fairness | https://doi.org/10.48550/arXiv.2410.11005 |
547 | Large Language Model Evaluation via Matrix Nuclear-Norm | Yahan Li, Tingyu Xia, Yi Chang, Yuan Wu | 2024-10-14 | arXiv | https://github.com/MLGroupJLU/MatrixNuclearNorm | https://doi.org/10.48550/arXiv.2410.10672 |
548 | AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language Models | Haiquan Lu, Yefan Zhou, Shiwei Liu, Zhangyang Wang, Michael W. Mahoney, Yaoqing Yang | 2024-10-14 | arXiv | https://github.com/haiquanlu/AlphaPruning | https://doi.org/10.48550/arXiv.2410.10912 |
549 | MentalGLM Series: Explainable Large Language Models for Mental Health Analysis on Chinese Social Media | Wei Zhai, Nan Bai, Qing Zhao, Jianqiang Li, Fan Wang, Hongzhi Qi, Meng Jiang, Xiaoqin Wang, Bing Xiang Yang, Guanghui Fu | 2024-10-14 | arXiv | https://github.com/zwzzzQAQ/MentalGLM | https://doi.org/10.48550/arXiv.2410.10323 |
550 | LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language Models | Han Qiu, Jiaxing Huang, Peng Gao, Qin Qi, Xiaoqin Zhang, Ling Shao, Shijian Lu | 2024-10-13 | arXiv | https://github.com/hanqiu-hq/LongHalQA | https://doi.org/10.48550/arXiv.2410.09962 |
551 | RMB: Comprehensively Benchmarking Reward Models in LLM Alignment | Enyu Zhou, Guodong Zheng, Binghai Wang, Zhiheng Xi, Shihan Dou, Rong Bao, Wei Shen, Limao Xiong, Jessica Fan, Yurong Mou, Rui Zheng, Tao Gui, Qi Zhang, Xuanjing Huang | 2024-10-13 | arXiv | https://github.com/Zhou-Zoey/RMB-Reward-Model-Benchmark | http://arxiv.org/abs/2410.09893v1 |
552 | FB-Bench: A Fine-Grained Multi-Task Benchmark for Evaluating LLMs' Responsiveness to Human Feedback | Youquan Li, Miao Zheng, Fan Yang, Guosheng Dong, Bin Cui, Weipeng Chen, Zenan Zhou, Wentao Zhang | 2024-10-12 | arXiv | https://github.com/PKU-Baichuan-MLSystemLab/FB-Bench | http://arxiv.org/abs/2410.09412v1 |
553 | Skipping Computations in Multimodal LLMs | Mustafa Shukor, Matthieu Cord | 2024-10-12 | arXiv | https://github.com/mshukor/ima-lmms | http://arxiv.org/abs/2410.09454v1 |
554 | LLM$\times$MapReduce: Simplified Long-Sequence Processing using Large Language Models | Zihan Zhou, Chong Li, Xinyi Chen, Shuo Wang, Yu Chao, Zhili Li, Haoyu Wang, Rongqiao An, Qi Shi, Zhixing Tan, Xu Han, Xiaodong Shi, Zhiyuan Liu, Maosong Sun | 2024-10-12 | arXiv | https://github.com/thunlp/LLMxMapReduce | http://arxiv.org/abs/2410.09342v1 |
555 | FlatQuant: Flatness Matters for LLM Quantization | Yuxuan Sun, Ruikang Liu, Haoli Bai, Han Bao, Kang Zhao, Yuening Li, Jiaxin Hu, Xianzhi Yu, Lu Hou, Chun Yuan, Xin Jiang, Wulong Liu, Jun Yao | 2024-10-12 | arXiv | https://github.com/ruikangliu/FlatQuant | http://arxiv.org/abs/2410.09426v1 |
556 | ELICIT: LLM Augmentation via External In-Context Capability | Futing Wang, Jianhao Yan, Yue Zhang, Tao Lin | 2024-10-12 | arXiv | https://github.com/LINs-lab/ELICIT | http://arxiv.org/abs/2410.09343v1 |
557 | ReLU's Revival: On the Entropic Overload in Normalization-Free Large Language Models | Nandan Kumar Jha, Brandon Reagen | 2024-10-12 | arXiv | https://github.com/Nandan91/relu-revival-normfree | https://doi.org/10.48550/arXiv.2410.09637 |
558 | OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models | Jun Wang, Meng Fang, Ziyu Wan, Muning Wen, Jiachen Zhu, Anjie Liu, Ziqin Gong, Yan Song, Lei Chen, Lionel M. Ni, Linyi Yang, Ying Wen, Weinan Zhang | 2024-10-12 | arXiv | https://openreasoner.github.io | https://doi.org/10.48550/arXiv.2410.09671 |
559 | MMAD: The First-Ever Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection | Xi Jiang, Jian Li, Hanqiu Deng, Yong Liu, Bin-Bin Gao, Yifeng Zhou, Jialin Li, Chengjie Wang, Feng Zheng | 2024-10-12 | arXiv | https://github.com/jam-cc/MMAD | https://doi.org/10.48550/arXiv.2410.09453 |
560 | Dual-AEB: Synergizing Rule-Based and Multimodal Large Language Models for Effective Emergency Braking | Wei Zhang, Pengfei Li, Junli Wang, Bingchuan Sun, Qihao Jin, Guangjun Bao, Shibo Rui, Yang Yu, Wenchao Ding, Peng Li, Yilun Chen | 2024-10-11 | arXiv | https://github.com/ChipsICU/Dual-AEB | https://doi.org/10.48550/arXiv.2410.08616 |
561 | AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation | Zijun Wang, Haoqin Tu, Jieru Mei, Bingchen Zhao, Yisen Wang, Cihang Xie | 2024-10-11 | arXiv | https://github.com/UCSC-VLAA/AttnGCG-attack | http://arxiv.org/abs/2410.09040v1 |
562 | QEFT: Quantization for Efficient Fine-Tuning of LLMs | Changhun Lee, Jun-gyu Jin, Younghyun Cho, Eunhyeok Park | 2024-10-11 | arXiv | https://github.com/xvyaward/qeft | http://arxiv.org/abs/2410.08661v1 |
563 | Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System | Weize Chen, Jiarui Yuan, Chen Qian, Cheng Yang, Zhiyuan Liu, Maosong Sun | 2024-10-10 | arXiv | https://chenweize1998.github.io/optima-project-page | http://arxiv.org/abs/2410.08115v1 |
564 | VibeCheck: Discover and Quantify Qualitative Differences in Large Language Models | Lisa Dunlap, Krishna Mandal, Trevor Darrell, Jacob Steinhardt, Joseph E Gonzalez | 2024-10-10 | arXiv | https://github.com/lisadunlap/VibeCheck | http://arxiv.org/abs/2410.12851v5 |
565 | Towards Next-Generation LLM-based Recommender Systems: A Survey and Beyond | Qi Wang, Jindong Li, Shiqi Wang, Qianli Xing, Runliang Niu, He Kong, Rui Li, Guodong Long, Yi Chang, Chengqi Zhang | 2024-10-10 | arXiv | https://github.com/jindongli-Ai/Next-Generation-LLM-based-Recommender-Systems-Survey | http://arxiv.org/abs/2410.19744v1 |
566 | StepTool: A Step-grained Reinforcement Learning Framework for Tool Learning in LLMs | Yuanqing Yu, Zhefan Wang, Weizhi Ma, Zhicheng Guo, Jingtao Zhan, Shuai Wang, Chuhan Wu, Zhiqiang Guo, Min Zhang | 2024-10-10 | arXiv | https://github.com/yuyq18/StepTool | http://arxiv.org/abs/2410.07745v2 |
567 | Reward-Augmented Data Enhances Direct Preference Alignment of LLMs | Shenao Zhang, Zhihan Liu, Zhaoran Wang | 2024-10-10 | arXiv | https://github.com/shenao-zhang/reward-augmented-preference | http://arxiv.org/abs/2410.08067v2 |
568 | Extracting and Transferring Abilities For Building Multi-lingual Ability-enhanced Large Language Models | Zhipeng Chen, Liang Song, Kun Zhou, Wayne Xin Zhao, Bingning Wang, Weipeng Chen, Ji-Rong Wen | 2024-10-10 | arXiv | https://github.com/RUCAIBox/MAET | https://doi.org/10.48550/arXiv.2410.07825 |
569 | Understanding the Interplay between Parametric and Contextual Knowledge for Large Language Models | Sitao Cheng, Liangming Pan, Xunjian Yin, Xinyi Wang, William Yang Wang | 2024-10-10 | arXiv | https://github.com/sitaocheng/Knowledge_Interplay | https://doi.org/10.48550/arXiv.2410.08414 |
570 | Teaching-Inspired Integrated Prompting Framework: A Novel Approach for Enhancing Reasoning in Large Language Models | Wenting Tan, Dongxiao Chen, Jieting Xue, Zihao Wang, Taijie Chen | 2024-10-10 | arXiv | https://github.com/SallyTan13/Teaching-Inspired-Prompting | https://doi.org/10.48550/arXiv.2410.08068 |
571 | Privately Learning from Graphs with Applications in Fine-tuning Large Language Models | Haoteng Yin, Rongzhe Wei, Eli Chien, Pan Li | 2024-10-10 | arXiv | https://github.com/Graph-COM/PvGaLM | https://doi.org/10.48550/arXiv.2410.08299 |
572 | GameTraversalBenchmark: Evaluating Planning Abilities Of Large Language Models Through Traversing 2D Game Maps | Muhammad Umair Nasir, Steven James, Julian Togelius | 2024-10-10 | arXiv | https://github.com/umair-nasir14/Game-Traversal-Benchmark | https://doi.org/10.48550/arXiv.2410.07765 |
573 | A Closer Look at Machine Unlearning for Large Language Models | Xiaojian Yuan, Tianyu Pang, Chao Du, Kejiang Chen, Weiming Zhang, Min Lin | 2024-10-10 | arXiv | https://github.com/sail-sg/closer-look-LLM-unlearning | https://doi.org/10.48550/arXiv.2410.08109 |
574 | CoBa: Convergence Balancer for Multitask Finetuning of Large Language Models | Zi Gong, Hang Yu, Cong Liao, Bingchang Liu, Chaoyu Chen, Jianguo Li | 2024-10-09 | EMNLP | https://github.com/codefuse-ai/MFTCoder | https://aclanthology.org/2024.emnlp-main.459 |
575 | Dissecting Fine-Tuning Unlearning in Large Language Models | Yihuai Hong, Yuelin Zou, Lijie Hu, Ziqian Zeng, Di Wang, Haiqin Yang | 2024-10-09 | EMNLP | https://github.com/yihuaihong/Dissecting-FT-Unlearning | https://aclanthology.org/2024.emnlp-main.228 |
576 | Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization | Changli Tang, Yixuan Li, Yudong Yang, Jimin Zhuang, Guangzhi Sun, Wei Li, Zujun Ma, Chao Zhang | 2024-10-09 | arXiv | https://video-salmonn-2.github.io | http://arxiv.org/abs/2410.06682v2 |
577 | IterGen: Iterative Structured LLM Generation | Shubham Ugare, Rohan Gumaste, Tarun Suresh, Gagandeep Singh, Sasa Misailovic | 2024-10-09 | arXiv | https://github.com/uiuc-arc/itergen | http://arxiv.org/abs/2410.07295v1 |
578 | Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning | Chongyu Fan, Jiancheng Liu, Licong Lin, Jinghan Jia, Ruiqi Zhang, Song Mei, Sijia Liu | 2024-10-09 | arXiv | https://github.com/OPTML-Group/Unlearn-Simple | http://arxiv.org/abs/2410.07163v2 |
579 | WALL-E: World Alignment by Rule Learning Improves World Model-based LLM Agents | Siyu Zhou, Tianyi Zhou, Yijun Yang, Guodong Long, Deheng Ye, Jing Jiang, Chengqi Zhang | 2024-10-09 | arXiv | https://github.com/elated-sawyer/WALL-E | http://arxiv.org/abs/2410.07484v2 |
580 | Weak-eval-Strong: Evaluating and Eliciting Lateral Thinking of LLMs with Situation Puzzles | Qi Chen, Bowen Zhang, Gang Wang, Qi Wu | 2024-10-09 | arXiv | https://github.com/chenqi008/LateralThinking | http://arxiv.org/abs/2410.06733v1 |
581 | Vitron: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing | Hao Fei, Shengqiong Wu, Hanwang Zhang, Tat-Seng Chua, Shuicheng Yan | 2024-10-08 | arXiv | https://vitron-llm.github.io/ | http://arxiv.org/abs/2412.19806v1 |
582 | Enhancing Temporal Modeling of Video LLMs via Time Gating | Zi-Yuan Hu, Yiwu Zhong, Shijia Huang, Michael R. Lyu, Liwei Wang | 2024-10-08 | arXiv | https://github.com/LaVi-Lab/TG-Vid | http://arxiv.org/abs/2410.05714v1 |
583 | ToolBridge: An Open-Source Dataset to Equip LLMs with External Tool Capabilities | Zhenchao Jin, Mengchen Liu, Dongdong Chen, Lingting Zhu, Yunsheng Li, Lequan Yu | 2024-10-08 | arXiv | https://github.com/CharlesPikachu/ToolBridge | http://arxiv.org/abs/2410.10872v1 |
584 | MEXA: Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment | Amir Hossein Kargaran, Ali Modarressi, Nafiseh Nikeghbal, Jana Diesner, François Yvon, Hinrich Schütze | 2024-10-08 | arXiv | https://github.com/cisnlp/Mexa | http://arxiv.org/abs/2410.05873v1 |
585 | GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models | Muhammad Jehanzeb Mirza, Mengjie Zhao, Zhuoyuan Mao, Sivan Doveh, Wei Lin, Paul Gavrikov, Michael Dorkenwald, Shiqi Yang, Saurav Jha, Hiromi Wakaki, Yuki Mitsufuji, Horst Possegger, Rogério Feris, Leonid Karlinsky, James R. Glass | 2024-10-08 | arXiv | https://github.com/jmiemirza/GLOV | https://doi.org/10.48550/arXiv.2410.06154 |
586 | AgentSquare: Automatic LLM Agent Search in Modular Design Space | Yu Shang, Yu Li, Keyu Zhao, Likai Ma, Jiahe Liu, Fengli Xu, Yong Li | 2024-10-08 | arXiv | https://github.com/tsinghua-fib-lab/AgentSquare | http://arxiv.org/abs/2410.06153v2 |
587 | Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models | Fei Wang, Ninareh Mehrabi, Palash Goyal, Rahul Gupta, Kai-Wei Chang, Aram Galstyan | 2024-10-07 | EMNLP | https://feiwang96.github.io/DataAdvisor/ | https://aclanthology.org/2024.emnlp-main.461 |
588 | Better than Your Teacher: LLM Agents that learn from Privileged AI Feedback | Sanjiban Choudhury, Paloma Sodhi | 2024-10-07 | arXiv | https://leap-llm.github.io | http://arxiv.org/abs/2410.05434v1 |
589 | PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs | Mengzhao Chen, Yi Liu, Jiahao Wang, Yi Bin, Wenqi Shao, Ping Luo | 2024-10-07 | arXiv | https://github.com/ChenMnZ/PrefixQuant | http://arxiv.org/abs/2410.05265v1 |
590 | Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild | Xinyu Zhao, Guoheng Sun, Ruisi Cai, Yukun Zhou, Pingzhi Li, Peihao Wang, Bowen Tan, Yexiao He, Li Chen, Yi Liang, Beidi Chen, Binhang Yuan, Hongyi Wang, Ang Li, Zhangyang Wang, Tianlong Chen | 2024-10-07 | arXiv | https://github.com/Model-GLUE/Model-GLUE | http://arxiv.org/abs/2410.05357v2 |
591 | Can LLMs Understand Time Series Anomalies? | Zihao Zhou, Rose Yu | 2024-10-07 | arXiv | https://github.com/Rose-STL-Lab/AnomLLM/` | http://arxiv.org/abs/2410.05440v2 |
592 | Intriguing Properties of Large Language and Vision Models | Young-Jun Lee, Byungsoo Ko, Han-Gyu Kim, Yechan Hwang, Ho-Jin Choi | 2024-10-07 | arXiv | https://github.com/passing2961/IP-LLVM | https://doi.org/10.48550/arXiv.2410.04751 |
593 | Aligning LLMs to Be Robust Against Prompt Injection | Sizhe Chen, Arman Zharmagambetov, Saeed Mahloujifar, Kamalika Chaudhuri, Chuan Guo | 2024-10-07 | arXiv | https://github.com/facebookresearch/SecAlign | http://arxiv.org/abs/2410.05451v1 |
594 | Narrative-of-Thought: Improving Temporal Reasoning of Large Language Models via Recounted Narratives | Xinliang Frederick Zhang, Nicholas Beauchamp, Lu Wang | 2024-10-07 | EMNLP | https://github.com/launchnlp/NoT | https://aclanthology.org/2024.findings-emnlp.963 |
595 | Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality | Guanyu Zhou, Yibo Yan, Xin Zou, Kun Wang, Aiwei Liu, Xuming Hu | 2024-10-07 | arXiv | https://github.com/The-Martyr/CausalMM | https://doi.org/10.48550/arXiv.2410.04780 |
596 | Synthesizing Interpretable Control Policies through Large Language Model Guided Search | Carlo Bosio, Mark W. Mueller | 2024-10-07 | arXiv | https://github.com/muellerlab/synthesizing_interpretable_control_policies | https://doi.org/10.48550/arXiv.2410.05406 |
597 | CogDevelop2K: Reversed Cognitive Development in Multimodal Large Language Models | Yijiang Li, Qingying Gao, Haoran Sun, Haiyun Lyu, Dezhi Luo, Hokin Deng | 2024-10-06 | arXiv | https://growing-ai-like-a-child.github.io/ | https://doi.org/10.48550/arXiv.2410.10855 |
598 | Leveraging Large Language Models for Suicide Detection on Social Media with Limited Labels | Vy Nguyen, Chau Pham | 2024-10-06 | arXiv | https://github.com/khanhvynguyen/Suicide_Detection_LLMs | https://doi.org/10.48550/arXiv.2410.04501 |
599 | MindScope: Exploring Cognitive Biases in Large Language Models Through Multi-Agent Systems | Zhentao Xie, Jiabao Zhao, Yilei Wang, Jinxin Shi, Yanhong Bai, Xingjiao Wu, Liang He | 2024-10-06 | ECAI | https://github.com/2279072142/MindScope | https://doi.org/10.3233/FAIA240879 |
600 | CS4: Measuring the Creativity of Large Language Models Automatically by Controlling the Number of Story-Writing Constraints | Anirudh Atmakuru, Jatin Nainani, Rohith Siddhartha Reddy Bheemreddy, Anirudh Lakkaraju, Zonghai Yao, Hamed Zamani, Haw-Shiuan Chang | 2024-10-05 | arXiv | https://github.com/anirudhlakkaraju/cs4_benchmark | https://doi.org/10.48550/arXiv.2410.04197 |
601 | Neuron-Level Sequential Editing for Large Language Models | Houcheng Jiang, Junfeng Fang, Tianyu Zhang, An Zhang, Ruipeng Wang, Tao Liang, Xiang Wang | 2024-10-05 | arXiv | https://github.com/jianghoucheng/NSE | https://doi.org/10.48550/arXiv.2410.04045 |
602 | Steering Large Language Models between Code Execution and Textual Reasoning | Yongchao Chen, Harsh Jhamtani, Srinagesh Sharma, Chuchu Fan, Chi Wang | 2024-10-04 | arXiv | https://yongchao98.github.io/CodeSteer/ | https://doi.org/10.48550/arXiv.2410.03524 |
603 | Self-Powered LLM Modality Expansion for Large Speech-Text Models | Tengfei Yu, Xuebo Liu, Zhiyi Hou, Liang Ding, Dacheng Tao, Min Zhang | 2024-10-04 | arXiv | https://github.com/ytf-philp/Self-powered-LSM | http://arxiv.org/abs/2410.03798v2 |
604 | Leveraging Social Determinants of Health in Alzheimer's Research Using LLM-Augmented Literature Mining and Knowledge Graphs | Tianqi Shang, Shu Yang, Weiqing He, Tianhua Zhai, Dawei Li, Bojian Hou, Tianlong Chen, Jason H. Moore, Marylyn D. Ritchie, Li Shen | 2024-10-04 | arXiv | https://github.com/hwq0726/SDoHenPKG | http://arxiv.org/abs/2410.09080v1 |
605 | LLM-TOPLA: Efficient LLM Ensemble by Maximising Diversity | Selim Furkan Tekin, Fatih Ilhan, Tiansheng Huang, Sihao Hu, Ling Liu | 2024-10-04 | arXiv | https://github.com/git-disl/llm-topla | http://arxiv.org/abs/2410.03953v1 |
606 | GraphRouter: A Graph-based Router for LLM Selections | Tao Feng, Yanzhen Shen, Jiaxuan You | 2024-10-04 | arXiv | https://github.com/ulab-uiuc/GraphRouter | http://arxiv.org/abs/2410.03834v1 |
607 | Aligning LLMs with Individual Preferences via Interaction | Shujin Wu, May Fung, Cheng Qian, Jeonghwan Kim, Dilek Hakkani-Tur, Heng Ji | 2024-10-04 | arXiv | https://github.com/ShujinWu-0814/ALOE | http://arxiv.org/abs/2410.03642v2 |
608 | PersonalSum: A User-Subjective Guided Personalized Summarization Dataset for Large Language Models | Lemei Zhang, Peng Liu, Marcus Tiedemann Oekland Henriksboe, Even W. Lauvrak, Jon Atle Gulla, Heri Ramampiaro | 2024-10-04 | arXiv | https://github.com/SmartmediaAI/PersonalSum | https://doi.org/10.48550/arXiv.2410.03905 |
609 | PersoBench: Benchmarking Personalized Response Generation in Large Language Models | Saleh Afzoon, Usman Naseem, Amin Beheshti, Zahra Jamali | 2024-10-04 | arXiv | https://github.com/salehafzoon/PersoBench | https://doi.org/10.48550/arXiv.2410.03198 |
610 | Output Scouting: Auditing Large Language Models for Catastrophic Responses | Andrew Bell, João Fonseca | 2024-10-04 | arXiv | https://github.com/joaopfonseca/outputscouting | https://doi.org/10.48550/arXiv.2410.05305 |
611 | ARB-LLM: Alternating Refined Binarizations for Large Language Models | Zhiteng Li, Xianglong Yan, Tianao Zhang, Haotong Qin, Dong Xie, Jiang Tian, Zhongchao Shi, Linghe Kong, Yulun Zhang, Xiaokang Yang | 2024-10-04 | arXiv | https://github.com/ZHITENGLI/ARB-LLM | https://doi.org/10.48550/arXiv.2410.03129 |
612 | A Probabilistic Perspective on Unlearning and Alignment for Large Language Models | Yan Scholten, Stephan Günnemann, Leo Schwinn | 2024-10-04 | arXiv | https://github.com/yascho/probabilistic-unlearning | https://doi.org/10.48550/arXiv.2410.03523 |
613 | CommonIT: Commonality-Aware Instruction Tuning for Large Language Models via Data Partitions | Jun Rao, Xuebo Liu, Lian Lian, Shengjun Cheng, Yunjie Liao, Min Zhang | 2024-10-04 | EMNLP | https://github.com/raojay7/CommonIT | https://aclanthology.org/2024.emnlp-main.561 |
614 | POSIX: A Prompt Sensitivity Index For Large Language Models | Anwoy Chatterjee, H. S. V. N. S. Kowndinya Renduchintala, Sumit Bhatia, Tanmoy Chakraborty | 2024-10-03 | EMNLP | https://github.com/kowndinya-renduchintala/POSIX | https://aclanthology.org/2024.findings-emnlp.852 |
615 | Traffic Light or Light Traffic? Investigating Phrasal Semantics in Large Language Models | Rui Meng, Ye Liu, Lifu Tu, Daqing He, Yingbo Zhou, Semih Yavuz | 2024-10-03 | EMNLP | https://github.com/memray/llm_phrase_semantics | https://aclanthology.org/2024.findings-emnlp.503 |
616 | Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents | Hanrong Zhang, Jingyuan Huang, Kai Mei, Yifei Yao, Zhenting Wang, Chenlu Zhan, Hongwei Wang, Yongfeng Zhang | 2024-10-03 | arXiv | https://github.com/agiresearch/ASB | http://arxiv.org/abs/2410.02644v1 |
617 | Meta-Models: An Architecture for Decoding LLM Behaviors Through Interpreted Embeddings and Natural Language | Anthony Costarelli, Mat Allen, Severin Field | 2024-10-03 | arXiv | https://github.com/acostarelli/meta-models-public | http://arxiv.org/abs/2410.02472v3 |
618 | StringLLM: Understanding the String Processing Capability of Large Language Models | Xilong Wang, Hao Fu, Jindong Wang, Neil Zhenqiang Gong | 2024-10-02 | arXiv | https://github.com/wxl-lxw/StringLLM | https://doi.org/10.48550/arXiv.2410.01208 |
619 | Towards a Theoretical Understanding of Synthetic Data in LLM Post-Training: A Reverse-Bottleneck Perspective | Zeyu Gan, Yong Liu | 2024-10-02 | arXiv | https://github.com/ZyGan1999/Towards-a-Theoretical-Understanding-of-Synthetic-Data-in-LLM-Post-Training | http://arxiv.org/abs/2410.01720v2 |
620 | Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint? | Xi Chen, Kaituo Feng, Changsheng Li, Xunhao Lai, Xiangyu Yue, Ye Yuan, Guoren Wang | 2024-10-02 | arXiv | https://github.com/xichen-fy/Fira | http://arxiv.org/abs/2410.01623v2 |
621 | EMMA: Efficient Visual Alignment in Multi-Modal LLMs | Sara Ghazanfari, Alexandre Araujo, Prashanth Krishnamurthy, Siddharth Garg, Farshad Khorrami | 2024-10-02 | arXiv | https://github.com/SaraGhazanfari/EMMA | http://arxiv.org/abs/2410.02080v1 |
622 | TypedThinker: Typed Thinking Improves Large Language Model Reasoning | Danqing Wang, Jianxin Ma, Fei Fang, Lei Li | 2024-10-02 | arXiv | https://github.com/dqwang122/ThinkHub | https://doi.org/10.48550/arXiv.2410.01952 |
623 | Open-RAG: Enhanced Retrieval Augmented Reasoning with Open-Source Large Language Models | Shayekh Bin Islam, Md Asib Rahman, K. S. M. Tozammel Hossain, Enamul Hoque, Shafiq Joty, Md. Rizwan Parvez | 2024-10-02 | EMNLP | https://openragmoe.github.io/ | https://aclanthology.org/2024.findings-emnlp.831 |
624 | DLP-LoRA: Efficient Task-Specific LoRA Fusion with a Dynamic, Lightweight Plugin for Large Language Models | Yuxuan Zhang, Ruizhe Li | 2024-10-02 | arXiv | https://github.com/MeCuping/DLP-LoRA | https://doi.org/10.48550/arXiv.2410.01497 |
625 | Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model Compression | Jingcun Wang, Yu-Guang Chen, Ing-Chao Lin, Bing Li, Grace Li Zhang | 2024-10-02 | arXiv | https://github.com/TUDa-HWAI/Basis_Sharing | https://doi.org/10.48550/arXiv.2410.03765 |
626 | FactAlign: Long-form Factuality Alignment of Large Language Models | Chao-Wei Huang, Yun-Nung Chen | 2024-10-02 | EMNLP | https://github.com/MiuLab/FactAlign | https://aclanthology.org/2024.findings-emnlp.955 |
627 | Dynamic Planning for LLM-based Graphical User Interface Automation | Shaoqing Zhang, Zhuosheng Zhang, Kehai Chen, Xinbei Ma, Muyun Yang, Tiejun Zhao, Min Zhang | 2024-10-01 | OpenReview | https://github.com/sqzhang-lazy/D-PoT | http://arxiv.org/abs/2410.00467v3 |
628 | Insight: A Multi-Modal Diagnostic Pipeline using LLMs for Ocular Surface Disease Diagnosis | Chun-Hsiao Yeh, Jiayun Wang, Andrew D. Graham, Andrea J. Liu, Bo Tan, Yubei Chen, Yi Ma, Meng C. Lin | 2024-10-01 | arXiv | https://danielchyeh.github.io/MDPipe/ | http://arxiv.org/abs/2410.00292v1 |
629 | Style-Specific Neurons for Steering LLMs in Text Style Transfer | Wen Lai, Viktor Hangya, Alexander Fraser | 2024-10-01 | arXiv | https://github.com/wenlai-lavine/sNeuron-TST | http://arxiv.org/abs/2410.00593v1 |
630 | Unleashing the Unseen: Harnessing Benign Datasets for Jailbreaking Large Language Models | Wei Zhao, Zhe Li, Yige Li, Jun Sun | 2024-10-01 | arXiv | https://github.com/suffix-maybe-feature/adver-suffix-maybe-features | http://arxiv.org/abs/2410.00451v3 |
631 | Multimodal LLM Enhanced Cross-lingual Cross-modal Retrieval | Yabing Wang, Le Wang, Qiang Zhou, Zhibin Wang, Hao Li, Gang Hua, Wei Tang | 2024-10 | MM '24: Proceedings of the 32nd ACM International Conference on Multimedia | https://github.com/LiJiaBei-7/leccr | https://dl.acm.org/doi/10.1145/3664647.3680886 |
632 | mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model | Anwen Hu, Yaya Shi, Haiyang Xu, Jiabo Ye, Qinghao Ye, Ming Yan, Chenliang Li, Qi Qian, Ji Zhang, Fei Huang | 2024-10 | MM '24: Proceedings of the 32nd ACM International Conference on Multimedia | https://github.com/X-PLUG/mPLUG-DocOwl/tree/main/PaperOwl | https://dl.acm.org/doi/10.1145/3664647.3681294 |
633 | WorldGPT: Empowering LLM as Multimodal World Model | Zhiqi Ge, Hongzhe Huang, Mingze Zhou, Juncheng Li, Guoming Wang, Siliang Tang, Yueting Zhuang | 2024-10 | MM '24: Proceedings of the 32nd ACM International Conference on Multimedia | https://github.com/DCDmllm/WorldGPT | https://dl.acm.org/doi/10.1145/3664647.3681488 |
634 | Semantic Alignment for Multimodal Large Language Models | Tao Wu, Mengze Li, Jingyuan Chen, Wei Ji, Wang Lin, Jinyang Gao, Kun Kuang, Zhou Zhao, Fei Wu | 2024-10 | MM '24: Proceedings of the 32nd ACM International Conference on Multimedia | https://mccartney01.github.io/SAM | https://dl.acm.org/doi/10.1145/3664647.3681014 |
635 | Preliminary Study on Incremental Learning for Large Language Model-based Recommender Systems | Tianhao Shi, Yang Zhang, Zhijian Xu, Chong Chen, Fuli Feng, Xiangnan He, Qi Tian | 2024-10 | CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management | https://github.com/TianhaoShi2001/LSAT | https://dl.acm.org/doi/10.1145/3627673.3679922 |
636 | Hypergraph Multi-modal Large Language Model: Exploiting EEG and Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video Understanding | Minghui Wu, Chenxu Zhao, Anyang Su, Donglin Di, Tianyu Fu, Da An, Min He, Ya Gao, Meng Ma, Kun Yan, Ping Wang | 2024-10 | MM '24: Proceedings of the 32nd ACM International Conference on Multimedia | https://github.com/mininglamp-MLLM/HMLLM | https://dl.acm.org/doi/10.1145/3664647.3680810 |
637 | MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors | Yuan Tang, Xu Han, Xianzhi Li, Qiao Yu, Yixue Hao, Long Hu, Min Chen | 2024-10 | MM '24: Proceedings of the 32nd ACM International Conference on Multimedia | https://github.com/TangYuan96/MiniGPT-3D | https://dl.acm.org/doi/10.1145/3664647.3681257 |
638 | MM-Forecast: A Multimodal Approach to Temporal Event Forecasting with Large Language Models | Haoxuan Li, Zhengmao Yang, Yunshan Ma, Yi Bin, Yang Yang, Tat-Seng Chua | 2024-10 | MM '24: Proceedings of the 32nd ACM International Conference on Multimedia | https://github.com/LuminosityX/MM-Forecast | https://dl.acm.org/doi/10.1145/3664647.3681593 |
639 | Fairness in Large Language Models in Three Hours | Thang Viet Doan, Zichong Wang, Nhat Nguyen Minh Hoang, Wenbin Zhang | 2024-10 | CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management | https://github.com/LavinWong/Fairness-in-Large-Language-Models | https://dl.acm.org/doi/10.1145/3627673.3679090 |
640 | Advancing Multimodal Large Language Models with Quantization-Aware Scale Learning for Efficient Adaptation | Jingjing Xie, Yuxin Zhang, Mingbao Lin, Liujuan Cao, Rongrong Ji | 2024-10 | MM '24: Proceedings of the 32nd ACM International Conference on Multimedia | https://github.com/xjjxmu/QSLAW | https://dl.acm.org/doi/10.1145/3664647.3680838 |
641 | LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation | Ziyao Zhang, Yanlin Wang, Chong Wang, Jiachi Chen, Zibin Zheng | 2024-09-30 | arXiv | https://github.com/DeepSoftwareAnalytics/LLMCodingHallucination | http://arxiv.org/abs/2409.20550v1 |
642 | VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs | Ruotong Liao, Max Erler, Huiyu Wang, Guangyao Zhai, Gengyuan Zhang, Yunpu Ma, Volker Tresp | 2024-09-30 | arXiv | https://github.com/mayhugotong/VideoINSTA | http://arxiv.org/abs/2409.20365v2 |
643 | LLMEmb: Large Language Model Can Be a Good Embedding Generator for Sequential Recommendation | Qidong Liu, Xian Wu, Wanyu Wang, Yejing Wang, Yuanshao Zhu, Xiangyu Zhao, Feng Tian, Yefeng Zheng | 2024-09-30 | arXiv | https://github.com/Applied-Machine-Learning-Lab/LLMEmb | http://arxiv.org/abs/2409.19925v2 |
644 | LexEval: A Comprehensive Chinese Legal Benchmark for Evaluating Large Language Models | Haitao Li, You Chen, Qingyao Ai, Yueyue Wu, Ruizhe Zhang, Yiqun Liu | 2024-09-30 | arXiv | https://github.com/CSHaitao/LexEval | https://doi.org/10.48550/arXiv.2409.20288 |
645 | RouterDC: Query-Based Router by Dual Contrastive Learning for Assembling Large Language Models | Shuhao Chen, Weisen Jiang, Baijiong Lin, James T. Kwok, Yu Zhang | 2024-09-30 | arXiv | https://github.com/shuhao02/RouterDC | https://doi.org/10.48550/arXiv.2409.19886 |
646 | Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models | Luohe Shi, Yao Yao, Zuchao Li, Lefei Zhang, Hai Zhao | 2024-09-30 | arXiv | https://github.com/ShiLuohe/ReferenceTrustableDecoding | https://doi.org/10.48550/arXiv.2409.20181 |
647 | BuildingView: Constructing Urban Building Exteriors Databases with Street View Imagery and Multimodal Large Language Mode | Zongrong Li, Yunlei Su, Chenyuan Zhu, Wufan Zhao | 2024-09-29 | arXiv | https://github.com/Jasper0122/BuildingView | https://doi.org/10.48550/arXiv.2409.19527 |
648 | Can Large Language Models Analyze Graphs like Professionals? A Benchmark, Datasets and Models | Xin Li, Weize Chen, Qizhi Chu, Haopeng Li, Zhaojun Sun, Ran Li, Chen Qian, Yiwei Wei, Zhiyuan Liu, Chuan Shi, Maosong Sun, Cheng Yang | 2024-09-29 | arXiv | https://github.com/BUPT-GAMMA/ProGraph | https://doi.org/10.48550/arXiv.2409.19667 |
649 | Identifying Knowledge Editing Types in Large Language Models | Xiaopeng Li, Shangwen Wang, Shezheng Song, Bin Ji, Huijun Liu, Shasha Li, Jun Ma, Jie Yu | 2024-09-29 | arXiv | https://github.com/xpq-tech/KETI | https://doi.org/10.48550/arXiv.2409.19663 |
650 | A multimodal LLM for the non-invasive decoding of spoken text from brain recordings | Youssef Hmamouche, Ismail Chihab, Lahoucine Kdouri, Amal El Fallah Seghrouchni | 2024-09-29 | arXiv | https://github.com/Hmamouche/brain_decode | http://arxiv.org/abs/2409.19710v1 |
651 | OpenSep: Leveraging Large Language Models with Textual Inversion for Open World Audio Separation | Tanvir Mahmud, Diana Marculescu | 2024-09-28 | EMNLP | https://github.com/tanvir-utexas/OpenSep | https://aclanthology.org/2024.emnlp-main.735 |
652 | A Survey on the Honesty of Large Language Models | Siheng Li, Cheng Yang, Taiqiang Wu, Chufan Shi, Yuji Zhang, Xinyu Zhu, Zesen Cheng, Deng Cai, Mo Yu, Lemao Liu, Jie Zhou, Yujiu Yang, Ngai Wong, Xixin Wu, Wai Lam | 2024-09-27 | arXiv | https://github.com/SihengLi99/LLM-Honesty-Survey | https://doi.org/10.48550/arXiv.2409.18786 |
653 | CurricuLLM: Automatic Task Curricula Design for Learning Complex Robot Skills using Large Language Models | Kanghyun Ryu, Qiayuan Liao, Zhongyu Li, Koushil Sreenath, Negar Mehr | 2024-09-27 | arXiv | https://github.com/labicon/CurricuLLM | https://doi.org/10.48550/arXiv.2409.18382 |
654 | Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models | Jiaming Li, Lei Zhang, Yunshui Li, Ziqiang Liu, Yuelin Bai, Run Luo, Longze Chen, Min Yang | 2024-09-27 | EMNLP | https://github.com/Geaming2002/Ruler | https://aclanthology.org/2024.findings-emnlp.172 |
655 | Align$^2$LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation | Hongzhe Huang, Jiang Liu, Zhewen Yu, Li Cai, Dian Jiao, Wenqiao Zhang, Siliang Tang, Juncheng Li, Hao Jiang, Haoyuan Li, Yueting Zhuang | 2024-09-27 | arXiv | https://github.com/DCDmllm/Align2LLaVA | http://arxiv.org/abs/2409.18541v2 |
656 | HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection | Xuefeng Du, Chaowei Xiao, Yixuan Li | 2024-09-26 | arXiv | https://github.com/deeplearningwisc/haloscope | http://arxiv.org/abs/2409.17504v1 |
657 | From News to Forecast: Integrating Event Analysis in LLM-Based Time Series Forecasting with Reflection | Xinlei Wang, Maike Feng, Jing Qiu, Jinjin Gu, Junhua Zhao | 2024-09-26 | arXiv | https://github.com/ameliawong1996/From_News_to_Forecast | http://arxiv.org/abs/2409.17515v3 |
658 | Extracting Affect Aggregates from Longitudinal Social Media Data with Temporal Adapters for Large Language Models | Georg Ahnert, Max Pellert, David Garcia, Markus Strohmaier | 2024-09-26 | arXiv | https://github.com/dess-mannheim/temporal-adapters | http://arxiv.org/abs/2409.17990v1 |
659 | AssistantX: An LLM-Powered Proactive Assistant in Collaborative Human-Populated Environment | Nan Sun, Bo Mao, Yongchang Li, Lumeng Ma, Di Guo, Huaping Liu | 2024-09-26 | arXiv | https://assistantx-agent.github.io/AssistantX/ | http://arxiv.org/abs/2409.17655v1 |
660 | RED QUEEN: Safeguarding Large Language Models against Concealed Multi-Turn Jailbreaking | Yifan Jiang, Kriti Aggarwal, Tanmay Laud, Kashif Munir, Jay Pujara, Subhabrata Mukherjee | 2024-09-26 | arXiv | https://github.com/kriti-hippo/red_queen | https://doi.org/10.48550/arXiv.2409.17458 |
661 | MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models | Gongfan Fang, Hongxu Yin, Saurav Muralidharan, Greg Heinrich, Jeff Pool, Jan Kautz, Pavlo Molchanov, Xinchao Wang | 2024-09-26 | arXiv | https://github.com/NVlabs/MaskLLM | https://doi.org/10.48550/arXiv.2409.17481 |
662 | Search for Efficient Large Language Models | Xuan Shen, Pu Zhao, Yifan Gong, Zhenglun Kong, Zheng Zhan, Yushu Wu, Ming Lin, Chao Wu, Xue Lin, Yanzhi Wang | 2024-09-25 | arXiv | https://github.com/shawnricecake/search-llm | https://doi.org/10.48550/arXiv.2409.17372 |
663 | AutoLLM-CARD: Towards a Description and Landscape of Large Language Models | Shengwei Tian, Lifeng Han, Goran Nenadic | 2024-09-25 | arXiv | https://github.com/shengwei-tian/dependency-parser-visualization | http://arxiv.org/abs/2409.17011v3 |
664 | DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling | Kyuheon Jung, Yongdeuk Seo, Seongwoo Cho, Jaeyoung Kim, Hyun-seok Min, Sungchul Choi | 2024-09-25 | arXiv | https://github.com/kkyuhun94/dalda | http://arxiv.org/abs/2409.16949v1 |
665 | Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction | Zhenmei Shi, Yifei Ming, Xuan-Phi Nguyen, Yingyu Liang, Shafiq Joty | 2024-09-25 | arXiv | https://github.com/SalesforceAIResearch/GemFilter | http://arxiv.org/abs/2409.17422v1 |
666 | EventHallusion: Diagnosing Event Hallucinations in Video LLMs | Jiacheng Zhang, Yang Jiao, Shaoxiang Chen, Jingjing Chen, Yu-Gang Jiang | 2024-09-25 | arXiv | https://github.com/Stevetich/EventHallusion | http://arxiv.org/abs/2409.16597v1 |
667 | HDFlow: Enhancing LLM Complex Problem-Solving with Hybrid Thinking and Dynamic Workflows | Wenlin Yao, Haitao Mi, Dong Yu | 2024-09-25 | arXiv | https://github.com/wenlinyao/HDFlow | http://arxiv.org/abs/2409.17433v1 |
668 | Zero-Shot Detection of LLM-Generated Text using Token Cohesiveness | Shixuan Ma, Quan Wang | 2024-09-25 | arXiv | https://github.com/Shixuan-Ma/TOCSIN | http://arxiv.org/abs/2409.16914v1 |
669 | CHBench: A Chinese Dataset for Evaluating Health in Large Language Models | Chenlu Guo, Nuo Xu, Yi Chang, Yuan Wu | 2024-09-24 | arXiv | https://github.com/TracyGuo2001/CHBench | https://doi.org/10.48550/arXiv.2409.15766 |
670 | HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models | Haoran Que, Feiyu Duan, Liqun He, Yutao Mou, Wangchunshu Zhou, Jiaheng Liu, Wenge Rong, Zekun Moore Wang, Jian Yang, Ge Zhang, Junran Peng, Zhaoxiang Zhang, Songyang Zhang, Kai Chen | 2024-09-24 | arXiv | https://github.com/Quehry/HelloBench | https://doi.org/10.48550/arXiv.2409.16191 |
671 | XTRUST: On the Multilingual Trustworthiness of Large Language Models | Yahan Li, Yi Wang, Yi Chang, Yuan Wu | 2024-09-24 | arXiv | https://github.com/LluckyYH/XTRUST | https://doi.org/10.48550/arXiv.2409.15762 |
672 | COHERENT: Collaboration of Heterogeneous Multi-Robot System with Large Language Models | Kehui Liu, Zixin Tang, Dong Wang, Zhigang Wang, Bin Zhao, Xuelong Li | 2024-09-23 | arXiv | https://github.com/MrKeee/COHERENT | https://doi.org/10.48550/arXiv.2409.15146 |
673 | Phantom of Latent for Large Language and Vision Models | Byung-Kwan Lee, Sangyun Chung, Chae Won Kim, Beomchan Park, Yong Man Ro | 2024-09-23 | arXiv | https://github.com/ByungKwanLee/Phantom | https://doi.org/10.48550/arXiv.2409.14713 |
674 | Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method | Weichao Zhang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng | 2024-09-23 | EMNLP | https://github.com/zhang-wei-chao/DC-PDD | https://aclanthology.org/2024.emnlp-main.300 |
675 | Unveiling Narrative Reasoning Limits of Large Language Models with Trope in Movie Synopses | Hung-Ting Su, Ya-Ching Hsu, Xudong Lin, Xiang Qian Shi, Yulei Niu, Han-Yuan Hsu, Hung-yi Lee, Winston H. Hsu | 2024-09-22 | EMNLP | https://github.com/Shelley1214/Trope | https://aclanthology.org/2024.findings-emnlp.872 |
676 | PTD-SQL: Partitioning and Targeted Drilling with LLMs in Text-to-SQL | Ruilin Luo, Liyuan Wang, Binghuai Lin, Zicheng Lin, Yujiu Yang | 2024-09-21 | arXiv | https://github.com/lrlbbzl/PTD-SQL | http://arxiv.org/abs/2409.14082v1 |
677 | StateAct: State Tracking and Reasoning for Acting and Planning with Large Language Models | Nikolai Rozanov, Marek Rei | 2024-09-21 | arXiv | https://github.com/ai-nikolai/StateAct | https://doi.org/10.48550/arXiv.2410.02810 |
678 | ShizishanGPT: An Agricultural Large Language Model Integrating Tools and Resources | Shuting Yang, Zehui Liu, Wolfgang Mayer, Ningpei Ding, Ying Wang, Yu Huang, Pengfei Wu, Wanli Li, Lin Li, Hong-Yu Zhang, Zaiwen Feng | 2024-09-20 | arXiv | https://github.com/Zaiwen/CropGPT | https://doi.org/10.48550/arXiv.2409.13537 |
679 | CFSP: An Efficient Structured Pruning Framework for LLMs with Coarse-to-Fine Activation Information | Yuxin Wang, Minghua Ma, Zekun Wang, Jingchang Chen, Huiming Fan, Liping Shan, Qing Yang, Dongliang Xu, Ming Liu, Bing Qin | 2024-09-20 | arXiv | https://github.com/wyxscir/CFSP | http://arxiv.org/abs/2409.13199v2 |
680 | CLAIR-A: Leveraging Large Language Models to Judge Audio Captions | Tsung-Han Wu, Joseph E. Gonzalez, Trevor Darrell, David M. Chan | 2024-09-19 | arXiv | https://github.com/DavidMChan/clair-a | https://doi.org/10.48550/arXiv.2409.12962 |
681 | Edu-Values: Towards Evaluating the Chinese Education Values of Large Language Models | Peiyi Zhang, Yazhou Zhang, Bo Wang, Lu Rong, Jing Qin | 2024-09-19 | arXiv | https://github.com/zhangpeii/Edu-Values | https://doi.org/10.48550/arXiv.2409.12739 |
682 | HLLM: Enhancing Sequential Recommendations via Hierarchical Large Language Models for Item and User Modeling | Junyi Chen, Lu Chi, Bingyue Peng, Zehuan Yuan | 2024-09-19 | arXiv | https://github.com/bytedance/HLLM | https://doi.org/10.48550/arXiv.2409.12740 |
683 | Development and bilingual evaluation of Japanese medical large language model within reasonably low computational resources | Issey Sukeda | 2024-09-18 | arXiv | https://github.com/stardust-coder/japanese-lm-med-harness | https://doi.org/10.48550/arXiv.2409.11783 |
684 | Large Language Models Are Strong Audio-Visual Speech Recognition Learners | Umberto Cappellazzo, Minsu Kim, Honglie Chen, Pingchuan Ma, Stavros Petridis, Daniele Falavigna, Alessio Brutti, Maja Pantic | 2024-09-18 | arXiv | https://github.com/umbertocappellazzo/AVSR-LLMs | https://doi.org/10.48550/arXiv.2409.12319 |
685 | BodyShapeGPT: SMPL Body Shape Manipulation with LLMs | Baldomero R. Árbol, Dan Casas | 2024-09-18 | arXiv | https://github.com/baldoarbol/BodyShapeGPT | http://arxiv.org/abs/2410.03556v1 |
686 | Improving LLM Reasoning with Multi-Agent Tree-of-Thought Validator Agent | Fatemeh Haji, Mazal Bethany, Maryam Tabar, Jason Chiang, Anthony Rios, Peyman Najafirad | 2024-09-17 | arXiv | https://github.com/SecureAIAutonomyLab/MA-ToT | http://arxiv.org/abs/2409.11527v2 |
687 | Benchmarking Large Language Model Uncertainty for Prompt Optimization | Pei-Fu Guo, Yun-Da Tsai, Shou-De Lin | 2024-09-16 | arXiv | https://github.com/0Frett/PO-Uncertainty-Benchmarking | https://doi.org/10.48550/arXiv.2409.10044 |
688 | Do Large Language Models Need a Content Delivery Network? | Yihua Cheng, Kuntai Du, Jiayi Yao, Junchen Jiang | 2024-09-16 | arXiv | https://github.com/LMCache/LMCache | https://doi.org/10.48550/arXiv.2409.13761 |
689 | Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language Models | Weihao Ye, Qiong Wu, Wenhao Lin, Yiyi Zhou | 2024-09-16 | arXiv | https://github.com/ywh187/FitPrune | https://doi.org/10.48550/arXiv.2409.10197 |
690 | The Two Word Test: A Semantic Benchmark for Large Language Models | Nicholas Riccardi, Xuan Yang, Rutvik H. Desai | 2024-09-16 | arXiv | https://github.com/NickRiccardi/two-word-test | https://doi.org/10.48550/arXiv.2306.04610 |
691 | HALO: Hallucination Analysis and Learning Optimization to Empower LLMs with Retrieval-Augmented Context for Guided Clinical Decision Making | Sumera Anjum, Hanzhi Zhang, Wenjun Zhou, Eun Jin Paek, Xiaopeng Zhao, Yunhe Feng | 2024-09-16 | arXiv | https://github.com/ResponsibleAILab/HALO | http://arxiv.org/abs/2409.10011v2 |
692 | Can Large Language Models Grasp Event Signals? Exploring Pure Zero-Shot Event-based Recognition | Zongyou Yu, Qiang Qu, Xiaoming Chen, Chen Wang | 2024-09-15 | arXiv | https://github.com/ChrisYu-Zz/Pure-event-based-recognition-based-LLM | https://doi.org/10.48550/arXiv.2409.09628 |
693 | Traffic Scene Generation from Natural Language Description for Autonomous Vehicles with Large Language Model | Bo-Kai Ruan, Hao-Tang Tsui, Yung-Hui Li, Hong-Han Shuai | 2024-09-15 | arXiv | https://basiclab.github.io/TTSG | https://doi.org/10.48550/arXiv.2409.09575 |
694 | AlpaPICO: Extraction of PICO Frames from Clinical Trial Documents Using LLMs | Madhusudan Ghosh, Shrimon Mukherjee, Asmit Ganguly, Partha Basuchowdhuri, Sudip Kumar Naskar, Debasis Ganguly | 2024-09-15 | arXiv | https://github.com/shrimonmuke0202/AlpaPICO | http://arxiv.org/abs/2409.09704v1 |
695 | PeriGuru: A Peripheral Robotic Mobile App Operation Assistant based on GUI Image Understanding and Prompting with LLM | Kelin Fu, Yang Tian, Kaigui Bian | 2024-09-14 | arXiv | https://github.com/Z2sJ4t/PeriGuru | http://arxiv.org/abs/2409.09354v1 |
696 | LLM-Powered Ensemble Learning for Paper Source Tracing: A GPU-Free Approach | Kunlong Chen, Junjun Wang, Zhaoqun Chen, Kunjin Chen, Yitian Chen | 2024-09-14 | arXiv | https://github.com/Cklwanfifa/KDDCUP2024-PST | http://arxiv.org/abs/2409.09383v2 |
697 | Intelligent LiDAR Navigation: Leveraging External Information and Semantic Maps with LLM as Copilot | Fujing Xie, Jiajie Zhang, Sören Schwertfeger | 2024-09-13 | arXiv | https://github.com/xiexiexiaoxiexie/Intelligent-LiDAR-Navigation-LLM-as-Copilot | http://arxiv.org/abs/2409.08493v1 |
698 | L3Cube-IndicQuest: A Benchmark Question Answering Dataset for Evaluating Knowledge of LLMs in Indic Context | Pritika Rohera, Chaitrali Ginimav, Akanksha Salunke, Gayatri Sawant, Raviraj Joshi | 2024-09-13 | arXiv | https://github.com/l3cube-pune/indic-nlp | http://arxiv.org/abs/2409.08706v2 |
699 | ProcessTBench: An LLM Plan Generation Dataset for Process Mining | Andrei Cosmin Redis, Mohammadreza Fani Sani, Bahram Zarrin, Andrea Burattin | 2024-09-13 | arXiv | https://github.com/microsoft/ProcessTBench | http://arxiv.org/abs/2409.09191v2 |
700 | FP-VEC: Fingerprinting Large Language Models via Efficient Vector Addition | Zhenhua Xu, Wenpeng Xing, Zhebo Wang, Chang Hu, Chen Jie, Meng Han | 2024-09-13 | arXiv | https://fingerprintvector.github.io | https://doi.org/10.48550/arXiv.2409.08846 |
701 | Fine-tuning Large Language Models for Entity Matching | Aaron Steiner, Ralph Peeters, Christian Bizer | 2024-09-12 | arXiv | https://github.com/wbsg-uni-mannheim/TailorMatch | https://doi.org/10.48550/arXiv.2409.08185 |
702 | DrLLM: Prompt-Enhanced Distributed Denial-of-Service Resistance Method with Large Language Models | Zhenyu Yin, Shang Liu, Guangyuan Xu | 2024-09-11 | arXiv | https://github.com/liuup/DrLLM | https://doi.org/10.48550/arXiv.2409.10561 |
703 | AdaPPA: Adaptive Position Pre-Fill Jailbreak Attack Approach Targeting LLMs | Lijia Lv, Weigang Zhang, Xuehai Tang, Jie Wen, Feng Liu, Jizhong Han, Songlin Hu | 2024-09-11 | arXiv | https://github.com/Yummy416/AdaPPA | http://arxiv.org/abs/2409.07503v1 |
704 | Think Together and Work Better: Combining Humans' and LLMs' Think-Aloud Outcomes for Effective Text Evaluation | SeongYeub Chu, JongWoo Kim, MunYong Yi | 2024-09-11 | arXiv | https://github.com/BBeeChu/InteractEval | http://arxiv.org/abs/2409.07355v1 |
705 | Understanding Knowledge Drift in LLMs through Misinformation | Alina Fastowski, Gjergji Kasneci | 2024-09-11 | arXiv | https://github.com/afastowski/knowledge_drift | http://arxiv.org/abs/2409.07085v1 |
706 | What is the Role of Small Models in the LLM Era: A Survey | Lihu Chen, Gaël Varoquaux | 2024-09-10 | arXiv | https://github.com/tigerchen52/role_of_small_models | http://arxiv.org/abs/2409.06857v4 |
707 | Ferret: Federated Full-Parameter Tuning at Scale for Large Language Models | Yao Shu, Wenyang Hu, See-Kiong Ng, Bryan Kian Hsiang Low, Fei Richard Yu | 2024-09-10 | arXiv | https://github.com/allen4747/Ferret | https://doi.org/10.48550/arXiv.2409.06277 |
708 | LLaMA-Omni: Seamless Speech Interaction with Large Language Models | Qingkai Fang, Shoutao Guo, Yan Zhou, Zhengrui Ma, Shaolei Zhang, Yang Feng | 2024-09-10 | arXiv | https://github.com/ictnlp/LLaMA-Omni | https://doi.org/10.48550/arXiv.2409.06666 |
709 | Benchmarking Chinese Knowledge Rectification in Large Language Models | Tianhe Lu, Jizhan Fang, Yunzhi Yao, Xin Xu, Ningyu Zhang, Huajun Chen | 2024-09-09 | arXiv | https://github.com/zjunlp/EasyEdit | https://doi.org/10.48550/arXiv.2409.05806 |
710 | FLoRA: Federated Fine-Tuning Large Language Models with Heterogeneous Low-Rank Adaptations | Ziyao Wang, Zheyu Shen, Yexiao He, Guoheng Sun, Hongyi Wang, Lingjuan Lyu, Ang Li | 2024-09-09 | arXiv | https://github.com/ATP-1010/FederatedLLM | https://doi.org/10.48550/arXiv.2409.05976 |
711 | Towards Democratizing Multilingual Large Language Models For Medicine Through A Two-Stage Instruction Fine-tuning Approach | Meng Zhou, Surajsinh Parmar, Anubhav Bhatti | 2024-09-09 | arXiv | https://github.com/SpassMed/Med-Llama3 | https://doi.org/10.48550/arXiv.2409.05732 |
712 | OneGen: Efficient One-Pass Unified Generation and Retrieval for LLMs | Jintian Zhang, Cheng Peng, Mengshu Sun, Xiang Chen, Lei Liang, Zhiqiang Zhang, Jun Zhou, Huajun Chen, Ningyu Zhang | 2024-09-08 | arXiv | https://github.com/zjunlp/OneGen | http://arxiv.org/abs/2409.05152v2 |
713 | Multi-Programming Language Ensemble for Code Generation in Large Language Model | Tengfei Xue, Xuefeng Li, Tahir Azim, Roman Smirnov, Jianhui Yu, Arash Sadrieh, Babak Pahlavan | 2024-09-06 | arXiv | https://github.com/NinjaTech-AI/MPLE | https://doi.org/10.48550/arXiv.2409.04114 |
714 | Sirius: Contextual Sparsity with Correction for Efficient LLMs | Yang Zhou, Zhuoming Chen, Zhaozhuo Xu, Victoria Lin, Beidi Chen | 2024-09-05 | arXiv | https://github.com/Infini-AI-Lab/Sirius | http://arxiv.org/abs/2409.03856v1 |
715 | Sketch: A Toolkit for Streamlining LLM Operations | Xin Jiang, Xiang Li, Wenjia Ma, Xuezhi Fang, Yiqun Yao, Naitong Yu, Xuying Meng, Peng Han, Jing Li, Aixin Sun, Yequan Wang | 2024-09-05 | arXiv | https://github.com/cofe-ai/Sketch | http://arxiv.org/abs/2409.03346v1 |
716 | Debate on Graph: a Flexible and Reliable Reasoning Framework for Large Language Models | Jie Ma, Zhitao Gao, Qi Chai, Wangchun Sun, Pinghui Wang, Hongbin Pei, Jing Tao, Lingyun Song, Jun Liu, Chen Zhang, Lizhen Cui | 2024-09-05 | arXiv | https://github.com/reml-group/DoG | https://doi.org/10.48550/arXiv.2409.03155 |
717 | Planning In Natural Language Improves LLM Search For Code Generation | Evan Wang, Federico Cassano, Catherine Wu, Yunfeng Bai, Will Song, Vaskar Nath, Ziwen Han, Sean Hendryx, Summer Yue, Hugh Zhang | 2024-09-05 | arXiv | https://github.com/scaleapi/plansearch | http://arxiv.org/abs/2409.03733v2 |
718 | LLM Detectors Still Fall Short of Real World: Case of LLM-Generated Short News-Like Posts | Henrique Da Silva Gameiro, Andrei Kucharavy, Ljiljana Dolamic | 2024-09-05 | arXiv | https://github.com/Reliable-Information-Lab-HEVS/benchmark_llm_texts_detection | http://arxiv.org/abs/2409.03291v2 |
719 | Alignment-Aware Model Extraction Attacks on Large Language Models | Zi Liang, Qingqing Ye, Yanyun Wang, Sen Zhang, Yaxin Xiao, Ronghua Li, Jianliang Xu, Haibo Hu | 2024-09-04 | arXiv | https://github.com/liangzid/alignmentExtraction | https://doi.org/10.48550/arXiv.2409.02718 |
720 | Large Language Model-Based Agents for Software Engineering: A Survey | Junwei Liu, Kaixin Wang, Yixuan Chen, Xin Peng, Zhenpeng Chen, Lingming Zhang, Yiling Lou | 2024-09-04 | arXiv | https://github.com/FudanSELab/Agent4SE-Paper-List | https://doi.org/10.48550/arXiv.2409.02977 |
721 | Hypothesizing Missing Causal Variables with LLMs | Ivaxi Sheth, Sahar Abdelnabi, Mario Fritz | 2024-09-04 | arXiv | https://github.com/ivaxi0s/hypothesizing-causal-variable-llm | http://arxiv.org/abs/2409.02604v1 |
722 | Pooling And Attention: What Are Effective Designs For LLM-Based Embedding Models? | Yixuan Tang, Yi Yang | 2024-09-04 | arXiv | https://github.com/yixuantt/PoolingAndAttn | http://arxiv.org/abs/2409.02727v2 |
723 | Booster: Tackling Harmful Fine-tuning for Large Language Models via Attenuating Harmful Perturbation | Tiansheng Huang, Sihao Hu, Fatih Ilhan, Selim Furkan Tekin, Ling Liu | 2024-09-03 | arXiv | https://github.com/git-disl/Booster | https://doi.org/10.48550/arXiv.2409.01586 |
724 | Exploiting the Vulnerability of Large Language Models via Defense-Aware Architectural Backdoor | Abdullah Arafat Miah, Yu Bi | 2024-09-03 | arXiv | https://github.com/SiSL-URI/Arch_Backdoor_LLM | https://doi.org/10.48550/arXiv.2409.01952 |
725 | MMLU-Pro+: Evaluating Higher-Order Reasoning and Shortcut Learning in LLMs | Saeid Asgari Taghanaki, Aliasgahr Khani, Amir Khasahmadi | 2024-09-03 | arXiv | https://github.com/asgsaeid/mmlu-pro-plus | http://arxiv.org/abs/2409.02257v3 |
726 | Agentic Society: Merging skeleton from real world and texture from Large Language Model | Yuqi Bai, Kun Sun, Huishi Yin | 2024-09-02 | arXiv | https://github.com/baiyuqi/agentic-society | https://doi.org/10.48550/arXiv.2409.10550 |
727 | FlashFlex: Accommodating Large Language Model Training over Heterogeneous Environment | Ran Yan, Youhe Jiang, Wangcheng Tao, Xiaonan Nie, Bin Cui, Binhang Yuan | 2024-09-02 | arXiv | https://github.com/Relaxed-System-Lab/FlashFlex | https://doi.org/10.48550/arXiv.2409.01143 |
728 | Large Language Models versus Classical Machine Learning: Performance in COVID-19 Mortality Prediction Using High-Dimensional Tabular Data | Mohammadreza Ghaffarzadeh-Esfahani, Mahdi Ghaffarzadeh-Esfahani, Arian Salahi-Niri, Hossein Toreyhi, Zahra Atf, Amirali Mohsenzadeh-Kermani, Mahshad Sarikhani, Zohreh Tajabadi, Fatemeh Shojaeian, Mohammad Hassan Bagheri, Aydin Feyzi, Mohammadamin Tarighatpayma, Narges Gazmeh, Fateme Heydari, Hossein Afshar, Amirreza Allahgholipour, Farid Alimardani, Ameneh Salehi, Naghmeh Asadimanesh, Mohammad Amin Khalafi, Hadis Shabanipour, Ali Moradi, Sajjad Hossein Zadeh, Omid Yazdani, Romina Esbati, Moozhan Maleki, Danial Samiei Nasr, Amirali Soheili, Hossein Majlesi, Saba Shahsavan, Alireza Soheilipour, Nooshin Goudarzi, Erfan Taherifard, Hamidreza Hatamabadi, Jamil S. Samaan, Thomas Savage, Ankit Sakhuja, Ali Soroush, Girish N. Nadkarni, Ilad Alavi Darazam, Mohamad Amin Pourhoseingholi, Seyed Amir Ahmad Safavi-Naini | 2024-09-02 | arXiv | https://github.com/mohammad-gh009/Large-Language-Models-vs-Classical-Machine-learning | https://doi.org/10.48550/arXiv.2409.02136 |
729 | Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference | Barys Liskavets, Maxim Ushakov, Shuvendu Roy, Mark Klibanov, Ali Etemad, Shane Luke | 2024-09-02 | arXiv | https://github.com/Workday/cpc | http://arxiv.org/abs/2409.01227v3 |
730 | Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models | Bang An, Sicheng Zhu, Ruiyi Zhang, Michael-Andrei Panaitescu-Liess, Yuancheng Xu, Furong Huang | 2024-09-01 | arXiv | https://github.com/umd-huang-lab/FalseRefusal | https://doi.org/10.48550/arXiv.2409.00598 |
731 | Harnessing the Power of Semi-Structured Knowledge and LLMs with Triplet-Based Prefiltering for Question Answering | Derian Boer, Fabian Koch, Stefan Kramer | 2024-09-01 | arXiv | https://github.com/kramerlab/4StepFocus | http://arxiv.org/abs/2409.00861v1 |
732 | AskIt: Unified Programming Interface for Programming with Large Language Models | Katsumi Okuda, Saman P. Amarasinghe | 2024-09 | 2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) | https://github.com/katsumiok/ts-askit | https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10444830 |
733 | LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models | Zhiyuan Hu, Yuliang Liu, Jinman Zhao, Suyuchen Wang, Yan Wang, Wei Shen, Qing Gu, Anh Tuan Luu, See-Kiong Ng, Zhiwei Jiang, Bryan Hooi | 2024-08-31 | arXiv | https://github.com/zhiyuanhubj/LongRecipe | https://doi.org/10.48550/arXiv.2409.00509 |
734 | MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models | Shuai Peng, Di Fu, Liangcai Gao, Xiuqin Zhong, Hongguang Fu, Zhi Tang | 2024-08-30 | arXiv | https://github.com/pengshuai-rin/MultiMath | https://doi.org/10.48550/arXiv.2409.00147 |
735 | A Survey on Evaluation of Large Language Models | Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Kaijie Zhu, Hao Chen, Linyi Yang, Xiaoyuan Yi, Cunxiang Wang, Yidong Wang, Wei Ye, Yue Zhang, Yi Chang, Philip S. Yu, Qiang Yang, Xing Xie | 2024-08-28 | ACM Transactions on Intelligent Systems and Technology (TIST), Volume 15, Issue 3 | https://llm-eval.github.io/ | https://dl.acm.org/doi/10.1145/3641289 |
736 | Atari-GPT: Benchmarking Multimodal Large Language Models as Low-Level Policies in Atari Games | Nicholas R. Waytowich, Devin White, MD Sunbeam, Vinicius G. Goecks | 2024-08-28 | arXiv | https://dev1nw.github.io/atari-gpt/ | http://arxiv.org/abs/2408.15950v2 |
737 | CBF-LLM: Safe Control for LLM Alignment | Yuya Miyaoka, Masaki Inoue | 2024-08-28 | arXiv | https://github.com/Mya-Mya/CBF-LLM | http://arxiv.org/abs/2408.15625v2 |
738 | Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders | Min Shi, Fuxiao Liu, Shihao Wang, Shijia Liao, Subhashree Radhakrishnan, De-An Huang, Hongxu Yin, Karan Sapra, Yaser Yacoob, Humphrey Shi, Bryan Catanzaro, Andrew Tao, Jan Kautz, Zhiding Yu, Guilin Liu | 2024-08-28 | arXiv | https://github.com/NVlabs/Eagle | http://arxiv.org/abs/2408.15998v1 |
739 | Efficient LLM Scheduling by Learning to Rank | Yichao Fu, Siqi Zhu, Runlong Su, Aurick Qiao, Ion Stoica, Hao Zhang | 2024-08-28 | arXiv | https://github.com/hao-ai-lab/vllm-ltr | http://arxiv.org/abs/2408.15792v1 |
740 | Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models | Yuncheng Yang, Yulei Qin, Tong Wu, Zihan Xu, Gang Li, Pengcheng Guo, Hang Shao, Yucheng Shi, Ke Li, Xing Sun, Jie Yang, Yun Gu | 2024-08-28 | arXiv | https://github.com/Yaphabates/Rocket | https://doi.org/10.48550/arXiv.2408.15915 |
741 | LyCon: Lyrics Reconstruction from the Bag-of-Words Using Large Language Models | Haven Kim, Kahyun Choi | 2024-08-27 | arXiv | https://github.com/havenpersona/lycon | https://doi.org/10.48550/arXiv.2408.14750 |
742 | PAT: Pruning-Aware Tuning for Large Language Models | Yijiang Liu, Huanrui Yang, Youxin Chen, Rongyu Zhang, Miao Wang, Yuan Du, Li Du | 2024-08-27 | arXiv | https://github.com/kriskrisliu/PAT_Pruning-Aware-Tuning | https://doi.org/10.48550/arXiv.2408.14721 |
743 | RSTeller: Scaling Up Visual Language Modeling in Remote Sensing with Rich Linguistic Semantics from Openly Available Data and Large Language Models | Junyao Ge, Yang Zheng, Kaitai Guo, Jimin Liang | 2024-08-27 | arXiv | https://github.com/SlytherinGe/RSTeller | https://doi.org/10.48550/arXiv.2408.14744 |
744 | CURLoRA: Stable LLM Continual Fine-Tuning and Catastrophic Forgetting Mitigation | Muhammad Fawi | 2024-08-26 | arXiv | https://github.com/MNoorFawi/curlora | http://arxiv.org/abs/2408.14572v1 |
745 | AgentMove: Predicting Human Mobility Anywhere Using Large Language Model based Agentic Framework | Jie Feng, Yuwei Du, Jie Zhao, Yong Li | 2024-08-26 | arXiv | https://github.com/tsinghua-fib-lab/AgentMove | https://doi.org/10.48550/arXiv.2408.13986 |
746 | ConVis: Contrastive Decoding with Hallucination Visualization for Mitigating Hallucinations in Multimodal Large Language Models | Yeji Park, Deokyeong Lee, Junsuk Choe, Buru Chang | 2024-08-25 | arXiv | https://github.com/yejipark-m/ConVis | https://doi.org/10.48550/arXiv.2408.13906 |
747 | Vision-Language and Large Language Model Performance in Gastroenterology: GPT, Claude, Llama, Phi, Mistral, Gemma, and Quantized Models | Seyed Amir Ahmad Safavi-Naini, Shuhaib Ali, Omer Shahab, Zahra Shahhoseini, Thomas Savage, Sara Rafiee, Jamil S. Samaan, Reem Al Shabeeb, Farah Ladak, Jamie O. Yang, Juan Echavarria, Sumbal Babar, Aasma Shaukat, Samuel Margolis, Nicholas P. Tatonetti, Girish N. Nadkarni, Bara El Kurdi, Ali Soroush | 2024-08-25 | arXiv | https://github.com/Sdamirsa/LLM-VLM-in-Gastroenterology | https://doi.org/10.48550/arXiv.2409.00084 |
748 | vitaLITy 2: Reviewing Academic Literature Using Large Language Models | Hongye An, Arpit Narechania, Emily Wall, Kai Xu | 2024-08-24 | arXiv | https://vitality-vis.github.io | https://doi.org/10.48550/arXiv.2408.13450 |
749 | HRGraph: Leveraging LLMs for HR Data Knowledge Graphs with Information Propagation-based Job Recommendation | Azmine Toushik Wasi | 2024-08-24 | Proceedings of the 1st Workshop on Knowledge Graphs and Large Language Models (KaLLM 2024), Association for Computational Linguistics 2024 | https://github.com/azminewasi/HRGraph | http://arxiv.org/abs/2408.13521v1 |
750 | LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs | Chansung Park, Juyong Jiang, Fan Wang, Sayak Paul, Jing Tang | 2024-08-24 | arXiv | https://github.com/deep-diver/llamaduo | http://arxiv.org/abs/2408.13467v2 |
751 | IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities | Bin Wang, Chunyu Xie, Dawei Leng, Yuhui Yin | 2024-08-23 | arXiv | https://github.com/360CVGroup/Inner-Adaptor-Architecture | https://doi.org/10.48550/arXiv.2408.12902 |
752 | MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans? | Yi-Fan Zhang, Huanyu Zhang, Haochen Tian, Chaoyou Fu, Shuangqing Zhang, Junfei Wu, Feng Li, Kun Wang, Qingsong Wen, Zhang Zhang, Liang Wang, Rong Jin, Tieniu Tan | 2024-08-23 | arXiv | https://mme-realworld.github.io/ | http://arxiv.org/abs/2408.13257v2 |
753 | LIMP: Large Language Model Enhanced Intent-aware Mobility Prediction | Songwei Li, Jie Feng, Jiawei Chi, Xinyuan Hu, Xiaomeng Zhao, Fengli Xu | 2024-08-23 | arXiv | https://github.com/tsinghua-fib-lab/LIMP | https://doi.org/10.48550/arXiv.2408.12832 |
754 | Generating Analytic Specifications for Data Visualization from Natural Language Queries using Large Language Models | Subham Sah, Rishab Mitra, Arpit Narechania, Alex Endert, John T. Stasko, Wenwen Dou | 2024-08-23 | arXiv | https://nl4dv.github.io | https://doi.org/10.48550/arXiv.2408.13391 |
755 | BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks on Large Language Models | Yige Li, Hanxun Huang, Yunhan Zhao, Xingjun Ma, Jun Sun | 2024-08-23 | arXiv | https://github.com/bboylyg/BackdoorLLM | https://doi.org/10.48550/arXiv.2408.12798 |
756 | LLM-PBE: Assessing Data Privacy in Large Language Models | Qinbin Li, Junyuan Hong, Chulin Xie, Jeffrey Tan, Rachel Xin, Junyi Hou, Xavier Yin, Zhun Wang, Dan Hendrycks, Zhangyang Wang, Bo Li, Bingsheng He, Dawn Song | 2024-08-23 | Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 11 | https://llm-pbe.github.io/ | https://dl.acm.org/doi/10.14778/3681954.3681994 |
757 | Controllable Text Generation for Large Language Models: A Survey | Xun Liang, Hanyu Wang, Yezhaohui Wang, Shichao Song, Jiawei Yang, Simin Niu, Jie Hu, Dan Liu, Shunyu Yao, Feiyu Xiong, Zhiyu Li | 2024-08-22 | arXiv | https://github.com/IAAR-Shanghai/CTGSurvey | https://doi.org/10.48550/arXiv.2408.12599 |
758 | Enhanced Fine-Tuning of Lightweight Domain-Specific Q&A Model Based on Large Language Models | Shenglin Zhang, Pengtian Zhu, Minghua Ma, Jiagang Wang, Yongqian Sun, Dongwen Li, Jingyu Wang, Qianying Guo, Xiaolei Hua, Lin Zhu, Dan Pei | 2024-08-22 | ISSRE | https://github.com/Zero-Pointer/Self-Evolution | https://doi.org/10.1109/ISSREW63542.2024.00048 |
759 | Geolocation Representation from Large Language Models are Generic Enhancers for Spatio-Temporal Learning | Junlin He, Tong Nie, Wei Ma | 2024-08-22 | arXiv | https://github.com/Umaruchain/LLMGeovec | https://doi.org/10.48550/arXiv.2408.12116 |
760 | Reasoning Factual Knowledge in Structured Data with Large Language Models | Sirui Huang, Yanggan Gu, Xuming Hu, Zhonghao Li, Qing Li, Guandong Xu | 2024-08-22 | arXiv | https://github.com/EganGu/StructFact | https://doi.org/10.48550/arXiv.2408.12188 |
761 | Aligning (Medical) LLMs for (Counterfactual) Fairness | Raphael Poulain, Hamed Fayyaz, Rahmatollah Beheshti | 2024-08-22 | arXiv | https://github.com/healthylaife/FairAlignmentLLM | http://arxiv.org/abs/2408.12055v1 |
762 | Evidence-backed Fact Checking using RAG and Few-Shot In-Context Learning with LLMs | Ronit Singhal, Pransh Patwa, Parth Patwa, Aman Chadha, Amitava Das | 2024-08-22 | arXiv | https://github.com/ronit-singhal/evidence-backed-fact-checking-using-rag-and-few-shot-in-context-learning-with-llms | http://arxiv.org/abs/2408.12060v2 |
763 | MoE-LPR: Multilingual Extension of Large Language Models through Mixture-of-Experts with Language Priors Routing | Hao Zhou, Zhijun Wang, Shujian Huang, Xin Huang, Xue Han, Junlan Feng, Chao Deng, Weihua Luo, Jiajun Chen | 2024-08-21 | arXiv | https://github.com/zjwang21/MoE-LPR | https://doi.org/10.48550/arXiv.2408.11396 |
764 | Personality Alignment of Large Language Models | Minjun Zhu, Linyi Yang, Yue Zhang | 2024-08-21 | arXiv | https://github.com/zhu-minjun/PAlign | https://doi.org/10.48550/arXiv.2408.11779 |
765 | Story3D-Agent: Exploring 3D Storytelling Visualization with Large Language Models | Yuzhou Huang, Yiran Qin, Shunlin Lu, Xintao Wang, Rui Huang, Ying Shan, Ruimao Zhang | 2024-08-21 | arXiv | https://yuzhou914.github.io/Story3D-Agent/ | https://doi.org/10.48550/arXiv.2408.11801 |
766 | SimBench: A Rule-Based Multi-Turn Interaction Benchmark for Evaluating an LLM's Ability to Generate Digital Twins | Jingquan Wang, Harry Zhang, Huzaifa Mustafa Unjhawala, Peter Negrut, Shu Wang, Khailanii Slaton, Radu Serban, Jin-Long Wu, Dan Negrut | 2024-08-21 | arXiv | https://github.com/uwsbel/SimBench | http://arxiv.org/abs/2408.11987v1 |
767 | SysBench: Can Large Language Models Follow System Messages? | Yanzhao Qin, Tao Zhang, Tao Zhang, Yanjun Shen, Wenjing Luo, Haoze Sun, Yan Zhang, Yujing Qiao, Weipeng Chen, Zenan Zhou, Wentao Zhang, Bin Cui | 2024-08-20 | arXiv | https://github.com/PKU-Baichuan-MLSystemLab/SysBench | https://doi.org/10.48550/arXiv.2408.10943 |
768 | Putting People in LLMs' Shoes: Generating Better Answers via Question Rewriter | Junhao Chen, Bowen Wang, Zhouqiang jiang, Yuta Nakashima | 2024-08-20 | arXiv | https://github.com/3244we/Question-Rewriter | http://arxiv.org/abs/2408.10573v1 |
769 | FLAME: Learning to Navigate with Multimodal LLM in Urban Environments | Yunzhe Xu, Yiyuan Pan, Zhe Liu, Hesheng Wang | 2024-08-20 | arXiv | https://flame-sjtu.github.io | http://arxiv.org/abs/2408.11051v1 |
770 | Task-level Distributionally Robust Optimization for Large Language Model-based Dense Retrieval | Guangyuan Ma, Yongliang Ma, Xing Wu, Zhenpeng Su, Ming Zhou, Songlin Hu | 2024-08-20 | arXiv | https://github.com/tdro-llm/tdro | https://doi.org/10.48550/arXiv.2408.10613 |
771 | Large Language Models for Multimodal Deformable Image Registration | Mingrui Ma, Weijie Wang, Jie Ning, Jianfeng He, Nicu Sebe, Bruno Lepri | 2024-08-20 | arXiv | https://github.com/ninjannn/LLM-Morph | https://doi.org/10.48550/arXiv.2408.10703 |
772 | Predicting Rewards Alongside Tokens: Non-disruptive Parameter Insertion for Efficient Inference Intervention in Large Language Model | Chenhan Yuan, Fei Huang, Ru Peng, Keming Lu, Bowen Yu, Chang Zhou, Jingren Zhou | 2024-08-20 | EMNLP | https://github.com/chenhan97/Otter | https://aclanthology.org/2024.emnlp-main.316 |
773 | Beyond Labels: Aligning Large Language Models with Human-Like Reasoning | Muhammad Rafsan Kabir, Rafeed Mohammad Sultan, Ihsanul Haque Asif, Jawad Ibn Ahad, Fuad Rahman, Mohammad Ruhul Amin, Nabeel Mohammed, Shafin Rahman | 2024-08-20 | ICPR | https://github.com/apurba-nsu-rnd-lab/DFAR | https://doi.org/10.1007/978-3-031-78172-8_16 |
774 | LLM-Barber: Block-Aware Rebuilder for Sparsity Mask in One-Shot for Large Language Models | Yupeng Su, Ziyi Guan, Xiaoqun Liu, Tianlai Jin, Dongkuan Wu, Graziano Chesi, Ngai Wong, Hao Yu | 2024-08-20 | arXiv | https://github.com/YupengSu/LLM-Barber | https://doi.org/10.48550/arXiv.2408.10631 |
775 | CMoralEval: A Moral Evaluation Benchmark for Chinese Large Language Models | Linhao Yu, Yongqi Leng, Yufei Huang, Shang Wu, Haixin Liu, Xinmeng Ji, Jiahui Zhao, Jinwang Song, Tingting Cui, Xiaoqing Cheng, Liutao Liutao, Deyi Xiong | 2024-08-19 | ACL | https://github.com/tjunlp-lab/CMoralEval | https://doi.org/10.18653/v1/2024.findings-acl.703 |
776 | Pedestrian Attribute Recognition: A New Benchmark Dataset and A Large Language Model Augmented Framework | Jiandong Jin, Xiao Wang, Qian Zhu, Haiyang Wang, Chenglong Li | 2024-08-19 | arXiv | https://github.com/Event-AHU/OpenPAR | https://doi.org/10.48550/arXiv.2408.09720 |
777 | R2GenCSR: Retrieving Context Samples for Large Language Model based X-ray Medical Report Generation | Xiao Wang, Yuehang Li, Fuling Wang, Shiao Wang, Chuanfu Li, Bo Jiang | 2024-08-19 | arXiv | https://github.com/Event-AHU/Medical_Image_Analysis | https://doi.org/10.48550/arXiv.2408.09743 |
778 | AutoML-guided Fusion of Entity and LLM-based Representations for Document Classification | Boshko Koloski, Senja Pollak, Roberto Navigli, Blaž Škrlj | 2024-08-19 | arXiv | https://github.com/bkolosk1/bablfusion | http://arxiv.org/abs/2408.09794v2 |
779 | FFAA: Multimodal Large Language Model based Explainable Open-World Face Forgery Analysis Assistant | Zhengchao Huang, Bin Xia, Zicheng Lin, Zhun Mou, Wenming Yang, Jiaya Jia | 2024-08-19 | arXiv | https://ffaa-vl.github.io | https://doi.org/10.48550/arXiv.2408.10072 |
780 | Antidote: Post-fine-tuning Safety Alignment for Large Language Models against Harmful Fine-tuning | Tiansheng Huang, Gautam Bhattacharya, Pratik Joshi, Josh Kimball, Ling Liu | 2024-08-18 | arXiv | https://huangtiansheng.github.io/Antidote_gh_page/ | https://doi.org/10.48550/arXiv.2408.09600 |
781 | HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model | Mengkang Hu, Tianxing Chen, Qiguang Chen, Yao Mu, Wenqi Shao, Ping Luo | 2024-08-18 | arXiv | https://github.com/HiAgent2024/HiAgent | https://doi.org/10.48550/arXiv.2408.09559 |
782 | PA-LLaVA: A Large Language-Vision Assistant for Human Pathology Image Understanding | Dawei Dai, Yuanhui Zhang, Long Xu, Qianlan Yang, Xiaojing Shen, Shuyin Xia, Guoyin Wang | 2024-08-18 | arXiv | https://github.com/ddw2AIGROUP2CQUPT/PA-LLaVA | https://doi.org/10.48550/arXiv.2408.09530 |
783 | TC-RAG:Turing-Complete RAG's Case study on Medical LLM Systems | Xinke Jiang, Yue Fang, Rihong Qiu, Haoyu Zhang, Yongxin Xu, Hao Chen, Wentao Zhang, Ruizhe Zhang, Yuchen Fang, Xu Chu, Junfeng Zhao, Yasha Wang | 2024-08-17 | arXiv | https://https://github.com/Artessay/SAMA | http://arxiv.org/abs/2408.09199v1 |
784 | Can Large Language Models Improve the Adversarial Robustness of Graph Neural Networks? | Zhongjian Zhang, Xiao Wang, Huichi Zhou, Yue Yu, Mengmei Zhang, Cheng Yang, Chuan Shi | 2024-08-16 | arXiv | https://github.com/zhongjian-zhang/LLM4RGNN | https://doi.org/10.48550/arXiv.2408.08685 |
785 | MIA-Tuner: Adapting Large Language Models as Pre-training Text Detector | Wenjie Fu, Huandong Wang, Chen Gao, Guanghua Liu, Yong Li, Tao Jiang | 2024-08-16 | arXiv | https://github.com/wjfu99/MIA-Tuner | https://doi.org/10.48550/arXiv.2408.08661 |
786 | Authorship Attribution in the Era of LLMs: Problems, Methodologies, and Challenges | Baixiang Huang, Canyu Chen, Kai Shu | 2024-08-16 | arXiv | https://llm-authorship.github.io | http://arxiv.org/abs/2408.08946v1 |
787 | Fine-tuning LLMs for Autonomous Spacecraft Control: A Case Study Using Kerbal Space Program | Alejandro Carrasco, Victor Rodriguez-Fernandez, Richard Linares | 2024-08-16 | arXiv | https://github.com/ARCLab-MIT/kspdg | http://arxiv.org/abs/2408.08676v1 |
788 | Multimodal Causal Reasoning Benchmark: Challenging Vision Large Language Models to Infer Causal Links Between Siamese Images | Zhiyuan Li, Heng Wang, Dongnan Liu, Chaoyi Zhang, Ao Ma, Jieting Long, Tom Weidong Cai | 2024-08-15 | arXiv | https://github.com/Zhiyuan-Li-John/MuCR | https://doi.org/10.48550/arXiv.2408.08105 |
789 | Prefix Guidance: A Steering Wheel for Large Language Models to Defend Against Jailbreak Attacks | Jiawei Zhao, Kejiang Chen, Xiaojian Yuan, Weiming Zhang | 2024-08-15 | arXiv | https://github.com/weiyezhimeng/Prefix-Guidance | https://doi.org/10.48550/arXiv.2408.08924 |
790 | Polaris: Open-ended Interactive Robotic Manipulation via Syn2Real Visual Grounding and Large Language Models | Tianyu Wang, Haitao Lin, Junqiu Yu, Yanwei Fu | 2024-08-15 | IROS | https://star-uu-wang.github.io/Polaris/ | https://doi.org/10.1109/IROS58592.2024.10801446 |
791 | Can Large Language Models Understand Symbolic Graphics Programs? | Zeju Qiu, Weiyang Liu, Haiwen Feng, Zhen Liu, Tim Z. Xiao, Katherine M. Collins, Joshua B. Tenenbaum, Adrian Weller, Michael J. Black, Bernhard Schölkopf | 2024-08-15 | arXiv | https://sgp-bench.github.io/ | https://doi.org/10.48550/arXiv.2408.08313 |
792 | FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models | Zhongyu Zhao, Menghang Dong, Rongyu Zhang, Wenzhao Zheng, Yunpeng Zhang, Huanrui Yang, Dalong Du, Kurt Keutzer, Shanghang Zhang | 2024-08-15 | arXiv | https://github.com/zhenwuweihe/FactorLLM | https://doi.org/10.48550/arXiv.2408.11855 |
793 | ArabLegalEval: A Multitask Benchmark for Assessing Arabic Legal Knowledge in Large Language Models | Faris Hijazi, Somayah AlHarbi, Abdulaziz AlHussein, Harethah Abu Shairah, Reem Alzahrani, Hebah AlShamlan, George Turkiyyah, Omar Knio | 2024-08-15 | ArabicNLP | https://github.com/Thiqah/ArabLegalEval | https://aclanthology.org/2024.arabicnlp-1.20 |
794 | Evaluating Large Language Model based Personal Information Extraction and Countermeasures | Yupei Liu, Yuqi Jia, Jinyuan Jia, Neil Zhenqiang Gong | 2024-08-14 | arXiv | https://github.com/liu00222/LLM-Based-Personal-Profile-Extraction | https://doi.org/10.48550/arXiv.2408.07291 |
795 | Knowledge in Superposition: Unveiling the Failures of Lifelong Knowledge Editing for Large Language Models | Chenhui Hu, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao | 2024-08-14 | arXiv | https://github.com/ChenhuiHu/knowledge_in_superposition | https://doi.org/10.48550/arXiv.2408.07413 |
796 | Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities | Enneng Yang, Li Shen, Guibing Guo, Xingwei Wang, Xiaochun Cao, Jie Zhang, Dacheng Tao | 2024-08-14 | arXiv | https://github.com/EnnengYang/Awesome-Model-Merging-Methods-Theories-Applications | http://arxiv.org/abs/2408.07666v4 |
797 | LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs | Yushi Bai, Jiajie Zhang, Xin Lv, Linzhi Zheng, Siqi Zhu, Lei Hou, Yuxiao Dong, Jie Tang, Juanzi Li | 2024-08-13 | arXiv | https://github.com/THUDM/LongWriter | http://arxiv.org/abs/2408.07055v1 |
798 | Kov: Transferable and Naturalistic Black-Box LLM Attacks using Markov Decision Processes and Tree Search | Robert J. Moss | 2024-08-11 | arXiv | https://github.com/sisl/Kov.jl | http://arxiv.org/abs/2408.08899v1 |
799 | Revisiting Multi-Modal LLM Evaluation | Jian Lu, Shikhar Srivastava, Junyu Chen, Robik Shrestha, Manoj Acharya, Kushal Kafle, Christopher Kanan | 2024-08-09 | arXiv | https://kevinlujian.github.io/MLLM_Evaluations/ | http://arxiv.org/abs/2408.05334v1 |
800 | SHIELD: LLM-Driven Schema Induction for Predictive Analytics in EV Battery Supply Chain Disruptions | Zhi-Qi Cheng, Yifei Dong, Aike Shi, Wei Liu, Yuzhi Hu, Jason O'Connor, Alexander G. Hauptmann, Kate S. Whitefoot | 2024-08-09 | arXiv | https://fly1113.github.io/MFI/ | http://arxiv.org/abs/2408.05357v2 |
801 | Tabular Transfer Learning via Prompting LLMs | Jaehyun Nam, Woomin Song, Seong Hyeon Park, Jihoon Tack, Sukmin Yun, Jaehyung Kim, Kyu Hwan Oh, Jinwoo Shin | 2024-08-09 | arXiv | https://github.com/jaehyun513/P2T | http://arxiv.org/abs/2408.11063v1 |
802 | VITA: Towards Open-Source Interactive Omni Multimodal LLM | Chaoyou Fu, Haojia Lin, Zuwei Long, Yunhang Shen, Meng Zhao, Yifan Zhang, Shaoqi Dong, Xiong Wang, Di Yin, Long Ma, Xiawu Zheng, Ran He, Rongrong Ji, Yunsheng Wu, Caifeng Shan, Xing Sun | 2024-08-09 | arXiv | https://vita-home.github.io | http://arxiv.org/abs/2408.05211v2 |
803 | BA-LoRA: Bias-Alleviating Low-Rank Adaptation to Mitigate Catastrophic Inheritance in Large Language Models | Yupeng Chang, Yi Chang, Yuan Wu | 2024-08-08 | arXiv | https://github.com/cyp-jlu-ai/BA-LoRA | http://arxiv.org/abs/2408.04556v3 |
804 | ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities | Jiarui Lu, Thomas Holleis, Yizhe Zhang, Bernhard Aumayer, Feng Nan, Felix Bai, Shuang Ma, Shen Ma, Mengyu Li, Guoli Yin, Zirui Wang, Ruoming Pang | 2024-08-08 | arXiv | https://github.com/apple/ToolSandbox | http://arxiv.org/abs/2408.04682v1 |
805 | Open-domain Implicit Format Control for Large Language Model Generation | Yiqun Yao, Wenjia Ma, Xuezhi Fang, Xin Jiang, Xiang Li, Xuying Meng, Peng Han, Jing Li, Aixin Sun, Yequan Wang | 2024-08-08 | arXiv | https://github.com/cofe-ai/OIFC | https://doi.org/10.48550/arXiv.2408.04392 |
806 | Medical Graph RAG: Towards Safe Medical Large Language Model via Graph Retrieval-Augmented Generation | Junde Wu, Jiayuan Zhu, Yunli Qi, Jingkun Chen, Min Xu, Filippo Menolascina, Vicente Grau | 2024-08-08 | arXiv | https://github.com/MedicineToken/Medical-Graph-RAG | https://doi.org/10.48550/arXiv.2408.04187 |
807 | CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases | Xiangyan Liu, Bo Lan, Zhiyuan Hu, Yang Liu, Zhicheng Zhang, Fei Wang, Michael Shieh, Wenmeng Zhou | 2024-08-07 | arXiv | https://github.com/modelscope/modelscope-agent/tree/master/apps/codexgraph_agent | https://doi.org/10.48550/arXiv.2408.03910 |
808 | NACL: A General and Effective KV Cache Eviction Framework for LLMs at Inference Time | Yilong Chen, Guoxia Wang, Junyuan Shang, Shiyao Cui, Zhenyu Zhang, Tingwen Liu, Shuohuan Wang, Yu Sun, Dianhai Yu, Hua Wu | 2024-08-07 | arXiv | https://github.com/PaddlePaddle/Research/tree/master/NLP/ACL2024-NACL | http://arxiv.org/abs/2408.03675v2 |
809 | WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models | Prannaya Gupta, Le Qi Yau, Hao Han Low, I-Shiang Lee, Hugo Maximus Lim, Yu Xin Teoh, Jia Hng Koh, Dar Win Liew, Rishabh Bhardwaj, Rajat Bhardwaj, Soujanya Poria | 2024-08-07 | arXiv | https://github.com/walledai/walledeval | https://doi.org/10.48550/arXiv.2408.03837 |
810 | ULLME: A Unified Framework for Large Language Model Embeddings with Generation-Augmented Learning | Hieu Man, Nghia Trung Ngo, Franck Dernoncourt, Thien Huu Nguyen | 2024-08-06 | arXiv | https://github.com/nlp-uoregon/ullme | https://doi.org/10.48550/arXiv.2408.03402 |
811 | OpenFactCheck: A Unified Framework for Factuality Evaluation of LLMs | Hasan Iqbal, Yuxia Wang, Minghan Wang, Georgi Georgiev, Jiahui Geng, Iryna Gurevych, Preslav Nakov | 2024-08-06 | arXiv | https://github.com/mbzuai-nlp/openfactcheck | http://arxiv.org/abs/2408.11832v2 |
812 | Topic Modeling with Fine-tuning LLMs and Bag of Sentences | Johannes Schneider | 2024-08-06 | arXiv | https://github.com/JohnTailor/FT-Topic | http://arxiv.org/abs/2408.03099v1 |
813 | StructEval: Deepen and Broaden Large Language Model Assessment via Structured Evaluation | Boxi Cao, Mengjie Ren, Hongyu Lin, Xianpei Han, Feng Zhang, Junfeng Zhan, Le Sun | 2024-08-06 | ACL | https://github.com/c-box/StructEval | https://doi.org/10.18653/v1/2024.findings-acl.314 |
814 | Citekit: A Modular Toolkit for Large Language Model Citation Generation | Jiajun Shen, Tong Zhou, Suifeng Zhao, Yubo Chen, Kang Liu | 2024-08-06 | arXiv | https://github.com/SjJ1017/Citekit | https://doi.org/10.48550/arXiv.2408.04662 |
815 | UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model | Zhaowei Li, Wei Wang, Yiqing Cai, Qi Xu, Pengyu Wang, Dong Zhang, Hang Song, Botian Jiang, Zhida Huang, Tao Wang | 2024-08-05 | arXiv | https://github.com/lzw-lzw/UnifiedMLLM | https://doi.org/10.48550/arXiv.2408.02503 |
816 | Why Are My Prompts Leaked? Unraveling Prompt Extraction Threats in Customized Large Language Models | Zi Liang, Haibo Hu, Qingqing Ye, Yaxin Xiao, Haoyang Li | 2024-08-05 | arXiv | https://github.com/liangzid/PromptExtractionEval | https://doi.org/10.48550/arXiv.2408.02416 |
817 | RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation | Daniel Fleischer, Moshe Berchansky, Moshe Wasserblat, Peter Izsak | 2024-08-05 | arXiv | https://github.com/IntelLabs/RAGFoundry | http://arxiv.org/abs/2408.02545v1 |
818 | ReDel: A Toolkit for LLM-Powered Recursive Multi-Agent Systems | Andrew Zhu, Liam Dugan, Chris Callison-Burch | 2024-08-05 | arXiv | https://github.com/zhudotexe/redel | http://arxiv.org/abs/2408.02248v2 |
819 | SEAS: Self-Evolving Adversarial Safety Optimization for Large Language Models | Muxi Diao, Rumei Li, Shiyang Liu, Guogang Liao, Jingang Wang, Xunliang Cai, Weiran Xu | 2024-08-05 | arXiv | https://SEAS-LLM.github.io/ | https://doi.org/10.48550/arXiv.2408.02632 |
820 | PLUGH: A Benchmark for Spatial Understanding and Reasoning in Large Language Models | Alexey Tikhonov | 2024-08-03 | arXiv | https://github.com/altsoph/PLUGH | https://doi.org/10.48550/arXiv.2408.04648 |
821 | MALADE: Orchestration of LLM-powered Agents with Retrieval Augmented Generation for Pharmacovigilance | Jihye Choi, Nils Palumbo, Prasad Chalasani, Matthew M. Engelhard, Somesh Jha, Anivarya Kumar, David Page | 2024-08-03 | arXiv | https://github.com/jihyechoi77/malade | http://arxiv.org/abs/2408.01869v1 |
822 | CFBench: A Comprehensive Constraints-Following Benchmark for LLMs | Tao Zhang, Yanjun Shen, Wenjing Luo, Yan Zhang, Hao Liang, Tao Zhang, Fan Yang, Mingan Lin, Yujing Qiao, Weipeng Chen, Bin Cui, Wentao Zhang, Zenan Zhou | 2024-08-02 | arXiv | https://github.com/PKU-Baichuan-MLSystemLab/CFBench | http://arxiv.org/abs/2408.01122v1 |
823 | Talk Less, Interact Better: Evaluating In-context Conversational Adaptation in Multimodal LLMs | Yilun Hua, Yoav Artzi | 2024-08-02 | arXiv | https://github.com/lil-lab/ICCA | http://arxiv.org/abs/2408.01417v1 |
824 | Agentic LLM Workflows for Generating Patient-Friendly Medical Reports | Malavikha Sudarshan, Sophie Shih, Estella Yee, Alina Yang, John Zou, Cathy Chen, Quan Zhou, Leon Chen, Chinmay Singhal, George Shih | 2024-08-02 | arXiv | http://github.com/malavikhasudarshan/Multi-Agent-Patient-Letter-Generation | http://arxiv.org/abs/2408.01112v2 |
825 | Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed Inputs | Peng Ding, Jingyu Wu, Jun Kuang, Dan Ma, Xuezhi Cao, Xunliang Cai, Shi Chen, Jiajun Chen, Shujian Huang | 2024-08-02 | ACM Multimedia | https://github.com/NJUNLP/Hallu-PI | https://doi.org/10.1145/3664647.3681251 |
826 | Non Verbis, Sed Rebus: Large Language Models Are Weak Solvers of Italian Rebuses | Gabriele Sarti, Tommaso Caselli, Malvina Nissim, Arianna Bisazza | 2024-08-01 | CLiC-it | https://github.com/gsarti/verbalized-rebus | https://ceur-ws.org/Vol-3878/96_main_long.pdf |
827 | Large Language Model-driven Meta-structure Discovery in Heterogeneous Information Network | Lin Chen, Fengli Xu, Nian Li, Zhenyu Han, Meng Wang, Yong Li, Pan Hui | 2024-08 | KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining | https://github.com/LinChen-65/ReStruct | https://dl.acm.org/doi/10.1145/3637528.3671965 |
828 | Neural Retrievers are Biased Towards LLM-Generated Content | Sunhao Dai, Yuqi Zhou, Liang Pang, Weihao Liu, Xiaolin Hu, Yong Liu, Xiao Zhang, Gang Wang, Jun Xu | 2024-08 | KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining | https://github.com/KID-22/Source-Bias | https://dl.acm.org/doi/10.1145/3637528.3671882 |
829 | Lookahead: An Inference Acceleration Framework for Large Language Model with Lossless Generation Accuracy | Yao Zhao, Zhitian Xie, Chen Liang, Chenyi Zhuang, Jinjie Gu | 2024-08 | KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining | https://github.com/alipay/PainlessInferenceAcceleration | https://dl.acm.org/doi/10.1145/3637528.3671614 |
830 | RecExplainer: Aligning Large Language Models for Explaining Recommendation Models | Yuxuan Lei, Jianxun Lian, Jing Yao, Xu Huang, Defu Lian, Xing Xie | 2024-08 | KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining | https://github.com/microsoft/RecAI | https://dl.acm.org/doi/10.1145/3637528.3671802 |
831 | AutoWebGLM: A Large Language Model-based Web Navigating Agent | Hanyu Lai, Xiao Liu, Iat Long Iong, Shuntian Yao, Yuxuan Chen, Pengbo Shen, Hao Yu, Hanchen Zhang, Xiaohan Zhang, Yuxiao Dong, Jie Tang | 2024-08 | KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining | https://github.com/THUDM/AutoWebGLM | https://dl.acm.org/doi/10.1145/3637528.3671620 |
832 | A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models | Wenqi Fan, Yujuan Ding, Liangbo Ning, Shijie Wang, Hengyun Li, Dawei Yin, Tat-Seng Chua, Qing Li | 2024-08 | KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining | https://advanced-recommender-systems.github.io/RAG-Meets-LLMs/ | https://dl.acm.org/doi/10.1145/3637528.3671470 |
833 | A Survey of Large Language Models for Graphs | Xubin Ren, Jiabin Tang, Dawei Yin, Nitesh V. Chawla, Chao Huang | 2024-08 | KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining | https://github.com/HKUDS/Awesome-LLM4Graph-Papers | https://dl.acm.org/doi/10.1145/3637528.3671460 |
834 | ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models | Mingrui Wu, Xinyue Cai, Jiayi Ji, Jiale Li, Oucheng Huang, Gen Luo, Hao Fei, Guannan Jiang, Xiaoshuai Sun, Rongrong Ji | 2024-07-31 | arXiv | https://github.com/mrwu-mac/ControlMLLM | https://doi.org/10.48550/arXiv.2407.21534 |
835 | Automated Review Generation Method Based on Large Language Models | Shican Wu, Xiao Ma, Dehui Luo, Lulu Li, Xiangcheng Shi, Xin Chang, Xiaoyun Lin, Ran Luo, Chunlei Pei, Changyin Du, Zhi-Jian Zhao, Jinlong Gong | 2024-07-30 | arXiv | https://github.com/TJU-ECAT-AI/AutomaticReviewGeneration | https://doi.org/10.48550/arXiv.2407.20906 |
836 | CollectiveSFT: Scaling Large Language Models for Chinese Medical Benchmark with Collective Instructions in Healthcare | Jingwei Zhu, Minghuan Tan, Min Yang, Ruixue Li, Hamid Alinejad-Rokny | 2024-07-29 | arXiv | https://github.com/CAS-SIAT-XinHai/CollectiveSFT | https://doi.org/10.48550/arXiv.2407.19705 |
837 | Can Editing LLMs Inject Harm? | Canyu Chen, Baixiang Huang, Zekun Li, Zhaorun Chen, Shiyang Lai, Xiongxiao Xu, Jia-Chen Gu, Jindong Gu, Huaxiu Yao, Chaowei Xiao, Xifeng Yan, William Yang Wang, Philip Torr, Dawn Song, Kai Shu | 2024-07-29 | arXiv | https://llm-editing.github.io | http://arxiv.org/abs/2407.20224v3 |
838 | rLLM: Relational Table Learning with LLMs | Weichen Li, Xiaotong Huang, Jianwu Zheng, Zheng Wang, Chaokun Wang, Li Pan, Jianhua Li | 2024-07-29 | arXiv | https://github.com/rllm-project/rllm | http://arxiv.org/abs/2407.20157v1 |
839 | A Role-specific Guided Large Language Model for Ophthalmic Consultation Based on Stylistic Differentiation | Laiyi Fu, Binbin Fan, Hongkai Du, Yanxiang Feng, Chunhua Li, Huping Song | 2024-07-26 | arXiv | https://github.com/sperfu/EyeDoc | https://doi.org/10.48550/arXiv.2407.18483 |
840 | Exploring Bengali Religious Dialect Biases in Large Language Models with Evaluation Perspectives | Azmine Toushik Wasi, Raima Islam, Mst Rafia Islam, Taki Hasan Rafi, Dong-Kyu Chae | 2024-07-25 | arXiv | https://heal-workshop.github.io/#:~:text=Exploring%20Bengali%20Religious%20Dialect%20Biases%20in%20Large%20Language%20Models%20with%20Evaluation%20Perspectives | https://doi.org/10.48550/arXiv.2407.18376 |
841 | Scalify: scale propagation for efficient low-precision LLM training | Paul Balança, Sam Hosegood, Carlo Luschi, Andrew Fitzgibbon | 2024-07-24 | arXiv | https://github.com/graphcore-research/jax-scalify | http://arxiv.org/abs/2407.17353v1 |
842 | Boosting Large Language Models with Socratic Method for Conversational Mathematics Teaching | Yuyang Ding, Hanglei Hu, Jie Zhou, Qin Chen, Bo Jiang, Liang He | 2024-07-24 | CIKM | https://github.com/ECNU-ICALK/SocraticMath | https://doi.org/10.1145/3627673.3679881 |
843 | Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance | Ao Shen, Qiang Wang, Zhiquan Lai, Xionglve Li, Dong-sheng Li | 2024-07-24 | arXiv | https://github.com/xiaocaigou/qbaraqahira | https://doi.org/10.48550/arXiv.2407.17029 |
844 | Figure it Out: Analyzing-based Jailbreak Attack on Large Language Models | Shi Lin, Rongchang Li, Xun Wang, Changting Lin, Wenpeng Xing, Meng Han | 2024-07-23 | arXiv | https://github.com/theshi-1128/ABJ-Attack | https://doi.org/10.48550/arXiv.2407.16205 |
845 | INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model | Yiwei Ma, Zhibin Wang, Xiaoshuai Sun, Weihuang Lin, Qiang Zhou, Jiayi Ji, Rongrong Ji | 2024-07-23 | arXiv | https://github.com/WeihuangLin/INF-LLaVA | https://doi.org/10.48550/arXiv.2407.16198 |
846 | UniMEL: A Unified Framework for Multimodal Entity Linking with Large Language Models | Qi Liu, Yongyi He, Defu Lian, Zhi Zheng, Tong Xu, Liu Che, Enhong Chen | 2024-07-23 | arXiv | https://github.com/Javkonline/UniMEL | https://doi.org/10.48550/arXiv.2407.16160 |
847 | Enhancing LLM's Cognition via Structurization | Kai Liu, Zhihang Fu, Chao Chen, Wei Zhang, Rongxin Jiang, Fan Zhou, Yaowu Chen, Yue Wu, Jieping Ye | 2024-07-23 | arXiv | https://github.com/alibaba/struxgpt | http://arxiv.org/abs/2407.16434v2 |
848 | Rome was Not Built in a Single Step: Hierarchical Prompting for LLM-based Chip Design | Andre Nakkab, Sai Qian Zhang, Ramesh Karri, Siddharth Garg | 2024-07-23 | arXiv | https://github.com/ajn313/ROME-LLM | http://arxiv.org/abs/2407.18276v3 |
849 | Structure-aware Domain Knowledge Injection for Large Language Models | Kai Liu, Ze Chen, Zhihang Fu, Rongxin Jiang, Fan Zhou, Yaowu Chen, Yue Wu, Jieping Ye | 2024-07-23 | arXiv | https://github.com/alibaba/struxgpt | http://arxiv.org/abs/2407.16724v2 |
850 | Counter Turing Test ( |
Ishan Kavathekar, Anku Rani, Ashmit Chamoli, Ponnurangam Kumaraguru, Amit Sheth, Amitava Das | 2024-07-22 | OpenReview | https://github.com/ishank31/Counter_Turing_Test | http://arxiv.org/abs/2407.15694v2 |
851 | SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models | Mingze Xu, Mingfei Gao, Zhe Gan, Hong-You Chen, Zhengfeng Lai, Haiming Gang, Kai Kang, Afshin Dehghan | 2024-07-22 | arXiv | https://github.com/apple/ml-slowfast-llava | https://doi.org/10.48550/arXiv.2407.15841 |
852 | LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language Models | Xi Chen, Songyang Zhang, Qibing Bai, Kai Chen, Satoshi Nakamura | 2024-07-22 | ACL | https://github.com/openaudiolab/LLaST | https://doi.org/10.18653/v1/2024.findings-acl.416 |
853 | Knowledge Acquisition Disentanglement for Knowledge-based Visual Question Answering with Large Language Models | Wenbin An, Feng Tian, Jiahao Nie, Wenkai Shi, Haonan Lin, Yan Chen, Qianying Wang, Yaqiang Wu, Guang Dai, Ping Chen | 2024-07-22 | arXiv | https://github.com/Lackel/DKA | https://doi.org/10.48550/arXiv.2407.15346 |
854 | Do Large Language Models Have Compositional Ability? An Investigation into Limitations and Scalability | Zhuoyan Xu, Zhenmei Shi, Yingyu Liang | 2024-07-22 | arXiv | https://github.com/OliverXUZY/LLM_Compose | https://doi.org/10.48550/arXiv.2407.15720 |
855 | Large Language Model for Verilog Generation with Golden Code Feedback | Ning Wang, Bingkun Yao, Jie Zhou, Xi Wang, Zhe Jiang, Nan Guan | 2024-07-21 | arXiv | https://github.com/CatIIIIIIII/veriseek | https://doi.org/10.48550/arXiv.2407.18271 |
856 | Navigation Instruction Generation with BEV Perception and Large Language Models | Sheng Fan, Rui Liu, Wenguan Wang, Yi Yang | 2024-07-21 | ECCV | https://github.com/FanScy/BEVInstructor | https://doi.org/10.1007/978-3-031-72670-5_21 |
857 | BIGbench: A Unified Benchmark for Social Bias in Text-to-Image Generative Models Based on Multi-modal LLM | Hanjun Luo, Haoyu Huang, Ziye Deng, Xuecheng Liu, Ruizhe Chen, Zuozhu Liu | 2024-07-21 | arXiv | https://github.com/BIGbench2024/BIGbench2024/ | http://arxiv.org/abs/2407.15240v3 |
858 | No Size Fits All: The Perils and Pitfalls of Leveraging LLMs Vary with Company Size | Ashok Urlana, Charaka Vinayak Kumar, Bala Mallikarjunarao Garlapati, Ajeet Kumar Singh, Rahul Mishra | 2024-07-21 | arXiv | https://github.com/vinayakcse/IndustrialLLMsPapers | http://arxiv.org/abs/2408.01444v2 |
859 | Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval | Yiyang Jiang, Wengyu Zhang, Xulu Zhang, Xiaoyong Wei, Chang Wen Chen, Qing Li | 2024-07-21 | arXiv | https://github.com/fletcherjiang/LLMEPET | http://arxiv.org/abs/2407.15051v3 |
860 | SynCPKL: Harnessing LLMs to Generate Synthetic Data for Commonsense Persona Knowledge Linking | Kuan-Yen Lin | 2024-07-21 | arXiv | https://github.com/irislin1006/CPKL | http://arxiv.org/abs/2407.15281v1 |
861 | On the Design and Analysis of LLM-Based Algorithms | Yanxi Chen, Yaliang Li, Bolin Ding, Jingren Zhou | 2024-07-20 | arXiv | https://github.com/modelscope/agentscope/tree/main/examples/paper_llm_based_algorithm | http://arxiv.org/abs/2407.14788v2 |
862 | Enhancing Zero-shot Audio Classification using Sound Attribute Knowledge from Large Language Models | Xuenan Xu, Pingyue Zhang, Ming Yan, Ji Zhang, Mengyue Wu | 2024-07-19 | arXiv | https://www.github.com/wsntxxn/AttrEnhZsAc | https://doi.org/10.48550/arXiv.2407.14355 |
863 | Beyond Code Generation: Assessing Code LLM Maturity with Postconditions | Fusen He, Juan Zhai, Minxue Pan | 2024-07-19 | arXiv | https://github.com/MatureModel/PostcondGen | http://arxiv.org/abs/2407.14118v1 |
864 | Internal Consistency and Self-Feedback in Large Language Models: A Survey | Xun Liang, Shichao Song, Zifan Zheng, Hanyu Wang, Qingchen Yu, Xunkai Li, Rong-Hua Li, Yi Wang, Zhonghao Wang, Feiyu Xiong, Zhiyu Li | 2024-07-19 | arXiv | https://github.com/IAAR-Shanghai/ICSFSurvey | https://doi.org/10.48550/arXiv.2407.14507 |
865 | SegPoint: Segment Any Point Cloud via Large Language Model | Shuting He, Henghui Ding, Xudong Jiang, Bihan Wen | 2024-07-18 | arXiv | https://heshuting555.github.io/SegPoint | https://doi.org/10.48550/arXiv.2407.13761 |
866 | ViLLa: Video Reasoning Segmentation with Large Language Model | Rongkun Zheng, Lu Qi, Xi Chen, Yi Wang, Kun Wang, Yu Qiao, Hengshuang Zhao | 2024-07-18 | arXiv | https://github.com/rkzheng99/ViLLa | https://doi.org/10.48550/arXiv.2407.14500 |
867 | E5-V: Universal Embeddings with Multimodal Large Language Models | Ting Jiang, Minghui Song, Zihan Zhang, Haizhen Huang, Weiwei Deng, Feng Sun, Qi Zhang, Deqing Wang, Fuzhen Zhuang | 2024-07-17 | arXiv | https://github.com/kongds/E5-V | https://doi.org/10.48550/arXiv.2407.12580 |
868 | Leveraging Environment Interaction for Automated PDDL Generation and Planning with Large Language Models | Sadegh Mahdavi, Raquel Aoki, Keyi Tang, Yanshuai Cao | 2024-07-17 | arXiv | https://github.com/BorealisAI/llm-pddl-planning | https://doi.org/10.48550/arXiv.2407.12979 |
869 | MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models | Leyang Shen, Gongwei Chen, Rui Shao, Weili Guan, Liqiang Nie | 2024-07-17 | arXiv | https://github.com/JiuTian-VL/MoME | https://doi.org/10.48550/arXiv.2407.12709 |
870 | Patch-Level Training for Large Language Models | Chenze Shao, Fandong Meng, Jie Zhou | 2024-07-17 | arXiv | https://github.com/shaochenze/PatchTrain | https://doi.org/10.48550/arXiv.2407.12665 |
871 | VISA: Reasoning Video Object Segmentation via Large Language Models | Cilin Yan, Haochen Wang, Shilin Yan, Xiaolong Jiang, Yao Hu, Guoliang Kang, Weidi Xie, Efstratios Gavves | 2024-07-16 | ECCV | https://github.com/cilinyan/VISA | https://doi.org/10.1007/978-3-031-72633-0_6 |
872 | NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window? | Mo Li, Songyang Zhang, Yunxin Liu, Kai Chen | 2024-07-16 | arXiv | https://github.com/open-compass/opencompass | http://arxiv.org/abs/2407.11963v1 |
873 | Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language Models | Jiasheng Zheng, Boxi Cao, Zhengzhao Ma, Ruotong Pan, Hongyu Lin, Yaojie Lu, Xianpei Han, Le Sun | 2024-07-16 | arXiv | https://github.com/jszheng21/RACE | https://doi.org/10.48550/arXiv.2407.11470 |
874 | Robust Utility-Preserving Text Anonymization Based on Large Language Models | Tianyu Yang, Xiaodan Zhu, Iryna Gurevych | 2024-07-16 | arXiv | https://github.com/UKPLab/arxiv2024-rupta | https://doi.org/10.48550/arXiv.2407.11770 |
875 | LRQ: Optimizing Post-Training Quantization for Large Language Models by Learning Low-Rank Weight-Scaling Matrices | Jung Hyun Lee, Jeonghoon Kim, June Yong Yang, Se Jung Kwon, Eunho Yang, Kang Min Yoo, Dongsoo Lee | 2024-07-16 | arXiv | https://github.com/onliwad101/FlexRound_LRQ | https://doi.org/10.48550/arXiv.2407.11534 |
876 | Think-on-Graph 2.0: Deep and Interpretable Large Language Model Reasoning with Knowledge Graph-guided Retrieval | Shengjie Ma, Chengjin Xu, Xuhui Jiang, Muzhi Li, Huaren Qu, Cehao Yang, Jiaxin Mao, Jian Guo | 2024-07-15 | arXiv | https://github.com/IDEA-FinAI/ToG-2 | https://doi.org/10.48550/arXiv.2407.10805 |
877 | When AI Meets Finance (StockAgent): Large Language Model-based Stock Trading in Simulated Real-world Environments | Chong Zhang, Xinyi Liu, Mingyu Jin, Zhongmou Zhang, Lingyao Li, Zhenting Wang, Wenyue Hua, Dong Shu, Suiyuan Zhu, Xiaobo Jin, Sujian Li, Mengnan Du, Yongfeng Zhang | 2024-07-15 | arXiv | https://github.com/MingyuJ666/Stockagent | https://doi.org/10.48550/arXiv.2407.18957 |
878 | VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation | Bocheng Zou, Mu Cai, Jianrui Zhang, Yong Jae Lee | 2024-07-15 | EMNLP | https://vgbench.github.io | https://aclanthology.org/2024.emnlp-main.213 |
879 | Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models | Qingcheng Zeng, Mingyu Jin, Qinkai Yu, Zhenting Wang, Wenyue Hua, Zihao Zhou, Guangyan Sun, Yanda Meng, Shiqing Ma, Qifan Wang, Felix Juefei-Xu, Kaize Ding, Fan Yang, Ruixiang Tang, Yongfeng Zhang | 2024-07-15 | arXiv | https://github.com/qcznlp/uncertainty_attack | https://doi.org/10.48550/arXiv.2407.11282 |
880 | By My Eyes: Grounding Multimodal Large Language Models with Sensor Data via Visual Prompting | Hyungjun Yoon, Biniyam Aschalew Tolera, Taesik Gong, Kimin Lee, Sung-Ju Lee | 2024-07-15 | EMNLP | https://github.com/diamond264/ByMyEyes | https://aclanthology.org/2024.emnlp-main.133 |
881 | Prompt Selection Matters: Enhancing Text Annotations for Social Sciences with Large Language Models | Louis Abraham, Charles Arnal, Antoine Marie | 2024-07-15 | arXiv | https://prompt-ultra.github.io/ | https://doi.org/10.48550/arXiv.2407.10645 |
882 | Evaluating Large Language Models with fmeval | Pola Schwöbel, Luca Franceschi, Muhammad Bilal Zafar, Keerthan Vasist, Aman Malhotra, Tomer Shenhar, Pinal Tailor, Pinar Yilmaz, Michael Diamond, Michele Donini | 2024-07-15 | arXiv | https://github.com/aws/fmeval | https://doi.org/10.48550/arXiv.2407.12872 |
883 | MMM: Multilingual Mutual Reinforcement Effect Mix Datasets & Test with Open-domain Information Extraction Large Language Models | Chengguang Gan, Sunbowen Lee, Qingyu Yin, Xinyang He, Hanjun Wei, Yunhao Liang, Younghun Lim, Shijian Wang, Hexiang Huang, Qinghao Zhang, Shiwen Ni, Tatsunori Mori | 2024-07-15 | arXiv | https://ganchengguang.github.io/MRE/ | https://doi.org/10.48550/arXiv.2407.10953 |
884 | IDEAL: Leveraging Infinite and Dynamic Characterizations of Large Language Models for Query-focused Summarization | Jie Cao, Dian Jiao, Qiang Yan, Wenqiao Zhang, Siliang Tang, Yueting Zhuang | 2024-07-15 | arXiv | https://github.com/DCDmllm/IDEAL_Summary | https://doi.org/10.48550/arXiv.2407.10486 |
885 | ChatLogic: Integrating Logic Programming with Large Language Models for Multi-Step Reasoning | Zhongsheng Wang, Jiamou Liu, Qiming Bao, Hongfei Rong, Jingfeng Zhang | 2024-07-14 | IJCNN | https://github.com/Strong-AI-Lab/ChatLogic | https://doi.org/10.1109/IJCNN60899.2024.10650138 |
886 | Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models | Yuchen Yang, Kwonjoon Lee, Behzad Dariush, Yinzhi Cao, Shao-Yuan Lo | 2024-07-14 | ECCV | https://github.com/Yuchen413/AnomalyRuler | https://doi.org/10.1007/978-3-031-73004-7_18 |
887 | Refusing Safe Prompts for Multi-modal Large Language Models | Zedian Shao, Hongbin Liu, Yuepeng Hu, Neil Zhenqiang Gong | 2024-07-12 | arXiv | https://github.com/Sadcardation/MLLM-Refusal | https://doi.org/10.48550/arXiv.2407.09050 |
888 | Stepwise Verification and Remediation of Student Reasoning Errors with Large Language Model Tutors | Nico Daheim, Jakub Macina, Manu Kapur, Iryna Gurevych, Mrinmaya Sachan | 2024-07-12 | EMNLP | https://github.com/eth-lre/verify-then-generate | https://aclanthology.org/2024.emnlp-main.478 |
889 | Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection | Xingyu Peng, Yan Bai, Chen Gao, Lirong Yang, Fei Xia, Beipeng Mu, Xiaofei Wang, Si Liu | 2024-07-12 | arXiv | https://github.com/GradiusTwinbee/GLIS | http://arxiv.org/abs/2407.08931v1 |
890 | Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training | Youliang Yuan, Wenxiang Jiao, Wenxuan Wang, Jen-tse Huang, Jiahao Xu, Tian Liang, Pinjia He, Zhaopeng Tu | 2024-07-12 | arXiv | https://github.com/RobustNLP/DeRTa | http://arxiv.org/abs/2407.09121v1 |
891 | Incorporating Large Language Models into Production Systems for Enhanced Task Automation and Flexibility | Yuchen Xia, Jize Zhang, Nasser Jazdi, Michael Weyrich | 2024-07-11 | arXiv | https://github.com/YuchenXia/GPT4IndustrialAutomation | https://doi.org/10.48550/arXiv.2407.08550 |
892 | SEED-Story: Multimodal Long Story Generation with Large Language Model | Shuai Yang, Yuying Ge, Yang Li, Yukang Chen, Yixiao Ge, Ying Shan, Yingcong Chen | 2024-07-11 | arXiv | https://github.com/TencentARC/SEED-Story | https://doi.org/10.48550/arXiv.2407.08683 |
893 | The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective | Zhen Qin, Daoyuan Chen, Wenhao Zhang, Liuyi Yao, Yilun Huang, Bolin Ding, Yaliang Li, Shuiguang Deng | 2024-07-11 | arXiv | https://github.com/modelscope/data-juicer/blob/main/docs/awesome_llm_data.md | https://doi.org/10.48550/arXiv.2407.08583 |
894 | Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing | Huanqian Wang, Yang Yue, Rui Lu, Jingxin Shi, Andrew Zhao, Shenzhi Wang, Shiji Song, Gao Huang | 2024-07-11 | arXiv | https://github.com/lucywang720/model-surgery | http://arxiv.org/abs/2407.08770v1 |
895 | EfficientQAT: Efficient Quantization-Aware Training for Large Language Models | Mengzhao Chen, Wenqi Shao, Peng Xu, Jiahao Wang, Peng Gao, Kaipeng Zhang, Yu Qiao, Ping Luo | 2024-07-10 | arXiv | https://github.com/OpenGVLab/EfficientQAT | https://doi.org/10.48550/arXiv.2407.11062 |
896 | GLBench: A Comprehensive Benchmark for Graph with Large Language Models | Yuhan Li, Peisong Wang, Xiao Zhu, Aochuan Chen, Haiyun Jiang, Deng Cai, Victor Wai Kin Chan, Jia Li | 2024-07-10 | arXiv | https://github.com/NineAbyss/GLBench | https://doi.org/10.48550/arXiv.2407.07457 |
897 | Inference Performance Optimization for Large Language Models on CPUs | Pujiang He, Shan Zhou, Wenhuan Huang, Changqing Li, Duyi Wang, Bin Guo, Chen Meng, Sheng Gui, Weifei Yu, Yi Xie | 2024-07-10 | arXiv | https://github.com/intel/xFasterTransformer | https://doi.org/10.48550/arXiv.2407.07304 |
898 | RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization | Xijie Huang, Zechun Liu, Shih-Yang Liu, Kwang-Ting Cheng | 2024-07-10 | arXiv | https://github.com/HuangOwen/RoLoRA | http://arxiv.org/abs/2407.08044v2 |
899 | FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation | Liqun Ma, Mingjie Sun, Zhiqiang Shen | 2024-07-09 | arXiv | https://github.com/LiqunMa/FBI-LLM | http://arxiv.org/abs/2407.07093v1 |
900 | Etalon: Holistic Performance Evaluation Framework for LLM Inference Systems | Amey Agrawal, Anmol Agarwal, Nitin Kedia, Jayashree Mohan, Souvik Kundu, Nipun Kwatra, Ramachandran Ramjee, Alexey Tumanov | 2024-07-09 | arXiv | https://github.com/project-etalon/etalon | http://arxiv.org/abs/2407.07000v2 |
901 | Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps | Yung-Sung Chuang, Linlu Qiu, Cheng-Yu Hsieh, Ranjay Krishna, Yoon Kim, James R. Glass | 2024-07-09 | EMNLP | https://github.com/voidism/Lookback-Lens | https://aclanthology.org/2024.emnlp-main.84 |
902 | DebUnc: Mitigating Hallucinations in Large Language Model Agent Communication with Uncertainty Estimations | Luke Yoffe, Alfonso Amayuelas, William Yang Wang | 2024-07-08 | arXiv | https://github.com/lukeyoffe/debunc | https://doi.org/10.48550/arXiv.2407.06426 |
903 | LLMBox: A Comprehensive Library for Large Language Models | Tianyi Tang, Yiwen Hu, Bingqian Li, Wenyang Luo, Zijing Qin, Haoxiang Sun, Jiapeng Wang, Shiyi Xu, Xiaoxue Cheng, Geyang Guo, Han Peng, Bowen Zheng, Yiru Tang, Yingqian Min, Yushuo Chen, Jie Chen, Yuanqian Zhao, Luran Ding, Yuhao Wang, Zican Dong, Chunxuan Xia, Junyi Li, Kun Zhou, Wayne Xin Zhao, Ji-Rong Wen | 2024-07-08 | arXiv | https://github.com/RUCAIBox/LLMBox | https://doi.org/10.48550/arXiv.2407.05563 |
904 | iLLM-TSC: Integration reinforcement learning and large language model for traffic signal control policy improvement | Aoyu Pang, Maonan Wang, Man-On Pun, Chung Shue Chen, Xi Xiong | 2024-07-08 | arXiv | https://github.com/Traffic-Alpha/iLLM-TSC | https://doi.org/10.48550/arXiv.2407.06025 |
905 | GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing | Zhenyu Wang, Aoxue Li, Zhenguo Li, Xihui Liu | 2024-07-08 | arXiv | https://zhenyuw16.github.io/GenArtist_page | http://arxiv.org/abs/2407.05600v2 |
906 | KG-FPQ: Evaluating Factuality Hallucination in LLMs with Knowledge Graph-based False Premise Questions | Yanxu Zhu, Jinlin Xiao, Yuhang Wang, Jitao Sang | 2024-07-08 | arXiv | https://github.com/yanxuzhu/KG-FPQ | http://arxiv.org/abs/2407.05868v2 |
907 | LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages | Yinquan Lu, Wenhao Zhu, Lei Li, Yu Qiao, Fei Yuan | 2024-07-08 | arXiv | https://github.com/CONE-MT/LLaMAX/ | http://arxiv.org/abs/2407.05975v2 |
908 | PsycoLLM: Enhancing LLM for Psychological Understanding and Evaluation | Jinpeng Hu, Tengteng Dong, Luo Gang, Hui Ma, Peng Zou, Xiao Sun, Dan Guo, Xun Yang, Meng Wang | 2024-07-08 | arXiv | https://github.com/MACLAB-HFUT/PsycoLLM | http://arxiv.org/abs/2407.05721v3 |
909 | LogicVista: Multimodal LLM Logical Reasoning Benchmark in Visual Contexts | Yijia Xiao, Edward Sun, Tianyu Liu, Wei Wang | 2024-07-06 | arXiv | https://github.com/Yijia-Xiao/LogicVista | http://arxiv.org/abs/2407.04973v1 |
910 | Beyond Perplexity: Multi-dimensional Safety Evaluation of LLM Compression | Zhichao Xu, Ashim Gupta, Tao Li, Oliver Bentham, Vivek Srikumar | 2024-07-06 | arXiv | https://github.com/zhichaoxu-shufe/Beyond-Perplexity-Compression-Safety-Eval | http://arxiv.org/abs/2407.04965v3 |
911 | ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models | Yuzhe Gu, Ziwei Ji, Wenwei Zhang, Chengqi Lyu, Dahua Lin, Kai Chen | 2024-07-05 | arXiv | https://github.com/open-compass/ANAH | https://doi.org/10.48550/arXiv.2407.04693 |
912 | AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents | Petr Anokhin, Nikita Semenov, Artyom Sorokin, Dmitry Evseev, Mikhail Burtsev, Evgeny Burnaev | 2024-07-05 | arXiv | https://github.com/AIRI-Institute/AriGraph | http://arxiv.org/abs/2407.04363v2 |
913 | Automating Venture Capital: Founder assessment using LLM-powered segmentation, feature engineering and automated labeling techniques | Ekin Ozince, Yiğit Ihlamur | 2024-07-05 | arXiv | https://github.com/velapartners/moneyball-LLM-based-founder-features | http://arxiv.org/abs/2407.04885v1 |
914 | BiosERC: Integrating Biography Speakers Supported by LLMs for ERC Tasks | Jieying Xue, Minh Phuong Nguyen, Blake Matheny, Le Minh Nguyen | 2024-07-05 | arXiv | https://github.com/yingjie7/BiosERC | http://arxiv.org/abs/2407.04279v1 |
915 | Towards Enhancing Coherence in Extractive Summarization: Dataset and Experiments with LLMs | Mihir Parmar, Hanieh Deilamsalehy, Franck Dernoncourt, Seunghyun Yoon, Ryan A. Rossi, Trung Bui | 2024-07-05 | arXiv | https://github.com/Mihir3009/Extract-AI | http://arxiv.org/abs/2407.04855v1 |
916 | Waterfall: Framework for Robust and Scalable Text Watermarking and Provenance for LLMs | Gregory Kang Ruey Lau, Xinyuan Niu, Hieu Dao, Jiangwei Chen, Chuan-Sheng Foo, Bryan Kian Hsiang Low | 2024-07-05 | arXiv | https://github.com/aoi3142/Waterfall | http://arxiv.org/abs/2407.04411v2 |
917 | When LLMs Play the Telephone Game: Cumulative Changes and Attractors in Iterated Cultural Transmissions | Jérémy Perez, Grgur Kovač, Corentin Léger, Cédric Colas, Gaia Molinaro, Maxime Derex, Pierre-Yves Oudeyer, Clément Moulin-Frier | 2024-07-05 | arXiv | https://github.com/jeremyperez2/TelephoneGameLLM | http://arxiv.org/abs/2407.04503v2 |
918 | AutoBench: Automatic Testbench Generation and Evaluation Using LLMs for HDL Design | Ruidi Qiu, Grace Li Zhang, Rolf Drechsler, Ulf Schlichtmann, Bing Li | 2024-07-04 | arXiv | https://github.com/AutoBench/AutoBench | http://arxiv.org/abs/2407.03891v2 |
919 | Q-Adapter: Customizing Pre-trained LLMs to New Preferences with Forgetting Mitigation | Yi-Chen Li, Fuxiang Zhang, Wenjie Qiu, Lei Yuan, Chengxing Jia, Zongzhang Zhang, Yang Yu, Bo An | 2024-07-04 | arXiv | https://github.com/mansicer/Q-Adapter | http://arxiv.org/abs/2407.03856v3 |
920 | TongGu: Mastering Classical Chinese Understanding with Knowledge-Grounded Large Language Models | Jiahuan Cao, Dezhi Peng, Peirong Zhang, Yongxin Shi, Yang Liu, Kai Ding, Lianwen Jin | 2024-07-04 | EMNLP | https://github.com/SCUT-DLVCLab/TongGu-LLM | https://aclanthology.org/2024.findings-emnlp.243 |
921 | The Price of Prompting: Profiling Energy Use in Large Language Models Inference | Erik Johannes Husom, Arda Goknil, Lwin Khin Shar, Sagar Sen | 2024-07-04 | arXiv | https://github.com/ejhusom/MELODI | https://doi.org/10.48550/arXiv.2407.16893 |
922 | NutriBench: A Dataset for Evaluating Large Language Models in Carbohydrate Estimation from Meal Descriptions | Andong Hua, Mehak Preet Dhaliwal, Ryan Burke, Laya Pullela, Yao Qin | 2024-07-04 | arXiv | https://mehak126.github.io/nutribench.html | https://doi.org/10.48550/arXiv.2407.12843 |
923 | CFinBench: A Comprehensive Chinese Financial Benchmark for Large Language Models | Ying Nie, Binwei Yan, Tianyu Guo, Hao Liu, Haoyu Wang, Wei He, Binfan Zheng, Weihao Wang, Qiang Li, Weijian Sun, Yunhe Wang, Dacheng Tao | 2024-07-02 | arXiv | https://cfinbench.github.io/ | https://doi.org/10.48550/arXiv.2407.02301 |
924 | Extracting and Encoding: Leveraging Large Language Models and Medical Knowledge to Enhance Radiological Text Representation | Pablo Messina, René Vidal, Denis Parra, Alvaro Soto, Vladimir Araujo | 2024-07-02 | ACL | https://github.com/PabloMessina/CXR-Fact-Encoder | https://doi.org/10.18653/v1/2024.findings-acl.236 |
925 | Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models | Zihan Wang, Deli Chen, Damai Dai, Runxin Xu, Zhuoshu Li, Y. Wu | 2024-07-02 | arXiv | https://github.com/deepseek-ai/ESFT | https://doi.org/10.48550/arXiv.2407.01906 |
926 | To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models | Bozhong Tian, Xiaozhuan Liang, Siyuan Cheng, Qingbin Liu, Mengru Wang, Dianbo Sui, Xi Chen, Huajun Chen, Ningyu Zhang | 2024-07-02 | EMNLP | https://github.com/zjunlp/KnowUnDo | https://aclanthology.org/2024.findings-emnlp.82 |
927 | Breaking Bias, Building Bridges: Evaluation and Mitigation of Social Biases in LLMs via Contact Hypothesis | Chahat Raj, Anjishnu Mukherjee, Aylin Caliskan, Antonios Anastasopoulos, Ziwei Zhu | 2024-07-02 | arXiv | https://github.com/chahatraj/breakingbias | http://arxiv.org/abs/2407.02030v1 |
928 | Enabling Discriminative Reasoning in LLMs for Legal Judgment Prediction | Chenlong Deng, Kelong Mao, Yuyao Zhang, Zhicheng Dou | 2024-07-02 | arXiv | https://github.com/ChenlongDeng/ADAPT | http://arxiv.org/abs/2407.01964v4 |
929 | TokenPacker: Efficient Visual Projector for Multimodal LLM | Wentong Li, Yuqian Yuan, Jian Liu, Dongqi Tang, Song Wang, Jie Qin, Jianke Zhu, Lei Zhang | 2024-07-02 | arXiv | https://github.com/CircleRadon/TokenPacker | http://arxiv.org/abs/2407.02392v4 |
930 | MalAlgoQA: Pedagogical Evaluation of Counterfactual Reasoning in Large Language Models and Implications for AI in Education | Shashank Sonkar, Naiming Liu, Myco Le, Richard G. Baraniuk | 2024-07-01 | EMNLP | https://github.com/luffycodes/MalAlgoQA-Dataset | https://aclanthology.org/2024.findings-emnlp.913 |
931 | MIRAI: Evaluating LLM Agents for Event Forecasting | Chenchen Ye, Ziniu Hu, Yihe Deng, Zijie Huang, Mingyu Derek Ma, Yanqiao Zhu, Wei Wang | 2024-07-01 | arXiv | https://mirai-llm.github.io/ | http://arxiv.org/abs/2407.01231v1 |
932 | FineSurE: Fine-grained Summarization Evaluation using LLMs | Hwanjun Song, Hang Su, Igor Shalyminov, Jason Cai, Saab Mansour | 2024-07-01 | arXiv | https://github.com/DISL-Lab/FineSurE-ACL24 | http://arxiv.org/abs/2407.00908v3 |
933 | SplitLoRA: A Split Parameter-Efficient Fine-Tuning Framework for Large Language Models | Zheng Lin, Xuanjie Hu, Yuxin Zhang, Zhe Chen, Zihan Fang, Xianhao Chen, Ang Li, Praneeth Vepakomma, Yue Gao | 2024-07-01 | arXiv | https://fduinc.github.io/splitlora/ | https://doi.org/10.48550/arXiv.2407.00952 |
934 | DiscoveryBench: Towards Data-Driven Discovery with Large Language Models | Bodhisattwa Prasad Majumder, Harshit Surana, Dhruv Agarwal, Bhavana Dalvi Mishra, Abhijeetsingh Meena, Aryan Prakhar, Tirth Vora, Tushar Khot, Ashish Sabharwal, Peter Clark | 2024-07-01 | arXiv | https://github.com/allenai/discoverybench | https://doi.org/10.48550/arXiv.2407.01725 |
935 | Enhancing the Capability and Robustness of Large Language Models through Reinforcement Learning-Driven Query Refinement | Zisu Huang, Xiaohua Wang, Feiran Zhang, Zhibo Xu, Cenyuan Zhang, Xiaoqing Zheng, Xuanjing Huang | 2024-07-01 | arXiv | https://github.com/Huangzisu/query-refinement | https://doi.org/10.48550/arXiv.2407.01461 |
936 | EconNLI: Evaluating Large Language Models on Economics Reasoning | Yue Guo, Yi Yang | 2024-07-01 | ACL | https://github.com/Irenehere/EconNLI | https://doi.org/10.18653/v1/2024.findings-acl.58 |
937 | AutoFlow: Automated Workflow Generation for Large Language Model Agents | Zelong Li, Shuyuan Xu, Kai Mei, Wenyue Hua, Balaji Rama, Om Raheja, Hao Wang, He Zhu, Yongfeng Zhang | 2024-07-01 | arXiv | https://github.com/agiresearch/AutoFlow | https://doi.org/10.48550/arXiv.2407.12821 |
938 | LLaRA: Large Language-Recommendation Assistant | Jiayi Liao, Sihang Li, Zhengyi Yang, Jiancan Wu, Yancheng Yuan, Xiang Wang, Xiangnan He | 2024-07 | SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval | https://github.com/ljy0ustc/LLaRA | https://dl.acm.org/doi/10.1145/3626772.3657690 |
939 | USimAgent: Large Language Models for Simulating Search Users | Erhan Zhang, Xingzhu Wang, Peiyuan Gong, Yankai Lin, Jiaxin Mao | 2024-07 | SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval | https://github.com/Meow-E/USimAgent | https://dl.acm.org/doi/10.1145/3626772.3657963 |
940 | IDGenRec: LLM-RecSys Alignment with Textual ID Learning | Juntao Tan, Shuyuan Xu, Wenyue Hua, Yingqiang Ge, Zelong Li, Yongfeng Zhang | 2024-07 | SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval | https://github.com/agiresearch/IDGenRec | https://dl.acm.org/doi/10.1145/3626772.3657821 |
941 | Are Large Language Models Good at Utility Judgments? | Hengran Zhang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng | 2024-07 | SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval | https://github.com/ict-bigdatalab/utility_judgments | https://dl.acm.org/doi/10.1145/3626772.3657784 |
942 | LLMatic: Neural Architecture Search Via Large Language Models And Quality Diversity Optimization | Muhammad Umair Nasir, Sam Earle, Julian Togelius, Steven James, Christopher W. Cleghorn | 2024-07 | GECCO '24: Proceedings of the Genetic and Evolutionary Computation Conference | https://github.com/umair-nasir14/LLMatic | https://dl.acm.org/doi/10.1145/3638529.3654017 |
943 | ChatUniTest: A Framework for LLM-Based Test Generation | Yinghao Chen, Zehao Hu, Chen Zhi, Junxiao Han, Shuiguang Deng, Jianwei Yin | 2024-07 | FSE 2024: Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering | https://github.com/ZJU-ACES-ISE/ChatUniTest | https://dl.acm.org/doi/10.1145/3663529.3663801 |
944 | GraphArena: Benchmarking Large Language Models on Graph Computational Problems | Jianheng Tang, Qifan Zhang, Yuhan Li, Jia Li | 2024-06-29 | arXiv | https://github.com/squareRoot3/GraphArena | https://doi.org/10.48550/arXiv.2407.00379 |
945 | LLMs-as-Instructors: Learning from Errors Toward Automating Model Improvement | Jiahao Ying, Mingbao Lin, Yixin Cao, Wei Tang, Bo Wang, Qianru Sun, Xuanjing Huang, Shuicheng Yan | 2024-06-29 | arXiv | https://yingjiahao14.github.io/LLMs-as-Instructors-pages/ | http://arxiv.org/abs/2407.00497v1 |
946 | YuLan: An Open-source Large Language Model | Yutao Zhu, Kun Zhou, Kelong Mao, Wentong Chen, Yiding Sun, Zhipeng Chen, Qian Cao, Yihan Wu, Yushuo Chen, Feng Wang, Lei Zhang, Junyi Li, Xiaolei Wang, Lei Wang, Beichen Zhang, Zican Dong, Xiaoxue Cheng, Yuhan Chen, Xinyu Tang, Yupeng Hou, Qiangqiang Ren, Xincheng Pang, Shufang Xie, Wayne Xin Zhao, Zhicheng Dou, Jiaxin Mao, Yankai Lin, Ruihua Song, Jun Xu, Xu Chen, Rui Yan, Zhewei Wei, Di Hu, Wenbing Huang, Ze-Feng Gao, Yueguo Chen, Weizheng Lu, Ji-Rong Wen | 2024-06-28 | arXiv | https://github.com/RUC-GSAI/YuLan-Chat | https://doi.org/10.48550/arXiv.2406.19853 |
947 | Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring | Jiazheng Li, Hainiu Xu, Zhaoyue Sun, Yuxiang Zhou, David West, Cesare Aloisi, Yulan He | 2024-06-28 | arXiv | https://github.com/lijiazheng99/thought_tree_assessment | http://arxiv.org/abs/2406.19949v2 |
948 | MMRo: Are Multimodal LLMs Eligible as the Brain for In-Home Robotics? | Jinming Li, Yichen Zhu, Zhiyuan Xu, Jindong Gu, Minjie Zhu, Xin Liu, Ning Liu, Yaxin Peng, Feifei Feng, Jian Tang | 2024-06-28 | arXiv | https://mm-robobench.github.io/ | http://arxiv.org/abs/2406.19693v1 |
949 | Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs | Sukmin Yun, Haokun Lin, Rusiru Thushara, Mohammad Qazim Bhat, Yongxin Wang, Zutao Jiang, Mingkai Deng, Jinhong Wang, Tianhua Tao, Junbo Li, Haonan Li, Preslav Nakov, Timothy Baldwin, Zhengzhong Liu, Eric P. Xing, Xiaodan Liang, Zhiqiang Shen | 2024-06-28 | arXiv | https://mbzuai-llm.github.io/webpage2code/ | http://arxiv.org/abs/2406.20098v2 |
950 | DIM: Dynamic Integration of Multimodal Entity Linking with Large Language Model | Shezheng Song, Shasha Li, Jie Yu, Shan Zhao, Xiaopeng Li, Jun Ma, Xiaodong Liu, Zhuo Li, Xiaoguang Mao | 2024-06-27 | PRCV | https://github.com/season1blue/DIM | https://doi.org/10.1007/978-981-97-8620-6_13 |
951 | STBench: Assessing the Ability of Large Language Models in Spatio-Temporal Analysis | Wenbin Li, Di Yao, Ruibo Zhao, Wenjie Chen, Zijie Xu, Chengxue Luo, Chang Gong, Quanliang Jing, Haining Tan, Jingping Bi | 2024-06-27 | arXiv | https://github.com/LwbXc/STBench | https://doi.org/10.48550/arXiv.2406.19065 |
952 | Hierarchical Deconstruction of LLM Reasoning: A Graph-Based Framework for Analyzing Knowledge Utilization | Miyoung Ko, Sue Hyun Park, Joonsuk Park, Minjoon Seo | 2024-06-27 | arXiv | https://github.com/kaistAI/knowledge-reasoning | http://arxiv.org/abs/2406.19502v2 |
953 | Selective Prompting Tuning for Personalized Conversations with LLMs | Qiushi Huang, Xubo Liu, Tom Ko, Bo Wu, Wenwu Wang, Yu Zhang, Lilian Tang | 2024-06-26 | OpenReview | https://github.com/hqsiswiliam/SPT | http://arxiv.org/abs/2406.18187v1 |
954 | IRCAN: Mitigating Knowledge Conflicts in LLM Generation via Identifying and Reweighting Context-Aware Neurons | Dan Shi, Renren Jin, Tianhao Shen, Weilong Dong, Xinwei Wu, Deyi Xiong | 2024-06-26 | arXiv | https://github.com/danshi777/IRCAN | http://arxiv.org/abs/2406.18406v2 |
955 | CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs | Zirui Wang, Mengzhou Xia, Luxi He, Howard Chen, Yitao Liu, Richard Zhu, Kaiqu Liang, Xindi Wu, Haotian Liu, Sadhika Malladi, Alexis Chevalier, Sanjeev Arora, Danqi Chen | 2024-06-26 | arXiv | https://charxiv.github.io/ | http://arxiv.org/abs/2406.18521v1 |
956 | Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs | Xin Lai, Zhuotao Tian, Yukang Chen, Senqiao Yang, Xiangru Peng, Jiaya Jia | 2024-06-26 | arXiv | https://github.com/dvlab-research/Step-DPO | http://arxiv.org/abs/2406.18629v1 |
957 | Understand What LLM Needs: Dual Preference Alignment for Retrieval-Augmented Generation | Guanting Dong, Yutao Zhu, Chenghao Zhang, Zechen Wang, Zhicheng Dou, Ji-Rong Wen | 2024-06-26 | arXiv | https://github.com/dongguanting/DPA-RAG | http://arxiv.org/abs/2406.18676v2 |
958 | Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMs | Lei Zhang, Yunshui Li, Jiaming Li, Xiaobo Xia, Jiaxi Yang, Run Luo, Minzheng Wang, Longze Chen, Junhao Liu, Min Yang | 2024-06-26 | arXiv | https://github.com/Hambaobao/HCP-Coder | http://arxiv.org/abs/2406.18294v2 |
959 | The Surprising Effectiveness of Multimodal Large Language Models for Video Moment Retrieval | Meinardus Boris, Batra Anil, Rohrbach Anna, Rohrbach Marcus | 2024-06-26 | arXiv | https://github.com/sudo-Boris/mr-Blip | https://doi.org/10.48550/arXiv.2406.18113 |
960 | BADGE: BADminton report Generation and Evaluation with LLM | Shang-Hsuan Chiang, Lin-Wei Chao, Kuang-Da Wang, Chih-Chuan Wang, Wen-Chih Peng | 2024-06-26 | arXiv | https://github.com/AndyChiangSH/BADGE | http://arxiv.org/abs/2406.18116v1 |
961 | Visual Reasoning and Multi-Agent Approach in Multimodal Large Language Models (MLLMs): Solving TSP and mTSP Combinatorial Challenges | Mohammed Elhenawy, Ahmad Abutahoun, Taqwa I. Alhadidi, Ahmed Jaber, Huthaifa I. Ashqar, Shadi Jaradat, Ahmed Abdelhay, Sebastien Glaser, Andry Rakotonirainy | 2024-06-26 | Mach. Learn. Knowl. Extr. | https://github.com/ahmed-abdulhuy/Solving-TSP-and-mTSP-Combinatorial-Challenges-using-Visual-Reasoning-and-Multi-Agent-Approach-MLLMs- | https://doi.org/10.3390/make6030093 |
962 | A Closer Look into Mixture-of-Experts in Large Language Models | Ka Man Lo, Zeyu Huang, Zihan Qiu, Zili Wang, Jie Fu | 2024-06-26 | arXiv | https://github.com/kamanphoebe/Look-into-MoEs | https://doi.org/10.48550/arXiv.2406.18219 |
963 | A Review of Large Language Models and Autonomous Agents in Chemistry | Mayk Caldas Ramos, Christopher J. Collison, Andrew D. White | 2024-06-26 | arXiv | https://github.com/ur-whitelab/LLMs-in-science | https://doi.org/10.48550/arXiv.2407.01603 |
964 | ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMs | Ahmed Heakl, Youssef Zaghloul, Mennatullah Ali, Rania Hossam, Walid Gomaa | 2024-06-26 | arXiv | http://github.com/ahmedheakl/arazn-llm | http://arxiv.org/abs/2406.18120v2 |
965 | Crafting Customisable Characters with LLMs: Introducing SimsChat, a Persona-Driven Role-Playing Agent Framework | Bohao Yang, Dong Liu, Chen Tang, Chenghao Xiao, Kun Zhao, Chao Li, Lin Yuan, Guang Yang, Lanxiao Huang, Chenghua Lin | 2024-06-25 | arXiv | https://github.com/Bernard-Yang/SimsChat | http://arxiv.org/abs/2406.17962v3 |
966 | TALEC: Teach Your LLM to Evaluate in Specific Domain with In-house Criteria by Criteria Division and Zero-shot Plus Few-shot | Kaiqi Zhang, Shuai Yuan, Honghan Zhao | 2024-06-25 | arXiv | https://github.com/zlkqz/auto_eval | http://arxiv.org/abs/2407.10999v1 |
967 | T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge | Jianyu Wei, Shijie Cao, Ting Cao, Lingxiao Ma, Lei Wang, Yanyong Zhang, Mao Yang | 2024-06-25 | arXiv | https://github.com/microsoft/T-MAC | http://arxiv.org/abs/2407.00088v1 |
968 | Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA | Minzheng Wang, Longze Chen, Cheng Fu, Shengyi Liao, Xinghua Zhang, Bingli Wu, Haiyang Yu, Nan Xu, Lei Zhang, Run Luo, Yunshui Li, Min Yang, Fei Huang, Yongbin Li | 2024-06-25 | arXiv | https://github.com/MozerWang/Loong | http://arxiv.org/abs/2406.17419v2 |
969 | Layer-Wise Quantization: A Pragmatic and Effective Method for Quantizing LLMs Beyond Integer Bit-Levels | Razvan-Gabriel Dumitru, Vikas Yadav, Rishabh Maheshwary, Paul-Ioan Clotan, Sathwik Tejaswi Madhusudhan, Mihai Surdeanu | 2024-06-25 | arXiv | https://github.com/RazvanDu/LayerwiseQuant/ | http://arxiv.org/abs/2406.17415v3 |
970 | Grass: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients | Aashiq Muhamed, Oscar Li, David Woodruff, Mona Diab, Virginia Smith | 2024-06-25 | arXiv | https://github.com/aashiqmuhamed/GRASS | http://arxiv.org/abs/2406.17660v1 |
971 | Predicting the Big Five Personality Traits in Chinese Counselling Dialogues Using Large Language Models | Yang Yan, Lizhi Ma, Anqi Li, Jingsong Ma, Zhenzhong Lan | 2024-06-25 | arXiv | https://github.com/kuri-leo/BigFive-LLM-Predictor | https://doi.org/10.48550/arXiv.2406.17287 |
972 | Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models | Wenhao Shi, Zhiqiang Hu, Yi Bin, Junhua Liu, Yang Yang, See-Kiong Ng, Lidong Bing, Roy Ka-Wei Lee | 2024-06-25 | EMNLP | https://github.com/HZQ950419/Math-LLaVA | https://aclanthology.org/2024.findings-emnlp.268 |
973 | Large Language Models are Interpretable Learners | Ruochen Wang, Si Si, Felix Yu, Dorothea Wiesmann, Cho-Jui Hsieh, Inderjit S. Dhillon | 2024-06-25 | arXiv | https://github.com/ruocwang/llm-symbolic-program | https://doi.org/10.48550/arXiv.2406.17224 |
974 | Improving Arithmetic Reasoning Ability of Large Language Models through Relation Tuples, Verification and Dynamic Feedback | Zhongtao Miao, Kaiyan Zhao, Yoshimasa Tsuruoka | 2024-06-25 | arXiv | https://github.com/gpgg/art | https://doi.org/10.48550/arXiv.2406.17873 |
975 | From Distributional to Overton Pluralism: Investigating Large Language Model Alignment | Thom Lake, Eunsol Choi, Greg Durrett | 2024-06-25 | arXiv | https://github.com/thomlake/investigating-alignment | https://doi.org/10.48550/arXiv.2406.17692 |
976 | Dual-Space Knowledge Distillation for Large Language Models | Songming Zhang, Xue Zhang, Zengkui Sun, Yufeng Chen, Jinan Xu | 2024-06-25 | EMNLP | https://github.com/songmzhang/DSKD | https://aclanthology.org/2024.emnlp-main.1010 |
977 | DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph | Zhehao Zhang, Jiaao Chen, Diyi Yang | 2024-06-25 | arXiv | https://github.com/SALT-NLP/DARG | https://doi.org/10.48550/arXiv.2406.17271 |
978 | Building on Efficient Foundations: Effectively Training LLMs with Structured Feedforward Layers | Xiuying Wei, Skander Moalla, Razvan Pascanu, Caglar Gulcehre | 2024-06-24 | arXiv | https://github.com/CLAIRE-Labo/StructuredFFN/tree/main | http://arxiv.org/abs/2406.16450v2 |
979 | Prompt-Consistency Image Generation (PCIG): A Unified Framework Integrating LLMs, Knowledge Graphs, and Controllable Diffusion Models | Yichen Sun, Zhixuan Chu, Zhan Qin, Kui Ren | 2024-06-24 | arXiv | https://github.com/TruthAI-Lab/PCIG | http://arxiv.org/abs/2406.16333v1 |
980 | Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs | Ashwinee Panda, Berivan Isik, Xiangyu Qi, Sanmi Koyejo, Tsachy Weissman, Prateek Mittal | 2024-06-24 | arXiv | https://github.com/kiddyboots216/lottery-ticket-adaptation | http://arxiv.org/abs/2406.16797v2 |
981 | EVALALIGN: Supervised Fine-Tuning Multimodal LLMs with Human-Aligned Data for Evaluating Text-to-Image Models | Zhiyu Tan, Xiaomeng Yang, Luozheng Qin, Mengping Yang, Cheng Zhang, Hao Li | 2024-06-24 | arXiv | https://sais-fuxi.github.io/projects/evalalign/ | http://arxiv.org/abs/2406.16562v3 |
982 | AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models | Jiale Cheng, Yida Lu, Xiaotao Gu, Pei Ke, Xiao Liu, Yuxiao Dong, Hongning Wang, Jie Tang, Minlie Huang | 2024-06-24 | EMNLP | https://github.com/thu-coai/AutoDetect | https://aclanthology.org/2024.findings-emnlp.397 |
983 | ShadowLLM: Predictor-based Contextual Sparsity for Large Language Models | Yash Akhauri, Ahmed F. AbouElhamayed, Jordan Dotzel, Zhiru Zhang, Alexander M. Rush, Safeen Huda, Mohamed S. Abdelfattah | 2024-06-24 | EMNLP | https://github.com/abdelfattah-lab/shadow_llm/ | https://aclanthology.org/2024.emnlp-main.1068 |
984 | Multi-LogiEval: Towards Evaluating Multi-Step Logical Reasoning Ability of Large Language Models | Nisarg Patel, Mohith Kulkarni, Mihir Parmar, Aashna Budhiraja, Mutsumi Nakamura, Neeraj Varshney, Chitta Baral | 2024-06-24 | EMNLP | https://github.com/Mihir3009/Multi-LogiEval | https://aclanthology.org/2024.emnlp-main.1160 |
985 | M2Lingual: Enhancing Multilingual, Multi-Turn Instruction Alignment in Large Language Models | Rishabh Maheshwary, Vikas Yadav, Hoang Nguyen, Khyati Mahajan, Sathwik Tejaswi Madhusudhan | 2024-06-24 | arXiv | https://github.com/ServiceNow/M2Lingual | https://doi.org/10.48550/arXiv.2406.16783 |
986 | Large Language Models Are Cross-Lingual Knowledge-Free Reasoners | Peng Hu, Sizhe Liu, Changjiang Gao, Xin Huang, Xue Han, Junlan Feng, Chao Deng, Shujian Huang | 2024-06-24 | arXiv | https://github.com/NJUNLP/Knowledge-Free-Reasoning | https://doi.org/10.48550/arXiv.2406.16655 |
987 | AudioBench: A Universal Benchmark for Audio Large Language Models | Bin Wang, Xunlong Zou, Geyu Lin, Shuo Sun, Zhuohan Liu, Wenyu Zhang, Zhengyuan Liu, AiTi Aw, Nancy F. Chen | 2024-06-23 | arXiv | https://github.com/AudioLLMs/AudioBench | https://doi.org/10.48550/arXiv.2406.16020 |
988 | Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models | Lynn Chua, Badih Ghazi, Yangsibo Huang, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, Chulin Xie, Chiyuan Zhang | 2024-06-23 | arXiv | https://github.com/google-research/crosslingual-knowledge-barriers | https://doi.org/10.48550/arXiv.2406.16135 |
989 | Efficient Evolutionary Search Over Chemical Space with Large Language Models | Haorui Wang, Marta Skreta, Cher-Tian Ser, Wenhao Gao, Lingkai Kong, Felix Streith-Kalthoff, Chenru Duan, Yuchen Zhuang, Yue Yu, Yanqiao Zhu, Yuanqi Du, Alán Aspuru-Guzik, Kirill Neklyudov, Chao Zhang | 2024-06-23 | arXiv | http://github.com/zoom-wang112358/MOLLEO | https://doi.org/10.48550/arXiv.2406.16976 |
990 | FS-RAG: A Frame Semantics Based Approach for Improved Factual Accuracy in Large Language Models | Harish Tayyar Madabushi | 2024-06-23 | arXiv | https://github.com/H-TayyarMadabushi/A-Frame-Semantics-based-approach-for-Improved-Factual-Accuracy-in-Large-Language-Models | https://doi.org/10.48550/arXiv.2406.16167 |
991 | FastMem: Fast Memorization of Prompt Improves Context Awareness of Large Language Models | Junyi Zhu, Shuochen Liu, Yu Yu, Bo Tang, Yibo Yan, Zhiyu Li, Feiyu Xiong, Tong Xu, Matthew B. Blaschko | 2024-06-23 | EMNLP | https://github.com/IAAR-Shanghai/FastMem | https://aclanthology.org/2024.findings-emnlp.687 |
992 | Can LLM Graph Reasoning Generalize beyond Pattern Memorization? | Yizhuo Zhang, Heng Wang, Shangbin Feng, Zhaoxuan Tan, Xiaochuang Han, Tianxing He, Yulia Tsvetkov | 2024-06-23 | arXiv | https://github.com/MatthewYZhang/NLGift | http://arxiv.org/abs/2406.15992v2 |
993 | SS-GEN: A Social Story Generation Framework with Large Language Models | Yi Feng, Mingyang Song, Jiaqi Wang, Zhuang Chen, Guanqun Bi, Minlie Huang, Liping Jing, Jian Yu | 2024-06-22 | arXiv | https://github.com/MIMIFY/SS-GEN | http://arxiv.org/abs/2406.15695v2 |
994 | Ladder: A Model-Agnostic Framework Boosting LLM-based Machine Translation to the Next Level | Zhaopeng Feng, Ruizhe Chen, Yan Zhang, Zijie Meng, Zuozhu Liu | 2024-06-22 | arXiv | https://github.com/fzp0424/MT-Ladder | http://arxiv.org/abs/2406.15741v3 |
995 | RuleR: Improving LLM Controllability by Rule-based Data Recycling | Ming Li, Han Chen, Chenguang Wang, Dang Nguyen, Dianqi Li, Tianyi Zhou | 2024-06-22 | arXiv | https://github.com/tianyi-lab/RuleR | http://arxiv.org/abs/2406.15938v3 |
996 | Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibration | Zhongzhi Yu, Zheng Wang, Yonggan Fu, Huihong Shi, Khalid Shaikh, Yingyan Celine Lin | 2024-06-22 | ICML | https://github.com/GATECH-EIC/ACT | https://openreview.net/forum?id=DLTjFFiuUJ |
997 | video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models | Guangzhi Sun, Wenyi Yu, Changli Tang, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Yuxuan Wang, Chao Zhang | 2024-06-22 | ICML | https://github.com/bytedance/SALMONN/ | https://openreview.net/forum?id=nYsh5GFIqX |
998 | The Music Maestro or The Musically Challenged, A Massive Music Evaluation Benchmark for Large Language Models | Jiajia Li, Lu Yang, Mingni Tang, Chenchong Chenchong, Zuchao Li, Ping Wang, Hai Zhao | 2024-06-22 | ACL | https://github.com/zcli-charlie/ZIQI-Eval | https://doi.org/10.18653/v1/2024.findings-acl.194 |
999 | Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph | Roman Vashurin, Ekaterina Fadeeva, Artem Vazhentsev, Lyudmila Rvanova, Akim Tsvigun, Daniil Vasilev, Rui Xing, Abdelrahman Boda Sadallah, Kirill Grishchenkov, Sergey Petrakov, Alexander Panchenko, Timothy Baldwin, Preslav Nakov, Maxim Panov, Artem Shelmanov | 2024-06-21 | arXiv | https://github.com/IINemo/lm-polygraph | https://doi.org/10.48550/arXiv.2406.15627 |
1000 | ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models | Haiquan Zhao, Lingyu Li, Shisong Chen, Shuqi Kong, Jiaan Wang, Kexin Huang, Tianle Gu, Yixu Wang, Jian Wang, Dandan Liang, Zhixu Li, Yan Teng, Yanghua Xiao, Yingchun Wang | 2024-06-21 | arXiv | https://github.com/AIFlames/Esc-Eval | https://doi.org/10.48550/arXiv.2406.14952 |
1001 | GIEBench: Towards Holistic Evaluation of Group Identity-based Empathy for Large Language Models | Leyan Wang, Yonggang Jin, Tianhao Shen, Tianyu Zheng, Xinrun Du, Chenchen Zhang, Wenhao Huang, Jiaheng Liu, Shi Wang, Ge Zhang, Liuyu Xiang, Zhaofeng He | 2024-06-21 | arXiv | https://github.com/GIEBench/GIEBench | https://doi.org/10.48550/arXiv.2406.14903 |
1002 | Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models | Qi Liu, Bo Wang, Nan Wang, Jiaxin Mao | 2024-06-21 | arXiv | https://github.com/liuqi6777/pe_rank | https://doi.org/10.48550/arXiv.2406.14848 |