Jul 22, 2024
LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference
Posted by Cecile G. Tamura in category: futurism
Qichen Fu, Minsik Cho, Thomas Merth, Sachin Mehta, Mohammad Rastegari, Mahyar Najibi Apple & Meta 2024
Join the discussion on this paper page.