Pregunta de entrevista de Meta

How to design a multimodal LLM? Coding: DFS on tree.