WU, P. Y.; MEBANE, W. R. MARMOT: A Deep Learning Framework for Constructing Multimodal Representations for Vision-and-Language Tasks. Computational Communication Research (old website), [S. l.], v. 4, n. 1, p. 275–322, 2022. Disponível em: http://bubble.labs.vu.nl/ccr/article/view/102. Acesso em: 10 jun. 2025.