Wu, P. Y., & Mebane, W. R. (2022). MARMOT: A Deep Learning Framework for Constructing Multimodal Representations for Vision-and-Language Tasks. Computational Communication Research (old Website), 4(1), 275–322. Retrieved from http://bubble.labs.vu.nl/ccr/article/view/102