设为首页收藏本站

深圳大学论坛

 找回密码
 注册

QQ登录

只需一步,快速开始

搜索
查看: 349|回复: 0

[标题党] Show and Tell: A Neural Image Caption Generator :看和说,一个神经图片说明生成器

[复制链接]
发表于 2017-8-16 14:29:28 | 显示全部楼层 |阅读模式
Show和Tell模型是编码器 - 解码器神经网络的一个例子。它通过首先将图像“编码”为固定长度矢量表示,然后将表示“解码”为自然语言描述。图像编码器是一个深卷积神经网络。这种类型的网络被广泛用于图像任务,目前是用于对象识别和检测的最先进的技术。我们特别选择的网络是在ILSVRC-2012-CLS图像分类数据集中预先训练的Inception v3图像识别模型。解码器是一个长期的短期存储(LSTM)网络。这种类型的网络通常用于诸如语言建模和机器翻译之类的序列建模任务。在“Show and Tell”模型中,LSTM网络被训练为基于图像编码的语言模型。字幕中的单词用嵌入模型表示。词汇中的每个单词都与训练期间学习的固定长度向量表示相关联。下图说明了模型架构。
       
tensorflow
http://www.tensorflownews.com/



The Show and Tell model is an example of an encoder-decoder neural network. It works by first "encoding" an image into a fixed-length vector representation, and then "decoding" the representation into a natural language description.
The image encoder is a deep convolutional neural network. This type of network is widely used for image tasks and is currently state-of-the-art for object recognition and detection. Our particular choice of network is the Inception v3 image recognition model pretrained on the ILSVRC-2012-CLS image classification dataset.
The decoder is a long short-term memory (LSTM) network. This type of network is commonly used for sequence modeling tasks such as language modeling and machine translation. In the Show and Tell model, the LSTM network is trained as a language model conditioned on the image encoding.
Words in the capti** are represented with an embedding model. Each word in the vocabulary is associated with a fixed-length vector representation that is learned during training.
The following diagram illustrates the model architecture.
https://github.com/tensorflow/models/tree/master/im2txt#introduction

回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

QQ|小黑屋|手机版|深圳大学论坛 ( 粤ICP备11049555号  

GMT+8, 2017-11-23 07:48 , Processed in 0.125000 second(s), 22 queries .

Powered by Discuz! X3.1

© 2001-2013 Comsenz Inc.

快速回复 返回顶部 返回列表