Skip to content

Latest commit

 

History

History
57 lines (39 loc) · 2.83 KB

README.md

File metadata and controls

57 lines (39 loc) · 2.83 KB

Sequential Stories

The Show and Tell model is a image-to-text model for Tensorflow, developed by Google DeepMind and based on this paper, that takes an input and learns how to describe the content of images. This experimental iOS app uses this feature to generate a series of captions and create a story.

Example using stills from the 'The Gran Budapest Hotel' by Wes Anderson:

demo

Setup

  1. Install im2txt and its dependencies. Follow Edouard Fouché setup and used the same pre trained model described in his instructions. The only change was that in line 49 in im2txt/im2txt/inference_utils/vocabulary.py I didn't change this:
reverse_vocab = [line.split()[0] for line in reverse_vocab] # to:
reverse_vocab = [eval(line.split()[0]).decode() for line in reverse_vocab]
  1. Download or clone this repo.
  2. Install the app located in platforms/ios in Xcode. You can also run cordova plaform add ios from the root and then cordova prepare ios and then upload. (Install Cordova first)
  3. Connect your phone to a Wifi network. Your computer should be connected to the same network.
  4. Open the file server_im2txt.py and change line 15: ip = '172.16.220.255' to match the ip assigned by the network. (To know your ip type ifconfig | grep "inet " | grep -Fv 127.0.0.1 | awk '{print $2}' in OSX)
  5. Run python server_im2txt.py
  6. Open the app, and click the top left icon. Enter the same IP address from before. A green light should turn on the right top corner.

Running a MacBook Pro from 2014 it takes around 7 seconds to caption an image.

Dependencies:

  • Bazel
  • TensorFlow 1.0 or greater
  • NumPy
  • Natural Language Toolkit (NLTK)
  • Checkpoint

Versions

The file server_im2txt runs the im2txt model on every request from to the /upload route and returns a string with a sentence for the story. The app loads an image to the /upload folder.

The file server_lstm runs a classification model in keras and then a LSTM network trained on the 25 most download books from the Gutenberg Project. This was the first approach to the app and it's still a WIP.

Outputs

demo

Interaction

demo

Links

TODO

  • Configure IP from app.
  • Create Reacte Native version?
  • Add more nlp to the output or maybe add the lstm version to it?