Decoding the Text Encoding

Team Members

Hamid Izadinia [email protected]
Fereshteh Sadeghi [email protected]

Overview

Commentary on research/development process

In this project we want to extract the raw data information in a given word cloud image. Word clouds and text visualization is one of the recent most popular and widely used types of visualizations. Despite the attractiveness and simplicity of producing word clouds, they do not provide a thorough visualization for the distribution of the underlying data. Therefore, it is important to redesign word clouds for improving their design choices and to be able to do further statistical analysis on data. In this paper we have proposed the development of a fully automatic redesigning algorithm for word cloud visualization. Our proposed method is able to decode an input word cloud visualization and provides the raw data in the form of a list of (word, value) pairs. To the best of our knowledge our work is the first attempt to extract raw data from word cloud visualization. We have tested our proposed method both qualitatively and quantitatively. The results of our experiments show that our algorithm is able to extract the words and their weights effectively with considerable low error rate.

Work Breakdown

The workload of the project is divided in every step. The details are as follows:

Discussion about the algorithm and ideas used in every step and the possible evaluation methods. (Fereshteh Sadeghi, Hamid Izadinia)
Applying computer vision methods for extraction of the connected regions in the image. (Hamid Izadinia)
Implementation of finding connected components in graph using bipartite graph matching algorithm. (Fereshteh Sadeghi)
Recognizing the corresponding letter for each image patch using an OCR method which is based on cross correlation between image patch and templates. (Fereshteh Sadeghi)
Qualitative evaluation using downloaded images from Google. (Hamid Izadinia)
Quantitative evaluation using word cloud implementation of d3 for generating word cloud and extraction of the ground truth values from SVG files. Then, processing PNG files using our method and compare the output results with the ground truth. (Hamid Izadinia)

Poster, Final Paper

Running Instructions

This project is implemented in Matlab and C++. For running the code you can run "run_script". In this script the following functions will run and the results will show in figure in every iteration of algorithm. The error of value estimation compared to ground truth prints as output.

The functions are:

Extracting the connected components in the image

connected_comp_patch.m
Computing the edge weights for all connections in graph

get_rel_letters_func.m get_rel_letters_vert_func.m
Iterative word extraction and their weight estimation

convert_image_to_chart.m
Reading ground truth histograms from SVG file

read_gt.m

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
data		data
final		final
images		images
svg		svg
vis		vis
FS_call_OCR.m		FS_call_OCR.m
Fordist.m		Fordist.m
README.md		README.md
assignmentoptimal.c		assignmentoptimal.c
assignmentoptimal.mexa64		assignmentoptimal.mexa64
assignmentoptimal.mexmaci64		assignmentoptimal.mexmaci64
buildletters.m		buildletters.m
calc_overlap.m		calc_overlap.m
connected_comp_patch.m		connected_comp_patch.m
convert_image_to_chart.m		convert_image_to_chart.m
cost_connectivity.mat		cost_connectivity.mat
extractletter.m		extractletter.m
get_rel_letters.m		get_rel_letters.m
get_rel_letters_func.m		get_rel_letters_func.m
get_rel_letters_vert.m		get_rel_letters_vert.m
get_rel_letters_vert_func.m		get_rel_letters_vert_func.m
get_words.m		get_words.m
google.config.mat		google.config.mat
google.m		google.m
index.html		index.html
is_inLine.m		is_inLine.m
letters_Arial.mat		letters_Arial.mat
makeStructFromNode.m		makeStructFromNode.m
overview_final.png		overview_final.png
parseAttributes.m		parseAttributes.m
parseChildNodes.m		parseChildNodes.m
parseXML.m		parseXML.m
read_gt.m		read_gt.m
read_letter_perso.m		read_letter_perso.m
run_script.m		run_script.m
sp_dist2.m		sp_dist2.m
spellcheck.m		spellcheck.m
strdist.m		strdist.m
summary.png		summary.png
templates.mat		templates.mat
testXYdata.mat		testXYdata.mat
tight_subplot.m		tight_subplot.m

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Decoding the Text Encoding

Team Members

Overview

Commentary on research/development process

Work Breakdown

Running Instructions

About

Releases

Packages

Languages

CSE512-14W/fp-izadinia-fsadeghi

Folders and files

Latest commit

History

Repository files navigation

Decoding the Text Encoding

Team Members

Overview

Commentary on research/development process

Work Breakdown

Running Instructions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages