Projects

Word
Representation

Inside Out:Two Jointly Predictive Models for Word Representations and Phrase Representations: We propose proposes a novel approach of building word representations by incorporating morphemes as context information in addition to the typical words based context. In addition, according to the theory of distributed morphology, our models can be easily applied to the learning of phrase representations, by viewing constituting words in a phrase as its morphemes.

Learning Word Representations by Jointly Modeling Syntagmatic and Paradigmatic Relations: We propose two novel distributional models for word representation using both syntagmatic and paradigmatic relations via a joint training objective. The proposed models can perform better than all the state-of-the-art baseline methods on both word analogy and word similairty tasks.

Content Extraction Via Text Density: We Proposed a DOM based content extraction approach via text density. This method improves the quality of structural content extraction of web pages and retains the original structural information in the web page cleaning process.