Authors develop a new text corpus VT10g, which supports many useful Web characteristics in a sample collection.