DBWeb software and datasets
The DBWeb Team has produced the following software and datasets:
Web information extraction
RED, standing for RSS-based Experimental Dataset is a dataset constructed from RSS feeds for Web article content extraction. It contains 90 Web sites comprising 1,010 individual Web pages, all of which have been annotated by hand to identify the main content of interest.