Collecting statistics from XHTML pages