URL: 940 ms File: 17 ms
If you could keep the files on local disk, you could cut the total time to about 2 minutes (10 minutes with SAXParser), which is still pretty slow but not so bad. So, I think the issue is downloading 8000 files. Try seeing how long it would take to download 8000 files to your PC. Perhaps thread-level parallelism will help (test on a few thousands files).
Then I took to the web to see if SAXParser is just really slow, even though I've already established that the parser probably isn't at fault here. Apparently, when using validation it's much slower, which is no surprise. Some people suggested using Sun's Multi-Schema Validator. Another suggested using RXP validator, which is supposedly the fastest validator available. You'd have to go through JNI though since RXP is a C library. And here: Fastest Java XML parsers.
This post has been edited by blackcompe: 04 October 2012 - 05:10 PM