[Grub-dev] bugreport: grubng.exe c# version 01.2950.37591 - various bugs (crawling / .arc file bloating)

ab spam at abittner.de
Wed Jan 30 19:58:24 UTC 2008


hello there,

been using grubng.exe c# version 01.2950.37591 on a windows xp sp2 
english system for a little while now and have encountered several bugs 
and buggy behaviour i wanted to report:


1. summary: .arc result file gets bloated enormously when 
aborting/restarting gui (crawl process).

while working (crawling) the given workunit (url-list) and then being 
interrupted by the quit/exit button and then being restarted again 
grubng.exe doesnt respect the already crawled urls at all.

the resulting answer file from the webservers named 
username.044d253173a6af2b2b8cb0cae40c040ac0e6f989.arc keeps growing and 
growing when the user restarts the grubng.exe gui. and grubng.exe keeps 
crawling and adding all the same results over and over again....

grubng.exe gets for example 250urls to crawl.
it starts crawling urls 1 to 10. grubng.exe gets quitted.
grubng.exe gets restarted. it crawls urls 1 to 10 again and then 
continues to urls 11 to 250.

the results from all the previous url-crawling all land in the .arc file 
and inflate it enourmously depending on the urls/answers.


2. summary: crawling stalls from time to time. never resumes after that.

grubng crawling stalls from time to time. no http connections to any 
more http servers are being made. thats for example how i came across 
the bug mentioned above. my grubng.exe process stopped with the url 
number 10. all the results from urls one to nine were already in the 
.arc resultfile. grubng.exe never tries to recrawl url number 10 or 
never continues to urls beyond that failing/errorneous url #10. so it 
never skips to 11 and thus never continues with the crawl.

i have been waiting on the grubng.exe client for like 10 to 20minutes. 
nothing ever happened after that url number 10. then i quit 
grubng.exe/gui normally from within the gui. after restarting grubng.exe 
it started all over again with the urls from one to 250 again.

regards.


More information about the Grub-dev mailing list