[Grub-dev] back working on some grub stuff :) - New workunit format

Balinny balinny at gmail.com
Mon May 12 15:32:42 UTC 2008


Bartek Jasicki wrote:
> And now little explain:
> Plain text still have this same problem like old workunit. You must or
> write in code amount of links in one workunit (and every time if this
> amount is changed, you must change code too) or read all file to count
> amount of links. In new version this can be little harder than in older,
> because every block with link to crawl can have different amount of
> lines. Thus to count amount of links to crawl you must check all text.
> This is only one disadvantage which i find in this proposition.
>   
Just count the number of lines beginning with GET.
$ grep "^GET " workunit.1 | wc -l
Should give you the count, should you need it.

I'm amazed on how you are going to count it with xml without having the 
entire file read ;)

I don't see that having a different number of lines is such a trouble. 
Currently there is, even for
my code, which will need to be change when Jer adds more headers. By 
removing the headers
with magic meanings like host: it's much simpler just treating them all 
the same in a loop until
line is empty.

> Making workunit as a xml have advantages:
> - simpler to parse (most parses can count elements in xml file, thus
> counting amount of links been simpler)
>   
Only if you have a ready-to-use xml library. It's slower because it's 
more complex.
I look ahead to your shell script client of xml workunits.

> - human readable - with good named elements workunit can be easy
> understand by everyone
>   
I find it quite readable, perhaps a bit harder to fully understand and 
write. But anyone wishisng
to mess with workunits should be able to understand that 
http://homepage3.nifty.com/naonaorin/
is a URL and User-Agent: a header.
Plus, there's documentation ;)

> - looking similar on all operating systems (every system use other new
> line element. Then CRLF can looks good on Windows only, on other systems
> in normal text editors output can be very interesting)
>   
Not so much. Most Linux editors accept CRLF perfectly. Perhaps too 
happily. If i want to show
the CR i usually go to old vi.
> - simpler to create, in plain text you still must use some order to put
> options for work. In xml this is necessary.
>
> Of course, xml version has disadvantages too:
> - More necessary data send to client. Not only white spaces but
> elements name too
> - Slower parse than plain text (this depend on library used to parse
> xml file, from little slower to unusable)
> - add any ;)
>
> Bartek



More information about the Grub-dev mailing list