[Grub-dev] back working on some grub stuff :) - New workunit format

Bartek Jasicki thindil2 at gmail.com
Mon May 12 16:04:09 UTC 2008


On 2008-05-12 at. 17:32:42
Balinny <balinny at gmail.com> wrote:

> Bartek Jasicki wrote:
> > And now little explain:
> > Plain text still have this same problem like old workunit. You must
> > or write in code amount of links in one workunit (and every time if
> > this amount is changed, you must change code too) or read all file
> > to count amount of links. In new version this can be little harder
> > than in older, because every block with link to crawl can have
> > different amount of lines. Thus to count amount of links to crawl
> > you must check all text. This is only one disadvantage which i find
> > in this proposition. 
> Just count the number of lines beginning with GET.
> $ grep "^GET " workunit.1 | wc -l
> Should give you the count, should you need it.
> 
> I'm amazed on how you are going to count it with xml without having
> the entire file read ;)
> 
> I don't see that having a different number of lines is such a
> trouble. Currently there is, even for
> my code, which will need to be change when Jer adds more headers. By 
> removing the headers
> with magic meanings like host: it's much simpler just treating them
> all the same in a loop until
> line is empty.
> 

Yes, but it simpler to make when you use xml parser (especially in
higher level languages). There no too much differences between plain
text and xml in C but in C#, Java or Python this make big difference
when you can read values from file by 2 functions than 50 ;) 

> > Making workunit as a xml have advantages:
> > - simpler to parse (most parses can count elements in xml file, thus
> > counting amount of links been simpler)
> >   
> Only if you have a ready-to-use xml library. It's slower because it's 
> more complex.
> I look ahead to your shell script client of xml workunits.
> 

Making shell script can be "little" harder with xml, but how i wrote
ealier - create new clients with high level languages can be easier.


> > - human readable - with good named elements workunit can be easy
> > understand by everyone
> >   
> I find it quite readable, perhaps a bit harder to fully understand
> and write. But anyone wishisng
> to mess with workunits should be able to understand that 
> http://homepage3.nifty.com/naonaorin/
> is a URL and User-Agent: a header.
> Plus, there's documentation ;)
> 

Documentation for xml files can be smaller and simpler ;)

> > - looking similar on all operating systems (every system use other
> > new line element. Then CRLF can looks good on Windows only, on
> > other systems in normal text editors output can be very interesting)
> >   
> Not so much. Most Linux editors accept CRLF perfectly. Perhaps too 
> happily. If i want to show
> the CR i usually go to old vi.

And i still nagger ;) Of course i not force to use xml, but i think this
can be better way for more complex workunits and more flexible than
plain text.

Bartek


More information about the Grub-dev mailing list