[Grub-dev] Yet another invalid workunit entry

Balinny balinny at gmail.com
Thu Jul 10 11:40:08 UTC 2008


Rainer Blome wrote:
>
> These look like symptoms of memory corruption.
> What software generated these strings?
>   
Agree.  But server side an interpreted language is used, so there 
shouldn't be such kind of problems.
Locally downloaded with wget, a solid rock there. My copy could be 
broken though, depite having
downloaded lots of big files without problem (as shown by the accompanng 
hash).
The Grub C Client opens the file read-only, so nothing there either.

> The crawling system as a whole receives external input, both in the form 
> of URLs being read from web pages and result files from users, if I'm 
> not mistaken.  Since the crawlers are intended to be run by end-users, 
> they must be secured against both malicious and unintentionally 
> detrimental input, from both work units and the web itself.  What has 
> been done in this direction?
>   
So i intended :)
There're several assumptions on my code about the workunit and http 
format, no embedded nulls,
only one PUT, the maximum size of an URL, the MIME length... I tried to 
set reasonable values
(why would we crawl an url which some browsers wouldn't visit?) and/or 
give appropiate erros.
I think the more defensive approach would be isPrivateIP() to avoid 
crawling the NAT.


> Searching for "grub security" shows that it would be better if the 
> crawler had a different name :-).  Yes, I know that this has been 
> already discussed, but still.
>
> Has Grub been run with a memory debugger?
>   
Running it under valgrind, there're many errors reported... on ld-2.3.6, 
by subcalls of gethostbyname :S
No problems on the Client detected.

> Has its code been reviewed?
> Looking at the Grub C Client code...
>   
As far as i know, you may be the first one which bothered to read it.

> o There is no systematic documentation in the code.
>
> o Requires C99 compiler - OK, I actually like C99, but the code
>    should #if __STDC_VERSION__ < 199901L
>    #error "Requires C99 compiler"
>    #endif
>   
That breaks default compiling with gcc. Added the check for C99 *or* GCC

> o String function calls without explicit limits.
>   
Can you elaborate? AFAIK the only unlimited piece was the strtok (i 
think i was a
bit lazy when i added it), which i just replaced.

> o Conditionals without braces.
>   
Are they so bad? Ok, i'll bracify the code a bit... :)
> o The function FetchPage...
>    - is four pages long (makes it hard to understand,
>      any function longer than one page should be broken up),
>   
Yeah, it's a kind of beast. But things are quite enchained. Perhaps the 
first 25 lines
could be moved to a validating subfunction, splitting the rest would 
take a little more.

>    - defines function-scope global ("static")
>      variables in the middle of the function
>      (do I just need to get used to this, or do others find this
>      nasty too?).
>   
It's on my TODO-list to make it reentrant.

>    I'm not saying that there is is an error here, but these
>    taken together make the code hard to validate.
>
>    - contains this loop:
> (...)
>   
> That "continue" is actually a NOP, since it just falls through to the 
> "while".  Assuming that all further "send" calls fail (return -1), and 
> that send doesn't try to access *Q unless the socket is connected, the 
> loop will iterate roughly 4.5 billion times, until QueryLen overflows 
> (wraps around reaching zero), all the while acquiring new sockets and 
> releasing them.  Although the loop does finish, this may look like an 
> infinite loop to the user.  (Test program below)
Good catch, that continue was intended to apply to the outer for, not 
the while. Fixed.


> o If the source does not fit in the buffer,
>    copytoEOL leaves the target unterminated (missing '\0').
>   
It's expected to return null-terminated strings. Fixed by making it a 
hard error.
There's probably no point on continuing with a wrong hostname (we could 
crawl
the others, but the server would end reject it). Probably copytoSP 
deserves the same.
Although truncating could be acceptable for the user-agent. :-/


> I would not use the Grub C Client :-(.
>
> Rainer
>   
You still have to evaluate the user support ;)



More information about the Grub-dev mailing list