[Grub-dev] Grub + RelEx | a Google Summer of Code project
Bartek Jasicki
thindil2 at gmail.com
Fri May 9 11:15:56 UTC 2008
On 2008-05-09 at. 14:14:35
"David Hart" <hart at singinst.org> wrote:
> Hi All,
>
> I'm writing to kickoff some conversation about the most useful ways to
> integrate RelEx <http://opencog.org/wiki/RelEx> into Grub (for more
> info, see Rich's project proposal
> <http://opencog.org/wiki/RelEx_Web_Crawler>).
>
> For example, what's the best way to send sentence grammar parses back
> to the grub server? Will a protocol extension be required? How could
> grammar parses potentially be used in the broader
> Atlas<http://search.wikia.com/wiki/Atlas>context? Are some forms more
> useful than others? Can/will the Java GrubNG
> client be suitable as a base? (RelEx is written in Java; GSoC coding
> begins May 26
> <http://code.google.com/opensource/gsoc/2008/faqs.html#0.1_timeline>)
>
> Cheers,
>
> -dave
Hello Dave
I try answer on your questions ;)
1. "what's the best way to send sentence grammar parses back to the grub
server? Will a protocol extension be required?"
For now there no that option. Server can get only pure webpages (with
headers and content). In this moment Grub client only fetch pages and
not make any operations on it (this make a server). Of course it can be
changed in future but, first we must think about advantages and
disadvantages.
Plus:
- Less work to do for server, thus faster all work can be done.
Minus:
- Bigger chances to get fake results. If parsing been made by clients,
then it is only a matter of time, when SEO specialist start modify
source code of clients and send to server crank out workunits.
Its possible to prevent this for example by adding to search database
only this workunits which have this same results from few (3 to 5 or
maybe more) clients, but in this situation, you lost all advantages
which give you parse it in client.
2. "How could grammar parses potentially be used in the broader Atlas
context?"
IMHO - it can be added in all three components of Atlas, after divide
all process (for example, initial parsing can be made by Grub server,
main operations made by Collector and Broker can changing it again in
something readable for human).
3. "Can/will the Java GrubNG client be suitable as a base?"
If i good know, Java client is little outdated (someone use this
client in last 2 months?). But if you want, you can use libraries from
C# client by merge Java and C# code on Mono platform. More information
about Java and Mono you can find here: http://www.mono-project.com/Java
Btw Information for someone who have access to Grub servers - server
with workunits again send happy 500 HTTP error code ;) It works by last
2 days, but again need to be fixed ;)
Regards
Bartek
More information about the Grub-dev
mailing list