June 20 2019 04:22:04
· Home
· CV
· Articles
· Links
· News Categories
· Media Gallery
· Search


Forgotten your password?
Request a new one here.
More SearchEngine goodness...
SoftwareI spent today tweaking and fixing issues (such as invalid hash collisions meaning loss of data in the indexes and word offsets not being added correctly). Perhaps of more importance, I added monitoring to the backend using my monitoring framework! This is the first time I've really been able to test the SearchEngine properly and here is how it looks! I've blurred out the content's because it's currently indexing the RuneScape knowledge base (hey, since I wrote Jagex's SearchTool libraries, I figured I'd use the same dataset for SearchEngine!). The data was all legally obtained via scraping their website with this little BASH script:
while [ $i -lt 9999 ]; do
   wget http://www.runescape.com/kbase/viewarticle.ws?article_id=$i -O $i.html
   i=`expr $i + 1`
So, how does it perform? Pretty well! It's not as fast as the implementation I wrote for Jagex (we used custom libraries which were a lot faster), however it's pretty close:
# Hash Fields: 2
    * [0] - Word count: 983
    * [1] - Word count: 10605
# Trie Fields: 2
    * [0] - Word count: 1249
    * [1] - Word count: 25428
# Average search time: 4ms
# Longest search time: 17ms
# Shortest search time: 0ms
# Search Count: 9
After only 9 searches, the average is down to 4ms (the longest average is always the first search since the JVM does initialisation when processing the first search). I believe the average search times at Jagex were on the order of 1.5ms, but I can't guarantee this figure. Not bad, though!

Today I realised that I have forgotten a pretty crucial part of the implementation - search logging! So that's what I'll be working on next.
760,337 unique visits