Finding a Better WordPress Search


by Matthew Boynes / @senyob / m@boyn.es
Alley Interactive / @alleydev / alleyinteractive.com

You

Built-in Search: History

  • Searched the title and content for all words
  • Each term must be present
  • Ordered by date

Built-in Search: 3.7 Updates

  • Basic relevance sorting

    1. Exact phrase is in title
    2. All keywords are in title
    3. Some keywords are in title
    4. Exact phrase is in content
    5. Keywords in the content
  • Stopwords are removed

    about, an, are, as, at, be, by, com, for, from, how, in, is, it, of, on, or, that, the, this, to, was, what, when, where, who, will, with, www

    Faster!

Built-in Search: Problems

1. Relevant Results

  • 3.7 Helped! Still not great
  • Only title and content are searched
  • Searches are pretty literal

Built-in Search: Problems

2. Speed / Performance

Anecdotal data using a five-term search...

  • 700 rows in wp_posts: 350ms
  • 11,000 rows in wp_posts: 1.2s
  • 468,000 rows in wp_posts: 100s (!)

Bottom line: doesn't scale

Built-in Search: Problems

3. Features

</complaining>

Finding a Better Search

Three critical components

  • Relevance
  • Speed & Performance
  • Features

Solutions

  • MySQL-Based
  • Google CSE
  • Dedicated Search Engine

MySQL-Based

Pros:

  • Simple
  • No additional software

Cons:

  • Complex queries
  • Slow
  • Affects site performance

Google CSE

Pros:

  • Power of Google
  • Good zero-configuration relevance
  • No impact on performance

Cons:

  • Limited configuration & customization
  • Advertising or $$$

Dedicated Search Engine

Pros:

  • Customizable
  • Fast
  • No impact on performance

Cons:

  • Need to implement it
  • Need to manage it

Dedicated Search Engine

  • Solr
  • Amazon CloudSearch
  • Sphinx
  • Xapian
  • Elasticsearch

Elasticsearch Overview

"flexible and powerful open source, distributed real-time search and analytics engine for the cloud"

  • Cloud out-of-the-box
  • JSON API
  • Schema free

Elasticsearch Relevance

  • Built-in relevance scoring

    Elasticsearch scoring formula

  • Text analysis, e.g. word stemming
  • Customizable
  • Search any or all fields

Elasticsearch Performance

  • Super fast, no impact on site performance
  • 400,000 posts <0.1s
  • Built for scaling

Elasticsearch Features

  • Faceting
  • Spelling correction ("did you mean?")
  • Geo searches
  • Wildcard searches
  • Many more...

Case Study: kff.org

  • Major source of information in US health care and US role in global health care
  • WordPress.com VIP
  • Data galore!

SearchPress

  1. Dead simple on the outside
  2. Developer-friendly on the inside
  3. Limitless & Fast

SearchPress: Simplicity

  • Settings: URL
  • Index happens behind the scenes
  • Search is replaced automatically

SearchPress: Developer Friendly

  • Numerous actions & filters
  • WordPress-friendly access to the search API

SearchPress: Limitless

  • Built with massive sites in mind

Case Study: JTA

Case Study: JTA

  • Need to search tags, authors, post meta
  • Tried Google CSE, not customizable enough
  • Date issues
  • 400k published posts

Questions?

SearchPress:
https://github.com/alleyinteractive/SearchPress
Twitter:
@senyob
Email:
m@boyn.es