0

What is ElasticSearch ?

elasticsearch.png

Introduce ElasticSearch

First we need to understand ElasticSearch is a search engine enterprise level (enterprise-level search engine). Its goal is to create a tool, or technical platform search and analysis in real time (referring here is fast and accurate), and how it can be applied to or deployed an easy way to source data (data sources) different. Data sources included above on the popular databases such as MS SQL, PostgreSQL, MySQL, ... but it can text (text), electronic mail (email), pdf, ... said General and amuse all things related to data writing. This issue I will explain further in the section below. ElasticSearch developed by Shay Banon and Apache Lucene-based, ElasticSearch is an open source distributions for finding data on the server. This is a scalable solution, supporting real-time search without a special configuration. It has been adopted by several companies, including StumbleUpon, and Mozilla. ElasticSearch is released under the Apache License 2.0. Some information about ElesticSearch:

  • Elasticsearch is a search engine.
  • Elasticsearch built to act as a server under the regime of RESTful cloud.
  • Inheritance and development of Apache Lucene.
  • Developed using the Java language.
  • As open-source software is released under the Apache License allows cleaning.
  • Similarly Solr (Apache).

Basic Understanding

As we can see the below points are ElasticSearch’s main concepts:

  • Cluster: A set of Nodes (servers) that holds all the data.
  • Node: A single server that holds some data and participate on the - -cluster's indexing and querying.
  • Index: Forget SQL Indexes. Each ES Index is a set of Documents.
  • Shards: A subset of Documents of an Index. An Index can be divided in many shards.
  • Type: A definition of the schema of a Document inside of an Index (a Index can have more than one type assigned).
  • Document: A JSON object with some data. It's the basic information unit in

ElasticSearch can be integrated with all applications that use the following types of language. We can intergrated ElasticSearch into our applications that use the language as below:

  • Java
  • JavaScript
  • Groovy
  • .NET
  • PHP
  • Perl
  • Python
  • Ruby

Those who have used Elasticsearch:

  • Mozilla
  • Quora
  • SoundCloud
  • GitHub
  • stack Exchange
  • Center for Open Science
  • reverb
  • Netflix.

ElasticSearch Mechanism of Action

You note that this is the foundation of ElasticSearch handling mechanism was announced and explained in my understanding, and in fact, its mechanism of action may be much more complex! As any developer would never share their best techniques.

Sure you would pay attention to the "real time" real time where I have often referred to above, this is also the development of criteria Elasticsearch, the reason Elasticsearch called "search & analyze in real time "because it has the ability to return search results quickly and accurately in a large data sources (big data source).

So how it works? To explain this we return to the data source (data source) that I mentioned above it does not only include the storage source databases like MySQL famous, MS SQL, PostgreSQL, but it could be text (text), pdf, doc, ... To explain this I eg data storage source is your email, in case you want to search for an email if you remember information such as title, sender, date sent, then finding it very easy, but put the case you do not remember only vaguely remember the content, or for example you want to find emails with similar content, they must do? When your data source is not stored in a structured and on a well-known database full of the necessary support. More specifically, I will take the example is a Web site that allows storage of files DOC, PDF, TXT, for example, for this page do not find data on the database that is looking directly on the file this message, try a little attention first 10 pages alone sized files looking tired, so 100 thousand files, file 1000 will look like? And the idea ElasticSearch also be formed from here.

Analysis

Analysis is a text transformation, text input and outputs a chunk of terms (tokens). One of the best features of the ES is bundled with lots of built-in analyzers. You imagine a function that every word in the text block and returns stemmed form of each word. Or a function that must be taken to remove all text and stop words . Depending on what you need, you can use one or multiple analyzers to transform the original text. In ES, analyzer useful in building index (database index) and speeding up the search through our document.

Querying

One other powerful features of ElasticSearch that is providing all type query. There are nearly 40 query type and is probably one of these will meet your needs perfectly.

We have a phrase for textual search queries, queries based on geo coordinates, numeric range queries. They can be very useful for data synthesis and more.

Should use ElasticSearch or not?

Maybe speed is the main concept that we ElasticSearch. ElasticSearch can make your system faster than before and has very good performance The other thing that is good when we use ElasticSearch is that for queries the result can be ordered by Relevance. By default ElasticSearch uses the TF/IDF similarity algorithm to calculate the relevance. While it have a lot of good points but it still have some bad things too. There is no silver bullet! ElasticSearch is very powerful in searching and aggregating data, but if you have an environment of extremely writing operations, maybe ES won't be your best option. Also, it doesn't have any kind of transactional operations. But as long as you don't rely on it as your primary data storage you should be fine.

Using ElasticSearch in RoR

At the first we must add the 2 gem into Gemfile.

gem 'elasticsearch-model'
gem 'elasticsearch-rails'

Create file: concerns/searchable.rb

require 'elasticsearch/model'

module Searchable
  extend ActiveSupport::Concern

  included do
    include Elasticsearch::Model
    include Elasticsearch::Model::Callbacks

    def self.search_by(type, query)
      self.search("#{type}:#{query}")
    end
  end

end

Create User and add the below into model

class User < ActiveRecord::Base
  include Searchable
...
end

Let's run rake command in terminal for import data from model into ElasticSearch

rake environment elasticsearch:import:all

You can also just index some fields for Model for saving memory in ElasticSearch

class User < ActiveRecord::Base
...
  def as_indexed_json(options = {})
    self.as_json({
      only: [:name, :email],
      include: {
        books: { only: :name }
      }
    })
  end
end

Now it's time to test it using Rails Console

User.search('*').map { |u| u.name }

User.search('Kane').records.to_a

User.search('Viblo').results.total

@users = User.search(params[:q]).page(params[:page]).records

response = User.search query: {match: {name: 'Tech Master'}}

Book.search(query: { query_string: {query: "title: *Ruby*"}}, size: 15, from: 2).records.map{|b| b.price}

All rights reserved

Viblo
Hãy đăng ký một tài khoản Viblo để nhận được nhiều bài viết thú vị hơn.
Đăng kí