Tìm hiểu gem Mechanize

I. Giới thiệu

  1. Thư viện mechanize được sử dụng để tự động tương tác với các trang web. Mechanizesẽ tự động lưu trữ và gửi cookies, follows redirects, và có thể follow links và submit form. Mechanize giữ track của các site mà bạn đã xem giống như là một lịch sử...
  2. Cài đặt
  • Mechanize yêu cầu ruby version từ 1.92
  • Cách 1:
    • Thêm gem "mechanize" vào Gemfile
    • chạy lệnh:
    bundle install
    
  • Cách 2
    • Chạy lệnh command
    sudo gem install Mechanize
    
  1. Sử dụng
    • Tạo 1 đối tượng mới
        [1] pry(main)> agent = Mechanize.new
      => #<Mechanize
       #<Mechanize::CookieJar:0x00000004edd650
        @store=
         #<HTTP::CookieJar::HashStore:0x00000004fb29e0
          @gc_index=0,
          @gc_threshold=150,
          @jar={},
          @logger=nil,
          @mon_count=0,
          @mon_mutex=#<Thread::Mutex:0x00000004fb2918>,
          @mon_owner=nil>>
       nil>
    
    • Lấy thử dữ liệu từ 1 trang web
        [2] pry(main)> agent.get "http://dantri.com.vn/"
        => #<Mechanize::Page
         {url #<URI::HTTP http://dantri.com.vn/>}
         {meta_refresh #<Mechanize::Page::MetaRefresh "" nil>}
         {title "Báo Dân trí | Tin tức Việt Nam và quốc tế nóng, nhanh, cập nhật 24h"}
         {iframes}
         {frames}
         {links
          #<Mechanize::Page::Link "" "/">
          #<Mechanize::Page::Link "Giao diện PDA" "/?removecookie=true">
          #<Mechanize::Page::Link "Khoa học" "/khoa-hoc-cong-nghe.htm">
          #<Mechanize::Page::Link "Blog" "/blog.htm">
          #<Mechanize::Page::Link "Du học" "http://duhoc.dantri.com.vn">
          #<Mechanize::Page::Link "Tuyển sinh" "http://tuyensinh.dantri.com.vn">
          #<Mechanize::Page::Link "So sánh" "http://websosanh.vn/dantri/">
          #<Mechanize::Page::Link "Mua bán" "http://enbac.com">
          #<Mechanize::Page::Link "Nhân ái" "/tam-long-nhan-ai.htm">
          #<Mechanize::Page::Link "Đời sống" "/doi-song.htm">
          #<Mechanize::Page::Link "Diễn đàn" "/dien-dan.htm">
          #<Mechanize::Page::Link "English" "http://www.dtinews.vn">
          #<Mechanize::Page::Link "" "/">
          #<Mechanize::Page::Link "Video" "/video-page.htm">
          #<Mechanize::Page::Link "Sự kiện" "/su-kien.htm">
          #<Mechanize::Page::Link "Xã hội" "/xa-hoi.htm">
          #<Mechanize::Page::Link "Thế giới" "/the-gioi.htm">
          #<Mechanize::Page::Link "Thể thao" "/the-thao.htm">
          #<Mechanize::Page::Link "Giáo dục" "/giao-duc-khuyen-hoc.htm">
          #<Mechanize::Page::Link "Nhân ái" "/tam-long-nhan-ai.htm">
          #<Mechanize::Page::Link "Kinh doanh" "/kinh-doanh.htm">
          #<Mechanize::Page::Link "Văn hóa" "/van-hoa.htm">
          #<Mechanize::Page::Link "Giải trí" "/giai-tri.htm">
          #<Mechanize::Page::Link "Du lịch" "http://dulich.dantri.com.vn">
          #<Mechanize::Page::Link "Pháp luật" "/phap-luat.htm">
          #<Mechanize::Page::Link "Nhịp sống trẻ" "/nhip-song-tre.htm">
          #<Mechanize::Page::Link "Sức khỏe" "/suc-khoe.htm">
          #<Mechanize::Page::Link "Sức mạnh số" "/suc-manh-so.htm">
          #<Mechanize::Page::Link "Xe++" "/o-to-xe-may.htm">
          #<Mechanize::Page::Link "Tình yêu" "/tinh-yeu-gioi-tinh.htm">
          #<Mechanize::Page::Link "Chuyện lạ" "/chuyen-la.htm">
          #<Mechanize::Page::Link "" "/event.htm">
          .................
    
    • Submit Form
    a = Mechanize.new { |agent|
      agent.user_agent_alias = 'Mac Safari'
    }
    search_form = a.get("https://youtube.com").form_with(id: "masthead-search")
    search_form.search_query = "Framgia VietNam"
    result = a.submit search_form
    
     {url #<URI::HTTPS https://www.youtube.com/results?search_query=Framgia+VietNam>}
     {meta_refresh}
     {title "Framgia VietNam - YouTube"}
     {iframes}
     {frames}
     {links
          #<Mechanize::Page::Link "Last hour" "/results?sp=EgIIAQ%253D%253D&q=Framgia+VietNam">
          #<Mechanize::Page::Link "Today" "/results?sp=EgIIAg%253D%253D&q=Framgia+VietNam">
          #<Mechanize::Page::Link "This week" "/results?sp=EgIIAw%253D%253D&q=Framgia+VietNam">
          #<Mechanize::Page::Link "This month" "/results?sp=EgIIBA%253D%253D&q=Framgia+VietNam">
          #<Mechanize::Page::Link "This year" "/results?sp=EgIIBQ%253D%253D&q=Framgia+VietNam">
          #<Mechanize::Page::Link "Video" "/results?sp=EgIQAQ%253D%253D&q=Framgia+VietNam">
          #<Mechanize::Page::Link "Channel" "/results?sp=EgIQAg%253D%253D&q=Framgia+VietNam">
          #<Mechanize::Page::Link "Playlist" "/results?sp=EgIQAw%253D%253D&q=Framgia+VietNam">
          #<Mechanize::Page::Link "Movie" "/results?sp=EgIQBA%253D%253D&q=Framgia+VietNam">
          #<Mechanize::Page::Link "Show" "/results?sp=EgIQBQ%253D%253D&q=Framgia+VietNam">
          #<Mechanize::Page::Link "Short (< 4 minutes)" "/results?sp=EgIYAQ%253D%253D&q=Framgia+VietNam">
          #<Mechanize::Page::Link "Long (> 20 minutes)" "/results?sp=EgIYAg%253D%253D&q=Framgia+VietNam">
          #<Mechanize::Page::Link "4K" "/results?sp=EgJwAQ%253D%253D&q=Framgia+VietNam">
          #<Mechanize::Page::Link "HD" "/results?sp=EgIgAQ%253D%253D&q=Framgia+VietNam">
          #<Mechanize::Page::Link "Subtitles/CC" "/results?sp=EgIoAQ%253D%253D&q=Framgia+VietNam">
          #<Mechanize::Page::Link "Creative Commons" "/results?sp=EgIwAQ%253D%253D&q=Framgia+VietNam">
          #<Mechanize::Page::Link "3D" "/results?sp=EgI4AQ%253D%253D&q=Framgia+VietNam">
          #<Mechanize::Page::Link "Live" "/results?sp=EgJAAQ%253D%253D&q=Framgia+VietNam">
          #<Mechanize::Page::Link "Purchased" "/results?sp=EgJIAQ%253D%253D&q=Framgia+VietNam">
          #<Mechanize::Page::Link "360°" "/results?sp=EgJ4AQ%253D%253D&q=Framgia+VietNam">
          #<Mechanize::Page::Link "Upload date" "/results?sp=CAI%253D&q=Framgia+VietNam">
          #<Mechanize::Page::Link "View count" "/results?sp=CAM%253D&q=Framgia+VietNam">
          #<Mechanize::Page::Link "Rating" "/results?sp=CAE%253D&q=Framgia+VietNam">
          #<Mechanize::Page::Link "\n  \n" "/channel/UCzFu3tWPz55kR9dH9i17MnA">
          #<Mechanize::Page::Link "Framgia Vietnam Family" "/channel/UCzFu3tWPz55kR9dH9i17MnA">
          #<Mechanize::Page::Link "\n  \n3:55" "/watch?v=eTx-_mhAb8M">
          #<Mechanize::Page::Link "[Official] Framgia Vietnam Exercise Dance" "/watch?v=eTx-_mhAb8M">
          #<Mechanize::Page::Link "Framgia Vietnam Family" "/channel/UCzFu3tWPz55kR9dH9i17MnA">
          #<Mechanize::Page::Link "\n  \n2:04" "/watch?v=5jOGtIqBtdw">
          #<Mechanize::Page::Link "FRAMGIA VIETNAM INTRODUCTION" "/watch?v=5jOGtIqBtdw">
          #<Mechanize::Page::Link "Son Nguyen Xuan" "/user/nr0003xx">
          #<Mechanize::Page::Link "\n  \n3:50" "/watch?v=FcCgNCCs7Ws">
          #<Mechanize::Page::Link "Sexy dance in Framgia Vietnam" "/watch?v=FcCgNCCs7Ws">
          #<Mechanize::Page::Link "Vũ Ngọc" "/channel/UCkR4iH5qTajsKWj28nB4qLw">
          #<Mechanize::Page::Link "\n  \n2:02" "/watch?v=LD34r19mUrs">
          #<Mechanize::Page::Link "[Framgia Autumn Party 2016]  Hòa âm ánh sáng The Remix" "/watch?v=LD34r19mUrs">
          #<Mechanize::Page::Link "Framgia Vietnam Family" "/channel/UCzFu3tWPz55kR9dH9i17MnA">
          #<Mechanize::Page::Link "\n  \n5:13" "/watch?v=WZoihB8sey4">
          #<Mechanize::Page::Link "Official Video 2nd Framgia VN" "/watch?v=WZoihB8sey4">
          #<Mechanize::Page::Link "Lien Dinh" "/channel/UCb7MSbPiJxnXxhOtkYBs7eg">
          #<Mechanize::Page::Link "\n  \n5:29" "/watch?v=b-QbIuKA8e0">
          #<Mechanize::Page::Link "[Framgia Vietnam] Official Staff Interview - December" "/watch?v=b-QbIuKA8e0">
          #<Mechanize::Page::Link "Framgia Vietnam Family" "/channel/UCzFu3tWPz55kR9dH9i17MnA">
          #<Mechanize::Page::Link "\n  \n1:17" "/watch?v=IkFSHE1IP-E">
          #<Mechanize::Page::Link "[Framgia Vietnam] Afternoon Exercise 02/08/2016" "/watch?v=IkFSHE1IP-E">
    
    • Login
    a = Mechanize.new { |agent|
      agent.user_agent_alias = 'Mac Safari'
    }
    login_form = a.get("https://github.com/login").form_with(action: "/session")
    login_form.name = "user_name"
    login_form.password = "passsword"
    result = a.submit login_form
    
    => #<Mechanize::Page
     {url #<URI::HTTPS https://github.com/>}
     {meta_refresh}
     {title "GitHub"}
     {iframes}
     {frames}
     {links
      #<Mechanize::Page::Link "Skip to content" "#start-of-content">
      #<Mechanize::Page::Link "\n  \n" "https://github.com/">
      #<Mechanize::Page::Link "\n            Pull requests\n" "/pulls">
      #<Mechanize::Page::Link "\n            Issues\n" "/issues">
      #<Mechanize::Page::Link "Gist" "https://gist.github.com/">
      #<Mechanize::Page::Link "\n        \n        \n" "/notifications">
      #<Mechanize::Page::Link "\n      \n      \n    " "/new">
      #<Mechanize::Page::Link "\n  New repository\n" "/new">
      #<Mechanize::Page::Link "\n    Import repository\n  " "/new/import">
      #<Mechanize::Page::Link "\n  New gist\n" "https://gist.github.com/">
      #<Mechanize::Page::Link "\n    New organization\n  " "/organizations/new">
      #<Mechanize::Page::Link "\n      \n      \n    " "/user_name">
      #<Mechanize::Page::Link "\n          Your profile\n        " "/user_name">
      #<Mechanize::Page::Link "\n          Your stars\n        " "/user_name?tab=stars">
      #<Mechanize::Page::Link "\n          Explore\n        " "/explore">
      #<Mechanize::Page::Link "\n            Integrations\n          " "/integrations">
      #<Mechanize::Page::Link "\n          Help\n        " "https://help.github.com">
      #<Mechanize::Page::Link "\n          Settings\n        " "/settings/profile">
    

=> Để thu data của các website vào database của mình ta viết các tasks để import vào