Image stitching

👋 Giới thiệu

Image Stitching hay còn được biết đến là ghép ảnh là một kỹ thuật trong thị giác máy tính nhằm kết hợp ảnh của cùng một khung cảnh nhưng được chụp ở nhiều góc khác nhau để tạo thành một bức ảnh toàn cảnh. Ở bài viết này mình sẽ giới thiệu về các bước để thực hiện kỹ thuật này và code của từng bước đó.

🔬 Các bước thực hiện

Feature detection and description (SIFT)

Bước đầu tiên của kỹ thuật này là trích xuất các đặc tính có ích hay keypoints. Vậy các keypoints ở đây là gì? Theo những gì mình đọc được từ docs của opencv thì có 3 loại đặc trưng chính đó là flats, edges, và corners, trong đó corners là đặc tính quan trọng nhất. Để hiểu rõ hơn, mình sẽ lấy ví dụ trong hình dưới đây:

Câu hỏi đặt ra là "Làm sao để tìm chính xác vị trí của các điểm A, B, C, D, E và F trên ảnh?". Đối với 2 điểm A và B, chúng là những mặt phẳng và xuất hiện tại rất nhiều vị trí trong ảnh, do đó rất khó để có thể xác định được chính xác vị trí của chúng. Tương tự, 2 điểm C và D ta có thể dễ dàng nhận thấy chúng là cạnh của toà nhà trong ảnh, tuy nhiên vẫn khá khó để xác thể xác định vị trí thực sự của chúng trong ảnh. Cuối cùng là 2 điểm góc E và F, nhìn thoáng qua ta cũng có thể biết 2 điểm này nằm nằm ở hai vị trí góc của toàn nhà trong ảnh. Từ đó ta có thể nhận ra rằng các đặc tính góc thường được coi là những đặc tính quan trọng vì chúng dễ dàng nhận biết và phân biệt trên những bức ảnh khác nhau.

Từ đây ta có 2 khái niệm là feature detection và feature description trong đó: - Feature detection: Là quá trình giúp ta tìm kiếm các đặc tính. - Feature description: Là quá trình giúp ta mô tả các đặc tính vừa tìm được để từ đó ta có thể tìm được các đặc tính đó trong những bức ảnh khác nhau.
Có rất nhiều thuật toán giải quyết vấn đề này tuy nhiên trong bài viết lần này mình sẽ chỉ tập trung vào thuật toán SIFT.

Motivation: SIFT là thuật toán được sinh ra nhằm giải quyết hạn chế của thuật toán Harris khi không thể phát hiện các điểm góc một cách nhất quán khi kích thước của đối tượng thay đổi trong ảnh. Điều này làm giảm hiệu quả trong các ứng dụng yêu cầu nhận dạng đối tượng ở nhiều kích thước khác nhau.
SIFT giải quyết hạn chế đó bằng cách sử dụng một phương pháp scale-space filtering là Difference of Gaussians giúp trích xuất đặc tính tại nhiều kích thước ảnh khác nhau.

def find_keypoints(img):
 keypoints_descriptor = cv2.SIFT_create()
 kp, des = keypoints_descriptor.detectAndCompute(img,None)

 return kp, des

Keypoint matching

Sau khi đã tìm được các keypoints của 2 bức ảnh, ta cần lọc ra những đặc tính xuất hiện ở cả hai bức ảnh bằng phương pháp KDTree. Phương pháp này có tốc độ nhanh hơn so với phương pháp Brute-force tuy nhiên độ chính xác sẽ thấp hơn.

def match_keypoints(ds1, ds2):
  RATIO_TEST = 0.7
  TREES = 5
  FLANN_INDEX_KDTREE = 1

  keypoints_matcher = cv2.FlannBasedMatcher(
        dict(algorithm = FLANN_INDEX_KDTREE, trees = TREES),
        dict(checks=50)
    )
  matches = keypoints_matcher.knnMatch(ds1,ds2,k=2)
  good = []
  for m,n in matches:
    if m.distance < RATIO_TEST*n.distance:
        good.append(m)

  return good

Ở hàm này có 2 tham số chính mà mọi người có thể fine-tune đó là trees và checks. Giá trị càng cao thì độ chính xác cũng sẽ tăng tuy nhiên thời gian để thực hiện thuật toán sẽ chậm hơn.

Find best hormography

Mục đích của bước này chính là chọn ra phép biến đổi tốt nhất từ những điểm được chọn sử dụng phương pháp ước lượng RANSAC.

def compute_homography(kp_query, kp_fit, matches):
    if len(matches) >= 10:
      pts_query = np.float32(
          [ kp_query[m.queryIdx].pt for m in matches ]
      ).reshape(-1,1,2)
      pts_fit = np.float32(
          [ kp_fit[m.trainIdx].pt for m in matches ]
      ).reshape(-1,1,2)

      H, status = cv2.findHomography(pts_fit, pts_query, cv2.RANSAC,5.0)

      return (matches, H, status)
    else:
      print(f"Minimum match count not satisfied")
      return None

Blending

Cuối cùng để tạo hiệu ứng chuyển tiếp mượt hơn sau khi kết hợp hai bức ảnh mình sẽ sử dụng phương pháp weighted blending

def create_mask(query_img, fit_img, version):
  SMOOTHING_WINDOW_PERCENT = 0.1
  height_query_photo = query_img.shape[0]
  width_query_photo = query_img.shape[1]
  width_fit_photo = fit_img.shape[1]
  height_panorama = height_query_photo
  width_panorama = width_query_photo + width_fit_photo

  lowest_width = min(width_query_photo, width_fit_photo)
  smoothing_window_size = max(100, min(SMOOTHING_WINDOW_PERCENT * lowest_width, 1000))


  offset = int(smoothing_window_size / 2)
  barrier = query_img.shape[1] - int(smoothing_window_size / 2)
  mask = np.zeros((height_panorama, width_panorama))

  if version == "left_image":
      mask[:, barrier - offset : barrier + offset] = np.tile(
          np.linspace(1, 0, 2 * offset).T, (height_panorama, 1)
      )
      mask[:, : barrier - offset] = 1
  else:
      mask[:, barrier - offset : barrier + offset] = np.tile(
          np.linspace(0, 1, 2 * offset).T, (height_panorama, 1)
      )
      mask[:, barrier + offset :] = 1

  return cv2.merge([mask, mask, mask])

def blending(query_img, fit_img, homography_matrix):
  height_img1 = query_img.shape[0]
  width_img1 = query_img.shape[1]
  width_img2 = fit_img.shape[1]
  height_panorama = height_img1
  width_panorama = width_img1 + width_img2

  panorama1 = np.zeros((height_panorama, width_panorama, 3))
  mask1 = create_mask(query_img, fit_img, version="left_image")
  panorama1[0 : self.query_img.shape[0], 0 : self.query_img.shape[1], :] = self.query_img
  panorama1 *= mask1
  mask2 = self.create_mask(query_img, fit_img, version="right_image")
  panorama2 = (
      cv2.warpPerspective(
          self.fit_img, homography_matrix, (width_panorama, height_panorama)
      )
      * mask2
  )
  result = panorama1 + panorama2

  # remove extra blackspace
  rows, cols = np.where(result[:, :, 0] != 0)
  min_row, max_row = min(rows), max(rows) + 1
  min_col, max_col = min(cols), max(cols) + 1

  final_result = result[min_row:max_row, min_col:max_col, :]

  return final_result

Mình sẽ thử kết hợp hai bức ảnh này:

Đây là kết quả:

Kết hợp lại

class ImageStitching:
  def __init__(self,
               flann_index,
               trees,
               ratio_test,
               min_match_count,
               smoothing_window_percent
               ):
    self.ratio_test = ratio_test
    self.min_match_count = min_match_count
    self.keypoints_matcher = cv2.FlannBasedMatcher(
        dict(algorithm = flann_index, trees = trees),
        dict(checks=50)
    )
    self.keypoints_descriptor = cv2.SIFT_create()
    self.smoothing_window_percent = smoothing_window_percent

  def find_keypoints(self, img):
    kp, des = self.keypoints_descriptor.detectAndCompute(img,None)

    return kp, des

  def match_keypoints(self, ds1, ds2):
    matches = self.keypoints_matcher.knnMatch(ds1,ds2,k=2)
    good = []
    for m,n in matches:
      if m.distance < self.ratio_test*n.distance:
          good.append(m)

    return good

  def compute_homography(self, kp_query, kp_fit, matches):
    if len(matches) >= self.min_match_count:
      pts_query = np.float32(
          [ kp_query[m.queryIdx].pt for m in matches ]
      ).reshape(-1,1,2)
      pts_fit = np.float32(
          [ kp_fit[m.trainIdx].pt for m in matches ]
      ).reshape(-1,1,2)

      H, status = cv2.findHomography(pts_fit, pts_query, cv2.RANSAC,5.0)

      return (matches, H, status)
    else:
      print(f"Minimum match count not satisfied cannot get homopgrahy")
      return None

  def create_mask(self, query_img, fit_img, version):
    height_query_photo = query_img.shape[0]
    width_query_photo = query_img.shape[1]
    width_fit_photo = fit_img.shape[1]
    height_panorama = height_query_photo
    width_panorama = width_query_photo + width_fit_photo

    lowest_width = min(width_query_photo, width_fit_photo)
    smoothing_window_size = max(100, min(self.smoothing_window_percent * lowest_width, 1000))

    offset = int(smoothing_window_size / 2)
    barrier = query_img.shape[1] - int(smoothing_window_size / 2)
    mask = np.zeros((height_panorama, width_panorama))

    if version == "left_image":
        mask[:, barrier - offset : barrier + offset] = np.tile(
            np.linspace(1, 0, 2 * offset).T, (height_panorama, 1)
        )
        mask[:, : barrier - offset] = 1
    else:
        mask[:, barrier - offset : barrier + offset] = np.tile(
            np.linspace(0, 1, 2 * offset).T, (height_panorama, 1)
        )
        mask[:, barrier + offset :] = 1

    return cv2.merge([mask, mask, mask])

  def blending(self, query_img, fit_img, homography_matrix):
    height_img1 = query_img.shape[0]
    width_img1 = query_img.shape[1]
    width_img2 = fit_img.shape[1]
    height_panorama = height_img1
    width_panorama = width_img1 + width_img2

    panorama1 = np.zeros((height_panorama, width_panorama, 3))
    mask1 = self.create_mask(query_img, fit_img, version="left_image")
    panorama1[0 : query_img.shape[0], 0 : query_img.shape[1], :] = query_img
    panorama1 *= mask1
    mask2 = self.create_mask(query_img, fit_img, version="right_image")
    panorama2 = (
        cv2.warpPerspective(
            fit_img, homography_matrix, (width_panorama, height_panorama)
        )
        * mask2
    )
    result = panorama1 + panorama2

    # remove extra blackspace
    rows, cols = np.where(result[:, :, 0] != 0)
    min_row, max_row = min(rows), max(rows) + 1
    min_col, max_col = min(cols), max(cols) + 1

    final_result = result[min_row:max_row, min_col:max_col, :]

    return final_result

  def __call__(self, query_img, fit_img):
    kp1, ds1 = self.find_keypoints(query_img)
    kp2, ds2 = self.find_keypoints(fit_img)
    good_kp = self.match_keypoints(ds1, ds2)
    matches, H, status = self.compute_homography(kp1, kp2, good_kp)
    final_res = self.blending(query_img, fit_img, H)

    return final_res

Cách sử dụng đối với 2 bức ảnh:

class ImageStitchingConfig:
  TREES = 5
  FLANN_INDEX_KDTREE = 1
  RATIO_TEST = 0.7
  MIN_MATCH_COUNT = 10
  SMOOTHING_WINDOW_PERCENT = 0.1 # consider increasing or decreasing[0.00, 1.00]
  
stitch = ImageStitching(
    ImageStitchingConfig.FLANN_INDEX_KDTREE,
    ImageStitchingConfig.TREES,
    ImageStitchingConfig.RATIO_TEST,
    ImageStitchingConfig.MIN_MATCH_COUNT,
    ImageStitchingConfig.SMOOTHING_WINDOW_PERCENT
)

final_res = stitch(img1, img2)

Cách sử dụng khi muốn kết hợp nhiều ảnh:

def recursion(list_img):
  if len(list_img) <= 2:
    print("The number of images should greater than 2.")
    return None

  final_img = None
  for i in range(1, len(list_img)):
    if i == 1:
      img1 = cv2.imread(list_img[i-1])
      img2 = cv2.imread(list_img[i])

      img1 = cv2.cvtColor(img1, cv2.COLOR_BGR2RGB)
      img2 = cv2.cvtColor(img2, cv2.COLOR_BGR2RGB)
      
      final_img = stitch(img1, img2).astype('uint8')
    else:
      img2 = cv2.imread(list_img[i])
      img2 = cv2.cvtColor(img2, cv2.COLOR_BGR2RGB)
      
      final_img = stitch(final_img, img2).astype('uint8')
  return final_img
  
list_img = ["/content/medium00.jpg",
            "/content/medium01.jpg",
            "/content/medium02.jpg",
            "/content/medium03.jpg",]
final_img = recursion(list_img)