27.2K 734 408

Đã đăng vào thg 11 2, 2022 9:16 SA 29 phút đọc

852

Giải ngố 🤪 Buffer trong Node.js là gì? 🤪 (Series: Bí kíp Javascript - PHẦN 22) - (Song Ngữ: VN - EN - JP)

Bài đăng này đã không được cập nhật trong 2 năm

Bạn có luôn bối rối, giống như mình, bất cứ khi nào bắt gặp các từ như Buffer, Stream và binary data trong Node.js không? Cảm giác đó có khiến mình không hiểu chúng, nghĩ rằng chúng không dành cho mình mà chỉ dành cho các chuyên gia Node.js và các package developers mới hiểu được?

Thật vậy, những từ đó có thể rất đáng sợ, đặc biệt là khi bạn bắt đầu với Node.js mà không qua bất kỳ trường lớp chính quy nào. Nhưng kể cả có cơ hội tiếp cận nó trong giản đường đại học mình cũng ko nắm được nhiều vì lỡ đánh rơi bút trong giờ học binary data ngẩng đầu lên thì thầy đã viết đầy 3 bảng .

Đáng buồn thay, nhiều hướng dẫn và sách thường sẽ chuyển thẳng sang hướng dẫn cách phát triển ứng dụng với các gói Node.js mà không cho bạn hiểu các tính năng cốt lõi của Node.js và tại sao chúng tồn tại. Và một số sẽ nói với bạn một cách trơ trẽn rằng bạn không cần phải hiểu chúng vì bạn có thể không bao giờ làm việc trực tiếp với chúng.

Chà, đúng là, bạn có thể không bao giờ làm việc trực tiếp với chúng nếu bạn vẫn là Node.js developer bình thường.

Tuy nhiên, nếu những điều bí ẩn khiến bạn thực sự tò mò và bạn sẽ không dừng lại ở đó để thỏa sự tò mò của mình. Bạn muốn nâng tầm hiểu biết về Node.js của mình lên một tầm cao mới, thì bạn thực sự muốn tìm hiểu sâu hơn để hiểu nhiều tính năng cốt lõi của Node.js, như Buffer chẳng hạn. Và đó chính xác là lý do tại sao mình viết bài này. Để giúp làm sáng tỏ một số tính năng và đưa việc học Node.js của chúng ta lên một cấp độ mới .

Khi giới thiệu về Buffer , tài liệu chính thức của Node.js có giới thiệu như sau…

…Cơ chế để đọc hoặc thao tác với các binary data stream. Lớp Buffer như một phần của API Node.js để giúp nó có thể tương tác với các stream octet trong contexts của những thứ như stream TCP và hoạt động của file system....

Hmmm, trừ khi bạn có kiến thức trước về tất cả các từ trong các câu trên, chúng có thể chỉ là một loạt các từ ngữ chuyên ngành. Hãy cố gắng đơn giản hóa điều đó một chút bằng cách diễn đạt lại nó, để có thể tập trung rõ ràng và không bị phân tâm bởi những từ ngữ chuyên ngành đó. Có thể giải thích đại khái như sau:

Lớp Buffer được giới thiệu như một phần của API Node.js để giúp nó có thể thao tác hoặc tương tác với các binary data stream.

Bây giờ điều đó đơn giản hơn phải không? Nhưng …Buffer, streams, binary data… những từ này cũng ko giễ mà . Ok, chúng ta hãy cố gắng giải quyết những từ này từ cuối cùng đến đầu tiên.

Binary data (Dữ liệu nhị phân) là gì?

Bạn có thể đã biết rằng máy tính lưu trữ và biểu diễn dữ liệu bằng tệp nhị phân (Binary). Nhị phân chỉ đơn giản là một tập hợp các số 1 và 0. Ví dụ: sau đây là năm mã nhị phân khác nhau (năm bộ số 1 và số 0 khác nhau):

10,01,001, 1110,00101011

Mỗi số trong một hệ nhị phân bao gồm các số 1 và 0 được gọi là Bit , là một dạng rút gọn của Binary digIT.

Để lưu trữ hoặc biểu diễn một phần dữ liệu, máy tính cần chuyển đổi dữ liệu đó sang dạng biểu diễn nhị phân của nó. Ví dụ, để lưu trữ số 12, máy tính cần chuyển 12 thành biểu diễn nhị phân của nó 1100.

Làm thế nào để một máy tính biết cách thực hiện chuyển đổi này? Thực ra nó chỉ là một phép toán thuần túy và giống như số nhị phân đơn giản mà chúng ta đã học trong toán phổ thông - biểu thị một số với cơ số 2. Sử dụng máy tính Casio cũng thực hiện được phép tính này.

Nhưng số không phải là kiểu dữ liệu duy nhất mà chúng ta có thể chuyển thần số nhị phân. String, hình ảnh và thậm chí cả video cũng có thể chuyển thành mã nhị phân. Máy tính biết cách biểu diễn tất cả các loại dữ liệu đó dưới dạng mã nhị phân. Ví dụ, làm thế nào máy tính biểu diễn string “L” bằng mã nhị phân? Để lưu trữ bất kỳ ký tự nào thành mã nhị phân, Đầu tiên Máy tính sẽ chuyển đổi ký tự đó thành một số, sau đó chuyển đổi số đó thành biểu diễn nhị phân của nó. Vì vậy, đối với string “L”, trước tiên máy tính sẽ chuyển đổi L thành một số đại diện cho L.

Mở Terminal trình duyệt của bạn và dán đoạn code sau, sau đó nhấn enter "L".charCodeAt(0):. Bạn đã thấy gì? là số 76? Đó là đại diện số hoặc Character Code hoặc Code Point của ký tự L. Nhưng làm thế nào một máy tính biết chính xác số nào sẽ đại diện cho mỗi ký tự? Làm thế nào nó biết sử dụng số 76 để biểu diễn L?

"L".charCodeAt(0)
76

Character Sets

Character Sets dùng để xác định các quy tắc về số nào đại diện cho ký tự nào. Chúng ta có các định nghĩa khác nhau về các quy tắc này, những quy tắc rất phổ biến bao gồm Unicode và ASCII. JavaScript thực sự hoạt động tốt với Character Sets Unicode. Trên thực tế, chính Unicode trong trình duyệt của bạn đã trả lời 76 là đại diện cho L.

Vì vậy, chúng ta đã biết cách máy tính biểu diễn các ký tự dưới dạng số. Bây giờ, đến lượt máy tính sẽ biểu diễn số 76 thành biểu diễn nhị phân. Bạn có thể nghĩ rằng chỉ cần chuyển đổi 76 thành hệ thống số cơ số 2. Không thế thì quá đơn giản tiếp tục phần tiếp theo nhé!

Character Encoding

Cũng giống như các quy tắc xác định số nào sẽ đại diện cho một ký tự, cũng có các quy tắc xác định cách số đó nên được biểu diễn bằng mã nhị phân. Cụ thể là dùng bao nhiêu bit để biểu diễn số. Đây được gọi là Character Encoding .

Một trong những định nghĩa dùng cho việc Character Encoding là UTF-8. UTF-8 nói rằng các ký tự phải được lập trình theo byte. Một byte là một tập hợp 8bit - 8bit bao gồm 1 và 0. Vì vậy, tám số 1 và 0 nên được sử dụng để đại diện cho Code Point của bất kỳ ký tự nào trong hệ nhị phân khi sử dụng UTF-8. (Tham khảo thêm một số các Character Encoding khác tại đây)

Để hiểu điều này, như chúng ta đã đề cập trước đó, biểu diễn nhị phân của số 12 là 1100. Vì vậy, khi UTF-8 nói rằng 12 phải ở dạng tám bit, UTF-8 đang nói cho máy tính cần thêm nhiều bit hơn vào bên trái của biểu diễn cơ số 2 thực tế của số 12 để biến nó thành một byte. Vì vậy, 12 nên được lưu trữ như 00001100.

Do đó, 76 nên được lưu là 01001100 khi dung Character Encoding UTF-8, thay vì 1100 ở dạng cơ số 2 của nó.

Yeah chúng ta đã tìm hiểu cách mà máy tính lưu trữ các string bằng mã nhị phân. Tương tự như vậy, máy tính cũng có các quy tắc cụ thể về việc lưu các dữ liệu khác như: hình ảnh và video đẻ chuyển đổi và lưu trữ bằng dự liệu dưới dạng nhị phân. Tóm lại, máy tính lưu trữ tất cả các kiểu dữ liệu bằng tệp nhị phân được gọi là binary data.

Nếu bạn cực kỳ quan tâm đến tính thực tế của Character Encoding, bạn có thể thích phần giới thiệu đơn giản và chi tiết này .

Bây giờ chúng ta đã hiểu binary data là gì, nhưng binary data stream từ phần giới thiệu của chúng ta về Buffer là gì?

Stream

Stream trong Node.js chỉ đơn giản là một dữ liệu string được di chuyển từ điểm này sang điểm khác trong một thời gian nhất định. Hiểu đơn giản hơn là, bạn có một lượng lớn dữ liệu cần xử lý, nhưng bạn không cần phải đợi tất cả dữ liệu có sẵn rồi mới bắt đầu xử lý. (Chúng ta sẽ xử lý dần dần tới chừng nào xử lý từng đó)

Về cơ bản, dữ liệu lớn này được chia nhỏ và gửi thành nhiều phần (Chunk). Vì vậy, từ định nghĩa ban đầu của Buffer (“binary data stream… trong contexts của… file system”), điều này đơn giản có nghĩa là binary data được di chuyển trong file system. Ví dụ: di chuyển văn bản được lưu trữ trong file 1.txt sang file 2.txt.

Nhưng chính xác thì Buffer giúp chúng ta tương tác hoặc thao tác với binary data như thế nào trong khi truyền dữ liệu? Chính xác thì Buffer này là gì?

Buffer

Chúng ta đã thấy rằng stream là sự di chuyển của dữ liệu từ điểm này sang điểm khác, nhưng chúng được di chuyển chính xác như thế nào?

Thông thường, sự di chuyển của dữ liệu thường là với mục đích xử lý hoặc đọc nó và xử lý gì đó dựa trên nó. Nhưng có một lượng dữ liệu tối thiểu và tối đa mà một quá trình có thể mất theo thời gian. Vì vậy, nếu tốc độ dữ liệu đến nhanh hơn tốc độ của quá trình xử lý dữ liệu, thì dữ liệu thừa cần phải đợi ở đâu đó cho đến khi đến lượt nó được xử lý.

Mặt khác, nếu quá trình xử lý dữ liệu nhanh hơn dữ liệu đến, thì một số ít dữ liệu đến sớm hơn (vẫn chưa đủ cho 1 lần xử lý. Ví dụ chúng ta xử lý 10 ký tự 1 lần chẳng hạn) thì nó cần phải đợi một lượng dữ liệu nhất định đến trước khi được gửi đi để xử lý.

“Khu vực chờ đợi” này là Buffer! Đó là một vị trí vật lý nhỏ trong máy tính của bạn, thường là trong RAM, nơi dữ liệu tạm thời được thu thập, chờ đợi và cuối cùng được gửi đi để xử lý trong quá trình steaming.

Ví dụ trực quan hơn: Chúng ta có thể coi toàn bộ stream và quá trình Buffer như một trạm xe buýt. Ở một số bến xe, xe buýt không được phép khởi hành cho đến khi có một lượng khách nhất định hoặc đến một giờ khởi hành cụ thể. Ngoài ra, hành khách có thể đến vào các thời điểm khác nhau với tốc độ khác nhau. Cả hành khách và bến xe đều không kiểm soát được việc hành khách sẽ đến bến vào lúc nào và bao nhiều người. Buffer chính là trạm chờ xe buýt đó. (Cơm mưa rơi bên hiên hè văng chúng mình chung đường, ta bên nhau tình cơ trú mưa bên thềm (khả năng cao là nhà chờ xe buýt ) ... Trú Mưa HKT)

Trong mọi trường hợp, hành khách đến sớm hơn sẽ phải đợi cho đến khi xe xuất phát. Trong khi những hành khách đến khi xe buýt đã đến hoặc khi xe buýt đã khởi hành cần phải đợi chuyến xe tiếp theo.

Trong bất kỳ trường hợp nào có thể xảy ra, luôn có một nơi để chờ đợi. Đó là Buffer! Node.js không thể kiểm soát tốc độ hoặc thời gian dữ liệu đến, tốc độ của stream. Nó chỉ có thể quyết định thời điểm gửi dữ liệu. Nếu chưa đến lúc, Node.js sẽ đặt chúng vào Buffer - “vùng chờ” - một vị trí nhỏ trong RAM, cho đến khi gửi chúng ra ngoài để xử lý.

Một ví dụ điển hình khác mà bạn có thể thấy Buffer đang hoạt động là khi bạn xem các video trực tuyến. Nếu kết nối internet của bạn đủ nhanh, tốc độ của stream sẽ đủ nhanh để lấp đầy Buffer ngay lập tức và gửi nó ra ngoài để xử lý, sau đó điền vào một cái khác và gửi nó đi, rồi cái khác, và cái khác… cho đến khi stream kết thúc.

Nhưng nếu kết nối của bạn chậm, sau khi xử lý dữ liệu đầu tiên đến, trình phát video sẽ hiển thị biểu tượng đang tải hoặc hiển thị văn bản “Buffer”, có nghĩa là thu thập thêm dữ liệu hoặc chờ thêm dữ liệu đến. Và khi Buffer được lấp đầy và xử lý, trình phát sẽ hiển thị dữ liệu video. Trong khi phát, dữ liệu mới sẽ tiếp tục đến và chờ trong Buffer.

Nếu trình phát đã xử lý xong hoặc phát dữ liệu trước đó và Buffer vẫn chưa được lấp đầy, văn bản “Buffer” sẽ được hiển thị lại, thông báo rằng bạn cần chờ thu thập thêm dữ liệu để xử lý.

Đó là Buffer!

Từ định nghĩa ban đầu về Buffer, nó cho thấy rằng khi ở trong Buffer, chúng ta có thể thao tác hoặc tương tác với binary data đang được truyền trực tiếp (stream). Ngoài ra, chúng ta cũng có thể tương tác với raw binary data - dạng dữ liệu thô này. Buffer trong Node.js cũng cung cấp một danh sách về những gì có thể làm được. Hãy xem một số trong số chúng.

Tương tác với Buffer

Thậm chí bạn còn có thể tạo Buffer của riêng bạn! Thật thú vị phải không? Thay vì phải ngồi chờ các stream tạo cho chúng ta. Hãy tạo một cái như thế! (Và bạn cũng có thể tưởng tượng đây chính là Buffer mà ta sẽ nhận được trong quá trình steam mà chúng ta nhận được như đã nói ở trên)

Tùy thuộc vào những gì bạn muốn đạt được, có những cách khác nhau để tạo một vùng Buffer. Ví dụ

// Tạo một buffer trống có kích thước 10. // Một buffer đó chỉ có thể chứa 10 byte.
const buf1 = Buffer.alloc(10);

// Tạo buffer với nội dung tùy chọn
const buf2 = Buffer.from("hello buffer");

Khi Buffer của bạn đã được tạo, bạn có thể bắt đầu tương tác với nó

// Kiểm tra cấu trúc của một Buffer
buf1.toJSON(); // {type: 'Buffer', data: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}// Buffer trống
buf2.toJSON(); // {type: 'Buffer', data: [104, 101, 108, 108, 111, 32, 98, 117, 102, 102, 101, 114]}
// hàm toJSON() hiển thị dữ liệu dưới dạng các Unicode Code Points của các ký tự

// Kiểm tra kích thước của một Buffer
buf1.length; // 10
buf2.length; // 12. Tự động gán dựa trên nội dung ban đầu khi được tạo.

// Ghi vào Buffer
buf1.write("Buffer really rocks!") 

// Decode một buffer
buf1.toString(); // 'Buffer rea'
// Nó không chưa được toàn bộ string ở trên, bởi vì buf1 được tạo chỉ chứa 10 byte, nó không thể chứa phần còn lại của String

Có rất rất nhiều tương tác mà chúng ta có thể có với một Buffer. Hãy truy cập vào các tài liệu chính thức để tìm hiểu hàm này.

Cuối cùng, mình sẽ để lại cho bạn một thử thách nhỏ này: Hãy đọc qua source code zlib.js, một trong những thư viện cốt lõi của Node.js, để xem cách nó tận dụng sức mạnh của Buffer để thao tác các binary data stream. Chúng hóa ra là các tệp được gziped. Và bạn sẽ hiểu tại sao nó lại là một trong những thứ quan trọng nhất khi chúng ta nhắc tới Node.js:

Single-Threaded but Scalable
Quick Code Execution
No Buffering
MIT License
Event-driven
Asynchronous APIs
....

English version

Are you ever confused, like me, whenever you come across words like Buffer, Stream, and binary data in Node.js? That feeling might make you think that they are not meant for you and are only for Node.js experts and package developers to understand.

Indeed, those words can be intimidating, especially when you start with Node.js without going through any formal classes. Even if you had the chance to encounter it in college, you might not have grasped much because you accidentally dropped your pen during the class on binary data, and when you looked up, the board was already full .

Sadly, many tutorials and books often skip explaining the core features of Node.js and why they exist, jumping straight into developing applications with Node.js packages. And some will bluntly tell you that you don't need to understand them because you may never work directly with them.

Well, it's true that you may never work directly with them if you're just a regular Node.js developer.

However, if the mysteries truly pique your curiosity and you won't stop there to satisfy your curiosity, if you want to elevate your understanding of Node.js to a new level, then you truly want to dig deeper to comprehend the core features of Node.js, like the Buffer. And that's precisely why I wrote this article. To help shed light on some of the features and take our Node.js learning to a new level .

When introducing the Buffer, the official Node.js documentation describes it as follows:

...A mechanism for reading or manipulating binary data streams. The Buffer class is part of the Node.js API to enable interaction with octet streams in TCP streams and file system operations...

Hmm, unless you have prior knowledge of all the terms in those sentences, they might just be a bunch of technical jargon. Let's try to simplify that a bit by rephrasing it, so we can have a clear focus without getting distracted by those technical terms. We can explain it roughly like this:

The Buffer class is introduced as part of the Node.js API to help it read or interact with binary data streams.

Now that sounds simpler, doesn't it? But...Buffer, streams, binary data... those words can still be confusing . Okay, let's try to tackle those words from the last one to the first.

What is Binary Data?

You might already know that computers store and represent data using binary files. Binary is simply a collection of 1s and 0s. For example, here are five different binary codes (five different sets of 1s and 0s):

10, 01, 001, 1110, 00101011

Each number in a binary system consisting of 1s and 0s is called a bit, which is a short form of Binary digIT.

To store or represent a piece of data, a computer needs to convert that data into its binary representation. For example, to store the number 12, a computer needs to convert 12 into its binary representation 1100.

How does a computer know how to perform this conversion? It's actually a pure mathematical operation and similar to the simple binary numbers we learned in elementary mathematics - representing a number in base 2. Even a Casio calculator can perform this operation.

But numbers are not the only data type that can be converted to binary. Strings, images, and even videos can also be converted to binary code. Computers know how to represent all those types of data in their `binary

code`. For example, how does a computer represent the string "L" in binary code? To store any character as binary code, the computer first converts that character into a number, then converts that number into its binary representation. So, for the string "L," the computer first converts L into a number representing L.

Open your browser's developer console and paste the following code snippet, then press enter: "L".charCodeAt(0). What do you see? It's the number 76, right? That's the numeric representation or Character Code or Code Point of the character L. But how does a computer know exactly which number represents each character? How does it know to use the number 76 to represent L?

"L".charCodeAt(0)
76

Character Sets

Character sets are used to define the rules of which number represents which character. We have different definitions of these rules, and some commonly used rules include Unicode and ASCII. JavaScript actually works well with the Unicode character set. In fact, it's the Unicode in your browser that answered 76 as the representation for L.

So, we've learned how computers represent characters as numbers. Now, it's the computer's turn to represent the number 76 as its binary representation. You might think that it's simply converting 76 to the binary number system. Well, it's not that simple Let's continue to the next part!

Character Encoding

Just as there are rules to determine which number represents a character, there are also rules to determine how that number should be represented in binary code. Specifically, how many bits should be used to represent the number. This is called Character Encoding.

One of the encodings used for Character Encoding is UTF-8. UTF-8 says that characters should be encoded in bytes. A byte is a set of 8 bits - 8 bits consisting of 1s and 0s. So, eight 1s and 0s should be used to represent the Code Point of any character in binary when using UTF-8. (Refer here for more character encodings)

To understand this, as we mentioned earlier, the binary representation of the number 12 is 1100. So, when UTF-8 says that 12 should be in eight bits, UTF-8 is telling the computer to add more bits to the left of the actual binary representation of the number 12 to make it a byte. So, 12 should be stored as 00001100.

Therefore, 76 should be stored as 01001100 when using the UTF-8 Character Encoding, instead of its binary representation 1100.

Yeah, we've learned how computers store strings as binary code. Similarly, computers have specific rules for storing other types of data such as images and videos, converting and storing them as binary data. In summary, computers store all types of data as binary files, which are called binary data.

If you're really interested in the practicality of Character Encoding, you might like this simple and detailed introduction.

Now that we understand what binary data is, what is a binary data stream from our introduction to the Buffer?

Stream

Stream in Node.js is simply a string of data that is moved from one point to another over a period of time. To put it in simpler terms, you have a large amount of data that needs to be processed, but you don't have to wait for all the data to be available before starting the processing. (We process as much as we receive)

Essentially, this large data is divided and sent in multiple parts called chunks. So, from the initial definition of Buffer ("binary data stream...in the contexts of...file system"), this simply means that binary data is being moved within the file system. For example, moving text stored in file 1.txt to file 2.txt.

But precisely how does the data move in a stream? And what does this Buffer mean?

Buffer

We have seen that a stream is the movement of data from one point to another, but how exactly is it moved precisely?

Typically, the movement of data is with the purpose of processing or reading it and doing something based on it. But there is a minimum and maximum amount of data that a process can take over time. So, if the rate of incoming data is faster than the rate of data processing, then the excess data needs to wait somewhere until it's its turn to be processed.

On the other hand, if the data processing is faster than the incoming data, then some data might arrive early (not enough for a processing cycle. For example, we process 10 characters at a time) and it needs to wait for a certain amount of data before being sent for processing.

This "waiting area" is the Buffer! It is a small physical location in your computer, usually in the RAM, where the data is temporarily collected, waiting, and eventually sent out for processing in the streaming process.

A more visual example: We can consider the entire stream and the Buffer process as a bus station. At some bus stops, the bus is not allowed to depart until there is a certain number of passengers or until a specific departure time. In addition, passengers can arrive at different times and at different speeds. Both the passengers and the bus cannot control when the passengers will arrive and how many. The Buffer is that bus station.

In any possible case, there is always a place to wait. That's the Buffer! Node.js cannot control the speed or timing of the incoming data, the speed of the stream. It can only decide when to send the data. If it's not time yet, Node.js will place them in the Buffer - the "waiting area" - a small location in the RAM, until they are sent out for processing.

Another typical example where you can see Buffer in action is when you watch online videos. If your internet connection is fast enough, the speed of the stream will be fast enough to fill the Buffer immediately and send it out for processing, then fill another and send it, and another, and another... until the stream ends.

But if your connection is slow, after processing the first incoming data, the video player will display a loading icon or show the text "Buffering," meaning it is gathering more data or waiting for more data to arrive. And when the Buffer is filled and processed, the video data will be displayed. While playing, new data will continue to arrive and wait in

the Buffer.

If the player has finished processing or playing the previous data and the Buffer is still not filled, the "Buffering" text will be displayed again, indicating that you need to wait for more data to be processed.

That's the Buffer!

From the initial definition of Buffer, it shows that while in the Buffer, we can interact or manipulate the binary data that is being directly transmitted (stream). Additionally, we can interact with this raw binary data as well. Buffer in Node.js also provides a list of what can be done. Let's see some of them.

Interacting with Buffer

You can even create your own Buffer! Isn't that interesting? Instead of waiting for the stream to create them for us. Let's create one! (And you can imagine this as the Buffer we will receive in the streaming process as mentioned above)

Depending on what you want to achieve, there are different ways to create a Buffer area. For example:

// Create an empty buffer of size 10. // A buffer that can only hold 10 bytes.
const buf1 = Buffer.alloc(10);

// Create a buffer with custom content
const buf2 = Buffer.from("hello buffer");

Once you have created your Buffer, you can start interacting with it.

// Check the structure of a Buffer
buf1.toJSON(); // {type: 'Buffer', data: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}// Empty Buffer
buf2.toJSON(); // {type: 'Buffer', data: [104, 101, 108, 108, 111, 32, 98, 117, 102, 102, 101, 114]}
// the toJSON() function displays the data as Unicode Code Points of the characters

// Check the size of a Buffer
buf1.length; // 10
buf2.length; // 12. Automatically assigned based on the initial content when created.

// Write to the Buffer
buf1.write("Buffer really rocks!") 

// Decode a buffer
buf1.toString(); // 'Buffer rea'
// It doesn't contain the entire string above because buf1 was created with only 10 bytes, it can't contain the remaining part of the String

There are many more interactions that we can have with a Buffer. Please refer to the official documentation to explore them.

Finally, I leave you with a little challenge: Read through the source code of zlib.js, one of the core libraries of Node.js, to see how it leverages the power of Buffer to handle binary data streams. They turn out to be gzipped files. And you will understand why it is one of the most important things when we talk about Node.js:

Single-Threaded but Scalable
Quick Code Execution
No Buffering
MIT License
Event-driven
Asynchronous APIs
....

日本語版

あなたは、Node.jsの「Buffer」「Stream」「binary data」といった単語について、いつも困惑していませんか？これらの言葉について理解できなくて、それはNode.jsの専門家やパッケージ開発者にしか理解できないと思ってしまいますか？

実際、これらの単語は非常に恐ろしいかもしれません、特に正規の学校に通わずにNode.jsを始めた場合です。大学の講義で「binary data」の説明をしている最中にペンを落としてしまい、頭を上げたら先生が3つの黒板を埋め尽くしているなんてことがありました（笑）。

残念ながら、多くのチュートリアルや書籍はNode.jsのコア機能やその存在理由を理解させることなく、ただNode.jsパッケージを使用したアプリケーションの開発方法に直接移ってしまいます。そして、一部のチュートリアルでは、これらの概念を理解する必要はないと平然と言われることがあります。「直接これらを扱う必要がないからね」と。

まあ、一般的なNode.js開発者である限り、あなたがこれらと直接取り組むことはないかもしれません。

ただし、それがあなたを本当に興味津々にさせ、好奇心を満たすために止まらないならば、自分のNode.jsの知識を新たなレベルに引き上げたいと思うならば、Node.jsのコア機能についてより深く理解することを望んでいるのなら、たとえば「Buffer」のようないくつかのコア機能を理解するためのより明確な説明が必要です。それがこの記事を書いた理由です。私たちのNode.js学習を新たなレベルに引き上げるために、いくつかのコア機能を理解しやすくして説明するために。

Bufferの紹介では、Node.jsの公式ドキュメントに次のように記載されています。

バイナリデータストリームを読み取るためのメカニズム。Bufferクラスは、Node.jsのAPIの一部として、TCPストリームやファイルシステムの操作などのコンテキストで、これらと相互作用する

ために使用されます。

んー、これらの文の中のすべての単語について事前に知識がある場合を除いて、それらは専門用語の羅列に過ぎません。それを少し簡単にするために、それを再表現してみましょう。明確に集中し、それらの専門用語に邪魔されずに。

Bufferクラスは、API Node.jsの一部として、バイナリデータストリームと操作するためのものです。

これで少しシンプルになりましたか？でも、Buffer、stream、binary data、これらの単語もまだわかりづらいでしょう（笑）。では、これらの単語について逆の順番で取り組んでみましょう。

バイナリデータ（binary data）とは？

コンピュータは、データをバイナリ（2進数）ファイルで保存し表現します。バイナリは単純に1と0の集合です。例えば、以下は異なる5つのバイナリコード（異なる1と0の組み合わせ）です。

10, 01, 001, 1110, 00101011

2進数の各桁の数字1と0を表すものをビット（bit）と呼びます。これはBinary digITの省略形です。

データの一部を保存または表現するには、コンピュータはそのデータをバイナリ表現に変換する必要があります。たとえば、数値12を保存するために、コンピュータは12をバイナリ表現に変換する必要があります。つまり、12をバイナリ表現に変換すると1100になります。

では、コンピュータはどのようにしてこの変換を行うのでしょうか？実際、これは単純な算術演算であり、私たちが普通の教育で学んだ2進数の数と同じです。電卓を使えば簡単に行うことができます。

しかし、数字だけでなく、文字列、画像、さらにはビデオなどもバイナリコードに変換することができます。コンピュータは、これらすべてのデータをバイナリコードとして表現する方法を知っています。たとえば、コンピュータが文字列「L」をバイナリコードで表

す方法はどうでしょうか？任意の文字をバイナリコードに保存するには、まずその文字を数字に変換し、その数字をバイナリ表現に変換します。そのため、文字「L」をバイナリコードで表現するために、コンピュータはまず「L」を表す数字に変換します。

ブラウザのターミナルを開いて、次のコードを貼り付け、Enterキーを押してください。「L」の文字コードが表示されましたか？ 76と表示されましたか？これが「L」の文字コードまたはCode Pointです。しかし、コンピュータはどのようにして正確にどの数字がどの文字を表すのかを知るのでしょうか？どのようにして76のような数値が「L」を表すために使用されるのでしょうか？

"L".charCodeAt(0)
76

文字セット（Character Sets）

Character Setsは、どの数値がどの文字を表すかに関する規則を定義します。異なる規則がいくつかありますが、最も一般的なものはUnicodeとASCIIです。JavaScriptは実際にはUnicodeを使用して文字セットを処理します。実際、ブラウザのUnicodeは、76が「L」を表すと答えました。

したがって、コンピュータが文字を数字で表す方法を知りました。次に、コンピュータは数字76をバイナリ表現に変換します。76を2進数の形式に変換するだけだと思うかもしれませんが、それはあまりにも単純です（笑）次のセクションに進みましょう！

文字エンコーディング（Character Encoding）

文字がどのようにバイナリコードとして表現されるべきかを定義する規則もあります。具体的には、数値をどれだけのビットで表現するかという規則です。これが文字エンコーディングです。

文字エンコーディングの一つには、UTF-8があります。UTF-8は、文字が**バイト（byte）**として符号化される必要があると述べています。バイトは8ビットの集合であり、1と0が含まれています。したがって、UTF-8は、2進数表現の実際の2進数表現の左側にさらにビットを追加する必要があると言っています。したがって、UTF-8では、12のような数をバイナリ

表現にするためには、実際の2進数表現の左側にビットを追加してバイトに変換する必要があります。したがって、12は00001100として保存されるべきです。

したがって、76はUTF-8で保存されるときに01001100となります。12の2進数表現のようには保存されず、UTF-8の文字エンコーディングに従ってビットが追加されるのです。

はい、これでコンピュータが文字列をバイナリコードで保存する方法について理解しました。同様に、コンピュータは画像やビデオなどのデータをバイナリ形式に変換して保存するための特定の規則を持っています。要するに、コンピュータはすべてのデータをバイナリデータと呼ばれるバイナリファイルで保存しています。

もし「文字エンコーディング」に興味があるなら、こちらの簡単で詳細な紹介が役に立つかもしれません。

これで「binary data」が何であるかわかりましたが、「binary data stream」は私たちの「Buffer」の紹介で言及されていたものは何でしょうか？

ストリーム（Stream）

ストリーム（Stream）は、Node.jsにおいて、ある一定の時間内にある場所から別の場所に移動する文字列データのことを指します。もっと簡単に言えば、処理する必要がある大量のデータがある場合でも、すべてのデータを待ってから処理を開始する必要はありません（必要なデータが到着するまで徐々に処理します）。

基本的には、大きなデータは小さなチャンク（塊）に分割され、それぞれが送信されます。そのため、最初の「バイナリデータストリーム」という定義（"binary data stream" in the "contexts" of the "file system"）からわかるように、バイナリデータはファイルシステム内で移動されるという単純な意味です。たとえば、テキストが "1.txt" というファイルから "2.txt" というファイルに移動することを考えてみましょう。

しかし、正確には、バイナリデータをデータをどのようにストリームでやり取りするために使用するのでしょうか？具体的には、この "Buffer" はどのようなものでしょうか？

バッファ（Buffer）

ストリームはデータがある場所から別の場所に移動することを示しますが、それらはどのように正確に移動するのでしょうか？

通常、データは処理したり読み取ったりして何らかの操作を行うために移動されます。ただし、プロセスが時間の経過とともに失われる可能性のある最小および最大データ量があります。したがって、データの到着速度がデータの処理速度よりも速い場合、余分なデータは処理されるまで待機する必要があります。

逆に、データの処理速度がデータの到着速度よりも速い場合、いくつかのデータが早期に到着しますが（1回の処理に十分なデータではない場合でも）、それらは処理されるまで待機する必要があります。

この「待機領域」がバッファ（Buffer）です！これはコンピュータの中で小さな物理的な場所であり、通常はRAM内にあり、データが一時的に集められ、待機し、最終的にはストリーム内で処理されるまで送信されます。

より視覚的な例を見てみましょう：私たちはストリームとバッファの全体をバス停として考えることができます。いくつかのバス停では、バスは一定の乗客数または特定の出発時間まで発車することは許されません。また、乗客は異なるタイミングで、異なる速度で到着することができます。バス停とバスの両方は、乗客がいつ到着し、何人いるかを制御できません。それがバッファです！

いかなる場合でも、早く到着する乗客は出発まで待たなければなりません。バスが到着するか、バスが出発した後に到着する乗客は次のバスを待たなければなりません。

発生する可能性のあるどのような場合でも、待つ場所があります。それがバッファです！Node.jsはデータの到着速度やストリームの速度を制御できません。それはただデータを送信するタイミングを決定するだけです。まだその時ではない場合、Node.jsはそれらを「バッファ」としてRAM内の小さな場所に配置し、処理するまで待機します。

もう1つの具体的な例は、オンラインでビデオを視聴する場合です。インターネット接続が十分に高速であれば、ストリームの速度はバッファを即座に埋め、処理するために送信されます。そして、別のものに置き換えられ、再び送信され、再び...ストリームが終了するまでこのプロセスが続きます。

しかし、接続が遅い場合、最初のデータが処理された後、ビデオプレーヤーはローディングアイコンを表示するか、"Buffer"と表示するテキストを表示します。これは、追加のデータを収集するか、データの到着を待つ必要があることを意味します。そして、バッファが埋まり、処理されると、ビデオデータが表示されます。再生中に新しいデータが続けて到着し、バッファで待機します。

プレーヤーが前のデータを処理または再生し終えており、バッファがまだ埋まっていない場合

、再び「Buffer」というテキストが表示され、データを処理するために追加のデータを待機する必要があることを示します。

それがバッファです！

最初の「Buffer」の定義からわかるように、バッファの中にいるときには、直接ストリームでやり取りされている「バイナリデータ」または「生のバイナリデータ」を操作または対話することができます。Node.jsのバッファは、何ができるかのリストも提供しています。いくつかの例を見てみましょう。

バッファとの対話

実際に自分自身のバッファを作成することさえできます！面白いですね？待つ必要はありません、ストリームが私たちのために作ってくれるのを待つ必要はありません。自分で作ってみましょう！（そして、私たちが上記で話したようなストリームプロセスで受け取るであろうバッファであると想像することもできます）

達成したいことによって、異なる方法でバッファ領域を作成することができます。例えば、

// サイズが10の空のバッファを作成します。// そのバッファは10バイトしか格納できません。
const buf1 = Buffer.alloc(10);

// オプションの内容でバッファを作成します
const buf2 = Buffer.from("hello buffer");

バッファが作成されたら、それと対話を開始できます。

// バッファの構造をチェックする
buf1.toJSON(); // {type: 'Buffer', data: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}// 空のバッファ
buf2.toJSON(); // {type: 'Buffer', data: [104, 101, 108, 108, 111, 32, 98, 117, 102, 102, 101, 114]}
// toJSON() 関数は、文字のUnicodeコードポイントでデータを表示します。

// バッファのサイズを確認する
buf1.length; // 10
buf2.length; // 12。作成時の元の内容に基づいて自動的に割り当てられます。

// バッファに書き込む
buf1.write("Buffer really rocks!")

// バッファをデコードする
buf1.toString(); // 'Buffer rea'
// 上記の文字列全体を含むことはできません、なぜならbuf1は10バイトしか保持

できず、それ以上の部分を格納することはできないからです

バッファとの対話にはさまざまなオプションがあります。公式ドキュメントにアクセスして、それらの関数について学習してください。

最後に、小さなチャレンジを残しておきます：Node.jsの中核ライブラリの1つであるzlib.jsのソースコードを読んで、バッファのパワーを活用して「バイナリデータストリーム」を操作する方法を確認してみてください。それらは「gziped」ファイルです。そして、Node.jsを語る上でなぜ重要な要素なのかを理解するでしょう：

シングルスレッドですがスケーラブル
コードの実行が速い
バッファリングがない
MITライセンス
イベント駆動
非同期API
....

Mình hy vọng bạn thích bài viết này và học thêm được điều gì đó mới.

Donate mình một ly cafe hoặc 1 cây bút bi để mình có thêm động lực cho ra nhiều bài viết hay và chất lượng hơn trong tương lai nhé. À mà nếu bạn có bất kỳ câu hỏi nào thì đừng ngại comment hoặc liên hệ mình qua: Zalo - 0374226770 hoặc Facebook. Mình xin cảm ơn.

Momo: NGUYỄN ANH TUẤN - 0374226770

TPBank: NGUYỄN ANH TUẤN - 0374226770 (hoặc 01681423001)

Binary data (Dữ liệu nhị phân) là gì?

Character Sets

Character Encoding

Stream

Buffer

Tương tác với Buffer

English version

What is Binary Data?

Character Sets

Character Encoding

Stream

Buffer

Interacting with Buffer

日本語版

バイナリデータ（binary data）とは？

文字セット（Character Sets）

文字エンコーディング（Character Encoding）

ストリーム（Stream）

バッファ（Buffer）

バッファとの対話

Mục lục