0

TutorialKart @tutorialkart

0 0 1

Đã đăng vào thg 11 4, 2017 5:25 SA 1 phút đọc

499

Apache Spark Introduction

Bài đăng này đã không được cập nhật trong 7 năm

Apache Spark Introduction

Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Spark Streaming for stream processing. Spark Core is the foundation of the overall project. It provides distributed task dispatching, scheduling, and basic I/O functionalities, exposed through an application programming interface

Spark provides interactive shell in python and scala programming languages.

Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported storage systems. Because the protocols have changed in different versions of Hadoop, you must build Spark against the same version that your cluster runs.

Spark SQL

Spark provides SQL language support, with command-line interfaces and ODBC/JDBC server.

Spark Streaming

Spark Streaming leverages Spark Core’s fast scheduling capability to perform streaming analytics. It ingests data in mini-batches and performs RDD transformations on those mini-batches of data.

MLlib Machine Learning Library

Spark MLlib is a distributed machine learning framework on top of Spark Core that, due in large part to the distributed memory-based Spark architecture.

GraphX

GraphX is a distributed graph processing framework on top of Apache Spark. GraphX provides two separate APIs for implementation of massively parallel algorithms (such as PageRank): a Pregel abstraction, and a more general MapReduce style API.

References

Wiki – https://en.wikipedia.org/wiki/Apache_Spark Official Site – http://spark.apache.org Spark Tutorial – TutorialKart

Apache Spark Spark

All rights reserved

Bài viết liên quan

Bốn tính chất đặc thù trong lập trình hướng đối tượng

Nguyen Thi Tu Yen

9 phút đọc

1.2K 1 1

5

Cài đặt apache Superset trên AWS EC2 (ubuntu)

1 phút đọc

803 1 0

2

How to install mysql apache, phpmyadmin for macos

1 phút đọc

1.8K 0 0

-2

Hướng dẫn cài đặt nhiều phiên bản PHP chạy đồng thời với Apache trên Ubuntu

Bùi Thế Hạnh

3 phút đọc

1.7K 0 1

0

002: Apache Kafka topic, partition, offset và broker

12 phút đọc

36.2K 25 5

144

So sánh Nginx và Apache - Lựa chọn máy chủ Web Server phù hợp cho trang web của bạn.

25 phút đọc

15.5K 21 1

23

004: Apache Kafka consumer offset, Broker discovery và Zookeeper

10 phút đọc

20.6K 13 4

112

Nginx vs Apache - Đâu là web server tốt nhất?

Dương Tiến Đạt

8 phút đọc

10.5K 5 1

5

APACHE CASSANDRA: PHÂN BỐ DỮ LIỆU VÀ TẠO BẢN SAO DỮ LIỆU TRONG CLUSTER - PARTITIONER VÀ REPLICATION.

9 phút đọc

3.6K 4 2

3

Nên lựa chọn web server nào: Apache hay NGINX?

Vương Minh Thái

9 phút đọc

2.9K 4 5

8

[Apache Kafka] Kiến trúc consumer retry trong Apache Kafka

10 phút đọc

4.1K 3 1

5

Part 2 - Gerrit Code Review with Jenkins CI : Cài đặt , cấu hình Gerrit với apache basic authen

Nguyễn Văn Mạnh

11 phút đọc

3.1K 2 6

3

Bốn tính chất đặc thù trong lập trình hướng đối tượng

Nguyen Thi Tu Yen

9 phút đọc

1.2K 1 1

5

Cài đặt apache Superset trên AWS EC2 (ubuntu)

1 phút đọc

803 1 0

2

How to install mysql apache, phpmyadmin for macos

1 phút đọc

1.8K 0 0

-2

Hướng dẫn cài đặt nhiều phiên bản PHP chạy đồng thời với Apache trên Ubuntu

Bùi Thế Hạnh

3 phút đọc

1.7K 0 1

0

002: Apache Kafka topic, partition, offset và broker

12 phút đọc

36.2K 25 5

144

So sánh Nginx và Apache - Lựa chọn máy chủ Web Server phù hợp cho trang web của bạn.

25 phút đọc

15.5K 21 1

23

004: Apache Kafka consumer offset, Broker discovery và Zookeeper

10 phút đọc

20.6K 13 4

112

Nginx vs Apache - Đâu là web server tốt nhất?

Dương Tiến Đạt

8 phút đọc

10.5K 5 1

5

Bình luận

Đăng nhập để bình luận

0

Hãy đăng ký một tài khoản Viblo để nhận được nhiều bài viết thú vị hơn.