Notes talking about the design and implementation of Apache Spark
Learning to write Spark examples
Self-written notes that may be useful
A Spark Reliability Testing Suite
Profiling Spark Applications for Performance Comparison and Diagnosis
Self-written slides that may be useful
Benchmarks for Low Latency (Streaming) solutions including Apache Storm, Apache Spark, Apache Flink, ...
The core library from Spark
extracting framework & user objects from task's heap dump
Store miscellaneou things
Examples of GraphX
Benchmark scripts in master
A Fine-grained Memory Estimator for MapReduce Jobs
My papers and technique reports
MOA is an open source framework for Big Data stream mining. It includes a collection of machine learning algorithms (classification, regression, clustering, outlier detection, concept drift detection and recommender systems) and tools for evaluation.
My technical reports
Basic examples for learning Spark
Profiling the GC activities in Android ART JVM
An extension of Yahoo's Benchmarks
A Memory Profiler for Diagnosing Memory Problems in MapReduce Applications
Mirror of Apache Spark
Enhanced hadoop-1.2.0 by LJX
MapReduce jobs that will cause OOME
My music project
Code & data for Fast data processing with Spark V2
SailingLab's Petuum project.
A distributed machine learning framework.
Real-world OOM cases in MapReduce jobs
A part of Hadoop-0.20.2 source code (Some MapReduce Framework related code has been modified by Lijie Xu).
Hadoop Benchmark - Input splits are first sampled
Dataflow and Memory Estimator for MR Jobs
Fetch the configuration/timeline/counters/log infos from JobTracker
Visualize the metrics got from tasks' logs, Pidstata and JVM
Representative MapReduce Job for Hadoop Benchmark