Notes talking about the design and implementation of Apache Spark
Learning to write Spark examples
Self-written notes that may be useful
My blogs
A Spark Reliability Testing Suite
Profiling Spark Applications for Performance Comparison and Diagnosis
Self-written slides that may be useful
My Homepage
extracting framework & user objects from task's heap dump
My papers and technique reports
Benchmarks for Low Latency (Streaming) solutions including Apache Storm, Apache Spark, Apache Flink, ...
The core library from Spark
Spark benchmark
Store miscellaneou things
Basic examples for learning Spark
Fetch the configuration/timeline/counters/log infos from JobTracker
Examples of GraphX
Benchmark scripts in master
A Fine-grained Memory Estimator for MapReduce Jobs
Testing SparkStreaming
Mirror of Apache MADlib
Measure memory and disk bandwidth using the random access size as a paramater.
Mirror of the official PostgreSQL GIT repository. Note that this is just a *mirror* - we don't work with pull requests on github. To contribute, please see https://wiki.postgresql.org/wiki/Submitting_a_Patch
My technical reports
MOA is an open source framework for Big Data stream mining. It includes a collection of machine learning algorithms (classification, regression, clustering, outlier detection, concept drift detection and recommender systems) and tools for evaluation.
Profiling the GC activities in Android ART JVM
My blog
An extension of Yahoo's Benchmarks
A Memory Profiler for Diagnosing Memory Problems in MapReduce Applications
Mirror of Apache Spark
Enhanced hadoop-1.2.0 by LJX
MapReduce jobs that will cause OOME
My music project
Code & data for Fast data processing with Spark V2
SailingLab's Petuum project.
A distributed machine learning framework.
Real-world OOM cases in MapReduce jobs
A part of Hadoop-0.20.2 source code (Some MapReduce Framework related code has been modified by Lijie Xu).
Hadoop Benchmark - Input splits are first sampled
Dataflow and Memory Estimator for MR Jobs
Visualize the metrics got from tasks' logs, Pidstata and JVM
Representative MapReduce Job for Hadoop Benchmark