Potential Projects

Publish in



Please download to get full document.

View again

of 10
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
CS294 Fall, 2011 Communication-Avoiding Algorithms www.cs.berkeley.edu/~odedsc/CS294. Potential Projects. Jim Demmel. Types of Projects. Read and report on a paper Design, implement and/or test an algorithm Extend the result of the paper you read
CS294 Fall, 2011Communication-Avoiding Algorithmswww.cs.berkeley.edu/~odedsc/CS294Potential ProjectsJim DemmelTypes of ProjectsRead and report on a paperDesign, implement and/or test an algorithmExtend the result of the paper you readUse, develop or extend a software generation toolApply CA algorithms to an interesting applicationDemonstrate the effect of hardware trendsSuggestions are welcomed1. Read and report on a paperSee http://www.cs.berkeley.edu/~odedsc/CS294/for suggested list of papers.Examples:Graph analysis for proving lower bounds for FFT:Hong and Kung. I/O complexity: The red-blue pebble game.Savage. Extending the Hong-Kung model to memory hierarchies. Many other papers on CA sorting, CA searching, CA discrete algorithms, CA dynamic programming, CA data structures, software generation tools2. Design, implement and/or test an algorithmDesign a new CA algorithm and analyze itImplement an algorithm and test and analyze the performance (eg HW counters), or implement two algorithms and compare their performance Your report should include tables/graphs comparing performance (time, communication etc) on various parameters (input size, input type if relevant, platform's parameters). Examples:Many linear algebra algorithms (next slides)Direct and IterativeDense and SparseAll the other dwarfs/motifsSave communication by saving bits of precissionOpen Problems / Work in Progress (1/2)Just for Direct Linear Algebra!
  • Rank Revealing CAQR (RRQR)
  • AP = QR, P a perm. chosen to put “most independent” columns first
  • Does LAPACK geqp3 move O(n3) words, or fewer?
  • Better: Tournament Pivoting (TP) on blocks of columns
  • CALU with more reliable pivoting (Gu, Grigori, Khabou et al)
  • Replace tournament pivoting of panel (TSLU) with Rank-Revealing QR of Panel
  • LU with Complete Pivoting (GECP)
  • A = PrLUPcT , Pr and Pc are perms. putting “most independent” rows & columns first
  • Idea: TP on column blocks to chose columns, then TSLU on columns to choose rows
  • LU of A=AT, with symmetric pivoting
  • A = PLLTPT, P perm., L lower triangular
  • Bunch-Kaufman, Rook pivoting: O(n3) words moved?
  • CA approach: Choose block with one step of CA-GECP, permute rows & columns to top left, pick symmetric subset with local Bunch-Kaufman
  • A = PLBLTPT, with symmetric pivoting (Toledo et al)
  • L triangular, P perm., B banded
  • B tridiagonal: due to Aasen
  • CA approach: B block tridiagonal, stability still in question
  • Open Problems / Work in Progress (2/2)Just for Direct Linear Algebra!
  • Heterogeneity (Ballard, Gearhart)
  • How to assign work to attain lower bounds when flop rates / bandwidths / latencies vary?
  • Using extra memory in distributed memory case (Solomonik)
  • Using all available memory to attain improved lower bounds
  • Just matmul and CALU so far
  • Using extra memory in shared memory case
  • Sparse matrices
  • CA Sparse Cholesky for “well-partitioned matrices” understood in theory
  • Sparse matrix multiplication
  • Implement CA parallel Strassen (already taken)
  • 3. Extend the result of the paper you readExtend the result of the paper you read, e.g., apply the techniques to a new problem, or adapt the algorithm to new settings. Examples: Can you combine the approach of [Hong Kung] (dominator-sets analysis) and of [Ballard Demmel Holtz Schwartz] (expansion analysis) to obtain lower bounds for Strassen-like algorithms that allow recomputations?Prove new lower bounds for Flops and deduce new communication lower bounds.4. Use, develop or extend a software generation toolUse, develop or extend a software generation tool.This may include implementing know algorithms/technique that are suitable for software generation tools.Examples: Sparse Polyhedral Framework [Strout et al]Automatic sparse code generation [Bodik, Gilad, et al]Adding features to pOSKI [Byun, Demmel, Yelick et al]SEJITS [Fox et al] – adding CA code generators[Ballard, Demmel, Gearhart 11] showed how to optimize the communication of dense matrix multiplication on heterogeneous platforms. Can you incorporate this algorithm in a compile-time auto-tuner?5. Apply CA algorithms to an interesting applicationIncorporate and/or specialize a CA algorithm in an important applicationsExamples: Lots of ParLab applications Ex: Health App: Simulating blood flow of stroke victimsBottom Solver in AMR (Adaptive Mesh Refinement)Large scale data analysis, e.g., on clouds.6. Demonstrate the effect of hardware trendsDemonstrate the effect of hardware trends (past and predicted future) on the efficiency of algorithms. Example:A rough estimation of the running time of CA O(n3) dense linear algebra on a sequential machine is γn3 + βn3/M1/2, where n is the matrix dimension, M the fast memory size, γ is time per flop, and β the reciprocal bandwidth. Thus, the arithmetic cost dominates the running time as long as γM1/2 ≥ βCollect historical data from various available sources, regarding hardware parameters of interests (such as the above β,γ, and M) as well as predicted parameters of future machines (also available from various sources), and present their impact on the scalability of selected algorithms. Collect data on energy and power performance.Effects of lower precision on communication and energy performance.
    Related Search
    We Need Your Support
    Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

    Thanks to everyone for your continued support.

    No, Thanks