GPU-based Cloud Computing for Comparing the Structure of Protein Binding Sites
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
GPU-based Cloud Computing for Comparing the Structure of Protein Binding Sites Matthias Leinweber1 , Lars Baumgärtner1 , Marco Mernberger1 , Thomas Fober1 , Eyke Hüllermeier1 , Gerhard Klebe2 , Bernd Freisleben1 1 Department of Mathematics & Computer Science and Center for Synthetic Microbiology, University of Marburg Hans-Meerwein-Str. 3, D-35032 Marburg, Germany 2 Department of Pharmacy and Center for Synthetic Microbiology, University of Marburg Marbacher Weg 6, D-35037 Marburg, Germany 1 {leinweberm, lbaumgaertner, mernberger, thomas, eyke, freisleb}@informatik.uni-marburg.de 2 klebe@staff.uni-marburg.de Abstract— In this paper, we present a novel approach for In this paper, we present a novel approach to signifi- using a GPU-based Cloud computing infrastructure to efficiently cantly speed up the computation times of a recent graph- perform a structural comparison of protein binding sites. The based algorithm for performing a structural comparison of original CPU-based Java version of a recent graph-based algo- rithm called SEGA has been rewritten in OpenCL to run on protein binding sites, called SEGA [15], by using the digital NVIDIA GPUs in parallel on a set of Amazon EC2 Cluster GPU ecosystem of a GPU-based Cloud computing infrastructure. Instances. This new implementation of SEGA has been tested The original CPU-based Java version of SEGA has been on a subset of protein structure data contained in the CavBase, rewritten in OpenCL to run on NVIDIA GPUs in parallel providing a structural comparison of protein binding sites on a on a set of Amazon EC2 Cluster GPU Instances. This new much larger scale than in previous research efforts reported in the literature. implementation of SEGA has been tested on protein structure data of the CavBase [16], providing a structural comparison of Index Terms— GPU, Cloud computing, protein binding sites, protein binding sites on a much larger scale than in previous structure comparison, graph alignment, OpenCL. research efforts reported in the literature. This paper is organized as follows. Section II discusses I. I NTRODUCTION related work. The SEGA algorithm is described in Section A major goal in synthetic biology is the manipulation of the III, and its GPU implementation is presented in Section IV. genetic setup of living cells to introduce novel biochemical Experimental results are discussed in Section V. Section VI pathways and alter existing ones. A prerequisite for the con- concludes the paper and outlines areas for future work. stitution of new biochemical pathways in microorganisms is a working knowledge of the biochemical function of the proteins II. R ELATED W ORK of interest. Since assessing protein function experimentally is time-consuming and in some cases even infeasible, the pre- Several graph-based algorithms for protein structure anal- diction of protein function is a central task in bioinformatics. ysis have been proposed in the literature. For example, Typically, the function of a protein is inferred from sim- a subgraph isomorphism algorithm [20] has been used by ilar proteins with known functions, most prominently by a Artymiuk et al. [2] to identify amino acid side chain patterns. sequence comparison, owing to the observation that proteins Furthermore, Xie and Bourne [22] have proposed an approach with an amino acid sequence similarity larger than 40% tend utilizing weighted subgraph isomorphism, while Jambon et to have similar functions [19]. Accordingly, a plethora of al. [6] employ heuristics to find correspondences. A more algorithms exists for comparing protein sequences, including recent approach based on fuzzy histograms to find similarities the well-known NCBI BLAST algorithm [1]. Yet, below this in structural protein data has been presented by Fober and threshold of 40%, results of sequence comparisons become Hüllermeier [5]. Fober et al. [4] have shown that pair-wise or more and more uncertain [11]. multiple alignments on structural protein information can be In cases where a sequence-based inference of protein func- achieved using labeled point clouds, i.e., sets of vertices in a tion remains inconclusive, a structural comparison can provide three-dimensional coordinate system. further insights and uncover more remote similarities [18], Apparently, algorithms for performing a structural compar- especially when focusing on functionally important regions ison of protein binding sites have not been designed to run of proteins, such as protein binding sites. Several algorithms on modern GPUs. However, there are several sequence-based are known to compare possible protein binding sites based on protein analysis approaches that were ported to GPUs. For structural data [9], [17], [3]. However, such algorithms have example, NCBI BLAST runs on GPUs to achieve significant much longer runtimes than their sequence-based counterparts, speedups [21]. Other projects, such as CUDASW++ and severely limiting their use for large scale comparisons. CUDA-BLASTP [13],[14],[12],[8], have shown that GPUs can
be used as cheap and powerful accelerators for well-known v1 v4 v2 algorithms for performing local sequence alignment, such as vc the Smith-Waterman algorithm. v4 v1 v1 III. T HE SEGA A LGORITHM v1 vc vc v4 The SEGA algorithm constructs a global graph alignment v2 v2 vc vc of complete node-labeled and edge-weighted graphs, i.e., a 1- to-1 correspondence of nodes. In principle, SEGA realizes a v3 v4 divide and conquer strategy by first solving a correspondence vc problem on a local scale to derive a distance measure on v3 v3 nodes. This local distance measure is used in a second step to v2 solve another correspondence problem on a global scale, by vc v3 deriving a mutual assignment of nodes to construct a global graph alignment. Fig. 1. Decomposition of the neighborhood of node vc with nneigh = 4. The To derive a local distance measure, nodes are compared in subgraph defined by the nneigh nearest nodes is decomposed into triangles terms of their immediate surroundings, i.e., the node neigh- containing the center node vc . borhood. This node neighborhood is defined by the subgraph formed by the n nearest neighbor nodes. Since SEGA has been developed for graphs representing protein binding sites If ambiguities arise, SEGA resorts to global information by based on CavBase data [16], nodes represent pseudocenters, selecting assignments for which both nodes preferably show i.e., spatial descriptors of physicochemical properties present a small deviation with respect to an already obtained partial within a binding site. Edges are weighted with the Euclidian solution. More precisely, the relative position of candidate distance between pseudocenters. nodes to each node in the partial solution is determined and The basic assumption is that the more similar the imme- used to calculate another cost matrix, containing a measure of diate surroundings of two pseudocenters are, the higher the the geometric deviation for each candidate pair. The actual likelihood that they belong to corresponding protein regions. assignments are then obtained by solving another optimal Comparing the node neighborhood thus corresponds to com- assignment problem, using the Hungarian algorithm [10]. A paring the spatial constellation of physicochemical properties more detailed description of the approach can be found in in close proximity of these pseudocenters. If these are highly Mernberger et al. [15]. similar, a mutual assignment of these nodes should be favored. Given two input graphs G1 = (V1 , E1 ) and G2 = (V2 , E2 ) IV. SEGA IN A GPU C LOUD with |V1 | = m1 and |V2 | = m2 , a local m1 × m2 distance matrix D = (dij )1≤i≤m1 ,1≤j≤m2 is obtained by extracting the In this section, a version of the SEGA algorithm running induced neighborhood subgraph for each center node vi ∈ V1 on GPU hardware and a pipelined computation framework for and vj ∈ V2 as given by the set of nodes including the center performing large scale GPU-based structural comparisons of nodes themselves and the closest n neighbor nodes. protein binding sites in a Cloud environment are presented. To obtain a distance measure between two nodes vi and vj , the corresponding subgraphs are decomposed into the set A. GPU Implementation of SEGA of all triangles containing the center node vc (see Figure 1). Then, an assignment problem is solved to obtain the A common problem when developing applications to run on number of matching triangles. Triangles are considered to GPU hardware is that it is not easy to utilize all resources of match, if a mutual assignment of nodes exists for which node a computational node efficiently. If the complete algorithm labels of corresponding neighbor nodes are identical and all is implemented to run on a GPU, the host CPU’s work corresponding edge weights are within an -range of each only consists of controlling the device, which usually is not other. In other words, a superposition preserving node labels sufficient to operate the processor at full load. (exempting the center node) and edge lengths is obtained. The The SEGA algorithm is well suited for a division into a node labels of the center nodes are not required to match, GPU and a CPU part. The part of the algorithm that solves to introduce a certain level of tolerance, which is necessary the correspondence problem has been rewritten to run on GPU when dealing with molecular structure data. Likewise, the hardware using OpenCL. The iterative part constructing a parameter ≥ 0 is a tolerance threshold determining the global alignment is computed on the host CPU, supported allowed deviation of edge lengths. by intermediate results generated by the GPU part of the The obtained distance matrix D can be considered as a cost implementation. matrix, indicating the cost for each potential assignment of The creation of the cost matrix D (see Section III) is divided nodes vi ∈ V1 and vj ∈ V2 . In the second step of the algorithm, into four OpenCL kernels. The first OpenCL kernel builds an optimal assignment of nodes from V1 and V2 is derived input graphs G = (V, E) from the point cloud information incrementally, by first realizing the assignment of nodes that provided by the protein cavity database. The data is stored have the smallest distance to each other before assigning the in a m × m matrix where m = |V | is the number of points next pair of nodes. describing this cavity. Based on the data parallelism in this
Fig. 2. SEGA GPU architecture overview. task, this kernel can run with m2 threads at once, where each calculation object that contains a set of tasks to handle a group thread computes a pair-wise distance. of comparisons. A calculation object contains two important The second OpenCL kernel constructs an intermediate ma- pieces of information: (a) a description of the entities to be trix for a protein cavity. This matrix contains for each V ∈ G compared, and (b) a set of instructions that are to be issued the indices for the n nearest neighbors. Each line in this when a comparison is performed. matrix is data-independent and contains the indices for the Figure 2 shows the orchestration of the six components n smallest values from the corresponding matrix line in D. involved in the comparison of the protein binding sites using This is calculated by m · (m/2) threads, where m/2 threads our GPU enhanced implementation of the SEGA algorithm. calculate the n smallest values with parallel reduction and the Furthermore, it also illustrates the data flow through the use of block-shared memory. framework. A neighborhood size of n results in l = n · (n − 1)/2 The Selector component is the entrance point of the triangles for each V ∈ G. These triangles are stored in a framework. It provides both an interconnection to a data m × l matrix Z that is created by the third OpenCL kernel. store with caching capabilities and the program logic that This kernel is executed with m × l threads in parallel using a controls which entities should be compared next. To perform vector containing the indices indicating which of the n nearest the SEGA comparisons, the Selector combines a set of neighbors is combined with which other neighbor. protein cavity identifiers and loads the point cloud data. This The last OpenCL kernel combines two triangle matrices information is passed via a queue to the DataProcessor. Z1 , Z2 into a distance matrix D with m1 × m2 elements. It is Additionally, the Selector stores meta-information in the executed with m1 · m2 threads, where each thread loops over Monitor component, such as the tasks in progress. In our l · l triangles computing the cost for a match. case, no further work of the algorithm depends on the CPU at The final alignment of the distance matrix D is computed this point, so the next component belongs to the GPU. as described in Section III, supported by intermediate results The decision to split the GPU part into three components generated by the OpenCL part. is mainly due to the design of modern GPU hardware. The latest generation of GPU hardware offers independent control B. Management Framework flows of memory reads, memory writes and kernel execution We have developed a software framework for managing the induced by the host system. Therefore, the DataProcessor GPU and CPU computations involved in our implementation. component containing an arbitrary number of threads is re- The framework consists of six major components. Three sponsible for converting (if needed) and transferring data from components control the GPU hardware, the fourth component the host system to the GPU device memory. Moreover, each is responsible for selecting objects for comparison, the fifth GPU device is controlled by its own set of GPU components component offers a service to manage thread pools for work- to ensure maximum utilization of the given resources. For loads on CPUs, and the sixth component provides progress SEGA, the point cloud data is copied into OpenCL buffers monitoring functionality. and transferred to the GPU. At this point, we encountered a The six components communicate via queues that offer possible bottleneck in the management of OpenCL memory multithreading inside each component and additionally a vi- objects: handling several thousands of objects dramatically able way for utilizing multiple GPU devices on a single reduced the allocation performance. Thus, we had to introduce compute node. Furthermore, this design offers the possibility an additional component responsible for ensuring a simple and of repeated execution of a computation on GPU and CPU efficient reuse of memory objects. Additionally, this allows a hardware. This can easily be realized by states inside a safer use of GPU device memory because such a pre-allocation
guarantees that the GPU device memory is not exceeded V. E VALUATION during execution, and also limits the number of computations To assess the performance of our approach, several exper- currently in progress. After a successful write operation to the iments have been conducted. The evaluation is split into two GPU, the calculation object containing the meta-information is parts. First, the performance gains of the SEGA GPU com- passed via an additional queue to the Launcher component. pared to the original SEGA algorithm are investigated. Second, The Launcher executes the corresponding GPU kernels, the results of a large scale comparison of protein binding which in the case of SEGA are responsible for creating the sites on Amazon’s EC2 Cluster GPU Instances are presented. polygon data and combining two distance matrices. After The structural data has been taken from the CavBase [16] completion, the calculation object is pushed into the next maintained by the Cambridge Crystallographic Data Centre. queue. The last GPU related component is the Dispatcher. It is responsible for reading back the results of the kernel execution 10 to the host memory and if necessary process the data further. 8 time [ms] 6 Afterwards, the results are pushed to the ThreadService. 4 Here, the alignments of the polygons are calculated, and the 0 2 300 results are stored. After successfully finishing a computation, 200 200 250 300 100 150 the Monitor component is informed. number of pseudocenters 0 0 50 100 number of pseudocenters The Monitor fulfills two major tasks. First, it cre- (a) GPU part of SEGA GPU ates an interconnection between the Selector and the ThreadService for storing the results. This is necessary to know whether all combinations have been successfully calcu- 30 lated. Additionally, it records the progress of the computation 20 time [ms] on a persistent storage. If a computation becomes interrupted 10 due to unpredictable reasons, such as system failures or disk 0 300 I/O errors, the computation can be resumed at the correct 200 100 100 150 200 250 300 number of pseudocenters 50 position. 0 0 number of pseudocenters The described framework has been implemented in Java us- (b) CPU part of SEGA GPU ing the JogAmp JOCL library [7] for controlling the OpenCL platform. 30 20 time [ms] C. Cloud Deployment 10 A common approach for parallelizing a computational prob- 0 300 200 300 250 lem is its division into three steps: work partitioning and 100 number of pseudocenters 50 100 150 200 0 0 number of pseudocenters distribution, task computation and result collection. In case of a commutative comparison where a self-comparison is not (c) Maximum of GPU and CPU parts of SEGA necessary, an input set of n elements results in a total number GPU n · (n − 1)/2 computations. A straight-forward approach is to divide the total number of 2500 2000 computations by the available number of Cloud nodes. If every time [ms] 1500 comparison is indexed by a single unique identifier, a single 1000 500 node simply needs the identifier to perform a comparison. 0 300 However, a better approach is to divide the total number of 200 100 150 200 250 300 100 number of pseudocenters comparisons by an arbitrary number that is larger than the 0 0 50 number of pseudocenters available number of nodes. This allows one to start the result (d) Original SEGA collection phase before the end of the task computation phase and, moreover, enables an on-demand scheduling of tasks to Fig. 3. SEGA benchmarks other nodes in case a node fails. The work partitioning and distribution phase also includes the distribution of the input data. For this purpose, several approaches are possible, such as data replication, network exports, and cluster file systems. A. SEGA vs. SEGA GPU Fortunately, in our case the required data of the cavity database The performance of the original SEGA implementation has could be reduced to about 140 MB. Consequently, the data has been measured on a single core of an Intel Core i7-2600 @ been transferred and loaded into the main memory of each 3.40 GHz with 8 GB RAM, whereas the performance of SEGA node. Due to the overall runtimes, this has a negligible impact GPU has been measured on a single NVIDIA GeForce GTX on the total computation time. After data and task distribution, 580 with 3 GB RAM. the nodes can calculate their part(s). When a task has finished, The runtimes depend on the number of pseudocenters its results can be collected from the Cloud and stored locally. present in the protein cavities, and thus both SEGA versions
Fig. 5. Pseudocenter distribution among the selected subset of the CavBase Fig. 4. Comparison of original SEGA and SEGA GPU benchmarks must have at least 11 pseudo centers. This resulted in n = 144.849 protein binding sites, leading to n ∗ (n − 1)/2 = 10.490.543.976 comparisons in total. have been benchmarked using a subset of the CavBase with a large spectrum of numbers of pseudocenters. In particular, the subset consists of cavities where the numbers of pseu- docenters range from 15 to 250. For each comparison, some 12 cavities matching certain size requirements were selected and 10 compared several times (100 times for SEGA GPU; 10 times for original SEGA) to calculate the average runtimes for a 8 particular size combination. The plots in Figure 3 show the 6 runtimes depending on the number of pseudocenters of each cavity. Figure 3(a) shows the average runtime of the GPU 4 part of a SEGA GPU run, and Figure 3(b) shows the runtime 2 of the CPU part of a SEGA GPU run. It is evident that the needed CPU runtime is often higher but is never twice as much as the GPU runtime. One could argue that in typical cluster nodes that offer GPU hardware, for each GPU at least two physical CPU cores are available. Instead, we decided to look (a) Boxplot showing the runtime distribution for SEGA GPU at the worst case and compared the results with Figure 3(c). This plot shows the maximum of the two preceding graphs. Finally, Figure 3(d) shows the runtimes of the original SEGA 1000 implementation. Figure 4 shows the SEGA GPU and original SEGA runtimes 800 in a single plot. It is important to note that the z-axis 600 has a logarithmic scale. It is evident that the SEGA GPU implementation is 10 to 200 (with an average of 110) times 400 faster than the original SEGA implementation, depending on the number of pseudocenters in each cavity. 200 0 B. SEGA GPU @ Amazon EC2 OpenCL pure Java The main target platform for SEGA GPU is Amazon’s EC2 Cluster GPU Instances. Each node (instance type: cg1.4xlarge) (b) Boxplot showing the runtime distribution for SEGA GPU has two Intel Xeon X5570 CPUs, 22 GB RAM and two compared to the original SEGA implementation NVIDIA Tesla M2050 with 2 GB RAM. Benchmarks between Fig. 6. Boxplots showing the randomly sampled runtime distributions the Tesla M2050 and GeForce GTX 580 have shown that the GTX 580 is about two times faster than the TESLA. This matches the theoretical GFLOPS specifications from Using Amazon’s EC2 resources with associated costs makes NVIDIA (single precision floating point). Thus, the GPU it important to predict the expected total runtime of a compu- runtime measured in the previous section corresponds to a tation especially if a hard limit for the financial budget must single EC2 node. be respected. According to Figure 5, the number of pseudo The subset of the CavBase used in our experiments has been centers of the proteins in the selected subset of the CavBase selected based on the following (pharmaceutically meaningful) is not uniformly distributed. Thus, to predict the total runtime, criteria: The resolution of a cavity must be larger than 2.5Å; randomly sampled pairs from the CavBase were visualized the volume must be between 350Å3 and 3500Å3 ; a protein with boxplots. The blue box enclosed by the lower and upper
quartile contains the medial 50% of the data. The distance ACKNOWLEDGEMENTS between the upper and lower quartile defines the interquartile This work is partially supported within the LOEWE pro- range (IQR), a measure for the variance of the data. The (lower gram of the State of Hesse, Germany, the German Research and upper) whisker visualizes the remaining data that is not Foundation (DFG), and a research grant provided by Amazon contained in the box defined by the lower and upper quartile. Web Services (AWS) in Education. Its length is bounded by 1.5· IQR. Data outside the whisker are outliers and marked by a cross. The 50th percentile (median) is R EFERENCES visualized by a red line, the confidence interval (α = 0.05) for [1] S. F. Altschul. BLAST Algorithm. John Wiley & Sons, Ltd, 2001. the mean by a triangle. Figure 6 (a) shows the boxplot for the [2] P. J. Artymiuk, A. R. Poirrette, H. M. Grindley, D. W. Rice, and SEGA GPU implementation. Figure 6 (b) shows a comparison P. Willett. A Graph-theoretic Approach to the Identification of Three- between the original SEGA implementation and SEGA GPU dimensional Patterns of Amino Acid Side-chains in Protein Structures. Journal of Molecular Biology, 243(2):327–344, 1994. to exemplify the performance gain. [3] T. Binkowski and A. Joachimiak. Protein functional surfaces: global A runtime per comparison of 1.7 ms was expected due to shape matching and local spatial alignments of ligand binding sites. the boxplot. To efficiently use of the infrastructure provided BMC structural biology, 8(1):45–68, 2008. [4] T. Fober, G. Glinca, G. Klebe, and E. Hüllermeier. Superposition by Amazon EC2, the entire computation was divided to run and Alignment of Labeled Point Clouds. IEEEACM Transactions on on 8 Amazon EC2 Cluster GPU Instances in parallel. The Computational Biology and Bioinformatics, 8(6):1653–1666, 2011. comparisons were grouped into 4096 packages and distributed [5] T. Fober and E. Hullermeier. Similarity Measures for Protein Structures Based on Fuzzy Histogram Comparison. Computational Intelligence, by assigning 512 packages to each node. Due to the runtime of pages 18–23, 2010. a single comparison and a total number of about 10.5 billion [6] M. Jambon, A. Imberty, G. Delage, and C. Geourjon. A new bioin- comparisons, a runtime of about 24 days on eight EC2 nodes formatic approach to detect common 3D sites in protein structures. Proteins, 52(2):137–145, 2003. was expected. In reality, the computation took about 22 days [7] JogAmp Community. JogAmp JOCL. http://jogamp.org/jocl/www/, to complete. The cost was about 6.700 US-$ (10.490.543.976 2012. comparisons · 1,7 ms / 3.600.000 ms/h · 1,234 US-$/h = [8] M. A. Kentie. Biological Sequence Alignment on Graphics Processing Units. Master’s thesis, Delft University of Technology, 2010. 6.113 US-$ for computations, the rest for storage and network [9] K. Kinoshita and H. Nakamura. Identification of protein biochemical traffic). functions by similarity search using the molecular surface database eF- In contrast, performing the 10.5 billion comparisons on a site. Protein Science, 12(8):1589–1595, 2003. [10] H. Kuhn. The hungarian method for the assignment problem. Naval single core of an Intel Core i7-2600 @ 3.40 GHz with about Research Logistics, 52(1):7–21, 2005. 300 ms runtime per comparison (see Figure 6 (b)) would [11] D. Lee, O. Redfern, and C. Orengo. Predicting protein function require about 36.425 days (∼ 100 years); on a quadcore node from sequence and structure. Nature Reviews Molecular Cell Biology, 8(12):995–1005, 2007. with the same specifications, about 9.106 days (∼ 25 years) [12] W. Liu, B. Schmidt, and W. Muller-Wittig. Cuda-blastp: Accelerating are required. If an Amazon High Quad CPU Instance with a blastp on cuda-enabled graphics hardware. IEEE/ACM Trans. Comput. cost of 0,40 US-$ per hour were used, the total cost would Biol. Bioinformatics, 8(6):1678–1684, Nov. 2011. [13] Y. Liu, W. Huang, J. Johnson, and S. Vaidya. GPU accelerated smith- amount to about 87.421 US-$. waterman. In Proceedings of the 6th international conference on Computational Science - Volume Part IV, ICCS’06, pages 188–195, Berlin, Heidelberg, 2006. Springer-Verlag. [14] Y. Liu, D. Maskell, and B. Schmidt. CUDASW++: optimizing Smith- VI. C ONCLUSIONS Waterman sequence database searches for CUDA-enable graphics pro- cessing units. BMC Research Notes, 2(1):73, 2009. In this paper, we have presented a novel approach to [15] M. Mernberger, G. Klebe, and E. Hüllermeier. SEGA: Semi-global significantly speed up the computation times of the SEGA al- graph alignment for structure-based protein comparison. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 8(5):1330– gorithm for a structural comparison of protein binding sites by 1343, 2011. using the digital ecosystem of a GPU-based Cloud computing [16] S. Schmitt, D. Kuhn, and G. Klebe. A New Method to Detect Related infrastructure. The original CPU-based Java version of SEGA Function Among Proteins Independent of Sequence and Fold Homology. Journal of Molecular Biology, 323(2):387 – 406, 2002. has been rewritten in OpenCL to run on NVIDIA GPUs in [17] A. Stark and R. Russell. Annotation in three dimensions. PINTS: parallel on a set of Amazon EC2 Cluster GPU Instances. This patterns in non-homologous tertiary structures. Nucleic Acids Research, new implementation of SEGA has been tested on a subset of 31(13):3341–3344, 2003. [18] J. M. Thornton. From genome to function. Science, 292(5524):2095– protein structure data of the CavBase, requiring an acceptable 2097, 2001. computation time of about three weeks. Thus, a structural [19] A. Todd, C. Orengo, and J. Thornton. Evolution of function in protein approach to compare protein binding sites becomes a viable superfamilies, from a structural perspective. Journal of Molecular Biology, 307(4):1113–1143, 2001. alternative to sequence-based alignment algorithms. [20] J. R. Ullmann. An Algorithm for Subgraph Isomorphism. Journal of There are several directions for future work. For example, the ACM, 23(1):31–42, 1976. [21] P. D. Vouzis and N. V. Sahinidis. GPU-BLAST: using graphics a comparative analysis could be done for the entire protein processors to accelerate protein sequence alignment. Bioinformatics, space in the CavBase, which not only allows a classification 27(2):182–188, 2011. of the protein space into structurally and functionally similar, [22] L. Xie and P. E. Bourne. Detecting evolutionary relationships across existing fold space, using sequence order-independent profileprofile homologous and non-homologous protein groups, but also alignments. Proceedings of the National Academy of Sciences of the supports the systematic search for unexpected similarities United States of America, 105(14):5441–5446, 2008. and functional relationships. Furthermore, other algorithms for a structural comparison of protein binding sites could be rewritten to run on GPU hardware to provide further insights.
You can also read