Review Article Big Data Management for Cloud-Enabled Geological Information Services
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Hindawi Scientific Programming Volume 2018, Article ID 1327214, 13 pages https://doi.org/10.1155/2018/1327214 Review Article Big Data Management for Cloud-Enabled Geological Information Services Yueqin Zhu ,1,2 Yongjie Tan ,1,2 Xiong Luo ,3,4 and Zhijie He 3,4 1 Development and Research Center, China Geological Survey, Beijing 100037, China 2 Key Laboratory of Geological Information Technology, Ministry of Land and Resources, Beijing 100037, China 3 School of Computer and Communication Engineering, University of Science and Technology Beijing (USTB), Beijing 100083, China 4 Beijing Key Laboratory of Knowledge Engineering for Materials Science, Beijing 100083, China Correspondence should be addressed to Yueqin Zhu; yueqin zhu@126.com and Xiong Luo; xluo@ustb.edu.cn Received 20 October 2017; Revised 10 December 2017; Accepted 31 December 2017; Published 29 January 2018 Academic Editor: Anfeng Liu Copyright © 2018 Yueqin Zhu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Cloud computing as a powerful technology of performing massive-scale and complex computing plays an important role in implementing geological information services. In the era of big data, data are being collected at an unprecedented scale. Therefore, to ensure successful data processing and analysis in cloud-enabled geological information services (CEGIS), we must address the challenging and time-demanding task of big data processing. This review starts by elaborating the system architecture and the requirements for big data management. This is followed by the analysis of the application requirements and technical challenges of big data management for CEGIS in China. This review also presents the application development opportunities and technical trends of big data management in CEGIS, including collection and preprocessing, storage and management, analysis and mining, parallel computing based cloud platform, and technology applications. 1. Introduction information to knowledge, and from knowledge to wisdom. In addition, it provides data analysis, mining, organization, In the era of big data, the data-driven modeling method and management services for the scientific management of enables us to exploit the potential of massive amount of land resources, prospecting breakthrough strategic action geological data easily [1–3]. In particular, by mining the and social services, while conducting multilevel, multiangle, data scientifically, one can offer new services that bring and multiobjective demonstration applications on geological higher values to customers. Furthermore, it is now possible data for government decision-making, scientific research, to implement the transition from digital geology to intelli- and public services [5]. gent geology by integrating multiple systems in geological Big data technologies are bringing unprecedented oppor- research through the use of big data and other technologies tunities and challenges to various application areas, especially [4]. to geological information processing [2, 6, 7]. Under these The application of geological cloud makes it possible circumstances, there are some advancements achieved in the to fully utilize structured and unstructured data, including development of this area [8, 9]. Furthermore, from various geology, minerals, geophysics, geochemistry, remote sens- disciplines of science and engineering, there has been a ing, terrain, topography, vegetation, architecture, hydrology, growing interest in this research field related to geological disasters, and other digitally geological data distributed in data generated in the geological information services (GIS). every place on the surface of the earth [4, 5]. Moreover, We analyze the number of those documents indexed in “Web the geological cloud will enable the integration of data of Science” [10]. In Figures 1 and 2, we can easily find that, in collection, resource integration, data transmission, informa- the past ten years, the number of those documents in which tion extraction, and knowledge mining, which will pave “geological data” is in the “Title” and in the “Topic” are both the way for the transition from data to information, from increasing, respectively. Hence, the analysis for geological
2 Scientific Programming 250 different fields, because the nature and types of data vary in 200 different fields and in different problems. Geology is a data intensive science and geological data are characterized with 150 Number multisource heterogeneity, spatiotemporal variation, correla- 100 tion, uncertainty, fuzziness, and nonlinearity. Therefore, the geological cloud has a certain degree of confidentiality and 50 it is highly domain-specific; meanwhile, it is developed on the basis of a large amount of geological data accumulated 0 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 over a long period of time [5, 11]. There are many real-time Year data generated from some issues like geological disasters and geological environment. The geological cloud includes core The number of documents basic data, which can be divided into three parts: existing Figure 1: The trend of the number of documents in which “geolog- structured database, some unstructured data, and public ical data” is in the “Title” from 2007 to 2016. application data. Therefore, it is important to take good advantage of the existing traditional structured data, use the big data technologies to deal with the relevant unstructured 6000 data, and also consider the peripheral public data. 5000 Geological big data are multidimensional, and they 4000 consist of both structured and unstructured data [12]. The Number technical methods of big data analysis differ greatly from 3000 those of professional databases. Long-term geological survey, 2000 geological study, and years of geological information accu- 1000 mulation have formed a rich and professional database, which 0 is an important fundamental assurance for land and resources 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 science management, geological survey, and geological infor- Year mation public service [13]. This “professional cloud” objec- tively requires the technology research and development, The number of documents such as construction of professional local area network, Figure 2: The trend of the number of documents in which data sharing platform, and geological big data visualization “geological data” is in the “Topic” from 2007 to 2016. services. Hence, the construction of geological cloud is closely related to land resource management, deployment decision, and the application demand of public service. The key tech- nologies of research and development include the following: data in GIS is an interesting and important research topic unstructured data extraction and mining analysis, structured currently. and unstructured data mixed storage and management, big Considering the development status of cloud-enabled data sharing platform, data transmission, and visualization geological information services (CEGIS) and the applica- [11]. tion requirements of big data management analysis, this Generally, the construction of geological cloud is a long- article describes the significant impact and revolution on term systematic project. This means that it is required to GIS brought by the advancement of big data technologies. follow the basic principles of “standing on the reality, focusing Furthermore, this article outlines the future application on the future” and “focusing on the long-term and overall development and technology development trend of big data situation, embarking on the current and local situation,” in management analysis in CEGIS. order to achieve the analysis and application of geological The remainder of this article is organized as follows. cloud public data and core data gradually in accordance with In Section 2, we provide a review on CEGIS, with an the technical route of big data analysis; thus the construction emphasis on the descriptions for the system architecture and of geological cloud will be implemented eventually. For the those requirements from big data management. Then, the earth, the land and resources management should cover challenges for big data management in CEGIS are presented many respects, including human behavior, climate change, in Section 3. Furthermore, the key technologies and trends development and utilization of various resources, natural on big data management in CEGIS are analyzed in Section 4. disasters, environmental pollution, and the ecosystem cycle. Finally, conclusion is drawn in Section 5. Then, the introduction of big data technologies can integrate this type of resource information to provide the ability of 2. Review on Cloud-Enabled Geological uniformly dealing with the problems related to the entire Information Services earth information resources, which has a significant effect on the strategic planning of land and resources management [3]. The construction of geological cloud differs from the current The geological cloud is an important part in the science big data analysis based on Internet and Internet of Things system for geological data research. The ultimate goal for (IoT). Having a deep understanding of data characteristics is developing geological cloud is to better describe and under- necessary to collect, process, analyze, and interpret data in stand the complex earth system and geological framework,
Scientific Programming 3 Geological cloud Data Computing Business network resources resources Geological Internet private information public area area service Communication Satellite remote satellite sensing Field data collection Data analysis processing Marine geological survey Data services Figure 3: The system architecture of geological cloud. provide the scientific basis for the description of the land including data exchange and audit, can be carried out surface and the biodiversity characteristics of the earth, and by single-directional light gate. improve the ability to deal with complex social problems. (iii) “One Main Node and Three Domain-Specific Nodes.” One main node is constructed in China Geologi- 2.1. System Architecture. Because the business service func- cal Survey Development Research Center. In addi- tions of each country are different, the system architecture of tion, three domain-specific nodes—namely, marine the geological cloud would vary. In the following, we present node, geological environment node, and aviation geo- a system architecture in Figure 3 [14], using China as an physical exploration and remote sensing node—are example. constructed, respectively. Each node is configured The geological cloud combines the geological survey with the corresponding servers, storage equipment, Intranet and the geological survey Extranet. It enables the network equipment, management platform, large- sharing and management of computing resources, storage scale specialized data processing software system and resources, network resources, software resources, and geolog- various customized applications. Each node would ical data resources [15]. store huge amounts of geological data and conform Geological cloud can be summarized with the following to current data security standards. The master node characteristics [14]. and the domain-specific nodes are linked via optical fibers. The master node will consist of 200 computing (i) “One Platform: The Geological Cloud Management nodes with 3 PB storage capacity and will be equipped Platform.” It uniformly manages computing with some geological data processing software system. resources, storage resources, network resources, The master node will be hosted in a medium-sized software resources, and geological data resources. supercomputing center and it will provide support for (ii) “Two Networks: The Geological Survey Intranet and the three-dimensional seismic exploration data pro- the Geological Survey Extranet.” Here, the Intranet is cessing and other large-scale computing. The three constructed by creating a network that is physically domain-specific nodes are to maintain their scale in isolated from the Internet. The Intranet is developed the near future to facilitate reasonable scheduling and on the basis of the existing geological survey network efficient utilization of information resources and data and each node is linked through a dedicated line or resources. bare fiber. All of the internal business management systems, software systems, and data are deployed On the Extranet, it deploys a system for geological on the Internet, providing services to 28 local units survey business management and auxiliary decision-making. and those users of more than 350 geological survey The system provides a real-time tracking and management projects. Facilitated by the public geological survey function for geological survey projects and various resources. network, the geological survey business management Main users of the geological cloud include institutional system, geological data information service system, users, geological survey project users, and the general public and public geological data can be deployed on the users. The institutional users can store the current geological Extranet accessed by the general public. The com- database and newly collected data in the geological cloud munication between the Intranet and the Extranet, through the geological survey business network and can
4 Scientific Programming Cloud-Enabled Geological Information Service Big Data Management The Large-scale Parallel Processing Framework Using MapReduce Typical Tools PowerView Sqoop Karmasphere HDFS Hibe Hbase Mahout 1 2 3 4 Data Pre- Data Data Result Processing Storage Mining/Analysis Display Figure 4: Schematic diagram of big data analysis. obtain the geological data of other institutions from the cloud and service, through the use of advanced cloud computing as needed. The geological survey project users can access system, IoT, and big data processing flow. It is shown in the cloud geological background data through 4G or satellite Figure 4 [5]. lines and can collect data through data the collection system. 3. Challenges for Big Data 2.2. Requirement from Big Data Management. The construc- Management in Cloud-Enabled tion of geological cloud must meet customer demand. Big Geological Information Services data technologies are then used as the means to implement the geological cloud. Geological big data are generated regarding various layers of The types and quantity of geological data have been the earth, the history of the conformation and evolution of continuously growing over the years. Geological data include the earth, and the material composition of the earth and its all kinds of electronic documents, structured, semistruc- changes. It also involves the exploration and utilization of tured, and unstructured data, such as various databases (map mineral resources. In the current geological work, the collec- database, spatial database, and attribute database), pictures, tion, mining, processing, analysis, and utilization of various tables, video, and audio. Generally speaking, those important complex type data are closely related to those general big data. data may be buried in the massive data without the guidance The “4V” characteristics of big data—namely, Volume, Veloc- for requirements. Hence, the first step is to understand the ity, Variety, and Veracity—also apply to geological big data. user requirements and then gain the capability of large-scale data processing. This is followed by data mining, algorithm, 3.1. Volume. Currently, there is no consensus on the size of and analysis, which will ultimately generate value. Big data geological data. Geological big data are a collection of data, technologies in the field of geography must meet different including geology, minerals, remote sensing, geophysical needs from people at different levels, including the public exploration, geochemical exploration, surveying, and map- demand of the geologic data services and professional data ping, which are interconnected and integrated. In terms of the demand for geological research institutions, as well as related number of mines, there are at least 70000 in China, and some enterprises and government departments [16]. official documents and popular science books indicate that On the basis of big data analysis technologies, a complete there are more than 200000 deposits and minerals that have data link is formed connecting data, information, knowledge, been found. The information is huge and cannot be processed
Scientific Programming 5 using conventional tools. For example, an Excel spreadsheet database (covering the national exploration right, mining cannot contain all the information of 70000 mining areas. right, mining right verification, geological information meta- Then, it is difficult to classify and rank the 200000 mines, so data database, and many others) [17]. it is necessary to rely on the concepts and technologies of big For those databases, they are still expanding and con- data [17]. summating, and their practical values have not yet been Especially in recent years, images, video, and other types fully reflected. However, the vast majority of researchers are of data have emerged on a large scale. With the application virtually impossible to have all of the above data, at most, of 3D scanning and other devices, the data volume has using their own accumulated data. Anyway, even if their been increasing exponentially. The ability to describe the accumulated data, both on the quantity and on type, is incom- data is more and more powerful, and the data are gradually parable by 10 years and 20 years ago, they are, in fact, in the era approximated to the real world. In addition, the large amount of the “relatively big data.” From 1999 to 2004, for example, of data is also reflected in the aspect that the methods and in “the Chinese mineralization system and regional metalor- ideas used by people to deal with data have undergone ganic evaluation” project, although there are 202 national a fundamental change. In the early days, people used the academic experts that participated in it, they only master sampling method to process and analyze data in order to data of 4500 properties (all kinds of minerals). From 2006 to approximate the objective with a small number of subsample 2013, the study of “national important mineral and regional data. With the development of technologies, the number of mineralization laws” was conducted; meanwhile, the mining samples gradually approaches the overall data. Using all the area covered only by the mineral resources research institute data can lead to a higher accuracy, which can explain things was 30600. Therefore, the increase of information and the in more detail [18]. amount of data are unprecedented in the last ten years. Recently, the China geological survey system has built databases including regional geological database (cover- 3.2. Variety. From the formal point of view, the geological big ing the 1 : 2500000, 1 : 1000000, 1 : 500000, 1 : 250000, and data have many characteristics, including multidimensional- 1 : 200000 regional geological map; the national 1 : 200000 ity, multiscale, and multitenses. And they contain structured, natural sand; the isotope geological dating; and the lithos- semistructured, and unstructured data and usually are stored tratigraphic unit database), basic geological database (cov- in forms of text, graphics, images, databases (including image ering the national rock property database and national geo- database, spatial database, and attribute database), tables, logical working degree database), mineral resources database videos, and audios in a fragmented state. For example, a (covering the national mineral resources, the national min- large number of field outcrop description data, borehole eral resources utilization survey mining resources reserves core description data, and all kinds of geological survey, verification results, the national survey of large and medium- exploration report, and a large number of geological maps, sized mines, the prospect of mineral resources, the survey of drawings, and photos were stored and managed in the form the resources potential of major solid mineral resources in of paper for a long time; even the numerous relational China, and the geological and mineral resources database), databases and spatial databases were primarily used to store oil and gas energy database (covering the oil and gas basins and manage structured data that are tabulated and vectorized, in China, the geological survey results of the national oil while the text descriptions, records, and summaries were and gas resources, the national petroleum and geophysical directly stored. Very few standardized processing and struc- exploration, national shale gas, national coal bed methane, tural transformations were performed. Furthermore, there is national natural gas hydrate, and other databases), geophys- no tool available to effectively integrate storage and manage ical database (covering 1 : 1 million, 1 : 500000, 1 : 250000, structured, semistructured, and unstructured data. 1 : 200000, and 1 : 50000 gravity, national regional grav- ity, national aeromagnetism, national ground magnetism, national electrical survey, seismic survey, national aviation 3.3. Velocity. The increase of geological data is very fast, radioactivity, and national logging database), geochemi- especially in remote sensing geology, aviation geophysical cal database (covering the databases of national 1 : 250000 exploration, regional geochemical exploration, and other and 1 : 20 geochemical exploration, national multiobjective fields, due to the introduction of new technologies and geochemical and national land quality evaluation results), new methods. Meanwhile, high speed processing is also a remote sensing survey database (covering national aeronau- characteristic of big data. In addition to the need of analyzing tical remote sensing image, China resources satellite data, data in real time, people also need to describe the results space remote sensing image, national mine environmental of data mining and processing through the use of several remote sensing monitoring, national high score satellite, and data processing techniques, such as image and video, while other databases), drilling database (covering the national requiring effective and efficient handling skills. For example, geological borehole information, the national important the detection of the deep earth information not only needs geological borehole, the Chinese mainland scientific drilling to obtain parameters of the seismic wave reflection and core scanning image library, and so on), hydraulic cycle refraction but also needs to conduct quick processing, so as hazards database, data literature database, special subject to timely predict whether earthquake will occur and forecast database (covering the national mineral resources poten- the time, location, and intensity. In this way, we can avoid tial evaluation database, the important mineral “three-rate” the disaster effectively. When applying a variety of data to investigation and evaluation database), work management a particular mountain, one should learn which ones have
6 Scientific Programming spatial limitations and which are not related to spatiality, 4. Key Technologies and Trends on Big Data so that one deduces the metallogenic law and guides the Management in Cloud-Enabled Geological prospecting better [17]. Information Service (CEGIS) 3.4. Veracity. For the understanding of the value of big data, With the rapid advancement of big data technologies, some most people consider it low value density. It means that the key technologies are accordingly developed for big data real useful information in the vast amount of data is very little. management in CEGIS. Specifically, a schematic diagram of Taking video as an example, the useful data may be only a those key technologies is shown in Figure 5. Then, in this second or two in the continuous monitoring process. While section we present an analysis on those key technologies. big data is high value, it does not need to be invested too Meanwhile, the trends along this direction are also discussed. much; just collecting information from the Internet can bring business value. Therefore, big data has the characteristics 4.1. Geological Big Data Collection and Preprocessing. Geolog- of low value density and high business value. The same is ical big data collection and preprocessing aim to categorize true for geological big data. So far, there has been a lot of those geological big data obtained from geological data, information about geophysical prospecting, but only a few geological information, and geological literature. have been confirmed, and the discovered mines were less. But once a breakthrough was made, its socioeconomic value was enormous, such as the lithium polymetallic deposit in Tibet 4.1.1. Geological Data Collection Access. In addition to the and the newly discovered Jima copper polymetallic deposit in traditional collection ways, it is also required to carry out the outskirts of Sichuan [17]. large-scale network information access and provide real- In addition, the spatial attribute and temporal attribute of time, high concurrency, and fast web content acquisition, geological data also bring a big challenge to data accuracy. combining with the application characteristics in the cloud Any geological data have spatial attribute, and their values environment. Currently, considering that the growth rate of are reflected in the spatial law of distribution of mineral geological information generated from the network is very resources. For this reason, in the process of establishing fast, the big data analysis system should obtain relevant data the metallogenic series, exploring the metallogenic law, and quickly. constructing the mathematical model, the spatial attribute of the metallogenic model should be considered. Obviously, 4.1.2. Quality and Usability Characteristics of Geological Data. every metallogenic series has its spatial attribute. Geological It needs to distinguish and identify valuable information data also has the time attribute, which is very different from through intelligent discovery and management technologies. physical, chemical, and other natural sciences. One of the Because the information value density contained in different fundamental pillars of geology is the geological time scale. data sources differs from each other, filtering out the useless The rocks, strata, and deposits of different geological periods or low-value data source can effectively reduce the data stor- have different distribution characteristics and regularity, so age and processing costs. Then, it can also further improve those data have their own time attribute. the efficiency and accuracy of analysis. It is obvious that those characteristics of geological big data mentioned above impose very challenging obstacles 4.1.3. Geological Data Entity Recognition Model. According to the data management in CEGIS. The challenges related to the subject domain of geology, the distributed data are to geological big data management can be summarized as extracted to form a data warehouse, after conducting the follows: operation of processing and integration. When extracting (i) It is quite difficult to describe and model geological data in the field of geology, it needs to use entity modeling big data, since there are few effective characteris- method to abstract entities from the vast numbers of data, tics description mechanisms and object modeling so as to find out the relationship between those entities. This approaches under the cloud computing environment. approach ensures that the data used in warehouse data can be consistent and relevant in accordance with the data model (ii) There remain many technical issues that must be [19]. These recognized data are directly input into the system, addressed to fully manage, mine, analyze, integrate, stored as metadata, which could be used for data management and share those geological big data, in consideration and analysis. of those complex characteristics, including multi- source heterogeneous data, highly spatiotemporal variation, high-volume and high-correlation data, 4.1.4. Aggregation of Geological Big Data. Generally, dif- and many others. ferent data sources and even the same data source may (iii) Many issues appear in achieving decision support, generate data with different formats. As mentioned above, such as data incompleteness, data uncertainty, and because these structural, semistructured, and unstructured high-dimensionality of data. multimodal geological big data are integrated together, the data heterogeneity is obvious in big data analysis. Then, The broad range of challenges described here make good data aggregation as the key technology in achieving data topics for research within the field of big data management in extraction and transformation [20] enables data sharing and CEGIS. They are analyzed in the next section. data fusion between heterogeneous data sources. Through
Scientific Programming 7 Multi-Source Heterogeneous Data Multimodal Data Highly Spatio-Temporal Variation High-Volume and High-Correlation Data Low-Value-Density Data Complex and Uncertain Data Applications Characteristics: “4V” (Volume, Variety, Velocity, Veracity) Analysis and Mining Big Data Flow Key Technologies Geological Big Data Storage Preprocessing Collections Highly Performable Big Data Cloud Computing Platform Big Data Management Figure 5: Schematic diagram of key technologies for big data management in CEGIS. the use of heterogeneous information aggregation technolo- Here, we provide an example of aggregating and collect- gies, the unified data retrieval and data presentation could ing geological big data in CEGIS. Figure 6 illustrates this be achieved. On the basis of it, after aggregating those process. While developing CEGIS, all kinds of geological distributed heterogeneous data sources, they are extracted data should be processed. Through the use of geological and converted to achieve the functions of automatically cloud, big data are collected, and then they are aggregated to constructing subject domain database and data warehouse achieve some key functions in geological information service [21]. platform, including catalog sharing, intelligent searching, data products release, and collaborative service. 4.1.5. Management of Geological Big Data Evolution Tracking Records. In order to effectively utilize geological big data, it 4.2. Geological Big Data Storage and Management. From needs to track the evolution of big data during the whole life the data collection perspective, geological data can be cycle of GIS, with the purpose of achieving the traceable big divided into field survey data, drilling and engineering explo- data management. ration data, remote detection data, analytical test data, and
8 Scientific Programming Organizing by rules Regional Geological geology Cloud Energy geology Geological information service platform Mineral Intelligent data Catalog Intelligent Data products Collaborative geology collecting sharing searching release service Hydrogeology Sharing and Location based service Exchanging by Permissions Aggregating by levels Background data Aggregating Collecting support big data big data Data uploading to Programme cloud Online analysis Project team Intelligent survey in the terminal Field work group Figure 6: An example of aggregating and collecting geological big data in CEGIS. comprehensive study data. From the angle of comprehensive a hybrid distributed storage model through the use of cloud application fields, they can be also divided into regionally advantages of flexibly scalable deployment, to meet the users’ geological survey data, energy and mineral resources evalu- requirement for geological big data resource management ation and exploration survey results data, geological disaster with satisfactory data durability and high availability [23]. monitoring and early warning data, geological environment Here, the hot research topics include the following: survey and evaluation results data, and marine geological (i) For geological applications, the load optimization survey and evaluation data. From the data formality point storage should be implemented to achieve the cou- of view, they can be divided into picture data, text report pling for data storage and application and the cou- data, tabular data, and image data. These data are collected pling for distributed file system and the new storage by different units. system. Facing these complex geological big data mentioned above, the traditional relational database will be difficult to (ii) Based on the application characteristics of distributed databases, more studies could be conducted on the handle them, while the distributed storage system can be used application of new databases NoSQL and NewSQL in to store such huge amounts of data and manage them. Then, geological survey work. the data system places the massive data in many machines, which avoids such limitation of storage capacity, and also With the development of big data technologies, more and brings many problems that have not occurred before in stand- more mature distributed data storage solutions will emerge alone systems. Hence, some distributed data storage solutions and will be applied to big data analysis [24, 25]. have accordingly emerged, including Hadoop, Spark, and Specifically, in the management of geological big data, other nonrelational database systems (like HBase, MongoDB, the implementation of data query—for example, spatial and many others) [22]. These different solutions satisfy query—has been a long-term focus. Generally, considering the specific requirements from different applications. When those advantages with unified modeling language (UML) and applying to the analysis of big data, different solutions can computer-aided software engineering (CASE) methodology, be employed according to the specific needs of different the spatial database could be accordingly designed and intelligence analysis. Furthermore, different solutions can be implemented to characterize and realize the object-oriented combined to meet specific needs. Actually, there have been spatial vector big data firstly [26]. And then, in the developed some attempts to develop combination strategies for dis- spatial database, the function of self-generating codes would tributed storage model, varying in the big data management be achieved to realize two-way spatial query between graphic- performance requirement, and the complexity of collected objects and property data [26]. Moreover, in consideration of big data that are supported by the distributed storage system the complex characteristics of geological big data, the spatial [23]. Hence, there is still a room for improvement and query is achieved finally through the use of Flex technology in optimization of geological big data storage, while designing ArcGIS Server software platform [27]. Practically speaking,
Scientific Programming 9 in this technology, the spatial query could be implemented deal with terabytes or even petabytes of data, as well as through two functions, including “Query” and “Find” query multidimensional data, all kinds of noise data and dynamic methods [28]. data. Because data mining algorithms will directly influence the outcome of the discovered knowledge, selecting the most 4.3. Geological Big Data Analysis and Mining. In terms of geo- appropriate algorithms and parallel computing strategy is the logical data analysis and mining, it needs to combine geolog- key to data mining. ical data, geological information, and geological literatures, Effective data mining also could reduce manual inter- through the analysis of geological application demand of vention during information processing and make use of real-time mining, to explore geological big data environment methods and tools of big data intelligent analysis [35, 36]. analysis and mining algorithm, in an effort to fully achieve Recently, there has been a growing interest in the geological the goal of intelligent mining for geological big data. big data mining through the use of some novel computational Figure 7 shows a schematic diagram of discovering geo- intelligent methods—for example, rough set [37] and fuzzy logical knowledge through analyzing and mining geological aggregation [38]. Moreover, with the development of those big data. It can be easily found that geological big data analysis neural network based machine learning algorithms in recent and mining play an important role in achieving the final goal. years, some popular methods, including extreme learning More relevant research work related to it mainly involves the machine [39, 40], approximate dynamic programming [41], following aspects. and kernel learning [42], could be used to further improve mining effectiveness for geological big data in the future. 4.3.1. Geological Big Data Analysis. Considering the special applications, geological big data technologies would apply big 4.4. Highly Performable Big Data Cloud Computing Platform. data concepts to analyze the metallogenic rules by making Highly performable big data cloud computing platform is the full use of various data related to ore, to recognize deposit foundation for big data analysis. It enables parallel computing metallogenic series, to summarize the metallogenic regulari- for large-scale incremental real-time data and large-scale ties and express in an appropriate way (like voice, image, and heterogeneous data [43–46]. many others), and to establish the scientifically mathematical With the advent of massive data storage solution, model. The model then uses new exploration data to predict many big data distributed computing frameworks have future data and to guide geological prospecting. been proposed. Among them, Hadoop, MapReduce, Spark, In addition, it is necessary to pay special attention to and Storm are the most important distributed computing the analysis of new geological big data information collected frameworks. These frameworks have different characteristics from social medium and networks [29]. These include the and solve different problems in applications [47–50]. The geological text information flow data from microblog web Hadoop/MapReduce is often used for offline complex big sites, the geological multimedia data from media sharing data processing, the Spark is often employed in offline fast web sites, the geology-related user interaction data on social big data processing, and the Storm is often available for networking web sites, and many others [30]. These mul- real-time online big data processing. Different computing tisource data complement traditional big data. Specifically, frameworks have their different advantages and disadvan- such data should be addressed with the help of multilingual tages. Hadoop/MapReduce is easy to program, and it is with information processing, multilingual machine translation, satisfactory scalability and fault tolerance. In addition, it is and social network cross-language retrieval [31]. Big data suitable for offline processing of massive data with petabyte analysis of such data is a key to deep use of geological data in level, but it does not support real-time computation and flow a broader dimension. With the maturity of big data analysis calculation. Spark is a memory-based iterative computing technologies, it becomes possible to analyze and extract framework. By placing intermediate data in memory, Spark valuable information from these data [32] and to provide can achieve higher iterative calculation performance. The effective solutions for geological big data applications. programming model of Spark is more flexible than that of Hadoop/MapReduce, but Spark is not suitable for those 4.3.2. Geological Big Data Mining. Data mining is to extract applications in which the fine-grained updates are conducted the unknown and useful knowledge and information from asynchronously. Hence, Spark may be unavailable for those the massive multilevel spatiotemporal data and attribute data, application models that require incremental changes. Storm using statistics, pattern recognition, artificial intelligence, is suitable for stream data processing. It can be used to handle set theory, fuzzy mathematics, cloud computing, machine a stream of incoming messages and can write the processed learning, visualization, and relevant techniques and methods. result to a specified storage device. Another major application Data mining could reveal the relationship and evolution of Storm is real-time data processing where data are not trend behind the geological big data, achieve the automatic necessary to be written into storage devices, which usually or semiautomatic acquisition of the new knowledge, and results in low time delay. Hence, Storm is particularly suitable provide the decision basis for resource prediction, prospect- for scenarios where real-time online analysis is required to ing, environmental assessment, and disaster prevention and obtain results for big data analysis. mitigation [33]. Therefore, the knowledge is obtained directly An application example is geological big data aggregation from known geological data to provide relevant decision mining framework based on Hadoop [16]. Geological big support [34]. In consideration of the amount of data, it may data aggregation mining platform research is based on the
10 Scientific Programming Filtering Preprocessing Converting Original data data data geological data Evaluating Analyzing and Geological results mining data knowledge Practicability of machine learning and statistical algorithms Feasibility of intelligent mining Effectiveness of online analysis and mining ··· Figure 7: Schematic diagram of discovering geological knowledge through analyzing and mining geological big data. China geological survey data network, and it uses the Hadoop The 20th exploratory line Element content 100 technology to improve and modify existing platform, to make it suitable for big data applications, and to provide a platform 50 for the pilot applications. The geological survey grid platform can be updated in three layers—that is, the virtual layer, 0 1 4 7 10 13 16 19 22 25 28 31 34 37 40 the computing layer, and the terminal application layer. The virtual layer is the virtualization of computer resources based The element of Mn on Hadoop distributed file system (HDFS) virtualization The element of Co technology, which is the foundation of cloud computing and The element of Be cloud services. The computing layer mainly uses MapReduce Figure 8: Correlation among three element morphologies. method to implement the analysis algorithms for geological big data. Currently, the geological big data technologies mainly use the block calculation strategy to achieve parallel analysis through the utilization of the characteristics of the metallogenic law, it is necessary to fully understand the Hadoop, in an effort to speed up the analysis and processing massive data about spatial distribution, reserves and produc- of geological data. The terminal application layer is designed tion in mineral origin, the geological structure of the mineral to display the results and receive user feedback to improve origin, and related geological survey data. Then, it is to system availability. conduct the regular speculation and objectivity expression of MapReduce has been used to perform morphological cor- these geological big data, so that one can identify the essential relation analysis, which involves the analysis of geochemical reasons for the distribution of mineral origin. Actually, using data processing and the study of the correlation between mul- geological big data technologies could help to translate data tielements. Figure 8 shows the pattern correlation between into new understanding or knowledge and help to guide the elements. It can be seen from Figure 8 that the elements of future geological prospecting work. Mn, Co, and Be are similar in the distribution of morphology. Therefore, from a qualitative point of view, the correlation is relatively high. Moreover, after testing, the proposed 4.5.2. Smart Prospecting. The types of deposits vary, and prototype system is running three times more quickly than the formation of them is related to certain geological back- the existing common computing platform, showing that the grounds and geological effects, respectively. The geological geological big data is applicable to the Hadoop platform. backgrounds include tectonic unit and stratigraphic unit, Furthermore, some applications of using MapReduce could deep upper mantle and lithosphere conditions, and paleo- be found in [51]. geography and palaeoclimate environment on the surface of the earth. Geological effects include tectonism, magmatism, 4.5. Applications of Geological Big Data Technologies sedimentation, metamorphism, and weathering. Theses geo- logic backgrounds and effects in the wide range of space, 4.5.1. Exploration of Metallogenic Law. The metallogenic law and in the long geologic history, are a dynamic change is the human regular knowledge of the temporal and spatial and repeated stack, and large deposits can be formed only distribution of mineral resources, and its cognitive level, in a variety of favourable conditions. Long-term scien- ability, and scope are all related to the size of data, the type tific research and experience accumulation formed mineral of data, and the way of data processing. Therefore, to deduce deposit and mineralization prediction subject. Professionals
Scientific Programming 11 are guided by certain theories and methods to adopt quanti- knowledge of the entity. Due to the continuous release of tative or qualitative methods to predict prospecting with the user-generated content and linking open data on the Internet, existing knowledge and experience. people need to explore knowledge interconnection methods However, in view of the difficulties of geological data shar- which both conform to the development of the network ing and the limitations of calculation tools and calculation information resources and meet users’ requirements from methods, most of the known deposits in the past are indepen- a new perspective according to the knowledge organization dent of each other. In the future, we can use geological data principles in the large data environment, to reveal human to connect several adjacent deposit exploration data, conduct cognition on a deeper level [55]. unified analysis and specialized processing, determine the In this context, knowledge graph (KG) was formally put “digital” characteristics of the distribution of metallogenic forward by Google in May 2012, and its goal is to improve the materials, find out metallogenic potential, delineate the search results and describe the various entities and concepts abnormal area and prospective area, and promote geological that exist in the real world and the relationship between these prospecting. Furthermore, geological data informatization entities and concepts. KG is a great choice to select the essence and standardization could be improved [52]. and discard the dross, as well as the sublimation of the present semantic web technology. In recent years, the applications of KG have been increasing rapidly, and there is now a mature 4.5.3. Service of People’s Livelihood Geology. After entering method used to draw a KG and conduct intelligent searching the 21st century, geological work is more closely related research based on KG [34]. However, the function of KG to economic development, and geological work plays an has not been fully implemented at present, especially for the important role in every aspect of social and economic life. specific object of geological big data; the application aspect Agricultural geology, urban geology, environmental geology, still needs to be further strengthened. Along this direction, tourism geology, disaster geology, and other works have been the visualization service for geological data in the web-based strengthened, and the service area has also been expanded system is attracting more and more attention [56, 57]. [53]. Meanwhile, the public demand for geological informa- tion is increasingly urgent [54]. In order to meet the social demand for geological data, 5. Conclusion China Geological Survey carried out the construction of Big data technologies make it possible to process massive geological cloud, which built cluster geologic data service amount of unstructured and semistructured geological data. system with the National Geological Information Center And the geological cloud enables us to explore the application and the Provincial Geological Information Center as the of demand-driven geological core data and to extract new backbone nodes, conducted the integration of data resources, information from unstructured data, while supporting the and applied the GIS cloud technology, in order to obtain decision-making in land resources management. Thus, the large-scale computing ability and solve those key problems, geological cloud could effectively organize and use geological such as the distributed storage, processing, query, interop- big data, to mine the data scientifically, with the purpose erability, and virtualization of massive spatial data [5, 13]. of producing higher value and achieving the corresponding Recently, in China, Shandong Provincial Bureau of Geology service. and Mineral Resources also carried out the construction of In the architecture of geological cloud, this article “the application system of geological business based on e- describes the application background of CEGIS and the government cloud platform.” It mainly relies on the public demands from big data management. Furthermore, we elab- service cloud platform of the e-government in Shandong orate the application requirements and challenges faced in province and constructs the government external network big data management technologies. Then, more analyses are service system and Internet service system to achieve the provided from four aspects, including data size, data type, unified management and information service of the mineral data processing speed, and data processing accuracy, respec- resource. Using technical methods of spatial analysis, big tively. In addition, this article outlines the research status and data mining, and three-dimensional geological model, it technology development opportunities of big data related in develops a basic system framework for geological mining CEGIS, from the perspectives of big data acquisition and pre- services, featured by “a (cloud) platform, a (data) center, processing, big data storage and management, big data analy- and many application systems,” to improve the ability of the sis and mining, highly performable big data cloud computing people’s livelihood geological service, promote interaction platform, and big data technology applications. With the con- with the public, realize socialization services, and promote tinuous development of big data technologies in addressing the clustering and industrialization of the mineral resources those challenges related to geological big data, such as the information services. difficulties of describing and modeling geological big data with some complex characteristics, CEGIS will move towards 4.5.4. Application of Knowledge Visualization Service. With a more mature and more intelligent direction in the future. the continuous development of web technology, human beings have experienced the “Web 1.0” era, which is charac- Conflicts of Interest terized by document interconnection, and “Web 2.0”, which is characterized by data interconnection, and are moving The authors declare that there are no conflicts of interest towards the new “Web 3.0” era based on the interconnected regarding the publication of this article.
12 Scientific Programming Acknowledgments [17] D. Wang, X. Liu, and L. Liu, “Characteristics of big geodata and its application to study of minerogenetic regularity and This work was supported in part by the Key Laboratory of minerogenetic series,” Mineral Deposits, vol. 34, no. 6, pp. 1143– Geological Information Technology of Ministry of Land and 1154, 2015. Resources under Grant 2017320, the National Key Technolo- [18] B. Pan and R. Yang, “Management and utilization of big data gies R&D Program of China under Grant 2015BAK38B01, for geology,” Surveying and Mapping of Geology and Mineral and the National Key R&D Program of China under Grant Resources, vol. 33, no. 1, pp. 1–3, 2017. 2016YFC0600510. [19] P. Yang and L. J. Lu, “The research on encoding methodology of the character of geological entity based on mass geological data,” Advanced Materials Research, vol. 962-965, pp. 208–212, References 2014. [1] P. Vermeesch and E. Garzanti, “Making geological sense of ‘big [20] X. Luo, D. Zhang, L. T. Yang, J. Liu, X. Chang, and H. Ning, “A data’ in sedimentary provenance analysis,” Chemical Geology, kernel machine-based secure data sensing and fusion scheme in vol. 409, pp. 20–27, 2015. wireless sensor networks for the cyber-physical systems,” Future Generation Computer Systems, vol. 61, pp. 85–96, 2016. [2] J. Chen, J. Xiang, Q. Hu et al., “Quantitative geoscience and geological big data development: a review,” Acta Geologica [21] C.-L. Kuo and J.-H. Hong, “Interoperable cross-domain seman- Sinica, vol. 90, no. 4, pp. 1490–1515, 2016. tic and geospatial framework for automatic change detection,” Computers & Geosciences, vol. 86, pp. 109–119, 2016. [3] Y. Zhu, Y. Tan, R. Li, and X. Luo, “Cyber-Physical-Social- Thinking modeling and computing for geological information [22] Y.-J. Wang, W.-D. Sun, S. Zhou, X.-Q. Pei, and X.-Y. Li, service system,” International Journal of Distributed Sensor “Key technologies of distributed storage for cloud computing,” Networks, vol. 12, no. 11, 2016. Journal of Software , vol. 23, no. 4, pp. 962–986, 2012. [23] M. Armbrust, A. Fox, R. Griffith et al., “Above the clouds: a [4] M. M. Song, Z. Li, B. Zhou, and C. L. Li, “Cloud computing berkeley view of cloud computing,” Tech. Rep. UCB/EECS- model for big geological data processing,” Applied Mechanics 2009-28, University of California at Berkeley, California, Calif, and Materials, vol. 475-476, pp. 306–311, 2014. USA, 2009. [5] J. P. Chen, J. Li, N. Cui, and P. P. Yu, “The construction and [24] J. Xia, Z. Bai, B. Wang, J. Chang, and Y. Wu, “Design and application of geological cloud under the big data background,” implementation of comprehensive management platform for Geological Bulletin of China, vol. 34, no. 7, pp. 1260–1265, 2015. geological data informatization,” Acta Scientiarum Naturalium [6] C. Li, “The technical infrastructure of geological survey infor- Universitatis Pekinensis, vol. 50, no. 2, pp. 295–300, 2014. mation grid,” in Proceedings of the 18th International Conference [25] W. Hua, J. Liu, and X. Liu, “Data management of object on Geoinformatics, pp. 1–6, 2010. type geological features on control dictionary,” Earth Science - [7] L. Wu, L. Xue, C. Li et al., “A geospatial information grid Journal of China University of Geosciences, vol. 40, no. 3, pp. 425– framework for geological survey,” PLoS ONE, vol. 10, no. 12, 430, 2015. Article ID e0145312, 2015. [26] B. Jia, C. Wang, C. Liu, and W. W. Sun, “Design and implemen- [8] K. Evangelidis, K. Ntouros, S. Makridis, and C. Papatheodorou, tation of object-oriented spatial database of coalfield geological “Geospatial services in the Cloud,” Computers and Geosciences, hazards -Based on object-oriented data model,” in Proceedings vol. 63, no. 2, pp. 116–122, 2014. of the 2010 International Conference on Computer Application [9] M. Huang, A. Liu, T. Wang, and C. Huang, “Green data gath- and System Modeling, ICCASM ’10, pp. V1282–V1286, Taiyuan, ering under delay differentiated services constraint for inter- China, IEEE, October 2010. net of things,” Wireless Communications and Mobile Comput- [27] http://server.arcgis.com/. ing, 2018, http://downloads.hindawi.com/journals/wcmc/aip/ [28] X. Zhou, X. Li, A. Chen et al., “Design and implementation of 9715428.pdf. the service system of spatial data for geological data,” Journal of [10] https://www.webofknowledge.com/. Geomatics, vol. 38, no. 4, pp. 57–60, 2013 (Chinese). [11] C. Yang, M. Yu, F. Hu, Y. Jiang, and Y. Li, “Utilizing Cloud Com- [29] H. Huang, Z. Chao, and C. Feng, “Opportunities and challenges puting to address big geospatial data challenges,” Computers, of big data intelligence analysis,” CAAI Transactions on Intelli- Environment and Urban Systems, vol. 61, pp. 120–128, 2017. gent Systems, vol. 11, no. 6, pp. 719–727, 2016. [12] L. Wu, L. Xue, C. Li et al., “A knowledge-driven geospatially [30] S. Jin, W. Lin, H. Yin, S. Yang, A. Li, and B. Deng, “Community enabled framework for geological big data,” ISPRS International structure mining in big data social media networks with Journal of Geo-Information, vol. 6, no. 6, article no. 166, 2017. MapReduce,” Cluster Computing, vol. 18, no. 3, pp. 999–1010, [13] Y. Tan, “Architecture and key issues of geological big data and 2015. information service project,” Geomatics World, vol. 23, no. 1, pp. [31] C. C. Yang, C.-P. Wei, and L.-F. Chien, “Managing and mining 1–9, 2016. multilingual documents: Introduction to the special topic issue [14] Y. Tan, “Architecture investigation of the construction of geo- of information processing management,” Information Process- logical big data system,” Geological Survey of China, vol. 3, no. ing & Management, vol. 47, no. 5, pp. 633-634, 2011. 3, pp. 1–6, 2016. [32] X. Luo, J. Deng, W. Wang, J.-H. Wang, and W. Zhao, “A [15] W. He and Y. Wang, “Prototype system of geological cloud quantized kernel learning algorithm using a minimum kernel computing,” Progress in Geophysics, vol. 29, no. 6, pp. 2886– risk-sensitive loss criterion and bilateral gradient technique,” 2896, 2014. Entropy, vol. 19, no. 7, article 365, 2017. [16] Y. Zhu, Y. Tan, J. Zhang, B. Mao, J. Shen, and C. Ji, “A [33] C. H. Tse, Y. L. Li, and E. Y. Lam, “Geological applications framework of hadoop based geology big data fusion and mining of machine learning in hyperspectral remote sensing data,” in technologies,” Acta Geodaetica et Cartographica Sinica, vol. 44, Proceedings of Conference on Image Processing - Machine Vision no. S0, pp. 152–159, 2015. Applications VIII, 2015.
You can also read