This operation can be done in microbatch, and in two modes, either the system is hot or stopped. Stream processing: stream loading tools are growing every day, with the appearance of new APIs (application programming interface). This transformation is carried out by the algorithms of MapReduce. A third collection method is the Spooq mode, which allows the step of import/export of data from a relational database to storage based on big data, either in HDFS file format of Hadoop, or to NOSQL tables. ![]() This method responds effectively to the process of extracting and importing big data. To this end, the system creates a network of nodes for the synchronization of big data. The second mode is based on ETL techniques (extract, transform, load). The first mode concerns the collection of massive data done locally then integrated successively in our storage system. 1.īatch processing: the big data framework has three modes of data collection. To integrate large volumes of data from the building blocks of companies’ information systems, ETLs, enterprise application integration (systems) (EAIs), and enterprise information integration (EIIs) are always used ( Acharjya and Ahmed, 2016 Daniel, 2017). Thus certain integration tools, including a big data adapter, already exist on the market this is the case with Talend Enterprise Data Integration–Big Data Edition. Hadoop uses scripting via MapReduce Sqoop and Flume also participate in the integration of unstructured data. In the big data context, data integration has been extended to unstructured data (sensor data, web logs, social networks, documents). The big data collection phase can be divided into two main categories, which depend on the type of load, either batch, microbatch, streaming. Then we will look at the different formats for storing structured and unstructured data. In what follows, we will make a comparative study of the tools that make this collection operation with respect to the norms and standards of big data. The data are then stored in the HDFS file format or NOSQL database ( Prabhu et al., 2019). Indeed, in this phase the big data system collects massive data from any structure, and from heterogeneous sources by a variety of tools. Integration consists all the data into the big data storage. 6.Ĭompression consists of reducing the size of the data, but without losing the relevance of the data. The transformation can lead to the division, convergence, normalization, or synthesis of the data. Noise reduction or removing involves cleaning data. ![]() ![]() 3.Ĭonstant validation and analysis of data. 2.įiltration and selection of incoming information relevant to the business. Identification of the various known data formats, by default big data targets the unstructured data. The components in the loading and collection process are: 1.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |