arXiv's Accessibility Forum starts next month!

Help | Advanced Search

Computer Science > Databases

Title: the data lakehouse: data warehousing and more.

Abstract: Relational Database Management Systems designed for Online Analytical Processing (RDBMS-OLAP) have been foundational to democratizing data and enabling analytical use cases such as business intelligence and reporting for many years. However, RDBMS-OLAP systems present some well-known challenges. They are primarily optimized only for relational workloads, lead to proliferation of data copies which can become unmanageable, and since the data is stored in proprietary formats, it can lead to vendor lock-in, restricting access to engines, tools, and capabilities beyond what the vendor offers. As the demand for data-driven decision making surges, the need for a more robust data architecture to address these challenges becomes ever more critical. Cloud data lakes have addressed some of the shortcomings of RDBMS-OLAP systems, but they present their own set of challenges. More recently, organizations have often followed a two-tier architectural approach to take advantage of both these platforms, leveraging both cloud data lakes and RDBMS-OLAP systems. However, this approach brings additional challenges, complexities, and overhead. This paper discusses how a data lakehouse, a new architectural approach, achieves the same benefits of an RDBMS-OLAP and cloud data lake combined, while also providing additional advantages. We take today's data warehousing and break it down into implementation independent components, capabilities, and practices. We then take these aspects and show how a lakehouse architecture satisfies them. Then, we go a step further and discuss what additional capabilities and benefits a lakehouse architecture provides over an RDBMS-OLAP.
Subjects: Databases (cs.DB)
Cite as: [cs.DB]
  (or [cs.DB] for this version)
  Focus to learn more arXiv-issued DOI via DataCite

Submission history

Access paper:.

  • Other Formats

license icon

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

latest research papers on data warehouse

TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

TDWI Upside - Where Data Means Business

  • Upside Home
  • Trends in Analytics
  • Data & Business Leadership
  • IT & Enterprise Data Management
  • Practical Data Science

latest research papers on data warehouse

The Outlook for Data Warehouses in 2023: Hyperscale Data Analysis at a Cost Advantage

As the push to become more data-driven intensifies, enterprises will be turning to hyperscale analytics.

  • By Chris Gladwin
  • December 14, 2022

The challenge for business leaders as they look to build on digital transformations is not that they need more data for decision-making. Most businesses already have enough data -- and it just keeps growing.

For Further Reading:

What organizations really need are better ways to manage the terabytes, petabytes, and, in some cases, exabytes of data being generated by their users, customers, applications, and systems. They are looking to turn raw data into actionable data and do so without experiencing the escalating costs associated with consumption-based cloud pricing, where expenses can rise sharply with use.

In 2022, we have seen CIOs already start to navigate a tough global economy. Businesses of all sizes are looking for deployment options and licensing terms that let them do more with more data but without runaway costs.

Heading into 2023, the way many organizations will become more data-driven is through modernization of their data warehouses, pipelines, and tools. They will adopt new, cloud-native platforms that are not only faster and more scalable but also engineered for increasingly complex data sets that are integral to digital business. Here are the most important trends worth noting.

Trend #1: Hyperscale will become mainstream

Big data keeps getting bigger. For the past 20 years, enterprise databases have been measured in terabytes. These days, a growing number of organizations are dealing with petabytes of data, a thousand times more. A select few are wrangling exabytes -- a million terabytes.

In other words, data-intensive businesses are moving beyond big data into the realm of hyperscale data, which is exponentially greater. That requires a reevaluation of data infrastructure.

What is driving this kind of data at super scale? More data is being created by more sources -- autonomous vehicles and telematics, sensor-enabled IoT networks, billions of mobile devices, healthcare monitoring, smart homes and factories, 5G networking, and edge computing, to name just a few.

The technology teams responsible for growing data volumes can see the writing on the wall -- even if their databases are not petabyte-scale today, it’s only a matter of time before they will be. For this reason, scalability and elasticity -- the ability to add CPU and storage resources instantaneously -- have become top priorities.

There are many ways to scale up and scale out, from adding server and storage capacity on premises to auto-scaling “serverless” cloud database services to manually provisioning cloud resources. In 2023, data warehouse vendors are sure to develop new ways to build and expand these systems and services.

It’s not just the overall volume of data that technologists must plan for, but also the burgeoning data sets and workloads to be processed. Some leading-edge IT organizations are now working with data sets that comprise billions or trillions of records. In 2023, we could even see data sets of a quadrillion rows in data-intensive industries such as adtech, telecommunications, and geospatial.

Hyperscale data sets will become more common as organizations leverage increasing data volumes in near real time from operations, customers, and on-the-move devices and objects.

Trend #2: Data complexity will increase

The nature of data is changing. There are both more data types and more complex data types with the lines continuing to blur between structured and semistructured data.

At the same time, the software and platforms used to manage and analyze data are evolving. New purpose-built databases specialize in different data types -- graphs, vectors, spatial, documents, lists, video, and many others.

Next-generation cloud data warehouses must be versatile -- able to support multimodal data natively to ensure performance and flexibility in the workloads they handle.

The need to analyze new and more complex data types, including semistructured data, will gain strength in the years ahead, driven by digital transformation and global business requirements. For example, a telecommunications network operator may look to analyze network metadata for visibility into the health of its switches and routers, or a shipping company may want to run geospatial analysis for logistics and route optimization.

Trend #3: Data analysis will be continuous

Data warehouses are becoming “always on” analytics environments. In the years ahead, the flow of data into and out of data warehouses will be not just faster but continuous.

Technology strategists have long sought to utilize real-time data for business decision-making, but architectural and system limitations have made that challenging, if not impossible. Also, consumption-based pricing could make continuous data cost prohibitive.

Increasingly, however, data warehouses and other infrastructure are offering new ways to stream data for real-time applications and use cases.

Popular examples of real-time data in action include stock-ticker feeds, ATM transactions, and interactive games. Now, emerging use cases such as IoT sensor networks, robotic automation, and self-driving vehicles are generating more real-time data that needs to be monitored, analyzed, and utilized.

The Year Ahead: Both Strategic and Cost Advantages

In 2023, the data warehouse market will continue to evolve, as businesses seek new and better ways to manage expanding data stores that, for a growing number of organizations, will reach hyperscale.

It’s not just more data but the changing nature of data -- increasingly complex and continuous -- that will compel data leaders to reassess their strategies and modernize their platforms.

Even so, there are limits to what businesses will spend for petabyte- and exabyte-size data warehouses. They must provide both strategic advantages and cost advantages. In 2023, the data warehouse platforms that can do both are most likely to win in the market.

About the Author

Chris Gladwin is the CEO and co-founder of Ocient , whose mission is to provide the leading platform the world uses to transform, store, and analyze its largest data sets. In 2004, Chris founded Cleversafe, which became the largest object storage vendor in the world according to IDC. The technology Cleversafe created is used by most people in the U.S. every day and generated over 1,000 patents granted or filed. Chris was the founding CEO of startups MusicNow and Cruise Technologies and led product strategy for Zenith Data Systems. He started his career at Lockheed Martin as a database programmer and holds an engineering degree from MIT. You can reach Chris via email , Twitter , or LinkedIn .

Related Articles

  • Our Cloud Migration Story: Challenges Faced and Lessons Learned
  • Best Practices for Smarter Cloud Spending
  • From Data-Driven to Data-Centric: The Next Evolution in Business Strategy

Trending Articles

latest research papers on data warehouse

Is Your Data AI-Ready?

latest research papers on data warehouse

Artificial Intelligence Versus the Data Engineer

latest research papers on data warehouse

Three Signs You Might Need a Data Fabric

latest research papers on data warehouse

Digital Transformation: Making Information Work for You

Tdwi membership, accelerate your projects, and your career.

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.

Membership Information

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

data-logo

Article Menu

latest research papers on data warehouse

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Data warehousing process modeling from classical approaches to new trends: main features and comparisons.

latest research papers on data warehouse

1. Introduction

  • ETL process modeling based on UML;
  • ETL process modeling based on ontology;
  • ETL process modeling based on MDA;
  • ETL process modeling based on graphical flow which includes BPMN, CPN, YAWL, and the data visualization flow;
  • ETL process modeling based on ad hoc formalisms, which include conceptual constructs, CommonCube, and EMD;
  • ELT process modeling approaches for Big Data.
  • We perform an exhaustive study through a systematic literature review in the data warehousing modeling field;
  • We propose a new classification system for ETL/ELT process modeling approaches;
  • We identify a set of comparison criteria, on which we based our literature review;
  • We define and compare the existing categories of approaches;
  • We investigate the new trends of ETL/ELT, specifically in the context of Big Data warehousing;
  • Finally, we provide a set of recommendations and an example for comparative study.

2. Comparison Criteria and Features for Modeling Data Warehousing Processes

3. summary and comparison of etl/elt process modeling approaches, 3.1. proposed classification of etl/elt process modeling approaches, 3.2. etl process modeling approaches based on uml, 3.2.1. summary of etl process modeling approaches based on uml, 3.2.2. comparison of uml-based approaches, 3.3. etl process modeling approaches based on ontology, 3.3.1. summary of etl process modeling approaches based on ontology, 3.3.2. comparison of ontology-based approaches.

  • Reusability: According to the authors, the proposed model (or a part of it) is reusable.
  • Formal specification: In our context, this is the definition of requirements, tasks, and data schemas in a formal way, by defining a vocabulary and expressions dedicated to these purposes. Formal specification is used too much in ontology-based modeling to formalize the developed ontologies. Moreover, this method allows for simplification of the presented model and facilitates its understanding.
  • Business requirement.
  • The type of ontology: domain ontology or application ontology. The application ontology models the useful knowledge for specific applications and, according to [ 46 ], should provide the ability for modeling various types of information, including the concepts of the domain, the relationships between those concepts, the attributes characterizing each concept and, finally, the different representation formats and (ranges of) values for each attribute. In contrast, the domain ontology is a more general ontology, which may pre-exist and may be developed independently of the data repositories. It enables the reuse, organization, and communication of knowledge and semantics between information users and providers [ 59 ].
  • The type of data heterogeneity treated: structural heterogeneity, semantic heterogeneity, or both. In [ 44 ], it was considered that structural heterogeneity arises from data in information systems being stored in different structures, such that they need homogenization; while semantic heterogeneity considers the intended meaning of the information items. In order to achieve semantic interoperability in a heterogeneous information system, the meaning of the interchanged information must be understood across the systems.
  • The proposed ontological approach, either based on a single-ontology approach, a multiple-ontology approach, or a hybrid approach. According to [ 60 ], single-ontology approaches use one global ontology to provide a shared vocabulary for the specification of the semantics. All information sources are related to one global ontology. In multiple-ontology approaches, each information source is described by its separate ontology. In principle, the source ontology can be a combination of several other ontologies, but the fact that the different source ontologies share the same vocabulary is not guaranteed. In hybrid approaches, the semantics of each source is described by its ontology, but all of the local ontologies use the shared global vocabulary. Each type of approach has advantages and disadvantages. More details are provided in [ 60 ].

3.4. ETL Process Modeling Approaches Based on MDA

  • CIM: A computation-independent model is placed on the top of the architecture, which is used only to describe the system requirements. This model helps to present exactly what the system is expected to do. It is also known in the literature as a “domain model” or “business model”.
  • PIM: A platform-independent model is a model of a sub-system that contains no information specific to the platform or the technology used to realize it [ 61 ].
  • PSM: A platform-specific model is a model of a sub-system that includes information about the specific technology for its implementation on a specific platform and, hence, possibly contains elements specific to the platform [ 61 ].
  • QVT: Query, view, transformation is a Meta-Object Facility (MOF) standard for specifying model transformations [ 63 ]. The QVT language can ensure the formal transformations between the different models of MDA layers (CIM, PIM, and PSM).
  • Code: An interpretation of the PSM model already obtained can be used to generate an application code and execute it using an appropriate tool.

3.4.1. Summary of ETL Process Modeling Approaches Based on MDA

3.4.2. comparison of mda-based approaches, 3.5. etl process modeling approaches based on graphical flow formalism, 3.5.1. summary of etl process modeling approaches based on bpmn, 3.5.2. summary of etl process modeling approaches based on cpn, 3.5.3. summary of etl process modeling approaches based on yawl, 3.5.4. summary of etl process modeling approaches based on data flow visualization, 3.5.5. comparison of graphical flow formalism-based approaches, 3.6. etl process modeling approaches based on ad hoc formalisms, 3.6.1. summary of etl process modeling approaches based on commoncube, 3.6.2. summary of etl process modeling approaches based on emd, 3.6.3. comparison of ad hoc formalism-based approaches, 3.7. elt process modeling approaches for big data, 3.7.1. summary of elt process modeling for big data, 3.7.2. comparison of elt process modeling approaches for big data, 4. discussion and findings.

  • The modeling methods based on standard modeling languages for the software development, such as UML or BPEL, or based on the standard notation BPMN, were confirmed to be powerful methods, as they favor standardization of the ETL workflow design. In addition, these standard-based methods are easy to implement, as recognized tools support them; moreover, their validation and evaluation will be straightforward. First, UML is over-demanded, used, and counted among the first standard modeling languages, which make it possible to produce good documentation on its various diagrams, and several use cases were provided, which saves new users time and effort when deploying it. Second, it can be exploited by commercial tools, as long as it is a standard technology. More generally, the documentation provided with a standard languages facilitates user comprehension and handling, even if they are not an experienced designer. Third, UML provides a set of packages that decompose the design of an ETL process into simple sub-processes (i.e., different logical units), thus facilitating the creation of the ETL model and, subsequently, the maintenance of the ETL process, regardless of its degree of complexity. However, despite the efforts conducted in [ 22 ], in terms of proposing an extension mechanism that allows the UML to model the transformations of the ETL at the low “attribute” level, according to other authors [ 6 , 34 , 39 ], this gap still presents a constraint to them. They considered that modeling based on the UML at the attribute level will lead to overly complicated models, unlike if we use conceptual constructs to conceptually model the elements involved in the ETL process, as mentioned in [ 77 , 89 ].
  • Several researchers favored the use of ontologies for data warehousing modeling, for various reasons: First, they can identify the schema of the data source and DW, enrich the metadata, and interchange these metadata among repositories [ 35 , 106 ]. Therefore, the supporting data classification, visual representation, and documentation are good. Second, according to [ 49 , 107 ], the use of an ontology is the best method for capturing the domain model’s semantics and resolving the semantic problems of both heterogeneity and interoperability. Third, by using an ontology, it is possible to define how two concepts of an ontology are structurally related, the type of relationship they have, and whether the relationship is symmetric, reflexive, or transitive. This is the way in which [ 45 ] defined the semantic integration of disparate data sources. Forth, they provide an explicit and formal representation, with well-defined semantics that allow for automated reasoning on metadata, including inference rules to derive new information from the available data [ 18 , 44 , 45 ]. Nevertheless, among the limits of semantic modeling, resolution of the heterogeneity of data sources, particularly semantic resolution, and mapping between these sources are very complex tasks. Furthermore, based on the OWL language, the ETL model can be redefined and reused during different stages of a DW design; however, this solution applies only to relational databases and does not support the semi-structured and unstructured data that the DW can receive. Indeed, according to [ 108 ], “In separated operational data sources, the syntax and semantics of data sources are extremely heterogeneous. In the ETL process, to establish a relationship between semantically similar data, the mapping between these sources can hardly be fully resolved by fixed met0amodels or frameworks”.
  • As for CWM, from the literature, Simitsis [ 109 ] deduced that “There does not exist common model for the metadata of ETL processes and CWM is not sufficient for this purpose, and it is too complicated for real-world applications”. In addition, according to [ 49 ], the CWM is more appropriate for resolving schema conflicts than the underlying semantics of the domain being modeled, which leads us to deduce that this standard should always be coupled with other methods focusing on semantic integration, such as ontologies, as proposed in [ 49 ].
  • According to [ 62 ], MDA models can represent systems at any level of abstraction or from different points of view, ranging from enterprise architectures to technological implementations. Further, from one PIM, one or more PSM can be derived by applying appropriate transformations. Therefore, the advantages of separating business logic and technology in the MDA by providing different layers (e.g., CIM, PIM, PSM, and code) lead toward interoperable, reusable, and portable software components and data models based on standard models [ 61 ]. In this context, from the comparison in Table 4 , we noted that the studied works based on the MDA tended to model the three levels: conceptual, logical, and physical. Moreover, as previously mentioned, all contributions met the “QVT” criteria to ensure the transformations between the different MDA layers. Finally, the primary strength of MDA-based methods is the automated transformation of models to implementations through the use of model-to-text (M2T) transformations, which automatically generate code from models. This automatic code generation seems simple overall, but relying on reliable patterns and referring to rich and constantly updated libraries is necessary. Moreover, according to [ 110 ], this task is comparable to manual development of the ETL procedure.
  • The BMPN is advantageous, thanks to the clarity and simplicity of notations for process representation and its powerful expressiveness, based on the use of a palette of conceptual tools to express business processes in an intuitive language. In addition to its description of the characterizations of ETL activities, it can express data and control objects, which are indispensable for the synchronization of the transformation flows. Moreover, the BPMN can be used to create a platform-independent conceptual model of an ETL workflow. We found works coupling BPMN and MDA or MDD for data warehouse modeling, such as the proposed framework of [ 25 ], which was summarized in Section 3.5.1 . Furthermore, BPMN is a formalism that relies on business requirements to model the ETL at a conceptual level. Finally, enterprise processes based on BPMN are designed uniformly, making communication between them easy.
  • The use of patterns is also interesting. Indeed, [ 28 ] mentioned, in their work, that the use of ETL patterns in workflow systems contexts provides a way to specify and share communication protocols, increases the data interchange across systems, and allows for the integration of new ETL patterns. Hence, they can be used and reused according to the needs of a practical application scenario [ 28 ], consequently reducing potential design errors and both simplifying and alleviating the task of implementing ETL systems.
  • Big Data characteristics—In our case, we were dealing with data from Twitter and other websites, allowing for tracking of the evolution of the COVID-19 pandemic and vaccination campaigns; therefore, we were dealing with massive volumes of data from different sources (massive volume, variability).
  • The type of data gathered—We collected CSV files and, hence, the data type was structured.

5. Conclusions

Author contributions, institutional review board statement, informed consent statement, data availability statement, conflicts of interest.

  • Inmon, W.H. Building the Data Warehouse , 1st ed.; John Wiley & Sons. Inc.: Hoboken, NJ, USA, 1996. [ Google Scholar ]
  • Vassiliadis, P. Data Warehouse Modeling And Quality Issues ; National Technical University of Athens Zographou: Athens, Greece, 2000. [ Google Scholar ]
  • Inmon, W.H. Building the Data Warehouse , 3rd ed.; Wiley: New York, NY, USA, 2002. [ Google Scholar ]
  • Kakish, K.; Kraft, T.A. ETL evolution for real-time data warehousing. In Proceedings of the Conference on Information Systems Applied Research, New Orleans, LA, USA, 1–4 November 2012; Volume 2167, p. 1508. [ Google Scholar ]
  • Kimball, R.; Reeves, L.; Ross, M.; Thornthwaite, W. The Data Warehouse Lifecycle Toolkit: Expert Methods for Designing, Developing, and Deploying Data Warehouses ; Wiley: New York, NY, USA, 1998. [ Google Scholar ]
  • Trujillo, J.; Luján-Mora, S. A UML based approach for modeling ETL processes in data warehouses. In Proceedings of the International Conference on Conceptual Modeling, Chicago, IL, USA, 13–16 October 2003; Springer: Berlin/Heidelberg, Germany, 2003; pp. 307–320. [ Google Scholar ]
  • Singh, J. ETL methodologies, limitations and framework for the selection and development of an ETL tool. Int. J. Res. Eng. Appl. Sci. 2016 , 6 , 108–112. [ Google Scholar ]
  • Muñoz, L.; Mazón, J.N.; Trujillo, J. Systematic review and comparison of modeling ETL processes in data warehouse. In Proceedings of the 5th Iberian Conference on Information Systems and Technologies, Santiago de Compostela, Spain, 16–19 June 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 1–6. [ Google Scholar ]
  • Laney, D. 3D Data Management: Controlling Data Volume, Velocity and Variety ; Lakshen, G.A., Ed.; Meta Group: Menlo Park, CA, USA, 2001; pp. 1–4. [ Google Scholar ]
  • Jo, J.; Lee, K.W. MapReduce-based D_ELT framework to address the challenges of geospatial Big Data. ISPRS Int. J. Geo-Inf. 2019 , 8 , 475. [ Google Scholar ] [ CrossRef ]
  • Cottur, K.; Gadad, V. Design and Development of Data Pipelines. Int. Res. J. Eng. Technol. (IRJET) 2020 , 7 , 2715–2718. [ Google Scholar ]
  • Fang, H. Managing data lakes in Big Data era: What’s a data lake and why has it became popular in data management ecosystem. In Proceedings of the 2015 IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent Systems (CYBER), Shenyang, China, 8–12 June 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 820–824. [ Google Scholar ]
  • Demarest, M. The Politics of Data Warehousing. June 1997, Volume 6, p. 1998. Available online: http://www.hevanet.com/demarest/marc/dwpol.html (accessed on 29 April 2022).
  • March, S.T.; Hevner, A.R. Integrated decision support systems: A data warehousing perspective. Decis. Support Syst. 2007 , 43 , 1031–1043. [ Google Scholar ] [ CrossRef ]
  • Solomon, M.D. Ensuring A Successful Data Warehouse Initiative. Inf. Syst. Manag. 2005 , 22 , 26–36. [ Google Scholar ] [ CrossRef ]
  • Muñoz, L.; Mazon, J.N.; Trujillo, J. ETL process Modeling Conceptual for Data Warehouses: A Systematic Mapping Study. IEEE Lat. Am. Trans. 2011 , 9 , 358–363. [ Google Scholar ] [ CrossRef ]
  • Oliveira, B.; Belo, O. Approaching ETL processes Specification Using a Pattern-Based ontology. In Data Management Technologies and Applications ; Francalanci, C., Helfert, M., Eds.; Series Title: Communications in Computer and Information Science; Springer International Publishing: Cham, Switzerland, 2017; Volume 737, pp. 65–78. [ Google Scholar ] [ CrossRef ]
  • Ali, S.M.F.; Wrembel, R. From conceptual design to performance optimization of ETL workflows: Current state of research and open problems. VLDB J. 2017 , 26 , 777–801. [ Google Scholar ] [ CrossRef ]
  • Jindal, R.; Taneja, S. Comparative study of data warehouse design approaches: A survey. Int. J. Database Manag. Syst. 2012 , 4 , 33. [ Google Scholar ] [ CrossRef ]
  • Nabli, A.; Bouaziz, S.; Yangui, R.; Gargouri, F. Two-ETL Phases for Data Warehouse Creation: Design and Implementation. In Advances in Databases and Information Systems ; Tadeusz, M., Valduriez, P., Bellatreche, L., Eds.; Series Title: Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2015; Volume 9282, pp. 138–150. [ Google Scholar ] [ CrossRef ]
  • Chandra, P.; Gupta, M. Comprehensive survey on data warehousing research. Int. J. Inf. Technol. 2017 , 10 , 217–224. [ Google Scholar ] [ CrossRef ]
  • Luján-Mora, S.; Vassiliadis, P.; Trujillo, J. Data mapping diagrams for data warehouse design with UML. In Proceedings of the International Conference on Conceptual Modeling, Shangai, China, 8–12 November 2004; Springer: Berlin/Heidelberg, Germany, 2004; pp. 191–204. [ Google Scholar ]
  • Bellatreche, L.; Khouri, S.; Berkani, N. Semantic Data Warehouse Design: From ETL to Deployment à la Carte. In Database Systems for Advanced Applications ; Hutchison, D., Kanade, T., Kittler, J., Kleinberg, J.M., Mattern, F., Mitchell, J.C., Naor, M., Nierstrasz, O., Pandu Rangan, C., Steffen, B., et al., Eds.; Series Title: Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2013; Volume 7826, pp. 64–83. [ Google Scholar ] [ CrossRef ]
  • Mazón, J.N.; Trujillo, J. An MDA approach for the development of data warehouses. Decis. Support Syst. 2008 , 45 , 41–58. [ Google Scholar ] [ CrossRef ]
  • El Akkaoui, Z.; Zimányi, E.; Mazón, J.N.; Trujillo, J. A BPMN-Based Design and Maintenance Framework for ETL processes. Int. J. Data Warehous. Min. 2013 , 9 , 46–72. [ Google Scholar ] [ CrossRef ]
  • Oliveira, B.; Belo, O. From ETL Conceptual Design to ETL Physical Sketching using Patterns. In Proceedings of the 20th International Conference on Enterprise Information Systems, Madeira, Portugal, 21–24 March 2018; pp. 262–269. [ Google Scholar ] [ CrossRef ]
  • Silva, D.; Fernandes, J.M.; Belo, O. Assisting data warehousing populating processes design through modelling using coloured petri nets. In Proceedings of the 3rd Industrial Conference on Simulation and Modeling Methodologies, Technologies and Applications, Reykjavik, Iceland, 29–31 July 2013. [ Google Scholar ]
  • Belo, O.; Cuzzocrea, A.; Oliveira, B. Modeling and supporting ETL processes via a pattern-oriented, task-reusable framework. In Proceedings of the 2014 IEEE 26th International Conference on Tools with Artificial Intelligence, Limassol, Cyprus, 10–12 November 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 960–966. [ Google Scholar ]
  • Dupor, S.; Jovanovic, V. An approach to conceptual modelling of ETL processes. In Proceedings of the 2014 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, Croatia, 26–30 May 2014; IEEE: Opatija, Croatia, 2014; pp. 1485–1490. [ Google Scholar ] [ CrossRef ]
  • Bala, M.; Boussaid, O.; Alimazighi, Z. A Fine-Grained Distribution Approach for ETL processes in Big Data Environments. Data Knowl. Eng. 2017 , 111 , 114–136. [ Google Scholar ] [ CrossRef ]
  • Li, Z.; Sun, J.; Yu, H.; Zhang, J. CommonCube-based Conceptual Modeling of ETL processes. In Proceedings of the 2005 International Conference on Control and Automation, Budapest, Hungary, 26–29 June 2005; IEEE: Budapest, Hungary, 2005; Volume 1, pp. 131–136. [ Google Scholar ] [ CrossRef ]
  • El-Sappagh, S.H.A.; Hendawi, A.M.A.; El Bastawissy, A.H. A proposed model for data warehouse ETL processes. J. King Saud Univ. Comput. Inf. Sci. 2011 , 23 , 91–104. [ Google Scholar ] [ CrossRef ]
  • Muñoz, L.; Mazón, J.N.; Pardillo, J.; Trujillo, J. Modelling ETL processes of data warehouses with UML activity diagrams. In Proceedings of the OTM Confederated International Conferences “On the Move to Meaningful Internet Systems”, Monterrey, Mexico, 9–14 November 2008; Springer: Berlin/Heidelberg, Germany, 2008; pp. 44–53. [ Google Scholar ]
  • Mallek, H.; Walha, A.; Ghozzi, F.; Gargouri, F. ETL-web process modeling. In Proceedings of the ASD Advances on Decisional Systems Conference, Hammamet, Tunisia, 29–31 May 2014. [ Google Scholar ]
  • Biswas, N.; Chattopadhyay, S.; Mahapatra, G.; Chatterjee, S.; Mondal, K.C. SysML Based Conceptual ETL process Modeling. In Computational Intelligence, Communications, and Business Analytics ; Mandal, J.K., Dutta, P., Mukhopadhyay, S., Eds.; Series Title: Communications in Computer and Information Science; Springer: Singapore, 2017; Volume 776, pp. 242–255. [ Google Scholar ] [ CrossRef ]
  • Ambler, S. A UML Profile for Data Modeling. 2002. Available online: http://www.agiledata.org/essays/umlDataModelingProfile.html (accessed on 29 April 2022).
  • Naiburg, E.; Naiburg, E.J.; Maksimchuck, R.A. UML for Database Design ; Addison-Wesley Professional: Boston, MA, USA, 2001. [ Google Scholar ]
  • Rational Rose 2000e: Rose Extensibility User’s Guide ; Rational Software Corporation: San Jose, CA, USA, 2000.
  • Muñoz, L.; Mazón, J.N.; Trujillo, J. Automatic generation of ETL processes from conceptual models. In Proceedings of the ACM Twelfth International Workshop on Data Warehousing and OLAP—DOLAP ’09, Hong Kong, China, 6 November 2009; p. 33. [ Google Scholar ] [ CrossRef ]
  • Biswas, N.; Chattapadhyay, S.; Mahapatra, G.; Chatterjee, S.; Mondal, K.C. A New Approach for Conceptual extraction-transformation-loading process Modeling. Int. J. Ambient Comput. Intell. 2019 , 10 , 30–45. [ Google Scholar ] [ CrossRef ]
  • Guarino, N. Formal ontology in Information Systems. In Proceedings of the First International Conference (FOIS’98), Trento, Italy, 6–8 June 1998; IOS Press: Amsterdam, The Netherlands, 1998. [ Google Scholar ]
  • Skoutas, D.; Simitsis, A.; Sellis, T. ontology-Driven Conceptual Design of ETL processes Using Graph transformations. In Journal on Data Semantics XIII ; Hutchison, D., Kanade, T., Kittler, J., Kleinberg, J.M., Mattern, F., Mitchell, J.C., Naor, M., Nierstrasz, O., Pandu Rangan, C., Steffen, B., et al., Eds.; Series Title: Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2009; Volume 5530, pp. 120–146. [ Google Scholar ] [ CrossRef ]
  • Jovanovic, P.; Romero, O.; Simitsis, A.; Abelló, A. Requirement-Driven Creation and Deployment of Multidimensional and ETL Designs. In Advances in Conceptual Modeling ; Hutchison, D., Kanade, T., Kittler, J., Kleinberg, J.M., Mattern, F., Mitchell, J.C., Naor, M., Nierstrasz, O., Pandu Rangan, C., Steffen, B., et al., Eds.; Series Title: Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2012; Volume 7518, pp. 391–395. [ Google Scholar ] [ CrossRef ]
  • Skoutas, D.; Simitsis, A. Designing ETL processes using semantic web technologies. In Proceedings of the 9th ACM International Workshop on Data Warehousing and OLAP—DOLAP ’06, Atlanta, GA, USA, 17–21 October 2022; ACM Press: Arlington, VR, USA, 2006; p. 67. [ Google Scholar ] [ CrossRef ]
  • Deb Nath, R.P.; Hose, K.; Pedersen, T.B. Towards a programmable semantic extract-transform-load framework for semantic data warehouses. In Proceedings of the ACM Eighteenth International Workshop on Data Warehousing and OLAP, Atlanta, GA, USA, 17–21 October 2022; pp. 15–24. [ Google Scholar ]
  • Skoutas, D.; Simitsis, A. ontology-Based Conceptual Design of ETL processes for Both Structured and Semi-Structured Data. Int. J. Semant. Web Inf. Syst. 2007 , 3 , 1–24. [ Google Scholar ] [ CrossRef ]
  • Hoang, A.D.T.; Nguyen, B.T. An Integrated Use of CWM and Ontological Modeling Approaches towards ETL processes. In Proceedings of the 2008 IEEE International Conference on e-Business Engineering, Xi’an, China, 22–24 October 2008; IEEE: Xi’an, China, 2008; pp. 715–720. [ Google Scholar ] [ CrossRef ]
  • Oliveira, B.; Belo, O. An ontology for Describing ETL Patterns Behavior. In Proceedings of the 5th International Conference on Data Management Technologies and Applications, Lisbon, Portugal, 24–26 July 2016; pp. 102–109. [ Google Scholar ] [ CrossRef ]
  • Thi, A.D.H.; Nguyen, B.T. A Semantic approach towards CWM-based ETL processes. Proc. I-SEMANTICS 2008 , 8 , 58–66. [ Google Scholar ]
  • TPC-H Homepage. Available online: http://www.tpc.org/tpch/ (accessed on 10 April 2022).
  • Chang, D.D.T. Common Warehouse Metamodel (CWM), UML and XML. In Proceedings of the Meta Data Conference, 19–23 March 2000; p. 56. Available online: https://cwmforum.org/cwm.pdf (accessed on 27 July 2022).
  • Ontology Definition Metamodel ; OMG Object Management Group: Needham, MA, USA, 2014; p. 362.
  • Romero, O.; Abelló, A. A framework for multidimensional design of data warehouses from ontologies. Data Knowl. Eng. 2010 , 69 , 1138–1157. [ Google Scholar ] [ CrossRef ]
  • Romero, O.; Simitsis, A.; Abelló, A. GEM: Requirement-driven generation of ETL and multidimensional conceptual designs. In Proceedings of the International Conference on Data Warehousing and Knowledge Discovery, Toulouse, France, 29 August–2 September 2011; Springer: Berlin/Heidelberg, Germany, 2011; pp. 80–95. [ Google Scholar ]
  • TPC-DS Homepage. Available online: https://www.tpc.org/tpcds/ (accessed on 10 April 2022).
  • Khouri, S.; El Saraj, L.; Bellatreche, L.; Espinasse, B.; Berkani, N.; Rodier, S.; Libourel, T. CiDHouse: Contextual SemantIc Data WareHouses. In Database and Expert Systems Applications ; Decker, H., Lhotská, L., Link, S., Basl, J., Tjoa, A.M., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2013; pp. 458–465. [ Google Scholar ] [ CrossRef ]
  • Lehigh University Benchmark (LUBM). Available online: http://swat.cse.lehigh.edu/projects/lubm/ (accessed on 10 April 2022).
  • Deb Nath, R.P.; Hose, K.; Pedersen, T.B.; Romero, O. SETL: A programmable semantic extract-transform-load framework for semantic data warehouses. Inf. Syst. 2017 , 68 , 17–43. [ Google Scholar ] [ CrossRef ]
  • Mena, E.; Kashyap, V.; Illarramendi, A.; Sheth, A. Domain specific ontologies for semantic information brokering on the global information infrastructure. In Formal Ontology in Information Systems ; IOS Press: Amsterdam, The Netherlands, 1998; Volume 46, pp. 269–283. [ Google Scholar ]
  • Wache, H.; Voegele, T.; Visser, U.; Stuckenschmidt, H.; Schuster, G.; Neumann, H.; Hübner, S. Ontology-based integration of information-a survey of existing approaches. In Proceedings of the IJCAI-01 Workshop: Ontologies and Information Sharing, Seattle, WA, USA, 4–6 August 2001. [ Google Scholar ]
  • Miller, J.; Mukerji, J. MDA Guide Version 1.0.1 ; OMG: Needham, MA, USA, 2003; p. 62. [ Google Scholar ]
  • MDA Specifications|Object Management Group. 2014. Available online: https://www.omg.org/mda/specs.htm (accessed on 10 April 2022).
  • Gardner, T.; Griffin, C.; Koehler, J.; Hauser, R. A review of OMG MOF 2.0 Query/Views/transformations Submissions and Recommendations towards the final Standard. In Proceedings of the MetaModelling for MDA Workshop, York, UK, 24–25 November 2003; Citeseer: Princeton, NJ, USA, 2003; Volume 13, p. 41. [ Google Scholar ]
  • Mazon, J.N.; Trujillo, J.; Serrano, M.; Piattini, M. Applying MDA to the development of data warehouses. In Proceedings of the 8th ACM international workshop on Data warehousing and OLAP—DOLAP, Bremen, Germany, 31 October 31–5 November 2005; p. 57. [ Google Scholar ] [ CrossRef ]
  • Maté, A.; Trujillo, J. A trace metamodel proposal based on the model driven architecture framework for the traceability of user requirements in data warehouses. Inf. Syst. 2012 , 37 , 753–766. [ Google Scholar ] [ CrossRef ]
  • Maté, A.; Trujillo, J. Tracing conceptual models’ evolution in data warehouses by using the model driven architecture. Comput. Stand. Interfaces 2014 , 36 , 831–843. [ Google Scholar ] [ CrossRef ]
  • Didonet, M.; Fabro, D.; Bézivin, J.; Valduriez, P. Weaving Models with the Eclipse AMW plugin. In Proceedings of the Eclipse Modeling Symposium, Eclipse Summit Europe, Esslingen, Germany, 11–12 October 2006. [ Google Scholar ]
  • Mazón, J.N.; Trujillo, J.; Serrano, M.; Piattini, M. Designing data warehouses: From business requirement analysis to multidimensional modeling. In Proceedings of the International Workshop on Requirements Engineering for Business. Need and IT Alignment (REBNITA 2005), Paris, France, 29–30 August 2005; University of New South Wales Press: Kensington, Australia, 2005; Volume 5, pp. 44–53. [ Google Scholar ]
  • Jouault, F.; Kurtev, I. Transforming models with ATL. In Proceedings of the Satellite Events at the MoDELS 2005 Conference, Montego Bay, Jamaica, 2–7 October 2005; Springer: Berlin/Heidelberg, Germany, 2006; Volume 43, p. 45. [ Google Scholar ]
  • El Akkaoui, Z.; Zimanyi, E. Defining ETL worfklows using BPMN and BPEL. In Proceedings of the ACM twelfth international workshop on Data warehousing and OLAP—DOLAP ’09, Hong Kong, China, 6 November 2009; p. 41. [ Google Scholar ] [ CrossRef ]
  • Akkaoui, Z.E.; Mazón, J.N.; Vaisman, A.; Zimányi, E. BPMN-based conceptual modeling of ETL processes. In Proceedings of the International Conference on Data Warehousing and Knowledge Discovery, Vienna, Austria, 3–6 September 2012; Springer: Berlin/Heidelberg, Germany, 2012; pp. 1–14. [ Google Scholar ]
  • El Akkaoui, Z.; Vaisman, A.; Zimányi, E. A Quality-based ETL Design Evaluation Framework. In Proceedings of the 21st International Conference on Enterprise Information Systems, Heraklion, Crete, Greece, 3–5 May 2019; pp. 249–257. [ Google Scholar ] [ CrossRef ]
  • Wilkinson, K.; Simitsis, A.; Castellanos, M.; Dayal, U. Leveraging business process models for ETL design. In Proceedings of the International Conference on Conceptual Modeling, Vancouver, BC, Canada, 1–4 November 2010; Springer: Berlin/Heidelberg, Germany, 2010; pp. 15–30. [ Google Scholar ]
  • Jensen, K.; Kristensen, L.M. Coloured Petri Nets: Modelling and Validation of Concurrent Systems ; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2009. [ Google Scholar ]
  • Pan, B.; Zhang, G.; Qin, X. Design and realization of an ETL method in business intelligence project. In Proceedings of the 2018 IEEE 3rd International Conference on Cloud Computing and Big Data Analysis (ICCCBDA), Chengdu, China, 20–22 April 2018; pp. 275–279. [ Google Scholar ] [ CrossRef ]
  • Vassiliadis, P.; Simitsis, A.; Skiadopoulos, S. Conceptual modeling for ETL processes. In Proceedings of the 5th ACM international workshop on Data Warehousing and OLAP—DOLAP ’02, McLean, VR, USA, 8 November 2002; pp. 14–21. [ Google Scholar ] [ CrossRef ]
  • Vassiliadis, P.; Simitsis, A.; Skiadopoulos, S. Modeling ETL activities as graphs. In Proceedings of the Design and Management of Data Warehouses, Toronto, ON, Canada, 27 May 2002; Volume 58, pp. 52–61. [ Google Scholar ]
  • Vassiliadis, P.; Simitsis, A.; Georgantas, P.; Terrovitis, M. A Framework for the Design of ETL Scenarios. In Proceedings of the International Conference on Advanced Information Systems Engineering, Klagenfurt/Velden, Austria, 16–20 June 2003; Springer: Berlin/Heidelberg, Germany, 2003; pp. 520–535. [ Google Scholar ]
  • Vassiliadis, P.; Vagena, Z.; Skiadopoulos, S.; Karayannidis, N.; Sellis, T. Arktos: Towards the modeling, design, control and execution of ETL processes. Inf. Syst. 2001 , 26 , 537–561. [ Google Scholar ] [ CrossRef ]
  • Simitsis, A.; Vassiliadis, P. A Methodology for the Conceptual Modeling of ETL processes. In Proceedings of the Conference on Advanced Information Systems Engineering (CAiSE), Klagenfurt/Velden, Austria, 16–20 June 2003; p. 12. [ Google Scholar ]
  • Bala, M.; Alimazighi, Z. ETL-XDesign: Outil d’aide à la modélisation de processus ETL. In Proceedings of the 6éme édition des Avancées sur les Systèmes Décisionnels, Blida, Algeria, 1–3 April 2012; pp. 155–166. [ Google Scholar ] [ CrossRef ]
  • Bala, M.; Boussaid, O.; Alimazighi, Z. P-ETL : Parallel-ETL based on the MapReduce paradigm. In Proceedings of the IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA), Doha, Qatar, 10–14 November 2014; pp. 42–49. [ Google Scholar ] [ CrossRef ]
  • Bala, M.; Boussaid, O.; Alimazighi, Z. extracting-transforming-loading Modeling Approach for Big Data Analytics. Int. J. Decis. Support Syst. Technol. 2016 , 8 , 50–69. [ Google Scholar ] [ CrossRef ]
  • Bala, M.; Boussaid, O.; Alimazighi, Z. Big-ETL: Extracting transforming loading approach for Big Data. In Proceedings of the International Conference on Parallel and Distributed processing Techniques and Applications (PDPTA), Las Vegas, NV, USA, 27–30 July 2015; p. 462. [ Google Scholar ] [ CrossRef ]
  • Kabiri, A.; Chiadmi, D. KANTARA: A Framework to Reduce ETL Cost and Complexity. Int. J. Eng. Technol. (IJET) 2016 , 8 , 1280–1284. [ Google Scholar ]
  • Kabiri, A.; Wadjinny, F.; Chiadmi, D. Towards a Framework for Conceptual Modeling of ETL processes. In Innovative Computing Technology ; Pichappan, P., Ahmadi, H., Ariwa, E., Eds.; Series Title: Communications in Computer and Information Science; Springer: Berlin/Heidelberg, Germany, 2011; Volume 241, pp. 146–160. [ Google Scholar ] [ CrossRef ]
  • Kabiri, A.; Chiadmi, D. A method for modelling and organazing ETL processes. In Proceedings of the Second International Conference on the Innovative Computing Technology (INTECH 2012), Casablanca, Morocco, 18–20 September 2012; pp. 138–143. [ Google Scholar ] [ CrossRef ]
  • Boshra, A.H.E.B.M.; Hendawi, R.A.M. Entity mapping diagram for modeling ETL processes. In Proceedings of the Third International Conference on Informatics and Systems (INFOS), Giza, Egypt, 19–22 March 2005. [ Google Scholar ]
  • Hendawi, A.M.; Sappagh, S.H.A.E. EMD: Entity mapping diagram for automated extraction, transformation, and loading processes in data warehousing. Int. J. Intell. Inf. Database Syst. 2012 , 6 , 255. [ Google Scholar ] [ CrossRef ]
  • Jamra, H.A.; Gillet, A.; Savonnet, M.; Leclercq, E. Analyse des discours sur Twitter dans une situation de crise. In Proceedings of the INFormatique des ORganisations et des Systèmes d’Information et de Décision (INFORSID), Dijon, France, 2–4 June 2020; p. 16. [ Google Scholar ]
  • Basaille, I.; Kirgizov, S.; Leclercq, E.; Savonnet, M.; Cullot, N.; Grison, T.; Gavignet, E. Un observatoire pour la modélisation et l’analyse des réseaux multi-relationnels. Doc. Numérique 2017 , 20 , 101–135. [ Google Scholar ]
  • Moalla, I.; Nabli, A.; Hammami, M. Towards Opinions analysis method from social media for multidimensional analysis. In Proceedings of the 16th International Conference on Advances in Mobile Computing and Multimedia, Yogyakarta, Indonesia, 19–21 November 2018; pp. 8–14. [ Google Scholar ] [ CrossRef ]
  • Walha, A.; Ghozzi, F.; Gargouri, F. Design and Execution of ETL process to Build Topic Dimension from User-Generated Content. In Proceedings of the International Conference on Research Challenges in Information Science, Online, 11–14 May 2021; Springer: Berlin/Heidelberg, Germany, 2021; pp. 374–389. [ Google Scholar ]
  • Walha, A.; Ghozzi, F.; Gargouri, F. From user generated content to social data warehouse: Processes, operations and data modelling. Int. J. Web Eng. Technol. 2019 , 14 , 203. [ Google Scholar ] [ CrossRef ]
  • Bruchez, R. Les Bases de Données NoSQL et le BigData: Comprendre et Mettre en Oeuvre ; Editions Eyrolles: Paris, France, 2015. [ Google Scholar ]
  • Gallinucci, E.; Golfarelli, M.; Rizzi, S. Approximate OLAP of document-oriented databases: A variety-aware approach. Inf. Syst. 2019 , 85 , 114–130. [ Google Scholar ] [ CrossRef ]
  • Mallek, H.; Ghozzi, F.; Teste, O.; Gargouri, F. BigDimETL with NoSQL Database. Procedia Comput. Sci. 2018 , 126 , 798–807. [ Google Scholar ] [ CrossRef ]
  • Yangui, R.; Nabli, A.; Gargouri, F. ETL based framework for NoSQL warehousing. In Proceedings of the European, Mediterranean, and Middle Eastern Conference on Information Systems, Coimbra, Portugal, 7–8 September 2017; Springer: Berlin/Heidelberg, Germany, 2017; pp. 40–53. [ Google Scholar ]
  • Souibgui, M.; Atigui, F.; Yahia, S.B.; Si-Said Cherfi, S. Business intelligence and analytics: On-demand ETL over document stores. In Proceedings of the International Conference on Research Challenges in Information Science, Limassol, Cyprus, 23–25 September 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 556–561. [ Google Scholar ]
  • Salinas, S.O.; Nieto Lemus, A.C. Data Warehouse and Big Data Integration. Int. J. Comput. Sci. Inf. Technol. 2017 , 9 , 1–17. [ Google Scholar ] [ CrossRef ]
  • Munshi, A.A.; Mohamed, Y.A.R.I. Data lake lambda architecture for smart grids Big Data analytics. IEEE Access 2018 , 6 , 40463–40471. [ Google Scholar ] [ CrossRef ]
  • Pal, G.; Li, G.; Atkinson, K. Multi-Agent Big-Data Lambda Architecture Model for E-Commerce Analytics. Data 2018 , 3 , 58. [ Google Scholar ] [ CrossRef ]
  • Antoniu, G.; Costan, A.; Pérez, M.; Stojanovic, N. The Sigma Data processing Architecture. In Proceedings of the Leveraging Future Data for Extreme-Scale Data Analytics to Enable High-Precision Decisions, Big Data and Extreme Scale Computing 2nd Series, (BDEC2), Bloomington, IN, USA, 28–30 November 2018. [ Google Scholar ]
  • Gillet, A.; Leclercq, E.; Cullot, N. Evolution et formalisation de la Lambda Architecture pour des analyses a hautes performances-Application aux donnees de Twitter. Rev. Ouvert. De L’Ingenierie Des Syst. D’Information (ROISI) 2021 , 2 , 26. [ Google Scholar ] [ CrossRef ]
  • Warren, J.; Marz, N. Big Data: Principles and Best Practices of Scalable Realtime Data Systems ; Simon and Schuster: New York, NY, USA, 2015. [ Google Scholar ]
  • Pardillo, J.; Mazon, J.N. Using Ontologies for the Design of Data Warehouses. Int. J. Database Manag. Syst. 2011 , 3 , 73–87. [ Google Scholar ] [ CrossRef ]
  • Ta’a, A.; Abdullah, M.S. ontology development for ETL process design. In Ontology-Based Applications for Enterprise Systems and Knowledge Management ; IGI Global: Pennsylvania, PA, USA, 2013; pp. 261–275. [ Google Scholar ]
  • Hofferer, P. Achieving business process model interoperability using metamodels and ontologies. In Proceedings of the ECIS 2007, St. Gallen, Switzerland, 7–9 June 2007. [ Google Scholar ]
  • Simitsis, A. Modeling and Optimization of Extraction-Transformation-Loading (ETL) Processes in Data Warehouse Environments. Ph.D. Thesis, National Technical University of Athens, Athens, Greece, 2004. [ Google Scholar ]
  • Samoylov, A.; Tselykh, A.; Sergeev, N.; Kucherova, M. Review and analysis of means and methods for automatic data extraction from heterogeneous sources. In Proceedings of the IV International Research Conference “Information Technologies in Science, Management, Social Sphere and Medicine” (ITSMSSM), Tomsk, Russia, 5–8 December 2017. [ Google Scholar ] [ CrossRef ]
  • Dhaouadi, A.; Bousselmi, K.; Monnet, S.; Gammoudi, M.M.; Hammoudi, S. A Multi-layer Modeling for the Generation of New Architectures for Big Data Warehousing. In Proceedings of the International Conference on Advanced Information Networking and Applications, Sydney, Australia, 13–15 April 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 204–218. [ Google Scholar ]

Click here to enlarge figure

CriteriaValueDefinitionRelevance
Standard formalism A formalism can be a tool or a framework already tested and validated by the domain community.To allow avoiding the multiplicity of formalisms and proprietary notations, reducing misunderstanding, and facilitating interoperability.
Graphical notations/symbols Graphical shapes and notations often grouped in palettes and used to model ETL activities, objects, and data sources.Relevant for communication, as they are more readable and understandable by human beings.
Modeling levelConceptual, logical, physicalConceptual, logical, and physical data models are the three levels of data modeling.Taking these three models jointly into account can ensure a consistent DW process.
Modeled phaseExtract, transform, loadExtract, transform, load are the three main phases of the DW process.Each phase represents a central stage in the process of designing a DW.
transformation levelAttribute, entityThe level at which the ETL transformation activities are effected. They can be at the entity level or at a lower level (attribute).Shows how much the approach focuses on the detail of the modeling.
Data source storage schema Illustrates the details of the data source structures involved in the data warehousing process.The data source schemas must be well-defined, in order to ensure their integration into the DW.
DW data storage schema Defines the physical storage of the DW, depending on the target platform (e.g., relational, MD, OO).The schema of the data target must be well-defined, in order to facilitate the mapping task with the source schema.
Mapping (schema/diagram) A schema translation is used to map the source schema to the DW schema. It can be of diagram or schema form.This inter-schema mapping allows us to understand the transition steps between the source and the target, and  visually summarizes the mapping.
ETL meta-model The process meta-model defines generic entities involved in all DW processes.Enables managing extensibility at the meta-layer and adaptability at the model layer for a specific DW.
Prototype/modeling tool A framework or tool is provided to implement the proposed model.Shows the feasibility of the proposed model.
Integrated approach The proposed ETL model is integrated into a global approach for the design of a DW.Provides a consolidated view of the integration of the model into an end-to-end data warehousing process.
Rules/techniques/ algorithms of transformations The means used to ensure the transition between the different levels of modeling: from conceptual to logical and from logical to physical.Enriches the proposed model by means of inter-level transformation. Provides detailed insight into the technique deployed for transitions.
ETL activities described The ETL activities described by the model.Provides insight into the activities supported by the model and the application of the proposed approach.
Data typeStructured, semi-structured, unstructuredThe type of data supported by the model.Provides an idea regarding the complexity of data processing that the model will have to support.
Mapping/ transformation techniqueManual, semiautomatic, automaticA mapping technique is a process for creating a link between two distinct data models (source and target). It can be manual, semiautomatic, or automatic.It is important in the logical process modeling phase in order to ensure a consistent DW process.
Entity relationship The relationships between the different entities presented in the data source storage schema and DW storage schema.Highlights the different relationships, thus reinforcing understanding.
Approach validation Propose validation of the proposed approach through a detailed case study and the proposal of a prototype or a framework.Ensures the feasibility and implementation of the model in a concrete scenario.
Approach evaluation (Benchmark) Conduct an experimental evaluation after the validation of the approach by use of a concrete use-case; for example, in order to check some performance parameters. In our context, the evaluation can be carried out through a benchmark.Allows for recognition of the effort made by the researchers to verify the deployment of the proposed model and its features.
Interoperability The interaction of the process model with the physical layer.Provides insight into the deployment of the model.
Extensibility Projects an idea about the capabilities of the model to support new features, such as adding new ETL tasks to be performed or changing data types.Allows us to determine the scalability of the proposed model.
Explicit definition of transformation Explicitly detail the tasks performed in the “transform” phase of an ETL process.Facilitates understanding and implementation of the model.
Layered architecture/workflow The model is composed of several layers, from which we can instantiate a multi-layer architecture. Each layer presents a level of modeling or a step of the process for the case of a workflow.Allows us to identify the different layers and steps in terms of the modeling levels: conceptual, logical, and physical. Workflows allow for the orchestration of tasks and modularization of the data warehousing process model.
Workflow management Describes the workflow management.Facilitates understanding of the workflow, as well as its inputs and outputs.
GUI support The proposed model supports a graphical user interface.Displays that the model is exploitable.
ETL process requirement Describes the specifications required by the ETL process for model design and implementation.Provides an idea about the required environment and the functional requirements of the process.
Comprehensive tracking and documentation A detailed description of all the tasks and steps supported by the ETL process, in addition to rich documentation.Facilitates its familiarity, comprehension, and the deployment step.
Approach[ ][ ][ ][ ][ ][ ][ ]
Criteria
Standard formalismXXX XXX
Graphical notationsX X
Modeling levelConceptualXXXXXXX
Logical XXXX
Physical XX
Modeled phaseExtract XXX
TransformXXXXX
LoadX X X
Transformation levelAttribute X
EntityX XXX
Data source storage schemaXX
DW data storage schema X XX
Mapping (schema/diagram)XX X
Mapping techniqueManualXX
Semiautomatic
Automatic
ETL meta-model X
Prototype/modeling toolX X
Integrated approachXX
Rules/techniques/algorithms of transformations XX
Automatic transformation XX
ETL activities described103109322
Data typeStructuredXXXX XX
Semi-structured
Unstructured X
Entity relationshipXX
Approach validation XX X
Approach evaluation (benchmark)
InteroperabilityX XX
ExtensibilityX X X
Explicit definition of transformationXX
Layered architecture X X
Workflow management X X
GUI support X
ETL process requirement XX
Dynamic aspect X XX
Class diagramXX
Activity diagram XXXXX
Requirement diagram XX
Object diagram X
Approach[ , , ][ ][ ][ ][ ][ ][ , ]
Criteria
Standard formalism XX
Graphical notations/symbols X
Modeling levelConceptualXXX X
LogicalXXXXX X
PhysicalXXXXXXX
Modeled PphaseExtractX XXX
TransformXXX XXX
LoadX XXX
Transformation levelAttributeXXX
EntityXXXXXX
Data source storage schemaX X XX
DW data storage schemaX X XX
Mapping (schema/diagram)XXX
Mapping techniqueManual XX
SemiautomaticX X
Automatic XXX
ETL meta-model XX X
Prototype/modeling toolXXXXXX
Integrated approach XX XX
Rules/techniques/algorithm of transformationsX XX X
Automatic transformationX X X
ETL activities described1311NA10NA10
Automatic transformationX X X
Data typeStructuredXXXXX X
Semi-structuredX X X
Unstructured X
Entity relationshipX XXXXX
Approach validationX XXXX
Approach evaluation (benchmark)X XXX
InteroperabilityXX XXX
ExtensibilityXXXXXX
Explicit definition of transformationX X X
Layered architecture/workflow XXXXX
Workflow management X X
GUI supportX XX
ETL process requirement X
Comprehensive tracking and documentationX XXXX
ReusabilityXXX X X
Formally specificationXX X
Business requirement XXX
Ontology approachSingleX XX
Multiple
HybridXXXXX
OntologyApplicationX X
DomainX XXX
HeterogeneitySemanticXXXXXXX
StructuralX XXX
Approach[ , , ][ ][ , ]
Criteria
Standard formalismXXX
Graphical notations/symbolsX
Modeling levelConceptualXXX
LogicalXX
PhysicalXXX
Modeled phaseextract X
TransformXXX
Load
Transformation levelAttribute
EntityXXX
Data source storage schema X
DW data storage schemaX X
Mapping (schema/diagram)XXX
Mapping techniqueManual
Semiautomatic X
AutomaticXX
ETL meta-modelXX
Prototype/modeling tool XX
Integrated approach
Automatic transformationXXX
ETL activities describedNA92
Data typeStructuredXXX
Semi-structured
Unstructured
Entity relationship
Approach validationXXX
Approach evaluation (benchmark)
InteroperabilityXXX
ExtensibilityXXX
Explicit definition of transformationXXX
Layered architecture/workflowX X
Workflow management
GUI support XX
ETL process requirement
Comprehensive tracking and documentationXXX
ReusabilityXXX
Formally specificationXXX
Business requirementX X
ETL constraintsXX
MDA layersCIMX X
PIMXXX
QVTXXX
PSMXX
Code X
ApproachBPMNCPNYAWLD. Flow Visualization
Criteria [ , , ] [ ] [ ] [ ] [ ] [ ] [ ] [ ]
Standard formalismXXXX X
Graphical notations/symbolsXX X
Modeling levelConceptualXXXXXX
LogicalXX XX
PhysicalXXX
Modeled phaseExtractXXXXXXXX
transformXXXX XX
loadXXX XX
Transformation levelAttribute X
EntityXXXX XX
Data source storage schemaX X X
DW data storage schemaX X X
Mapping (schema/diagram)
Mapping techniqueManual XXX XX
Semiautomatic
AutomaticX
ETL meta-modelX X X
Prototype/modeling tool X X
Integrated approach
Rules/techniques/algorithm of transformations
Automatic transformation
ETL activities described16810NANA37NA
Data typeStructuredXXXXXXXX
Semi-structured
Unstructured
Entity relationshipX
Approach validation XXXX X
Approach evaluation (benchmark)
InteroperabilityXXXX X
ExtensibilityXXXX XXX
Explicit definition of transformationX X
Layered architecture/workflowXXXX
Workflow managementXXXX X
GUI support X X
ETL process requirementX
Comprehensive tracking and documentationX
ReusabilityXX X X
Formal specification X
Business requirementXX X
ETL constraintsX
ApproachConceptual ConstructsCommonCubeEMD
Criteria [ , ] [ , , ] [ ] [ ] [ ] [ ] [ ]
Standard formalism
Graphical notations/symbolsXXX XXX
Modeling levelConceptualXXXXXXX
LogicalX X
PhysicalXXX X
Modeled phaseExtractXXX XXX
TransformXXXXXXX
LoadXXX XXX
Transformation levelAttributeX XXXXX
Entity X XXX
Data source storage schema XXXX
DW data storage schema XXXXXX
Mapping (schema/diagram) X X
Mapping techniqueManual XXXX
SemiautomaticXXX
Automatic
ETL meta-modelXXX XX
Prototype/modeling toolXXX X
Integrated approachX X
Rules/techniques/algorithm of transformations
Automatic transformation
ETL activities described128127151515
Data typeStructuredXXXXXXX
Semi-structured XX
Unstructured
Entity relationshipX XXXXX
Approach validationXXX X
Approach evaluation (benchmark)X
InteroperabilityX X XXX
ExtensibilityXXX
Explicit definition of transformationXXX
Layered architecture/workflowXXX XX
Workflow managementXXX
GUI supportXXX XX
ETL process requirementX X
Comprehensive tracking and documentationX X
ReusabilityXXX
Formally specificationX X
Business requirement
ETL constraintsX X
Approach[ ][ ][ ][ ][ ][ ]
Criteria
Data typeStructuredX X X
Semi-structured XXXXX
Unstructured X
Mapping (schema/diagram) X
Mapping techniqueManual
SemiautomaticX X
Automatic X
Entity relationship X
Approach validationXXX X
Approach evaluation (benchmark)
InteroperabilityXXXX
ExtensibilityXXX
Explicit definition of transformationXX
Layered architecture/workflowXXXXXX
Workflow managementXXX
GUI supportXXX X
ETL process requirementX
Comprehensive tracking and documentationX X
ReusabilityXXX
Formally specification X
Business requirement X X
ETL constraints
Big DataMassive volumeXXX X
Velocity X
Variability X
Veracity XX X
DBRelationalXX
NoSQL XX X
ProcessingBatchXXX X
Stream X
MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

Dhaouadi, A.; Bousselmi, K.; Gammoudi, M.M.; Monnet, S.; Hammoudi, S. Data Warehousing Process Modeling from Classical Approaches to New Trends: Main Features and Comparisons. Data 2022 , 7 , 113. https://doi.org/10.3390/data7080113

Dhaouadi A, Bousselmi K, Gammoudi MM, Monnet S, Hammoudi S. Data Warehousing Process Modeling from Classical Approaches to New Trends: Main Features and Comparisons. Data . 2022; 7(8):113. https://doi.org/10.3390/data7080113

Dhaouadi, Asma, Khadija Bousselmi, Mohamed Mohsen Gammoudi, Sébastien Monnet, and Slimane Hammoudi. 2022. "Data Warehousing Process Modeling from Classical Approaches to New Trends: Main Features and Comparisons" Data 7, no. 8: 113. https://doi.org/10.3390/data7080113

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

Comprehensive survey on data warehousing research

  • Original Research
  • Published: 15 December 2017
  • Volume 10 , pages 217–224, ( 2018 )

Cite this article

latest research papers on data warehouse

  • Pravin Chandra 1 &
  • Manoj K. Gupta 2  

1464 Accesses

11 Citations

Explore all metrics

Data, information and knowledge have important role in various human activities because by processing data, information is extracted and by analyzing data and information the knowledge is extracted. The problem of storing, managing and analyzing the huge volumes of data, which is generated regularly by the various sources has been arisen which leads to the need of large data repositories, e.g. data warehouses. In view of the above, a considerable amount attention of research and industry has been attracted by the data warehousing (DW). Various issues and challenges in the field of data warehousing are presented in many studies during the recent years. In this paper, a comprehensive survey is presented to take a holistic view of the research trends in the fields of data warehousing. This paper presents a systematic division of work of researchers in the fields of data warehousing. Finally, current research issues and challenges in the area of data warehousing are summarized for future directions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

Similar content being viewed by others

latest research papers on data warehouse

A Review of Integration of Data Warehousing and WWW in the Last Decade

latest research papers on data warehouse

Enhancing Big Data Warehousing for Efficient, Integrated and Advanced Analytics

latest research papers on data warehouse

Survey of Big Data Warehousing Techniques

Explore related subjects.

  • Artificial Intelligence

Akal F, Böhm K, Schek HJ (2002) OLAP query evaluation in a database cluster: a performance study on intra-query parallelism. In: East-European conf. on advances in databases and information systems (ADBIS), Bratislava, Slovakia

Aleem S, Capretz LF, Ahmed F (2014) Security issues in data warehouse. In: Mastorakis NE, Musić J (eds) Recent advances in information technology. WSEAS Press, pp 15–20

Arora M, Gosain A (2011) Schema evolution for data warehouse: a survey. Int J Comput Appl 22(6):6–14

Google Scholar  

Arora RK, Gupta MK (2017) e-Governance using data warehousing and data mining. Int J Comput Appl 169(8):28–31

Astriani W, Trisminingsih R (2015) Extraction, transformation, and loading (ETL) module for hotspot spatial data warehouse using Geokettle. In: Procedia, environmental science, Elsevier, the 2nd international symposium on LAPAN-IPB satellite for food security and environmental monitoring 2015, LISAT-FSEM 2015

Chaudhary S, Murala DP, Srivastav VK (2011) A critical review of data warehouse. Glob J Bus Manag Inf Technol 1(2):95–103

Chaudhuri S, Dayal U (1997) An overview of data warehousing and OLAP technology. ACM SIGMOD Rec 26:517–526

Article   Google Scholar  

Codd EF, Codd SB, Salley CT (1993) Providing OLAP (On-line Analytical Processing) to user-analysts: an IT mandate (white paper)

Dehne F, Robillard D, Rau-Chaplin A, Burke N (2016) VOLAP: a scalable distributed system for real-time OLAP with high velocity data. In: 2016 IEEE international conference on cluster computing (CLUSTER). IEEE, pp 354–363

ElGamal N, El-Bastawissy A, Galal-Edeen GH (2016) An architecture-oriented data warehouse testing approach. In: COMAD, pp 24–34

Furtado P (2009) A survey on parallel and distributed data warehouses. Int J Data Warehouse Min 5(2):57–77

Geary N, Jarvis B, Mew C, Gore H, Precisionpoint Software Limited (2017) Method and apparatus for automatically creating a data warehouse and OLAP cube. US Patent 9,684,703

Golfarelli M, Rizzi S (2009) A comprehensive approach to data warehouse testing. In: ACM, DOLAP’09, Hong Kong, China, November 6, 2009

Golfarelli M, Rizzi S (2018) From star schemas to big data: 20+ years of data warehouse research. In: A comprehensive guide through the Italian database research over the last 25 years. Springer International Publishing, pp 93–107

Gosain A, Heena (2015) Literature review of data model quality metrics of data warehouse. In: Procedia, computer science, Elsevier, international conference on intelligent computing, communication and convergence (ICCC-2014)

Gupta A, Harinarayan V, Quass D (1995) Aggregate-query processing in data warehousing environment. In: Proc. 21st int. conf. very large data bases, pp 358–369, Zurich, Switzerland, Sept. 1995

Gupta SL, Mathur S, Schema P (2012) Data warehouse vulnerability and security. Int J Sci Eng Res 3(5):1–5

Haertzen D (2009) Testing the data warehouse. http://www.infogoal.com

Han J, Kamber M, Pei J (2012) Data mining concepts and techniques, 3rd edn. Elsevier

Hurtado CA, Gutierrez C, Mendelzon AO (2005) Capturing summarizability with integrity constraints in OLAP. ACM Trans Database Syst 30(3):854–886

Inmon WH (2005) Building the data warehouse, 5th edn. Wiley, New York

Jaiswal A (2014) Security measures for data warehouse. Int J Sci Eng Technol Res 3(6):1729–1733

Jindal R, Taneja S (2012) Comparative study of data warehouse design approaches: a survey. Int J Database Manag Syst (IJDMS) 4(1):33–45

Kuijpers B, Gomez L, Vaisman A (2017) Performing OLAP over graph data: query language, implementation, and a case study. In: BIRTE '17 proceedings of the international workshop on real-time business intelligence and analytics, no 6. ACM, New York

Kumar S, Singh B, Kaur G (2016) Data warehouse security issue. Int J Adv Res Comput Sci 7(6):177–179

Mathen MP (2010) Data warehouse testing. Infosys White Paper, Mar 2010

Mookerjea A, Malisetty P (2008) Best practices in data warehouse testing. In: Proc. test, New Delhi, 2008

O’Neil P, Graefe G (1995) Multi-table joins through bitmapped join indices. SIGMOD Rec 24(3):8–11

Oliveira B, Belo O (2015) A domain-specific language for ETL patterns specification in data warehousing systems. In: Chapter in progress in artificial intelligence, Springer, Volume 9273 of the series lecture notes in computer science, pp 597–602

Oracle Corporation (2005) Oracle advanced security transparent data encryption best practices. Oracle White Paper, July 2010

Oueslati W, Akaichi J (2010) A survey on data warehouse evolution. Int J Database Manag Syst (IJDMS) 2(4):11–24

Ponniah P (2001) Data warehousing fundamentals. Wiley, New York

Book   Google Scholar  

Rizzi S, Golfarelli M (1999) A methodological framework for data warehouse design. DOLAP 98 Washington DC USA, Copyright ACM, l-581 13-120-8/98/l 1

Rousopoulos R (1998) Materialized views and data warehouses. SIGMOD Rec 27(1):21–26

Santos RJ, Bernardino J, Vieira M (2011) A survey on data security in data warehousing: issues, challenges and opportunities. In: EUROCON-International Conference on Computer as a Tool (EUROCON), 2011 IEEE, Print ISBN: 978-1-4244-7486-8

Taktak S, Alshomrani S, Feki J, Zurfluh G (2017) The power of a model-driven approach to handle evolving data warehouse requirements. In: MODELSWARD, pp 169–181

Tang B, Han S, Yiu ML, Ding R, Zhang D (2017) Extracting top-k insights from multi-dimensional data. In: Proceedings of the 2017 ACM international conference on management of data. ACM, pp 1509–1524

Trujillo J, Palomar M, Gómez J, Song IY (2001) Designing data warehouses with OO conceptual models. IEEE Comput 34(12):66–75

Vassiliadis P, Sellis T (1999) A survey of logical models for OLAP databases. SIGMOD Rec 28(4):64–69

Venkatadri M, Reddy LC (2011) A review on data mining from Past to the Future. Int J Comput Appl 15(7):19–22

Vishnu B, Manjunath TN, Hamsa C (2014) An effective data warehouse security framework. Int J Comput Appl Recent Adv Inf Technol 33–37

Wang Z, Chu Y, Tan KL, Agrawal D, Abbadi AE (2016) HaCube: extending MapReduce for efficient OLAP cube materialization and view maintenance. In: International conference on database systems for advanced applications. Springer, Cham, pp 113–129

Yangui R, Nabli A, Gargouri F (2016) Automatic transformation of data warehouse schema to NoSQL data base: comparative study. In: Procedia, computer science, Elsevier, 20th international conference on knowledge based and intelligent information and engineering systems, KES2016, 5–7 September 2016, York, UK

Zeng K, Agarwal S, Stoica I (2016) IOLAP: managing uncertainty for efficient incremental OLAP. In: Proceedings of the 2016 international conference on management of data. ACM, pp 1347–1361

Download references

Author information

Authors and affiliations.

University School of Information, Communication & Technology, Guru Gobind Singh Indraprastha University, Delhi, India

Pravin Chandra

Rukmini Devi Institute of Advanced Studies, Delhi, India

Manoj K. Gupta

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Manoj K. Gupta .

Rights and permissions

Reprints and permissions

About this article

Chandra, P., Gupta, M.K. Comprehensive survey on data warehousing research. Int. j. inf. tecnol. 10 , 217–224 (2018). https://doi.org/10.1007/s41870-017-0067-y

Download citation

Received : 11 August 2017

Accepted : 05 December 2017

Published : 15 December 2017

Issue Date : June 2018

DOI : https://doi.org/10.1007/s41870-017-0067-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Data warehousing
  • Data warehouse design
  • Data warehouse testing
  • Research trends
  • Find a journal
  • Publish with us
  • Track your research

To read this content please select one of the options below:

Please note you do not have access to teaching notes, an empirical study on data warehouse systems effectiveness: the case of jordanian banks in the business intelligence era.

EuroMed Journal of Business

ISSN : 1450-2194

Article publication date: 12 May 2022

Issue publication date: 23 October 2023

Despite the increasing role of the data warehouse as a supportive decision-making tool in today's business world, academic research for measuring its effectiveness has been lacking. This paucity of academic interest stimulated us to evaluate data warehousing effectiveness in the organizational context of Jordanian banks.

Design/methodology/approach

This paper develops a theoretical model specific to the data warehouse system domain that builds on the DeLone and McLean model. The model is empirically tested by means of structural equation modelling applying the partial least squares approach and using data collected in a survey questionnaire from 127 respondents at Jordanian banks.

Empirical data analysis supported that data quality, system quality, user satisfaction, individual benefits and organizational benefits have made strong contributions to data warehousing effectiveness in our organizational data context.

Practical implications

The results provide a better understanding of the data warehouse effectiveness and its importance in enabling the Jordanian banks to be competitive.

Originality/value

This study is indeed one of the first empirical attempts to measure data warehouse system effectiveness and the first of its kind in an emerging country such as Jordan.

  • Data warehouse system
  • DeLone and McLean model
  • Business intelligence
  • Structural equation modelling

Al-Okaily, A. , Al-Okaily, M. , Teoh, A.P. and Al-Debei, M.M. (2023), "An empirical study on data warehouse systems effectiveness: the case of Jordanian banks in the business intelligence era", EuroMed Journal of Business , Vol. 18 No. 4, pp. 489-510. https://doi.org/10.1108/EMJB-01-2022-0011

Emerald Publishing Limited

Copyright © 2022, Emerald Publishing Limited

Related articles

All feedback is valuable.

Please share your general feedback

Report an issue or find answers to frequently asked questions

Contact Customer Support

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • J Am Med Inform Assoc
  • v.29(4); 2022 Apr

Research data warehouse best practices: catalyzing national data sharing through informatics innovation

Shawn n murphy.

1 Research Information Science and Computing, Mass General Brigham, Somerville, Massachusetts, USA

2 Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, USA

Shyam Visweswaran

3 Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania, USA

4 Clinical and Translational Science Institute, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania, USA

Michael J Becich

Thomas r campion.

5 Department of Population Health Sciences, Weill Cornell Medicine, New York, New York, USA

6 Clinical and Translational Science Center, Weill Cornell Medicine, New York, New York, USA

Boyd M Knosp

7 Roy J. and Lucille A. Carver College of Medicine and the Institute for Clinical & Translational Science, University of Iowa, Iowa City, Iowa, USA

Genevieve B Melton-Meaux

8 Department of Surgery, University of Minnesota, Minneapolis, Minnesota, USA

9 Institute for Health Informatics (IHI), University of Minnesota, Minneapolis, Minnesota, USA

Leslie A Lenert

10 Biomedical Informatics Center (BMIC), Medical University of South Carolina, Charleston, South Carolina, USA

11 Health Sciences South Carolina, Columbia, South Carolina, USA

Associated Data

No new data were generated or analyzed in support of this research.

Research Patient Data Repositories (RPDRs) have become essential infrastructure for traditional Clinical and Translational Science Award (CTSA) programs and increasingly for a wide range of research consortia and learning health system networks. 1–5 Almost every institution with a CTSA or Clinical Translational Research (CTR) program (found in states with lower amounts of National Institutes of Health funding) hosts an RPDR for the benefit of affiliated researchers. These repositories aim to enable healthcare research based upon the patient populations they serve. Within the institution, RPDRs are valuable for a range of research activities. They are used to identify patients for clinical trial recruitment using privacy-preserving methods to search and extract specific cohorts of trial-eligible patients. 6 They aid in developing and validating computable phenotypes that are increasingly important for accurately identifying patient cohorts in a reproducible fashion. 7 RPDRs provide de-identified patient data for population health research and support a growing body of artificial intelligence to predict patient outcomes. 8 Further, clinical studies can often be simulated using data from an RPDR. 9 Beyond the institution, aggregates of de-identified datasets from multiple institutions linked with privacy-preserving hash codes provide an unprecedented opportunity to conduct population health research, perform comparative effectiveness analyses and apply artificial intelligence methods over large and diverse populations. 10 The data contained within the RPDR vary across institutions, based on institutional strengths and weaknesses; the papers published in this issue reflect that variability (see Table 1 ). Data are commonly acquired from local electronic health records (EHRs) and other clinical information systems that capture information during clinical care. Data consist of diagnoses, problem lists, procedures, prescribed medications, laboratory exams, and many types of free-text reports. Overall, the benefits of the RPDR for accelerating translational research can be significant. For example, at Harvard, in 2006, between $94 and $136 million in annual research funding was linked to the use of data from the RPDR. 11

Selected features of RDPRs and practices related to RPDRs

National EHR data sharing Special features Common data models supported
Lead authorPatientsScopeCTSAPCORIAll of UsCancer RegistryClaimsNLPPrivacy PreservingSDoHOMOP CDMPCORnet CDMACT, TriNetxComments
Hogan17.2MRegionalXXXXXXXXXEarly SDoH data (geocoding)
Pfaff6.4MUS wideXXXXXXXFocus on COVID-19 data quality
Waitman24MRegionalXXXXXCloud RPDR for multi-institutional collaboration
LoombaNot RDPRRegionalXXRegional NIH data commons resource
Meeker600KRegionalXXXPublic health system data governance
Visweswaran5MSingle siteXXXXXXXXRPDR case study
Khan224KUS wideXXXXHeight and weight normalization
Barnes400KUS wideXFederated collection
CampionNot RDPRRegionalXXXTools matching investigators approaches
Castro125KRegionalXXXXi2b2 based biobank linking multiple data types
NelsonNot RDPRUS wideXXXXRIC's EHR cohort assessment process
Walji4.4MUS wideXMulti-institutional dental data warehouse
KnospNot RDPRUS wideXXNational survey of best practices
Kahn7.3MRegionalXLarge-scale migration of RPDR to the cloud
WaltersNot RDPRSingle siteXModel for governance of RPDR data requests

NIH: National Institutes of Health; PCORI: Patient-Centered Outcomes Research Institute; RIC: Recruitment Innovation Center; SDoH: social determinants of health .

This focus issue of JAMIA describes some of the current research, approaches, applications, and best practices for RPDRs comprising 11 research and applications papers 12–22 and 4 case reports 23–26 (see Table 1 ). Ten of the papers describe RPDRs, and 5 describe governance, regulatory and technical issues related to RPDRs. The scope of the papers ranges from a single site to regional to US-wide (2, 7, and 6 articles, respectively). The number of patients in the RPDRs ranges from 125K to 24M, of which 7 include privacy-preserving features, and 1 contains data from natural language processing (NLP). Commonly used data models (CDMs) in the RPDRs include the Observational Medical Outcomes Partnership (OMOP) CDM, 27 the National Patient-Centered Clinical Research Network’s (PCORnet’s) CDM, 5 and the Accrual to Clinical Trials (ACT) 4 and TriNetX 9 CDMs that are based on the Informatics for Integrating Biology & the Bedside (i2b2) platform. 7

A key emerging innovation is the adoption of cloud technology for RPDRs. Knosp et al 21 surveyed 20 CTSA hubs and found that 2 hubs had completely migrated their RPDRs to the cloud and several others were considering moving their RPDRs to the cloud. Three other papers describe approaches, advantages, and challenges of implementing RPDRs in the cloud. 14 , 15 , 17 Barnes et al 17 offer an approach to RPDRs that is focused on sharing and integrating data for large-scale research projects, using the Amazon Web Services (AWS) to create a distributed data commons. Common workspaces can be created where datasets from multiple sources can be accessed through common authentication and analyzed with preconfigured tools, including Jupyter and R notebooks. A limitation of this approach is that the researchers must harmonize data across the different data models, although the datasets contain common data elements, use controlled vocabularies, and adhere to other standards.

Anticipating what may become a common architecture for RPDRs, Kahn et al 15 describe opportunities and challenges of migrating a large RPDR with administrative, clinical, genomic, and population-level data from on-premises infrastructure to the Google Cloud Platform. While the cloud offers advantages such as inexpensive storage, automatic backups, and secure analytic environments, a variety of issues have to be carefully evaluated to enable smooth migration from on-premises infrastructure to the cloud. The Extract, Transform and Load (ETL) processes may need redesigning due to movement of large data volumes across routers and networks, and realizing cost savings requires organizational changes that may be difficult to implement.

Waitman et al 14 describe how cloud technology facilitates multi-institutional research. The Greater Plains Collaborative (GPC) Reusable Observable Unified Study Environment (GROUSE) is implemented on AWS and integrates EHR, claims, and tumor registry data from 7 healthcare systems. Using GROUSE, the authors demonstrate that clinical data may sometimes allow for more precise inferences than coded data; for example, obesity is more accurately inferred from body mass index measures compared to diagnostic (ICD-10) codes. However, comorbidities associated with obesity such as diabetes and sleep apnea are more accurately inferred from diagnostic codes. This article outlines GROUSE’s governance, architecture, and compliance components and describes interagency agreements that facilitate health system collaboration, and that ensures security and privacy policies align with federal requirements.

The papers in this issue aptly illustrate that RPDRs are a diverse, vibrant ecosystem that collaboratively and progressively enhances national health research infrastructure. This infrastructure has been invaluable in investigating the COVID-19 pandemic. 8 , 28–30 What are the future directions for RPDRs? Assuming support for the current funding for data curation at individual site RPDRs is continued by the 2 primary funding agencies for these activities, the Patient-Centered Outcomes Research Institute (supports PCORnet) 5 , 31 and the National Center for Accelerating Translational Science (NCATS) at the National Institutes of Health (supports N3C 2 and ACT 4 ) one would expect expansion in the depth and breadth of data available in these networks. PCORnet 32 and N3C are in the process of expanding the deployment of privacy-preserving record linkage systems that will allow the integration of data from individual RPDRs across networks using the encrypted hashed identifiers. Even so, the data in RPDRs could be broader and more representative of the national healthcare system. Advances in application programming interfaces to access data EHRs brought about by the 21st Century Cures Act 33 and expansion of the United States Core Data for Interoperability (USCDI) Standards 34 to reflect research data needs may make it possible for a broader range of health systems to contribute data to RPDRs. One area that requires further policy development is expanding health information exchange for research. Currently, the governance for the National Health Information Network (NHIN) acknowledges the importance of health information exchange for research, but does not support it within its Trusted Exchange Framework and Common Agreement (TEFCA). 35 Access to data from multiple providers through a TEFCA process for research studies could remove many gaps that limit the completeness of patient-level health information in RPDRs. However, further policy development is needed by the Office of the National Coordinator for Health Information Technology (ONC) and TEFCA’s Recognized Coordinating Entity (the Sequoia Project), to achieve this capability.

Paradoxically, national standards that improve access to data for research from the health system might seem to obviate the case for RPDRs, where they may seem less needed when EHR data are universally available in standardized formats and by protocols such as bulk Fast Healthcare Interoperability Resources (FHIR). 33 In this setting, funders might want to centralize data resources to reduce costs, creating a monoculture based on cloud infrastructure. The N3C Data Enclave illustrates this approach on the cloud, which uses central resources to normalize data and provide access to data sets and analytics in a cloud environment operated by a government contractor. 2 This “monoculture,” particularly if controlled by a private contractor, might stifle the types of innovative work detailed in this issue. Furthermore, much of the benefit of the RPDR is achieved through local hospital connections. RPDRs greatly assist recruitment of patients for clinical trials through processes local to the hospitals where the trials are being conducted. Engagement of clinical researchers from hospitals and medical centers occurs mostly at the local level, where they can decide on priorities for data ETL and data aggregation. Taking Protected Health Information (PHI) outside of hospital entities is greatly limited by the Health Insurance Portability and Accountability Act but necessary to validate data in the EHR through chart review. A centralized architecture may or may not be more efficient but is certainly less diverse and provides fewer opportunities for research in RPDR methods than alternative federated approaches used in PCORnet and ACT.

Further, many technical challenges remain in the curation and delivery of healthcare system data for research which might be best addressed initially in a diverse competitive ecosystem and greatly enhance the capabilities and potential health impacts of RPDRs. Further development is required to integrate NLP technology, and corresponding integration of NLP abstracted data into RPDRs requires further development. While many NLP systems are being developed in the context of RPDRs, there are few standards for representing data that is the product of NLP systems. Broad dissemination of NLP technologies may require further algorithm research, standardized tool kits, and standards for target concepts for abstraction. NLP abstracted data, being derived from algorithms, may also require the representation of the precision of abstraction within RPDRs to fully support its use in research studies. Integrating EHR data with hospital clinical trials and clinical studies is a further area of research that requires new methods and development. Such methods may overcome some of the limitations in data collection from case report forms and provide new ways to conduct the studies.

The representation of genomic data with clinical data in RPDRs is another area where additional development is needed. Papers published in this issue describe the use of i2b2 ontologies for the representation of genomic data variation and association data. 18 , 22 The size and complexity of representation of gene variant data and single nucleotide polymorphism associational data as well as other ‘omics’ data, in association with clinical data on phenotypes, makes standardization of data representations for queries difficult. While there is evolving work on architectures 36 and standards supporting this, 37 the models for representation may need further maturation to support standardized data queries and federation of data across RPDRs in a network.

Overall, the collection of papers in this issue demonstrates the value of a diverse program supporting institutional level RPDR development. Ongoing support for diversity in RPDRs at individual institutions creates opportunities to advance that field that would be difficult to achieve in a more centralized monoculture. As also shown in the paper by Pfaff et al, 20 integration of these data resources, when necessary for specific national-level programs, is feasible and strengthens the ecosystem of RDPRs as a whole.

This work was funded in part by the University of Rochester Center for Leading Innovation and Collaboration (CLIC), under Grant U24TR002260.

AUTHOR CONTRIBUTIONS

All authors contributed to the manuscript, made critical revisions, and approved the final version for submission.

ACKNOWLEDGMENTS

We thank Dr. Suzanne Bakken for the insightful comments and suggestions on the draft manuscript.

CONFLICT OF INTEREST STATEMENT

None declared.

DATA AVAILABILITY

data mining Recently Published Documents

Total documents.

  • Latest Documents
  • Most Cited Documents
  • Contributed Authors
  • Related Sources
  • Related Keywords

Distance Based Pattern Driven Mining for Outlier Detection in High Dimensional Big Dataset

Detection of outliers or anomalies is one of the vital issues in pattern-driven data mining. Outlier detection detects the inconsistent behavior of individual objects. It is an important sector in the data mining field with several different applications such as detecting credit card fraud, hacking discovery and discovering criminal activities. It is necessary to develop tools used to uncover the critical information established in the extensive data. This paper investigated a novel method for detecting cluster outliers in a multidimensional dataset, capable of identifying the clusters and outliers for datasets containing noise. The proposed method can detect the groups and outliers left by the clustering process, like instant irregular sets of clusters (C) and outliers (O), to boost the results. The results obtained after applying the algorithm to the dataset improved in terms of several parameters. For the comparative analysis, the accurate average value and the recall value parameters are computed. The accurate average value is 74.05% of the existing COID algorithm, and our proposed algorithm has 77.21%. The average recall value is 81.19% and 89.51% of the existing and proposed algorithm, which shows that the proposed work efficiency is better than the existing COID algorithm.

Implementation of Data Mining Technology in Bonded Warehouse Inbound and Outbound Goods Trade

For the taxed goods, the actual freight is generally determined by multiplying the allocated freight for each KG and actual outgoing weight based on the outgoing order number on the outgoing bill. Considering the conventional logistics is insufficient to cope with the rapid response of e-commerce orders to logistics requirements, this work discussed the implementation of data mining technology in bonded warehouse inbound and outbound goods trade. Specifically, a bonded warehouse decision-making system with data warehouse, conceptual model, online analytical processing system, human-computer interaction module and WEB data sharing platform was developed. The statistical query module can be used to perform statistics and queries on warehousing operations. After the optimization of the whole warehousing business process, it only takes 19.1 hours to get the actual freight, which is nearly one third less than the time before optimization. This study could create a better environment for the development of China's processing trade.

Multi-objective economic load dispatch method based on data mining technology for large coal-fired power plants

User activity classification and domain-wise ranking through social interactions.

Twitter has gained a significant prevalence among the users across the numerous domains, in the majority of the countries, and among different age groups. It servers a real-time micro-blogging service for communication and opinion sharing. Twitter is sharing its data for research and study purposes by exposing open APIs that make it the most suitable source of data for social media analytics. Applying data mining and machine learning techniques on tweets is gaining more and more interest. The most prominent enigma in social media analytics is to automatically identify and rank influencers. This research is aimed to detect the user's topics of interest in social media and rank them based on specific topics, domains, etc. Few hybrid parameters are also distinguished in this research based on the post's content, post’s metadata, user’s profile, and user's network feature to capture different aspects of being influential and used in the ranking algorithm. Results concluded that the proposed approach is well effective in both the classification and ranking of individuals in a cluster.

A data mining analysis of COVID-19 cases in states of United States of America

Epidemic diseases can be extremely dangerous with its hazarding influences. They may have negative effects on economies, businesses, environment, humans, and workforce. In this paper, some of the factors that are interrelated with COVID-19 pandemic have been examined using data mining methodologies and approaches. As a result of the analysis some rules and insights have been discovered and performances of the data mining algorithms have been evaluated. According to the analysis results, JRip algorithmic technique had the most correct classification rate and the lowest root mean squared error (RMSE). Considering classification rate and RMSE measure, JRip can be considered as an effective method in understanding factors that are related with corona virus caused deaths.

Exploring distributed energy generation for sustainable development: A data mining approach

A comprehensive guideline for bengali sentiment annotation.

Sentiment Analysis (SA) is a Natural Language Processing (NLP) and an Information Extraction (IE) task that primarily aims to obtain the writer’s feelings expressed in positive or negative by analyzing a large number of documents. SA is also widely studied in the fields of data mining, web mining, text mining, and information retrieval. The fundamental task in sentiment analysis is to classify the polarity of a given content as Positive, Negative, or Neutral . Although extensive research has been conducted in this area of computational linguistics, most of the research work has been carried out in the context of English language. However, Bengali sentiment expression has varying degree of sentiment labels, which can be plausibly distinct from English language. Therefore, sentiment assessment of Bengali language is undeniably important to be developed and executed properly. In sentiment analysis, the prediction potential of an automatic modeling is completely dependent on the quality of dataset annotation. Bengali sentiment annotation is a challenging task due to diversified structures (syntax) of the language and its different degrees of innate sentiments (i.e., weakly and strongly positive/negative sentiments). Thus, in this article, we propose a novel and precise guideline for the researchers, linguistic experts, and referees to annotate Bengali sentences immaculately with a view to building effective datasets for automatic sentiment prediction efficiently.

Capturing Dynamics of Information Diffusion in SNS: A Survey of Methodology and Techniques

Studying information diffusion in SNS (Social Networks Service) has remarkable significance in both academia and industry. Theoretically, it boosts the development of other subjects such as statistics, sociology, and data mining. Practically, diffusion modeling provides fundamental support for many downstream applications (e.g., public opinion monitoring, rumor source identification, and viral marketing). Tremendous efforts have been devoted to this area to understand and quantify information diffusion dynamics. This survey investigates and summarizes the emerging distinguished works in diffusion modeling. We first put forward a unified information diffusion concept in terms of three components: information, user decision, and social vectors, followed by a detailed introduction of the methodologies for diffusion modeling. And then, a new taxonomy adopting hybrid philosophy (i.e., granularity and techniques) is proposed, and we made a series of comparative studies on elementary diffusion models under our taxonomy from the aspects of assumptions, methods, and pros and cons. We further summarized representative diffusion modeling in special scenarios and significant downstream tasks based on these elementary models. Finally, open issues in this field following the methodology of diffusion modeling are discussed.

The Influence of E-book Teaching on the Motivation and Effectiveness of Learning Law by Using Data Mining Analysis

This paper studies the motivation of learning law, compares the teaching effectiveness of two different teaching methods, e-book teaching and traditional teaching, and analyses the influence of e-book teaching on the effectiveness of law by using big data analysis. From the perspective of law student psychology, e-book teaching can attract students' attention, stimulate students' interest in learning, deepen knowledge impression while learning, expand knowledge, and ultimately improve the performance of practical assessment. With a small sample size, there may be some deficiencies in the research results' representativeness. To stimulate the learning motivation of law as well as some other theoretical disciplines in colleges and universities has particular referential significance and provides ideas for the reform of teaching mode at colleges and universities. This paper uses a decision tree algorithm in data mining for the analysis and finds out the influencing factors of law students' learning motivation and effectiveness in the learning process from students' perspective.

Intelligent Data Mining based Method for Efficient English Teaching and Cultural Analysis

The emergence of online education helps improving the traditional English teaching quality greatly. However, it only moves the teaching process from offline to online, which does not really change the essence of traditional English teaching. In this work, we mainly study an intelligent English teaching method to further improve the quality of English teaching. Specifically, the random forest is firstly used to analyze and excavate the grammatical and syntactic features of the English text. Then, the decision tree based method is proposed to make a prediction about the English text in terms of its grammar or syntax issues. The evaluation results indicate that the proposed method can effectively improve the accuracy of English grammar or syntax recognition.

Export Citation Format

Share document.

IEEE Account

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

IGI Global

  • Get IGI Global News

US Flag

  • All Products
  • Book Chapters
  • Journal Articles
  • Video Lessons
  • Teaching Cases

International Journal of Data Warehousing and Mining (IJDWM)

International Journal of Data Warehousing and Mining (IJDWM)

Export reference.

Mendeley

The International Journal of Data Warehousing and Mining (IJDWM) a featured IGI Global Core Journal Title, disseminates the latest international research findings in the areas of data management and analyzation. This journal is a forum for state-of-the-art developments, research, and current innovative activities focusing on the integration between the fields of data warehousing and data mining. Featured in prestigious indices including Web of Science® Citation Index Expanded®, Scopus®, Compendex®, INSPEC®, and more, this scholarly journal is led by a leading IGI Global editor and contains research from a growing list of more than 1,500+ industry-leading contributors. This journal is an ideal resource for academic researchers and practicing IT professionals looking for double-blind peer-reviewed articles that provide solutions to ongoing challenges, and new developments within this field.

  • Big Data Processing
  • Business Intelligence
  • Data Cleaning, Preparation, and Transformation
  • Data Linkage and Fusion
  • Data Mining Algorithms
  • Data Mining Technologies and Tools
  • Data Warehouse Modeling
  • Efficient Data Search
  • Emerging Applications of Data Mining
  • Emerging Domains of Data Warehousing
  • Mining Big Data
  • Online Analytical Processing
  • Tools and Languages

Compendex (Elsevier Engineering Index)

  • Timely Publication: Quick Turnarounds & Prompt Peer Review (No Embargoes)
  • Continuous Support: In-House, Personalized Service Throughout the Entire Process
  • Cutting-Edge Technology: Proprietary Technologies & Integrations With Major Open Access Platforms
  • Diverse Options: Individual APCs, Platinum Funding, Institutional Open Access Agreements, & More
  • Research Advancement First: IGI Global Prioritizes Research Over Profit by Forfeiting Subscription Revenue
  • Unmatched Transparency: Comprehensive Visibility in Processes, Licensing, & More
  • Rapid Transformation: IGI Global is One of Few Publishers That Have Completed the Open Access Transition
  • Independence and Integrity: IGI Global is Committed to Maintaining its Autonomy as an Independent Publisher
  • Medium-Sized, Yet Powerful: IGI Global Offers Advantages of a Medium-Sized Publisher With the Reach of a Larger Publisher

Payment of the APC fee (directly to the publisher) by the author or a funding body is not required until  AFTER  the manuscript has gone through the full double-anonymized peer review process and the Editor(s)-in-Chief at his/her/their full discretion has/have decided to accept the manuscript based on the results of the double-anonymized peer review process. 

In the traditional subscription-based model, the cost to the publisher to produce each article is covered by the revenue generated by journal subscriptions. Under OA, all the articles are published under a Creative Commons (CC BY) license; therefore, the authors or funding body will pay a one-time article processing charge (APC) to offset the costs of all of the activities associated with the publication of the article manuscript, including:

  • Digital tools used to support the manuscript management and review process
  • Typesetting, formatting and layout
  • Online hosting
  • Submission of the journal's content to numerous abstracts, directories, and indexes
  • Third-party software (e.g. plagiarism checks)
  • Editorial support which includes manuscript tracking, communications, submission guideline checks, and communications with authors and reviewers
  • All promotional support and activities which include metadata distribution, press releases, promotional communications, web content, ads, fliers, brochures, postcards, etc. for the journal and its published contents
  • The fact that all published articles will be freely accessible and able to be posted and disseminated widely by the authors
  • Professional line-by-line English language copy editing and proofreading*

*This service is only performed on article manuscripts with fully paid (not discounted or waived) APC fees.

To assist researchers in covering the costs of the APC in OA publishing, there are various sources of OA funding. Additionally, unlike many other publishers, IGI Global offers flexible subsidies, 100% Open Access APC funding, discounts, and more. Learn More

The International Journal of Data Warehousing and Mining (IJDWM) is owned and published by IGI Global.

International Journal of Data Warehousing and Mining (IJDWM) is editorially independent, with full authority of the journal's content falling to the Editor-in-Chief and the journal's Editorial Board .

The In-House Editorial Office manages the publishing operations of the journal.

IGI Global 701 East Chocolate Avenue Hershey, PA 17033 USA

Principal Contact Grace Long Managing Editor of Journal Development IGI Global Phone: (717) 533-8845 ext. 147 E-mail: [email protected]

Support Contact Samantha Miduri Development Editor - International Journal of Data Warehousing and Mining (IJDWM) IGI Global Phone: 717-533-8845 E-mail: [email protected]

IMAGES

  1. (PDF) A Review on Data Warehouse Management

    latest research papers on data warehouse

  2. Research on an Economic Monitor and Decision Support System Based on

    latest research papers on data warehouse

  3. Research Paper On Data Warehousing Free Essay Example

    latest research papers on data warehouse

  4. (PDF) A Survey on Data Warehouse Architecture

    latest research papers on data warehouse

  5. Design and Research on Data Warehouse of Insurance Industry

    latest research papers on data warehouse

  6. Enterprise Data Warehouse for Research (EDW4R) framework.

    latest research papers on data warehouse

VIDEO

  1. Data Mining : Data Warehousing and Online Analytical Processing ch4

  2. Estratégia para Montagem de Data Warehouse

  3. The Future of Data Warehousing

  4. Data Integrity for Digital Transformation: Why It Matters (Cigniti & Tricentis White Paper)

  5. Trick every researcher must use. Get updates about latest research papers on email. (Google Scholar)

  6. Official launch of Daleview's Journal

COMMENTS

  1. An Overview of Data Warehouse and Data Lake in Modern Enterprise Data

    Data is the lifeblood of any organization. In today's world, organizations recognize the vital role of data in modern business intelligence systems for making meaningful decisions and staying competitive in the field. Efficient and optimal data analytics provides a competitive edge to its performance and services. Major organizations generate, collect and process vast amounts of data ...

  2. PDF Lakehouse: A New Generation of Open Platforms that Unify Data

    a two-tier architecture is highly complex for users. In the first gener-ation platforms, all data was ETLed from operational data systems directly into a warehouse. In today's architectures, data is first ETLed into lakes, and then again ELTed into warehouses, creating complexity, delays, and new failure modes.

  3. The Data Lakehouse: Data Warehousing and More

    This paper discusses how a data lakehouse, a new architectural approach, achieves the same benefits of an RDBMS-OLAP and cloud data lake combined, while also providing additional advan-tages. We take today's data warehousing and break it down into implementation-independent components, capabilities, and prac-tices.

  4. [2310.08697] The Data Lakehouse: Data Warehousing and More

    The Data Lakehouse: Data Warehousing and More. Dipankar Mazumdar, Jason Hughes, JB Onofre. View a PDF of the paper titled The Data Lakehouse: Data Warehousing and More, by Dipankar Mazumdar and 2 other authors. Relational Database Management Systems designed for Online Analytical Processing (RDBMS-OLAP) have been foundational to democratizing ...

  5. The Outlook for Data Warehouses in 2023: Hyperscale Data ...

    The Year Ahead: Both Strategic and Cost Advantages. In 2023, the data warehouse market will continue to evolve, as businesses seek new and better ways to manage expanding data stores that, for a growing number of organizations, will reach hyperscale. It's not just more data but the changing nature of data -- increasingly complex and ...

  6. PDF An Overview of Data Warehouse and Data Lake in Modern Enterprise Data

    into the database. This is an open research topic of interest. Big data and its related emerging technologies have been changing the way e-commerce and e-services operate and have been opening new frontiers in business analytics and related research [6]. Big data analytics systems play a big role in the modern enterprise management

  7. Unified Data Warehousing & Analytics

    Abstract. This paper argues that the data warehouse architecture as we know it today will wither in the coming years and be replaced by a new architectural pattern, the Lakehouse, which will (i) be based on open direct-access data formats, such as Apache Parquet, (ii) have first-class support for machine learning and data science, and (iii) offer state-of-the-art performance.

  8. Recent Advances and Research Problems in Data Warehousing

    Current research has lead to new developments in all aspects of data warehousing, however, there are still a number of problems that need to be solved for making data warehousing effective. In this paper, we discuss recent developments in data warehouse modelling, view maintenance, and parallel query processing.

  9. Data

    The extract, transform, and load (ETL) process is at the core of data warehousing architectures. As such, the success of data warehouse (DW) projects is essentially based on the proper modeling of the ETL process. As there is no standard model for the representation and design of this process, several researchers have made efforts to propose modeling methods based on different formalisms, such ...

  10. 17235 PDFs

    In computing, a data warehouse (DW, DWH), or an enterprise data warehouse (EDW), is a database used for reporting and data analysis. Integrating... | Explore the latest full-text research PDFs ...

  11. Exploring the research landscape of data warehousing ...

    Moreover, we could visualize the key person in each specific research area related to the data warehouse and big data mining. 5. Conclusion. This study aimed to identify the knowledge structure and research topics and trends of the DaWaK Conference papers using the Springer data.

  12. A Data Warehouse Approach for Business Intelligence

    Abstract: In a cloud based data warehouse (DW), business users can access and query data from multiple sources and geographically distributed places. Business analysts and decision makers are counting on DWs especially for data analysis and reporting. Temporal and spatial data are two factors that affect seriously decision-making and marketing strategies and many applications require modelling ...

  13. 54619 PDFs

    Explore the latest full-text research PDFs, articles, conference papers, preprints and more on DATA WAREHOUSING. Find methods information, sources, references or conduct a literature review on ...

  14. Data Warehouse with Big Data Technology for Higher Education

    It is possible to implement data warehouse for typical university information system [8]. Academic data warehouse supports the decisional and analytical activities regarding the three major components in the university context: didactics, research, and management [9]. Data warehouse has important role in educational data analysis [10]. Table 1.

  15. Comprehensive survey on data warehousing research

    Various issues and challenges in the field of data warehousing are presented in many studies during the recent years. In this paper, a comprehensive survey is presented to take a holistic view of the research trends in the fields of data warehousing. This paper presents a systematic division of work of researchers in the fields of data warehousing.

  16. (PDF) Data Warehouse Concept and Its Usage

    Abstract. A data warehouse is a r epository for all data which is collected by an organization in various operational systems; it can. be either physical or l ogical. It is a subject oriented ...

  17. An empirical study on data warehouse systems effectiveness: the case of

    This paper develops a theoretical model specific to the data warehouse system domain that builds on the DeLone and McLean model. The model is empirically tested by means of structural equation modelling applying the partial least squares approach and using data collected in a survey questionnaire from 127 respondents at Jordanian banks.

  18. Research data warehouse best practices: catalyzing national data

    This focus issue of JAMIA describes some of the current research, approaches, applications, and best practices for RPDRs comprising 11 research and applications papers 12-22 and 4 case reports 23-26 (see Table 1).Ten of the papers describe RPDRs, and 5 describe governance, regulatory and technical issues related to RPDRs. The scope of the papers ranges from a single site to regional to US ...

  19. data mining Latest Research Papers

    Epidemic diseases can be extremely dangerous with its hazarding influences. They may have negative effects on economies, businesses, environment, humans, and workforce. In this paper, some of the factors that are interrelated with COVID-19 pandemic have been examined using data mining methodologies and approaches.

  20. Big Data and New Data Warehousing Approaches

    newly e merged types of data, which a re usually characterized by. 4Vs, but also lately by 7Vs [4]: volume - the amounts of data are vast. variety - there is a great number of data format and ...

  21. Research of Data Warehouse for Science and Technology Management System

    A core work of the science and technology management system is to support the integration and utilization of massive data from distributed systems using data warehouse technology. In this paper, we focus on this work. First, we introduce the background of science and technology management by illustrating the scheme of project management business flows. Then, to define the science and ...

  22. (PDF) TRENDS IN DATA WAREHOUSING TECHNIQUES

    The Big Data Warehouse (BDW) is a scalable, high-. performance system that uses Big Data techniques a nd technologies to support mixed and complex analytical. workloads (e.g., streaming analysis ...

  23. International Journal of Data Warehousing and Mining (IJDWM)

    The International Journal of Data Warehousing and Mining (IJDWM) a featured IGI Global Core Journal Title, disseminates the latest international research findings in the areas of data management and analyzation. This journal is a forum for state-of-the-art developments, research, and current innovat...