Big Data in Medical Applications and Health Care

: Big Data can unify all patient related data to get a 360-degree view of the patient to analyze and predict outcomes. It can improve clinical practices, new drug development and health care financing process. It offers a lot of benefits such as early disease detection, fraud detection and better healthcare quality and efficiency. This paper introduces the Big Data concept and characteristics, health care data and some major issues of Big Data. These issues include Big Data benefits, its applications and opportunities in medical areas and health care. Methods and technology progress about Big Data are presented in this study. Big Data challenges in medical applications and health care are also discussed.


Big Data Concept and Characteristics
Big data is the data that exceeds the processing capacity of conventional database systems.The data is too big, moves too fast, or doesn't fit the strictures of conventional database architectures (Dumbill, 2013).Big data characteristics can be described by "6Vs".They are: Volume, Velocity, Variety, Value, Variability and Veracity (Russom, 2011;Eaton et al., 2012;O'Reilly Radar Team, 2012;Zikopoulos et al., 2012;Bellini et al., 2013;Demchenko et al., 2013;Megahed and Jones-Farmer, 2013;Minelli et al., 2013;Rajpathak and Narsingpurkar, 2013): • Volume: This means data size such as Terabytes (TB: Approximately 10 12 bytes), Petabytes (PB: Approximately 10 15 bytes) and Zettabytes (ZB: Approximately 10 21 bytes), etc • Velocity: Data is generated at a high speed • Variety: This represents all types of data such as structured data from relational tables, semistructured data from key-value web clicks and unstructured data from email messages, articles and streamed video and audio, etc • Value: It is defined by the added-value that the collected data can bring.It refers to the value that the data adds to creating knowledge.There is some valuable information somewhere within the data • Variability: It refers to data changes during processing and lifecycle.Increasing variety and variability also increases the attractiveness of data and the potentiality in providing unexpected, hidden and valuable information • Veracity: It includes two aspects: Data consistency (or certainty) and data trustworthiness.Data can be in doubt: incompleteness, ambiguities, deception and uncertainty due to data inconsistency, etc There is often noisy data or false information in big data.The focus of Big Data is on correlations, not causality (Bottles and Begoli, 2014).In addition, the data we consider big today may not be considered big tomorrow because of the advances in data processing, storage and other system capabilities (Zaslavsky et al., 2012).

Health Care Data
Big data in healthcare refers to electronic health data sets so large and complex that it is difficult to manage with traditional or common data management methods and traditional software and/or hardware (Priyanka and Kulennavar, 2014).Some health care data are characterized by a need for timeliness; for example, data generated by wearable or implantable biometric sensors; blood pressure, or heart rate is often required to be collected and analyzed in real-time (Helm-Murtagh, 2014).
Data in healthcare can be categorized as follows:

Genomic Data
It refers to genotyping, gene expression and DNA sequence (Chen et al., 2012;Priyanka and Kulennavar, 2014).

Clinical Data and Clinical Notes
About 80% of this type data are unstructured documents, images and clinical or transcribed notes (Yang et al., 2014): • Structured data (e.g., laboratory data, structured EMR/HER) • Unstructured data (e.g., post-op notes, diagnostic testing reports, patient discharge summaries, unstructured EMR/HER and medical images such as radiological images and X-ray images) • Semi-structured data (e.g., copy-paste from other structure source) Behavior Data and Patient Sentiment Data

Health Publication and Clinical Reference Data
Text-based publications (journals articles, clinical research and medical reference material) and clinical text-based reference practice guidelines and health product (e.g., drug information) data (Miller, 2012;Priyanka and Kulennavar, 2014).

Administrative, Business and External Data
• Insurance claims and related financial data, billing and scheduling (Terry, 2013) • Biometric data: Fingerprints, handwriting and iris scans, etc Other Important Data

Benefits of Big Data in Medical Applications and Health Care
Effective large-scale analysis often requires the collection of heterogeneous data from multiple sources.For example, obtaining the 360-degrees health view of a patient(or a population) benefits from integrating and analyzing the medical health record along with Internet available environmental data and then even with readings from multiple types of meters (for example, glucose meters, heart meters, accelerometers, among others) (Jagadish et al., 2014).
Applying advanced analytics to patient profiles, characteristics and the cost and outcomes of care can help identify the most clinically and cost effective treatments, proactively identify individuals who would benefit from preventative care or lifestyle changes.Big Data could help reduce waste and inefficiency in the following three areas (Manyika et al., 2011) (3) turn large amounts of data into actionable information Big Data benefits in medical applications and health care can be summarized as follows (Helm-Murtagh, 2014;Raghupathi and Raghupathi, 2014): (1) Improvement of health outcomes through more accurate and precise diagnoses; identification of patients who are at risk of adverse outcomes; and customization of care at the level of the individual patient (personalized medicine).(2) Reduction of costs through earlier detection of disease; elimination of unnecessary and duplicate care; reduction in variations in care; and elimination of erroneous and improper claims submissions.(3) Predicting and managing obesity and health risks; detecting health care fraud more quickly and efficiently (Certain developments or outcomes may be predicted and/or estimated based on vast amounts of historical data).( 4) Decreasing inappropriate Emergency Department (ED) utilization by using statistical models to identify the best ED services or care alternatives that are more appropriate, more convenient and lower in cost according to health conditions, prior use of health care resources (e.g., having a primary care provider) and proximity to sites of care.( 5) Providing advantages to Health Informatics.This is fulfilled by allowing for more tests cases or more features for research, leading to both quicker validation of studies and the ability to accrue enough instances for training.Big Data approaches have been used for the analysis of Health Informatics data gathered at multiple levels, including the molecular, tissue, patient and population levels.The amount of data produced within Health Informatics has grown to be quite vast.Big Data analytics grants potentially great possibilities to gain much knowledge in Health Informatics.

Applications and Opportunities of Big Data in Medical Applications and Health Care
Big Data can provide support across all aspects of health care.Big Data analytics has gained traction in genomics, epidemic spread prediction, clinical outcome, fraud detection, pharmaceutical development and personalized patient care, etc.There are potential applications in these areas.The specific applications of Big Data in the areas are as follows.

Genomics Analytics
Genomic data is becoming critical to the complete patient record.Combining patient genomic data with clinical data helps cancer treatment (Chen et al., 2012;Priyanka and Kulennavar, 2014).

Flu Outbreak Prediction and Control
In public and population health, continuously aggregating and analyzing public health data helps detect and manage potential disease out breaks.Big Data analytics can mine web-based and social media data topredict flu outbreaks based on consumer search, social content and query activity (Priyanka and Kulennavar, 2014).

Clinical Outcome Analytics
Clinical analytics can be performed through unifying clinical, financial and operation data for efficient clinical decisions.Blue Cross and Blue Shield of North Carolina, USA has provided several promising examples of how Big Data can be used to reduce the cost of care, predict and manage health risks and improve clinical outcomes (Helm-Murtagh, 2014).

Fraud Detection and Prevention
Identifying, predicting and minimizing fraud can be implemented by using advanced analytic systems for fraud detection and checking the accuracy and consistency of claims.Big Data predictive modeling can be used by health care payers for fraud prevention.Fraud waste and abuse analytics can be performed in analyzing claims and benefits of Veterans benefits and education fraud (White, 2014;Raghupathi and Raghupathi, 2014).

Medical Device Design and Manufacturing
Big Data tools enable a broader set of anatomical configurations, device materials, delivery methods and tissue interactions to be evaluated.Computational methods and Big Data can play an important role in medical device design and manufacturing (Erdman and Keefe, 2013).

Personalized Patient Care
Healthcare is moving from a disease-centered model towards a patient-centered model.In a disease-centered model, physicians' decision making is centered on the clinical expertise and data from medical evidence and various tests.In a patient-centered model, patients actively participate in their own care and receive services focused on individual needs and preferences.The patient-centric model creates a personalized disease risk profile, as well as a disease management plan and wellness plan for an individual.Personalized healthcare is a data-driven approach.With the increase in the use of electronic medical records, Big Data will facilitate to bring proactive and personalized patient care (Chawla Davis, 2013).In the near future, new big data-derived linkages will prompt timely updates of patient triage, diagnostic assistance and clinical guidelines to allow more precise and personalized treatment to improve clinical outcome for patients (Yang et al., 2014).

E-Consultation and Tele-Diagnosis
In the future, the aggregated ECG and images from hospitals worldwide will become big data, which should be used to develop an e-consultation program helping on-site practitioners deliver appropriate treatment.Real-time tele-consultation and tele-diagnosis of ECG and images can be practiced via an e-platform for clinical, research and educational purposes.Big Data analytics can predict over 50% deaths with fewer false positives as compared with the traditional ECG analysis, conductedbased on a smaller segment of ECG signals (Hsieh et al., 2013).

Pharmaceuticals and Medicine
The ability of pharmaceutical companies to continue bringing new life-saving/life enhancing medicines to patients in a timely, yet cost-effective manner will dependent on their ability to manage big data generated during all phases of pharmaceutical development.Integration of clinical, healthcare, patents, safety and public research data will provide key insights into decision making for target selection and lead optimization through Big Data analytics for drug discovery (Schultz, 2013).

Medical Education
Visual analytics was explored as a tool for finding ways of representing big data from the medical curriculum of an undergraduate medical program.(2) determine data's roles in the lowest level of a course and in the overall picture of the medical program; (3) perceive and analyze the curriculum in terms of identifying whether knowledge, skills and attitude are constructed through the alignment of teaching methods and assessment towards learning outcomes and (4) perform gap analysis by comparing different states in which data can be found to identify possible discrepancies.

Smart Health and Wellbeing
Business Intelligence and Analytics (BI&A) and the related field of Big Data analytics have become increasingly important in the business communities.Table 1 (Chen et al., 2012) summarizes some BI&A features and capabilities in smart health and wellbeing, including applications, data characteristics, analytics and potential impacts.
Big Data has brought great opportunities in medical applications and health care.Big Data applications will expand to more areas (such as telemedicine and digital hospitals), further improve medical service and deliver extensive value-based care.Big Data applications and opportunities need technology support.

Methods and Technology Progress in Big Data
In healthcare/medical field, large amount of information about patients' medical histories, symptomatology, diagnoses and responses to treatments and therapies is collected.Data mining techniques can be implemented to derive knowledge from this data in order to either identify new interesting patterns in infection control data or to examine reporting practices.Moreover, predictive models can be used as detection tools exploiting Electronic Patient Record (EPR) accumulated for each person of the area (Bellini et al., 2013).
For Big Data healthcare systems, the Hadoop-MapReduce framework is uniquely capable of storing a wide range of healthcare data types including electronic medical records, genomic data, financial and claims data etc. and offers high scalability, reliability and availability than traditional Database Management Systems (DBMS).In addition, intelligent functional modules such as specialized machine-learning algorithms for image analysis and recognition, diagnosis, surveillance, detection, notification etc., can be built on it (Ngufor and Wojtusiak, 2013).
Figure 1 shows a general framework of big data and big data analytics.
In order to create an automatic lesion diagnostic model, an automatic breast cancer diagnostic model was created using a pattern recognition algorithm and big data mining technique.Data mining is the process of discovering useful correlations hidden in large quantities data and extracting information which can be used in decision-making.The Support Vector Machine (SVM) algorithm that is most frequently used with an elevated accuracy was also presented in detail.A machine learning algorithm inputs data into the computer establishes criteria for categorization and predicts the category of the data when data are input (Lee and Lee, 2014).
Visual analytics presents an area of synergistic research with big data by conceptualizing the output of complex processes through intuitive graphical means.Metrics dash boarding, real-time interactive visualization and Giga-node graph exploration are some examples that would serve as appropriate visualization solutions to the big data examples.Unstructured data needs to be converted into analysis-ready datasets, which include comprehensive workflows for of big data solutions.Consideration of the structure of the end-data models is vital for the visualization process (Schultz, 2013).
Visual analytics combines data analysis and manipulation techniques, information and knowledge representation and human cognitive strength to perceive and recognize visual patterns (Vaitsis et al., 2014).
Analytics supports Big Data by providing interactive visualizations that allow people to navigate these datasets.Visual Analytics has been defined as "the science of analytical reasoning facilitated by interactive visual interfaces " (Thomas and Cook, 2006).Big Data enabled by cloud technologies could provide us new insights-clinically, operationally and in research (Shrestha, 2014).The concept of storage-as-a-service cloud computing, which provides hospitals with a big data storage capacity based on their specific demands at a low cost.In cardiology, cloud computing technology and mobile teleconsultation should be combined because mobile teleconsultation requires high speed data delivery and a big data center where data can be delivered, stored, retrieved and managed securely (Hsieh et al., 2013).
Besides general cloud infrastructure services (storage, compute, infrastructure/VM management), the following services are required to support Big Data (Turk, 2012): • Cluster services • Hadoop related services and tools • Specialist data analytics tools (logs, events, data mining, etc.) • Databases/Servers SQL, NoSQL • MPP (Massively Parallel Processing) databases • Registries, indexing/search, semantics, namespaces • Security infrastructure (access control, policy enforcement, confidentiality, trust, availability, privacy) Organizations used various methods of deidentification (anonymization, pseudonymization, encryption, key-coding, data sharing) to distance data from personal identities and preserve individuals' privacy.De-identification has been viewed as an important protective measure to be taken under the data security and accountability principles.Yet, over the past few years, computer scientists have repeatedly shown that even anonymized data can typically be re-identified and associated with specific individuals.De-identified data, in other words, is a temporary state rather than a stable category (Tene and Polonetsky, 2013).

Challenges of Big Data in Medical Applications and Health Care
Large volume, velocity and variety of big data have brought big challenges in data storage, curation, retrieval, and visualization.Variability and veracity of big data indicate data instability and uncertainty, which often makes Big Data analytics difficult.
Major challenges of Big Data in medical applications and healthcare are as follows: (1) the data in many health care providers, specifically hospitals, are often segmented orsiloed.Clinical data such as patient history, vital signs, progress notes and diagnostic test results are stored in the EHR.Quality and outcomes data such as surgical site infections, rates of return to surgery and patient falls are in the quality or risk management departments.Standards for validating, consolidating and processing data are needed (White, 2014).(2) It is difficult to aggregate and analyze unstructured data.Unstructured data include: Test results, scanned documents, images and progress notes in the patients' EHR, etc. (White, 2014).Efficiently handling large volumes of medical imaging data, extracting potentially useful information and biomarkers and understanding unstructured clinical notes in the right context are challenges (Priyanka and Kulennavar, 2014).( 3) Analyzing genomic data is a computationally intensive task; combining with standard clinical data adds additional layers of complexity (Priyanka and Kulennavar, 2014).( 4) An emerging new data source is telemetry from patient-owned devices and information entered by patients.The challenge of Big Data becomes even greater when telemetry from automated monitoring devices is included.Such data could include subjective symptom scores (pain, mood and mobility); patient reported outcomes; and device telemetry such as weight, activity, glucose, blood pressure and pulse oximetry (Halamka, 2014).The capture, indexing and processing of continuously streaming (and possibly annotated) finegrained temporal data is a challenge (Schultz, 2013).( 5) Big Data's focus on correlations, not causality, is difficult for physicians biased toward the biomedical model, where the focus is finding the cause of the disease in order to effectively treat it.Big data means more information, but there is often noisy data or false information (Bottles and Begoli, 2014).( 6) Privacy issues in the Health Insurance Portability and Accountability Act (HIPAA) are often cited as barriers to collecting big data (Warner, 2013).In telecardiology and tele-consultation, data confidentiality in the cloud, data interoperability among hospitals and network latency and accessibility are challenges (Hsieh et al., 2013).( 7) Even if the privacy of the patient can be protected, many health care providers are reluctant to share data because of market competition.It is difficult to determine the proper balance between protecting the patient's information and maintaining the integrity and usability of the data.Open access, integration, standardization of readable and useable data is a challenge (White, 2014).( 8) Data hackers have become more damaging in big data.Data leakage can be costly.In March 2012, hackers broke into Utah's Department of Health database and downloaded personal data from 780,000 patients (Social Security Numbers were downloaded for 280,000 patients) (Schmitt et al., 2013).Biometrics such as a fingerprint helps improve information security and protect against data leakage.However, it is almost impossible to guarantee complete data security.(9) Both providers and payers pointed to resource shortfalls such as staffing, budget and infrastructure as the big barriers to the adoption of Big Data.Lack of infrastructure and policies, standards and practices that make the most of big data in healthcare were also cited as a concern (Bulletin Board, 2014).( 10) De-identification is the process by which personally identifiable information is removed from health information so that there cannot be any linkage back to the individual in any way.HIPAA outlines two procedures for de-identifying the information: Safe harbor and expert determination.The ability to gather and analyze de-identified data is essential to driving down cost and improving quality.Concerns exist that data cannot really be fully de-identified (Warner, 2013).
Big Data technology challenges such as date integration, data visualization and information security will be overcome with the advances of computer science, scientific computation and other disciplines.Other challenges such as standards, data privacy and ownership and data sharing and cross-disciplinary collaboration, etc. need supports from agencies and governments in policies.It is important and necessary to consolidate e-Infrastructures as persistent platforms to ensure continuity in Big Data.

Conclusion and Future Research
Big Data is based on data obtained from the whole process of diagnosis and treatment of each case.Big Data analytics can perform predictive modeling to determine which patients are most likely to benefit from a care management plan.It is moving forward quickly in population health and quality measurement.Big Data offers a lot of benefits such as disease prevention, reduced medical errors and the right care at the right time and better medical outcomes.In addition, Big Data can improve the Research and Development (R&D) and translation of new therapies.Big data has great potential to improve medicine, guide clinicians in delivering value-based care.
Big Data has challenges in medical applications and healthcare.These challenges include consolidating and processing segmented or siloed data, aggregating and analyzing unstructured data, indexing and processing continuously streaming data, privacy, data leakage, information security and lack of infrastructure and unified standards, etc.
Most of the above challenges can be future research topics.These future research topics can be: Aggregating and analyzing unstructured health care data, indexing and processing of continuously stream data, medical data confidentiality and interoperability, health care data security and e-Infrastructures as persistent platforms for health care big data, etc.The authors of the paper will focus on Big Data in medical sensor data and streaming data processing, privacy-preserving data mining in health care, sentiment analysis of medical big data and personalization and behavioral modeling.

Table 1 .
BI&A and Big Data in smart health and well being