Big Data, Smart Data and Imbalanced Classification - Preprocessing, Models and Challenges
Francisco Herrera, University of Granada, Spain
Assimilated Learning - Bridging the Gap between Big Data and Smart Data
Yi-Ke Guo, Independent Researcher, United Kingdom
Challenges on Big data based Clouds Health-Care for Risk Predictions based on Ensemble Classifiers and Subjective Analysis
Hamido Fujita, Iwate Prefectural University, Japan
Big Data, Smart Data and Imbalanced Classification - Preprocessing, Models and Challenges
Francisco Herrera
University of Granada
Spain
http://decsai.ugr.es/~herrera
Brief Bio
Francisco Herrera (SM'15) received his M.Sc. in Mathematics in 1988 and Ph.D. in Mathematics in 1991, both from the University of Granada, Spain. He is currently a Professor in the Department of Computer Science and Artificial Intelligence at the University of Granada. He has been the supervisor of 42 Ph.D. students. He has published more than 400 journal papers that have received more than 62000 citations (Scholar Google, H-index 125). He is coauthor of the books "Genetic Fuzzy Systems" (World Scientific, 2001) and "Data Preprocessing in Data Mining" (Springer, 2015), "The 2-tuple Linguistic Model. Computing with Words in Decision Making" (Springer, 2015), "Multilabel Classification. Problem analysis, metrics and techniques" (Springer, 2016), "Multiple Instance Learning. Foundations and Algorithms" (Springer, 2016). He currently acts as Editor in Chief of the international journals "Information Fusion" (Elsevier) and “Progress in Artificial Intelligence (Springer). He acts as editorial member of a dozen of journals. He received the following honors and awards: ECCAI Fellow 2009, IFSA Fellow 2013, 2010 Spanish National Award on Computer Science ARITMEL to the "Spanish Engineer on Computer Science", International Cajastur "Mamdani" Prize for Soft Computing (Fourth Edition, 2010), IEEE Transactions on Fuzzy System Outstanding 2008 and 2012 Paper Award (bestowed in 2011 and 2015 respectively), 2011 Lotfi A. Zadeh Prize Best paper Award of the International Fuzzy Systems Association, 2013 AEPIA Award to a scientific career in Artificial Intelligence, and 2014 XV Andalucía Research Prize Maimónides (by the regional government of Andalucía), 2017 Security Forum I+D+I Prize, and 2017 Andalucía Medal (by the regional government of Andalucía). He has been selected as a Highly Cited Researcher http://highlycited.com/ (in the fields of Computer Science and Engineering, respectively, 2014 to present, Clarivate Analytics). His current research interests include among others, soft computing (including fuzzy modeling, evolutionary algorithms and deep learning), computing with words, information fusion and decision making, and data science (including data preprocessing, prediction and big data).
Abstract
Big Data applications are emerging during the last years, and researchers from many disciplines are aware of the high advantages related to the knowledge extraction from this type of problem. To overcome this issue, the MapReduce framework has arisen as a"de facto" solution. Basically, it carries out a "divide-and-conquer" distributed procedure in a fault-tolerant way to adapt for commodity hardware.
Learning with imbalanced data refers to the scenario in which the amounts of instances that represent the concepts in a given problem follow a different distribution. The main issue when addressing such a learning problem is when the accuracy achieved for each class is also different. This situation occurs since the learning process of most classification algorithm is often biased towards the majority class examples, so that minorities ones are not well modeled into the final system. Being a very common scenario in real life applications, the interest of researchers and practitioners on the topic has grown significantly during these years. Being still a recent discipline, few research has been conducted on imbalanced classification for Big Data. The reasons behind this are mainly the difficulties in adapting standard techniques to the MapReduce programming style. Additionally, inner problems of imbalanced data, namely lack of data and small disjuncts are accentuated during the data partitioning to fit the MapReduce programming style.
In this talk we will pay attention to the imbalanced big data classification problem, we will analyze the current research state of this are, the behavior of standard preprocessing techniques in this particular framework toward, and we will carry out a discussion on the challenges and future directions for the topic.
Assimilated Learning - Bridging the Gap between Big Data and Smart Data
Yi-Ke Guo
Independent Researcher
United Kingdom
Brief Bio
Yike Guo is a Professor of Computing Science in the Department of Computing at Imperial College London. He is the founding Director of the Data Science Institute at Imperial College, as well as leading the Discovery Science Group in the department. Professor Guo also holds the position of CTO of the tranSMART Foundation, a global open source community using and developing data sharing and analytics technology for translational medicine.
Professor Guo received a first-class honours degree in Computing Science from Tsinghua University, China, in 1985 and received his PhD in Computational Logic from Imperial College in 1993 under the supervision of Professor John Darlington. He founded InforSense, a software company for life science and health care data analysis, and served as CEO for several years before the company's merger with IDBS, a global advanced R&D software provider, in 2009.
He has been working on technology and platforms for scientific data analysis since the mid-1990s, where his research focuses on knowledge discovery, data mining and large-scale data management. He has contributed to numerous major research projects including: the UK EPSRC platform project, Discovery Net; the Wellcome Trust-funded Biological Atlas of Insulin Resistance (BAIR); and the European Commission U-BIOPRED project. He is currently the Principal Investigator of the European Innovative Medicines Initiative (IMI) eTRIKS project, a €23M project that is building a cloud-based informatics platform, in which tranSMART is a core component for clinico-genomic medical research, and co-Investigator of Digital City Exchange, a £5.9M research programme exploring ways to digitally link utilities and services within smart cities.
Professor Guo has published over 200 articles, papers and reports. Projects he has contributed to have been internationally recognised, including winning the “Most Innovative Data Intensive Application Award” at the Supercomputing 2002 conference for Discovery Net, and the Bio-IT World "Best Practices Award" for U-BIOPRED in 2014. He is a Senior Member of the IEEE and is a Fellow of the British Computer Society.
Abstract
The importance of combined analysis of big and smart data has been well recognized and ample research has been conducted with the focus on “data integration” or “data fusion”. However, the aforementioned imbalance in size, context and richness in semantics made the integration at the data level a hard and unsustainable technology. Although there is some remarkable progresses made in studying the interaction of big and smart data and exploring the advantage of both for the mutual enhancement for their analysis, we still lack a systematic study and uniform approach for the joint analysis of both data types. In this talk, we are introducing Assimilated Learning where smart data and big data will be co-collected in a bi-directionally guided way and co-analysed with a bi-directional transfer learning mechanism.
Challenges on Big data based Clouds Health-Care for Risk Predictions based on Ensemble Classifiers and Subjective Analysis
Hamido Fujita
Iwate Prefectural University
Japan
http://www.fujita.soft.iwate-pu.ac.jp
Brief Bio
Dr. Hamido Fujita, is a professor at Iwate Prefectural University(IPU), Iwate, Japan.
He is the director of Intelligent Software Laboratory.
He worked at Tohoku University as visiting Professor on late eighties, and then joined University of Tokyo, RCAST as Associate Professor, and then he moved to Canada, as visiting Professor at the University of Montreal.
Then after he joined Iwate Prefectural University (IPU), Faculty of Software and Information Science, Iwate, Japan, as professor and head of Information System Division. He is directing at IPU two laboratories, Intelligent Software Laboratory and Cognitive Systems Laboratory.
Also, he is the founder of SOMET organization.
He has supervised Ph.D students jointly with University of Laval, University Technology, Syndey(UTS), He is also Professor at the University of Laval, Quebec, Canada supervising Graduate Studies students, he was a visiting Professor at the University of Paris_1, Sorbonne, 2003~2004. He worked as opponent for Stockholm University, Sweden co-supervised students. He also worked with UTS Sydney, Australia, co-supervised Ph.D students. He published books in IOS press. He guest edited several special issues on International Journal of Knowledge based systems, Elsevier where he is editor in this journal.
He is currently heading a Virtual Medical Doctor project supported by Ministry of Interior and communication of Japan, and a project related to Mental Cloning as an intelligent user interface between human user and computers, supported by MEXT (Ministry of Education, Culture, Sports, Science and Technology), of Japan.
Abstract
Discovering patterns from big data attracts a lot of attention due to its importance in discovering accurate patterns and features that are used in predictions of decision making.
The challenges in big data analytics are the high dimensionality and complexity in data representation. Granular computing and feature selection are among the challenge to deal with big data analytics that is used for Decision making. We will discuss these challenges in this talk and provide new projection on ensemble learning for health care risk prediction. In decision making most approaches are taking into account objective criteria, however the subjective correlation among different ensembles provided as preference utility is necessary to be presented to provide confidence preference additive among it reducing ambiguity and produce better utility preferences measurement for good quality predictions. Most models in Decision support systems are assuming criteria as independent. Different type of data (time series, linguistic values, interval data, etc.) imposes some difficulties to data analytics due to preprocessing and normalization processes which are expensive and difficult when data sets are raw and imbalanced. We will highlight these issues though project applied to health-care for elderly, by merging heterogeneous metrics for providing health care predictions for elderly at home. We have utilized ensemble learning as multi-classification techniques on multi-data streams that collected from multi-sensing devices. Subjectivity (i.e., service personalization) would be examined based on correlations between different contextual structures that are reflecting the framework of personal context, for example in nearest neighbor based correlation analysis fashion. Some of the attributes incompleteness also may lead to affect the approximation accuracy. Attributes with preference-ordered domain relations properties become one aspect in ordering properties in rough approximations. We outline issues on Virtual Doctor Systems, and highlights its innovation in interactions with elderly patients, also discuss these challenges in granular computing and decision support systems research domains. In this talk I will present the current state of art and focus it on health care risk analysis with examples from our experiments.