Quality Data Processing for Deep Learning
Francisco Herrera, University of Granada, Spain
New Predictive Knowledge Discovery from Unstructured Big Data: Images
Bonghee Hong, Pusan National University, Korea, Republic of
Clouds for Real-Time Applications - Scheduling Issues and Research Directions
Eleni Karatza, Aristotle University of Thessaloniki, Greece
Quality Data Processing for Deep Learning
Francisco Herrera
University of Granada
Spain
http://decsai.ugr.es/~herrera
Brief Bio
Francisco Herrera (SM'15) received his M.Sc. in Mathematics in 1988 and Ph.D. in Mathematics in 1991, both from the University of Granada, Spain. He is currently a Professor in the Department of Computer Science and Artificial Intelligence at the University of Granada. He has been the supervisor of 42 Ph.D. students. He has published more than 400 journal papers that have received more than 62000 citations (Scholar Google, H-index 125). He is coauthor of the books "Genetic Fuzzy Systems" (World Scientific, 2001) and "Data Preprocessing in Data Mining" (Springer, 2015), "The 2-tuple Linguistic Model. Computing with Words in Decision Making" (Springer, 2015), "Multilabel Classification. Problem analysis, metrics and techniques" (Springer, 2016), "Multiple Instance Learning. Foundations and Algorithms" (Springer, 2016). He currently acts as Editor in Chief of the international journals "Information Fusion" (Elsevier) and “Progress in Artificial Intelligence (Springer). He acts as editorial member of a dozen of journals. He received the following honors and awards: ECCAI Fellow 2009, IFSA Fellow 2013, 2010 Spanish National Award on Computer Science ARITMEL to the "Spanish Engineer on Computer Science", International Cajastur "Mamdani" Prize for Soft Computing (Fourth Edition, 2010), IEEE Transactions on Fuzzy System Outstanding 2008 and 2012 Paper Award (bestowed in 2011 and 2015 respectively), 2011 Lotfi A. Zadeh Prize Best paper Award of the International Fuzzy Systems Association, 2013 AEPIA Award to a scientific career in Artificial Intelligence, and 2014 XV Andalucía Research Prize Maimónides (by the regional government of Andalucía), 2017 Security Forum I+D+I Prize, and 2017 Andalucía Medal (by the regional government of Andalucía). He has been selected as a Highly Cited Researcher http://highlycited.com/ (in the fields of Computer Science and Engineering, respectively, 2014 to present, Clarivate Analytics). His current research interests include among others, soft computing (including fuzzy modeling, evolutionary algorithms and deep learning), computing with words, information fusion and decision making, and data science (including data preprocessing, prediction and big data).
Abstract
In the last years, deep learning methods and particularly Convolutional Neural Networks (CNNs) have exhibited excellent accuracies in many image and pattern classification problems, among others. To get quality data is the foundation for good data analytics in general, and it is also very important for getting a good deep learning model.
Quality data requires a deep data preprocessing analysis to adapt the data to fulfill the input demands of each learning algorithm. Data preprocessing is an essential part of any data mining process. In some cases, it focuses on correcting the deficiencies that may damage the learning process, such as omissions, noise and outliers, among others. In contrast to the classical classification models, the high abstraction capacity of CNNs allows them to work on the original high dimensional space, which reduces the need for manually preparing the input. However, a suitable preprocessing is still important to improve the quality of the result. One of the most used preprocessing techniques with CNNs is data augmentation for small image datasets, which increases the volume of the training dataset by applying several transformations to the original input. There are other guided preprocessing procedures based on specific problems, like brightness and other images features.
In this talk we present the connection between deep learning and data guided preprocessing approaches throughout all families of methods used to improve the deep learning capabilities, together with some applications.
New Predictive Knowledge Discovery from Unstructured Big Data: Images
Bonghee Hong
Pusan National University
Korea, Republic of
Brief Bio
Professor Bonghee Hong received his B.S degree and M.S. degree in Computer Engineering from Seoul National University in 1982 and 1984. He also received Ph.D. in Databases in 1988, from Seoul National University of Seoul, Korea. He has been currently a Professor in the Department of Computer Science and Engineering at Pusan National University of Busan, Korea since 1987. He has been now working in the area of Database, Stream Data Processing, Big Data Analytics and Processing, and Indexing on realtime moving object database.
His current research projects include both realtime streaming data processing of tactical moving objects for military services and predictive analytics of estimating rainfall and distance visibility of driving roads by using CCTV images for weather services.
His recently published paper on the Big Data field is as follows: “Clustering learning model of CCTV image pattern for producing road hazard meteorological information”(FGCS, 2018), “Pattern graph tracking-based stock price prediction using big data”(FGCS, 2018), Monte Carlo Simulation-based Traffic Speed Forecasting Using Historical Big Data(FGCS, 2016). The "Clustering learning model of CCTV image pattern" paper published in the FGCS was awarded the BEST HIGH QUALITY FORUM AWARD CERTIFICATE at the 3rd IoTBDS conference(2017). He also received the BEST PAPER AWARD from the 13th IEEE Embedded and Real-Time Computing Systems and Applications(2007). In 2011, he won the best academy award from the oldest and most prominent KIISE(Korean Institute of Information Scientists and Engineers) in the computer field in Korea. He had been an associate editor of the Journal of RF Technologies as international academic activities. He is now the vice chair of the steering committee of the DASFAA conference as an international conference activity. DASFAA is included in the list of excellent international conferences officially recognized by the Korean and Chinese governments as equivalent to the SCI Journal. The steering committee of IEEE BigComp includes him as a very important member. He was the chairman of KIISE, one of the largest and best computer society of Korea in 2017. He is currently the director of the Korean side who is proposing a project jointly with the University of Bremen(Prof. Dr. Walter Lang) for applying IRTG(International Research Training Group) from DFG in Germany of which topic is "Reducing Food Waste in Transocean Logistics Using Innovative Sensor Nets".
Abstract
One of the hottest topics for future big data research will be knowledge discovery. The sequence set of CCTV images as unstructured data can be regarded as stream data. The image of a leaf can be easily transformed into a change graph showing the distance from the center of the leaf to the perimeter of the edge. In a similar way, CCTV images that show rainy, cloudy, and sunny weather can be converted to graph patterns. This talk deals with the issues of how to discovery the knowledge of estimating rainfall and visibility by studying the pattern of sequence variation of images as time series data. The research hypothesis of knowledge discovery is that if we learn about images in the past, we can estimate the rainfall and the visibility distance for the CCTV image coming in real time. Vertical predictive analytics, which learn past data and perform future prediction, is more appropriate than horizontal predictive analytics to estimate rainfall. In order to solve the problem of overfitting when performing knowledge discovery, we discuss whether Monte Carlo simulation method can be a solution.
Clouds for Real-Time Applications - Scheduling Issues and Research Directions
Eleni Karatza
Aristotle University of Thessaloniki
Greece
Brief Bio
Helen Karatza is a Professor Emeritus in the Department of Informatics at the Aristotle University of Thessaloniki, Greece, where she teaches courses in the postgraduate and undergraduate level, and supervises doctoral and postdoctoral research. Dr. Karatza's research interests include Computer Systems Modeling and Simulation, Performance Evaluation, Grid and Cloud Computing, Energy Efficiency in Large Scale Distributed Systems, Resource Allocation and Scheduling and Real-time Distributed Systems.
Dr. Karatza has authored or co-authored over 215 technical papers and book chapters including five papers that earned best paper awards at international conferences. She is senior member of IEEE, ACM and SCS, and she served as an elected member of the Board of Directors at Large of the Society for Modeling and Simulation International. She served as Chair and Keynote Speaker in International Conferences.
Dr. Karatza is the Editor-in-Chief of the Elsevier Journal “Simulation Modeling Practice and Theory” and Senior Associate Editor of the “Journal of Systems and Software” of Elsevier. She was Editor-in-Chief of “Simulation Transactions of The Society for Modeling and Simulation International” and Associate Editor of “ACM Transactions on Modeling and Computer Simulation”. She served as Guest Editor of Special Issues in International Journals. More info about her activities/publications can be found in http://agent.csd.auth.gr/~karatza/
Abstract
For several years now there has been significant research in cloud computing. However, there still exist many open challenges due to the heterogeneity of cloud resources and the characteristics of the applications executed on such infrastructures. Cloud computing platforms offer an efficient means to run real-time applications. One of the most important aspects in cloud computing is the effective scheduling of real-time complex parallel jobs, allowing for guarantees that the deadlines will be met. Furthermore, the energy efficiency of cloud systems is very important. However, to reduce the energy consumption while meeting deadlines, adaptive scheduling techniques are required. In this talk we will present recent research covering a variety of concepts on real-time complex jobs scheduling in the cloud, and we will provide future research directions.