Big Data & Analytics Summit 2018
How do American Administrations manage Big Data?
Terms like unreliable, incomplete, duplicated, and obsolete are often used to describe government data assets, and it’s not uncommon for data-quality issues to be cited as a significant inhibitor to business analytics and systems modernization initiatives. In Australia, this challenge is magnified by the absence of a whole-of-government identifier, which hampers matching of citizen records across datasets. One might assume that Queensland OSR must’ve spent months cleansing its data in preparation for the machine learning prototype. However, its experience was that predictive algorithms can be applied to imperfect data with decent results. Elizabeth Goli, OSR’s Commissioner, explains: “despite the use of only three internal data sources and the current challenges we have with data quality, the machine learning solution was still able to predict with 71% accuracy the taxpayers that would end up defaulting on their tax payment. What this tells us is that you don’t need to wait for your data to be 100% perfect to apply machine learning.”
Although data cleansing will undoubtedly improve the accuracy of predictions, Ms. Goli observes: “the tool itself will actually become a key enabler in improving the quality of data.” This is due to the machine’s ability to interrogate massive data sets to establish probable linkages, and its ability to autonomously improve the accuracy of its predictions over time. So, while 71% is a good start, OSR expects to improve prediction accuracy to over 90% through the combination of increasing data quality and refinement of the predictive model.
During his tenure as CFO for the State of Indiana, Chris Atkins observed that “very few governments view data as a strategic asset. It’s usually not managed nearly so well as the government’s money. But it’s just as important for complex problem-solving.” Perhaps part of the reason is that government data – unlike public funding – is abundant (for example, just one use case, on infant mortality, required analysis of 9 billion rows of data). Such vast amounts of data can make it difficult to derive information and insights simply due to the impracticality of traditional disk I/O at such a scale. This is where in-memory data platforms come to the fore, enabling massive data sets to be interrogated within a timeframe that is acceptable for business purposes and workable for predictive analytics scenarios.
Another benefit of real-time computing is the ability to apply analytics directly to operational systems, enabling users to work with the most up-to-date version of data and to refine their data models dynamically. Mr. Atkins articulates the value of this capability to the business: “real-time data access lets you know with a high degree of certainty that your view of the issues is current and that the decisions you’re making with regard to policy and planning will be best calibrated to address the problems. Without real-time data, you’re managing the problems of yesterday – not today or tomorrow.”
For nearly half a century, the status quo has been that operational data is extracted, transformed, and loaded into data warehouses, to which analytical tools are applied and business reports are generated. ETL processes are typically run in batch overnight (often not every night), resulting in business decisions being made based on yesterday’s data (in a best-case scenario). The fundamental reasons for this are that transactional databases are not designed for reporting, and system performance can be impacted by analytical processes. Mr. Atkins describes how this issue manifested during initiation of the MPH project: “the agencies’ first concern was that access to data could not interfere with their operations. After all, we didn’t want to shut down citizen services!”
But real-time computing is challenging the status quo by enabling analytical processes to be applied to transactional databases without impacting the performance of operational systems. Ms. Goli describes the potential of this capability to transform government service delivery: “machine learning provided the ability to crunch large amounts of data and achieve real-time insight on that data. Visualization through the journey map and risk ratings brought these insights to the forefront, allowing front-line staff to easily consume them and embed them in their day-to-day business processes.”
IDC predicts that by 2019, 15% of government transactions (such as tax collection, welfare disbursement, and immigration control) will have embedded analytics. But there is still cultural resistance to new ways of working with machines. This is largely due to the perception, born out of the Industrial Revolution, that machines will replace peoples’ jobs. However, the McKinsey Global Institute argues that while 36% of healthcare and social assistance jobs will be subject to some degree of automation, less than five percent can be fully automated. In most cases, automation will take over specific tasks, rather than replacing entire jobs, with about 60% of all occupations having at least 30% of constituent activities that could be automated.
Ms. Goli explains that in OSR’s experience, automation has the potential to enhance the working experience: “with the introduction of advances in technology, such as machine learning, people are naturally scared that the machines will ultimately replace their jobs. However, what our prototype showed our staff was that this technology enriches, rather than replaces their jobs. Specifically, our staff can see how machine learning will take a lot of the frustration out of their jobs by enabling them to deal with customers holistically and help them to improve the customer experience.”
Data volume may have grown in recent years, but most companies are only really using 20%–50% of it to derive insights and make decisions
To be truly insight-driven, organizations should pull all of those captured numbers and isolated facts together into a single source that can generate a full picture of how every imaginable internal and external factor is impacting the business. Then, and only then, can companies make decisions in the moment and take action immediately.
How can your midsize business make the shift from data-driven to insight driven quickly with as little disruption as possible? Adjust the way everyone in your company – from the board room to the shop floor – uses data.
In the report, Forrester Consulting suggests that midsize business should consider embracing four fundamental capabilities:
1. Manage data with greater agility and flexibility
By adopting data management technology that goes beyond traditional, highly structured methods, you can consolidate data from every source into a single platform that can be accessed and acted on quickly – anytime, anywhere, and on any device. For everyone in your business, this can be a freeing experience when making on-the-fly decisions that contribute to the customer experience and bottom line.
2. Adopt analytics tools that are agile, flexible, and self-service
Thanks to the latest innovations in predictive and prescriptive analytics, midsize businesses can turn every functional leader and employee into a citizen data scientist. And all of this can be done without the intervention of IT or third-party analytics consultants. Self-service analytics are now intuitive enough to manipulate data and dig deep into the finer details of the problem at hand – even if you are using a mobile device at a customer location.
3. Get well-rounded insights by tapping into all areas of the business
Data volume may have grown in recent years, but most companies are only really using 20%–50% of it to derive insights and make decisions. Your business must combine currently leveraged data with the remaining 50%–80%.
4. Deliver insights that are contextual, actionable, and pervasive
Considering the pace of today’s world, analyzing data without the assistance of smart tools is too time-consuming. By the time a decision is reached, it’s too late. The data is outdated, the risk is impacting the business, and the opportunity is gone. Pulling analytics that are contextual and learning from past actions can generate more precise insights faster.
“Metadata management and ensuring data privacy for regulations such as GDPR joins earlier trends like AI and IoT, but the unexpected trend of 2018 will be the convergence of data management technologies,” said Emily Washington, senior vice president of product management at Infogix. “Big data has been the next big technology phenomenon for a long time, but businesses are increasingly evaluating ways to streamline their overall technology stack if they want to successfully leverage big data and analytics to create a better customer experience, achieve business objectives, gain a competitive advantage and ultimately, become market leaders.” The top data trends for 2018 were assembled by business leaders at Infogix who have decades of experience in information technology. The major trends include:
2018: The Year of Converging Data Management Technologies
- Use cases have proven that leveraging data requires a multitude of separate tools for tasks like data quality, analytics, governance, data integration, metadata management and more.
- To extract meaningful insights and increase operational efficacy, businesses will increasingly demand flexible, integrated tools to enable users to quickly ingest, prepare, analyze, act on, and govern data—while easily communicating insights derived.
Increased Importance of Data Governance
- The deluge of data is growing, government regulations are increasing and teams have much greater access to data within an organization. Add to this the increasing need to leverage advanced analytics, and data governance has become more critical than ever.
- Data governance capabilities have evolved in a way that provides complete transparency into a business’s data landscape—allowing them to combat increasingly complex regulatory and compliance demands and the shifting tides of business policies and business alignment.
The Continued Rise of the Chief Data Officer (CDO)
- In today’s data-intensive environment, a CDO is more important than ever to navigate regulatory demands, successfully leverage data and manage enterprise-wide governance.
- A CDO helps businesses manage unstructured and unpredictable data, while successfully leveraging advanced analytics and maximizing the value of data assets across the business enterprise.
Ensuring Data Privacy for Regulations such as the General Data Protection Regulation (GDPR)
- When GDPR goes into effect in May 2018, it will strengthen and unify data protection rules for all organizations processing personal data for European Union (EU) residents.
- Through analytics-enabled data governance, a business can not only locate personal data enterprise-wide, but monitor compliance, usage, approvals, and accountability across the organization.
The Proliferation of Metadata Management
- Metadata is a growing trend for 2018. This “data about data” contains the information necessary to understand and effectively use data such as business definitions, valid values, lineage, and more.
- Using such ontologies, organizations can understand the relationship between data sets, as well as enhance discoverability in metadata. Metadata management is critical in enterprise data environments to support data governance, regulatory compliance and data management demands.
The Monetization of Data Assets
- Organizations recognize that data is either a liability or an asset. Metadata can be used to enable a deeper understanding of the most valuable information.
- We are seeing more organizations using a combination of logical, physical, and conceptual metadata to classify data sets based on their importance, and businesses can apply a numerical value to each data classification, effectively monetizing it.
The Future of Prediction: Predictive Analytics to Improve Data Quality
- With the continued concerns with data quality, and the volumes of data increasing, businesses are enhancing data quality anomaly detection with the use of machine-learning algorithms.
- By using historical patterns to predict future data quality outcomes, businesses can dynamically detect anomalies in data that might otherwise have gone unnoticed or only found much later through manual intervention.
IoT Becoming More Real
- Each passing year marks an increase in the number of connected devices generating data and there is a steep rise in focusing on extraction of insights from this data.
- We are starting to see more and more defined IoT use cases leveraging data—from newer connected devices like sensors, and drones for analytics initiatives. With this, there is a growing demand for streaming data ingestion and analysis.
“As more data is generated through technologies like IoT, it becomes increasingly difficult to manage and leverage. Integrated self-service tools deliver an all-inclusive view of a business’s data landscape to draw meaningful, timely conclusions,” said Washington. “Full transparency into a business’s data assets will be crucial for successful analytics initiatives, addressing data governance and privacy needs, monetizing data assets and more as we move into 2018.”
The Wall Street Journal: MIT Team Uses Big Data, IoT to Speed Up ‘Last Mile’ Deliveries
CAMBRIDGE, Mass.–High-tech logistics systems have quickened the delivery of goods from manufacturing hubs to big-city markets in recent years. But speeding up the so-called last mile, from a local distribution center to a retailer or a customer’s home, has remained a challenge, especially in crowded urban centers. That’s a crucial hurdle, since the last mile of delivery routes tends to be the slowest and least cost-effective, according to Matthias Winkenbach, director of the Massachusetts Institute of Technology’s Megacity Logistics Lab, an initiative of the MIT Center for Transportation & Logistics.
It’s also where big-data analytics and the Internet of Things can be a powerful resource, Dr. Winkenbach told chief information officers, supply chain managers and other attendees at an MIT supply chain management R&D conference on Wednesday. “More and more companies are sitting on tons of data, but they don’t know what to do with it, or how to understand it,” he told CIO Journal. The MIT Megacity Logistics Lab team is trying to rectify that. It has worked with Anheuser-Busch InBev NV, the global brewery, and B2W, an e-commerce firm based in Sao Paolo, Brazil. The team’s former director, Edgar Blanco, was recently hired by Wal-Mart Stores Inc.WMT +1.16% as senior director of strategy and innovation, in order to apply the lab’s last-mile data analytics.
Until recently, Dr. Winkenbach said, gauging the efficiency of shipping routes has been limited to knowing when a package left a given depot, how far it travelled and the amount of time or fuel consumed in getting it there. But thanks to the consumerization of IT tools through smartphones, GPS-enabled devices, and IoT sensors and scanners — as well as the emergence of a fast, mobile Internet to collect and transmit large amounts of data from anywhere — shippers can now have a near-complete view of a given delivery route at any point in time, he said. That’s driving a hot new market for data-driven software firms that can help companies offer same-day deliveries. United Parcel Service Inc.UPS +0.61% in February led a $28 million funding round for Deliv Inc., one of a growing crop of last-mile delivery startups vying for accounts with retailers, restaurants and grocery stores. Deliv says it has roughly 4,000 retailers using its service including Kohl’s Corp. and Macy’s Inc.
Dr. Winkenbach said data-collecting tools can be used to better track the progress of delivery vehicles and inform route planning, by identifying patterns in delivery times.But they can also provide “transactional data” in the form of a clearer picture of what happens between a delivery truck and a customer’s doorstep, he said.
Many shippers want to know why some drop offs take much longer than others, an area Mr. Winkenbach calls the “black box” of delivery data, since for years so little off-vehicle data was available. He said geospatial data shows longer doorstep stops often occur in the most densely populated parts of a city, where many people live in high-rise apartments, he said. That means delivery workers are struggling to park, walking more or farther after parking, and climbing stairs when they get there.
Beyond that, so-called “crew traces” from smartphones, GPS and other geo-locating sources connected to delivery workers, can reveal key customer behavior — customers who chronically aren’t home, for instance — that is rarely factored into route planning, he said. Together, all this data can be fed into creating better delivery training programs, more efficient routes, and helping companies determine the best type of delivery vehicles. Sometimes multiple short-route deliveries on smaller vehicles, including bicycles, makes more sense than bulk deliveries in large trucks, for instance.
In most cases, Dr. Winkenbach said, his data shows that deliveries in big cities are almost always improved by creating multi-tiered systems with smaller distribution centers spread out in several neighborhoods, or simply pre-designated parking spots in garages or lots where smaller vehicles can take packages the rest of the way. One variable he has yet to crack is weather. “It’s a challenge to get accurate data on weather,” he said. “And all it takes is a big rainstorm and deliveries slow way down.
Microsoft’s new Azure SQL Data Warehouse
As reported by the website Top Tech News: Most As reported by the websitebusinesses are only able to crunch and analyze a small percentage of the massive amounts of data they have available, but several new offerings from Microsoft are aimed at changing that. Announced Wednesday during the Microsoft’s Build 2015 developers conference in San Francisco, the new tools and services offer cloud -based storage and computing power for organizations chasing big data intelligence .
Microsoft’s new Azure SQL Data Warehouse will let businesses add “unlimited compute power” to questions such as, “how will discounts affect inventories and margins?” A new service on top of the company’s existing Azure cloud platform, SQL Data Warehouse turns relational data warehousing into a pay-as-you-go, as-a-service offering. It is expected to roll out as a public preview in June. Also unveiled yesterday was Azure Data Lake, a “nearly infinite” data repository service that allows organizations to store, process and analyze exabytes of both structured and unstructured data. Would-be users are being invited to sign up to be notified when the service becomes available as a public preview. “Most businesses make decisions on a fraction of the data available to them, and this often leads to incorrect conclusions that can cost companies billions,” Scott Guthrie (pictured), Executive Vice President of Microsoft’s Cloud and Enterprise Group, said in a blog post on Wednesday. “But we believe that businesses should be able to derive insights from all of their data, no matter where it is stored, what format it is in and [no] matter how big that data is.”
The new Azure SQL Data Warehouse “offers developers the industry’s first enterprise -grade data warehouse that supports petabytes of data and scales compute separately from storage,” Guthrie said. He added the service will offer customers a 75 percent savings over cloud data warehouse offerings. Guthrie also pointed to a number of other additions Microsoft is making to its Azure cloud platform. Previewed during the Build conference, a new elastic database pool technology will let users of Azure SQL Database “easily manage hundreds to thousands of separate databases per client as a single scalable service,” Guthrie noted. Microsoft also introduced Visual Studio Code, a new Office Graph API; demonstrated Docker support for Linux and Windows Server; and previewed releases of .NET Core runtimes for Linux, Windows and Mac OSX.
Big data: the future of logistics
As reported by the website Luxembourg DELANO: “I am convinced there are opportunities for developing the logistics sector by using ICT,” Étienne Schneider, Luxembourg’s deputy prime minister. But “data alone does not generate opportunity. We need to turn them into smart data.” He was speaking at the “Big data: the future of logistics” conference, organised by KPMG, the Luxembourg-Poland Business Club and Poland’s embassy to the Grand Duchy. “Big data refers to the idea that society can do things with a large body of data that weren’t possible when working with smaller amounts,” according to The Economist. Originally it was applied to fields such astrophysics and automated translation. “Now it refers to the application of data-analysis and statistics in new areas, from retailing to human resources,” the publication explained. Noting that logistics is a big part of the Luxembourg government’s economic growth programme, Schneider said in his speech that big data had the potential to solve problems from supply chain management to road congestion. And with rising concerns over privacy and hacking, Luxembourg offers high security standards for handling sensitive data, while still allowing high efficiency, driven by its experience in the financial sector, argued Schneider.
He spoke in favour of further linking the multimodal transit centres in Luxembourg and Poland, meaning Luxembourg exports could more efficiently access eastern Europe via Poland and Luxembourg could serve as a hub for Polish goods headed towards countries such as France and Spain. Schneider, who is also the economy and defence minister, said that during a recent mission to Poland he was impressed by the level of the IT skills of recent university graduates.
One advantage of big data for a smaller economy is that a country does not have to be big to take advantage of it, said Bartosz Jałowiecki, Poland’s ambassador to the Grand Duchy. Both countries are well positioned to profit from big data’s benefits since both have flexible economies and outlooks, in the ambassador’s view. Pascal Denis of KPMG cited studies showing the amount of data is doubling every year. But what to do with the mounds of information? Piotr Reichert of Comarch, a Polish firm which provides IT services to large organisations, said its Luxembourg unit has used big data findings to develop an entirely new supply chain finance product, for instance. Payment delays “for small suppliers” are “sometimes a killer”. So the system matches sellers who have buyers with banks and has been used to settle more than 300m invoices in 12 countries since 2008, he said. Rafał Markiewicz of InPost, founded as a private competitor to state-run Polish Post, said his firm provides automated parcel distribution machines (and is currently testing automated laundromat drop off and pick up machines) in 21 countries from the Baltics to Chile. InPost based its finance unit in Luxembourg since electronic payments are an important part of e-commerce in many markets. More than 90% of e-commerce customers pay online in the UK, compared with roughly 10% in Russia, according to Markiewicz.
His colleague Maciej Jaroszuk-Rozycki said big data helps the company learn how to better manage delivery to its outlets, “because there is a limited amount of locker spaces”. One location in a trendy part of Warsaw needs to be restocked six times a day, he said. In the UK around half of parcels are picked up after 4pm and roughly one third are collected over the weekend, for example.
Michael Kreutzmeyer explained that his firm, Luxembourg-based Dematic, provides hardware and software for automating warehouses, including 50 Amazon distribution centres. Big data helps plan staffing for seasonal peaks in workload such as the winter holiday season. In addition, he observed a new trend that consumers will now order, say, ten pair of shoes of different colours and sizes, and then return the nine they do not like. Big data helps a company like Amazon restock its warehouses so those items can be resold quickly. That is important because, citing an old retailing maxim, “What you can’t sell today, you can’t sell tomorrow”. Francis Castelin of Transalliance, a transport firm that moved its headquarters from the French city of Nancy to Luxembourg in 2010, said technology has the potential to fundamentally shift how the sector is viewed by customers. “Big data could help logistics become an investment and not a cost”.
Unlocking Big data
4 lessons for every entrepreneur creating Big Data Solutions
(BILL SCHMARZO, CONTRIBUTOR,Chief Technology Officer, EMC Global Services)
I recently taught an MBA course at the University of San Francisco titled the “Big Data MBA.” In working with the students to apply Big Data concepts and techniques to their use cases, I came away with a few observations that could be applied by any entrepreneur.
1. Understand the customer’s problem.
To ensure that your solution adds value, start by conducting extensive primary and secondary research on both the problem and the value of the solution. To understand your targeted customers, develop personas early in the process that serve as the “face of the customer.” Document the types of questions they ask and decisions they make. Then use the resulting insight to identify and prioritize data sources that you could be capturing about your customers, products and operations based upon business value and ease of implementation.
2. Understand how your product fits in the customer’s environment.
Companies have big investments in their data and technology environments. They will not be easily persuaded to toss out that investment. Instead, figure out how your solution can leverage or extend your targeted customers’ existing data and technology investments. Data, analytics, reports, dashboards tools and even SQL are strategic organizational assets. Explore ways to extend or free up those assets with new big data technologies, products and capabilities. By adding $1 now, they can free up or add $10 of value to their existing investments, such as Business Intelligence and data warehousing. That’s always a winning strategy!
3. Build upon open source and cloud technologies.
There is a compelling suite of open source technologies, many supported by the Apache Foundation, that are free, scalable and that allow organizations to quickly develop and get products to market. These technologies include:
- Hadoop, a programming framework that supports the processing of large data sets in a distributed computing environment.
- Spark, an in-memory open-source cluster-computing framework that provides performance up to 100 times faster for in-memory analysis and applications.
- YARN, which enables multiple data processing engines on top of Hadoop such as interactive SQL, real-time streaming, and advanced analytics, along with the traditional MapReduce batch processing.
- Mahout, a suite of scalable machine learning algorithms focused primarily of collaborative filtering, clustering and classification.
- HBase, a column-oriented database management system that runs on top of HDFS; very useful for sparse data sets, which are common in many big data use cases.
- Hive, an open-source data warehouse system for querying and analyzing large datasets stored in Hadoop files.
- R, a free software programming language and software environment for statistical computing and graphics.
The entrepreneur should stand on the shoulders of those who have already built solutions to create your unique and compelling differentiation. Leverage open sources products and the cloud for your development environment. Light your hair on fire to get initial prototypes out to market as quickly as possible. Heavily instrument your product so that you have details about how customers are trying to use your product. Learn and evolve quickly. Speed is everything, with customer service a close second.
4. Provide a compelling, short payback ROI.
Help organizations find new ways to monetize their data and analytic assets. Focus on the business stakeholders by providing products and solutions that help them optimize their key business processes. To accomplish this, develop an initial ROI for your product and use it as compelling evidence for your customers to test, try and buy your product. Empowering front-line employees to deliver new services to customers is a great way to monetize your data and analytic assets and drive that ROI. If you don’t know how you product makes your targeted customers money, then don’t expect them to figure it out on their own. It’s very encouraging to see MBA students empowered to be bold and brave in creating new business opportunities. And as I told my MBA class, the days of the MBA and business leaders delegating data and analytic decisions to IT are over. Business leaders (and MBA students) need to start owning these new sources of monetization, and the time is now.
Techcrunch: How Big Data Will Transform Our Economy
(Reading TechCrunch) The great Danish physicist Niels Bohr once observed that “prediction is very difficult, especially if it’s about the future.” Particularly in the ever-changing world of technology, today’s bold prediction is liable to prove tomorrow’s historical artifact. But thinking ahead about wide-ranging technology and market trends is a useful exercise for those of us engaged in the business of partnering with entrepreneurs and executives that are building the next great company. Moreover, let’s face it: gazing into the crystal ball is a time-honored, end-of-year parlor game. And it’s fun. So in the spirit of the season, I have identified five big data themes to watch in 2015. As a marketing term or industry description, big data is so omnipresent these days that it doesn’t mean much. But it is pretty clear that we are at a tipping point. The global scale of the Internet, the ubiquity of mobile devices, the ever-declining costs of cloud computing and storage, and an increasingly networked physical word create an explosion of data unlike anything we’ve seen before. The creation of all of this data isn’t as interesting as the possible uses of it. I think 2015 may well be the year we start to see the true potential (and real risks) of how big data can transform our economy and our lives.
Data-driven decision tools are not only the domain of businesses but are now helping Americans make better decisions about the school, doctor or employer that is best for them. Similarly, companies are using data-driven software to find and hire the best employees or choose which customers to focus on.
But what happens when algorithms encroach on people’s privacy, their lifestyle choices and their health, and get used to make decisions based on their race, gender or age — even inadvertently? Our schools, companies and public institutions all have rules about privacy, fairness and anti-discrimination, with government enforcement as the backstop. Will privacy and consumer protection keep up with the fast-moving world of big data’s reach, especially as people become more aware of the potential encroachment on their privacy and civil liberties?.
With over $1.2 trillion spent annually on public K-12 and higher education, and with student performance failing to meet the expectations of policy makers, educators and employers are still debating how to fix American education. Some reformers hope to apply market-based models, with an emphasis on testing, accountability and performance; others hope to elevate the teaching profession and trigger a renewed investment in schools and resources. Both sides recognize that digital learning, inside and outside the classroom, is an unavoidable trend. From Massive Open Online Courses (MOOCs) to adaptive learning technologies that personalize the delivery of instructional material to the individual student, educational technology thrives on data. From names that you grew up with (McGraw Hill, Houghton Mifflin, Pearson) to some you didn’t (Cengage, Amplify), companies are making bold investments in digital products that do more than just push content online; they’re touting products that fundamentally change how and when students learn and how instructors evaluate individual student progress and aid their development. Expect more from this sector in 2015. Now that we’ve moved past mere adoption to implementation and utilization, 2015 will undoubtedly be big data’s break-out year.
The big data and cloud join forces
According to the website report www.tendencias21.net two ICT initiatives are monopolizing the headlines about technology in recent times, with the promise to revolutionize computing, business practice, education and most areas of knowledge in which one can think.
On the one hand, explains in his web Institute IMDEA Networks, a research institute of the Community of Madrid, mass data or large-scale data (Big Data) are an emerging paradigm for managing large amounts of information beyond the capabilities technology that supports traditional databases. On the other hand, cloud computing (Cloud Computing) emerges as a paradigm in distributed computing systems, whose goal is to offer the software as a service over the Internet. Cloud computing offers a flexible delivery model and highly scalable to withstand the demands of storage and computing technologies big data infrastructure. Both technologies converge to offer a huge range of data to explore and to obtain meaningful analysis as well as a growing range of services and resources with applications in any field that is affected by the innovation and development of ICT.
Both cloud computing and Big Data mature rapidly and its use is spreading, but it takes, explains Imdea, a determined effort to create a holistic environment in which both can thrive and achieve their full potential. The scientific objective of Cloud4BigData ambitious project – launched recently by Imdea, the Polytechnic University of Madrid and Universidad Rey Juan Carlos – is to facilitate the convergence of Big Data technologies to their underlying cloud infrastructure to achieve high levels of efficiency, flexibility , scalability, high availability, quality of service, ease of use, security and privacy.
Cloud4BigData unambiguously address the current shortcomings and deficiencies of Big Data and Cloud Computing, also taking advantage of their strengths. From the safe management to the efficient processing of the data, the project aims to combine and integrate differentiated and specialized technologies into one unified platform. The project will also demonstrate their competence in areas of emerging application with very demanding requirements that demand cloud and Big Data technologies such as machine to machine (machine-to-machine) technology, the Internet of Things (IoT – Internet of Things) , smart or intelligent technologies (such as smart Grid, smart grid, smart Cities, smart cities, intelligent transportation, smart transport, etc.) as well as traditional application areas, such as banking, telephony, multimedia communication, distributed simulations, etc., which require functionality beyond the current capabilities of Big Data technologies.
Cloud4BigData is funded by the Community of Madrid, through the Programme for R & D between research groups Technologies 2013, co-financed by the Structural Funds of the European Union. It began in October 2014 and will end in September 2018. Another project of big data, SciServer, supported by the National Science Foundation (NSF) US, aims to build a flexible and long-term ecosystem to provide scientists access huge data sets of observations and simulations.
Alex Szalay, Johns Hopkins University, is the principal investigator of the project, five years specified duration, and the architect of the Scientific Archive Sloan Digital Sky Survey (SDSS), a project to map the entire universe. The latter is where did the SciServer idea explains Szalay. “When the SDSS began in 1998, astronomers had data of less than 200,000 galaxies,” says Ani Thakar, an astronomer at Johns Hopkins, which is part of SciServer team. “Five years after starting SDSS, we had about 200 billion galaxies in our database. Today, SDSS data exceeds 70 terabytes, covering more than 220 million galaxies and 260 million stars.” The Johns Hopkins team created several online tools to access data from SDSS. For example, in the SkyServer site, anyone can navigate through the sky, details about the stars or find objects using multiple criteria. The site also includes educational activities in class lists to allow students to learn science from art data. For more advanced analysis, created Casjobs, where you can run queries of up to eight hours and store the results in a personal database. With each new tool, the user community grew, which led to make new scientific discoveries.
Big data is going to get a big boost from the European Union, which will match an industry consortium’s €2 billion investment with €500 million of public money over the next five years. Companies including Atos, IBM, Nokia Solutions and Networks, Orange, SAP, and Siemens, along with several research bodies, will invest in the public-private partnership (PPP) from January 2015. The partnership will invest in research and innovation in big data fields such as energy, manufacturing and health to deliver services including personalized medicine, food logistics and predictive analytics. Other products could include forecasting crop yields or speeding the diagnosis of brain injuries. This investment will give a boost to the struggling European big data industry, European Commission vice president Neelie Kroes said during a news conference in Brussels. “Europe is trailing behind. Virtually every big data company is from the USA, none are from Europe,” she said. “That has to change and that is why we are putting public money on the table.” The money is needed to help companies process some of the 1.7 million gigabytes of data she said is generated around the world each minute. This data, including climate information, satellite imagery, digital pictures and videos, transaction records and GPS signals should be put to use by European companies, Kroes said. The 25 companies forming the Big Data Value Association also see an immediate need to get their act together and start competing, said association president Jan Sundelin, who is also CEO of the Dutch e-commerce company Tie Kinetix. Europe is one of the largest retail markets in the world, yet non-European companies “know more about our consumers and what we are doing in Europe than we know ourselves,” he said during the news conference, where he invited other companies and startups to take part in the research. The investments in the industry will also support “Innovation Spaces” that will offer secure environments for experimenting with both private and open data, the Commission said. Plans to experiment with private data highlight one of the challenges of working with big data. While today’s datasets are so huge and complex to process that they require new ideas, tools and infrastructures they also require the right legal framework, systems and technical solutions to ensure privacy and security, the Commission said. It will make data protection and pseudonymization mechanisms technical priorities for the project. Further research into technical solutions to integrate privacy- and security- enhancing features will also be funded by the Commission. Moreover, the Commission will work with member states and others to make sure that businesses receive guidance on how to make data anonymous and use pseudonyms to perform personal data risk analyses, while also providing guidance on tools and initiatives available to enhance consumer awareness. The partnership that was signed in Brussels on Monday is the sixth huge EU technology investment program that it hopes will catapult it into global leadership. In July for instance the Commission partnered with the private sector to invest €5 billion in Europe’s electronics sector.