Welcome!

You will be redirected in 30 seconds or close now.

ColdFusion Authors: Yakov Fain, Jeremy Geelan, Maureen O'Gara, Nancy Y. Nee, Tad Anderson

Related Topics: @BigDataExpo, @CloudExpo, @ThingsExpo

@BigDataExpo: Blog Feed Post

Is Data Science Really Science? | @BigDataExpo #BigData #Analytics #DataScience

Science works within systems of laws such as the laws of physics, thermodynamics, mathematics, electromagnetism

My son Max is home from college and that always leads to some interesting conversations.  Max is in graduate school at Iowa State University where he is studying kinesiology and strength training.  As part of his research project, he is applying physics to athletic training in order to understand how certain types of exercises can lead to improvements in athletic speed, strength, agility, and recovery.

Figure 1:  The Laws of Kinesiology

Max was showing me one drill designed to increase the speed and thrust associated with jumping (Max added 5 inches to his vertical leap over the past 6 weeks, and can now dunk over the old man).  When I was asking him about the science behind the drill, he went into great details about the interaction between the sciences of physics, biomechanics and human anatomy.

Max could explain to me how the laws of physics (the study of the properties of matter and energy.), kinesiology (the study of human motion that mainly focuses on muscles and their functions) and biomechanics (they study of movement involved in strength exercise or in the execution of a sport skill) interacted to produce the desired outcomes.  He could explain why it worked.

And that is the heart of my challenges with treating data science as a science.  As a data scientist, I can predict what is likely to happen, but I cannot explain why it is going to happen.  I can predict when someone is likely to attrite, or respond to a promotion, or commit fraud, or pick the pink button over the blue button, but I cannot tell you why that’s going to happen.  And I believe that the inability to explain why something is going to happen is why I struggle to call “data science” a science.

Okay, let the hate mail rain down on me, but let me explain why this is an important distinction!

What Is Science?
Science
is the intellectual and practical activity encompassing the systematic study of the structure and behavior of the physical and natural world through observation and experiment.

Science works within systems of laws such as the laws of physics, thermodynamics, mathematics, electromagnetism, aerodynamics, electricity (like Ohm’s law), Newton’s law of motions, and chemistry.  Scientists can apply these laws to understand why certain actions lead to certain outcomes.  In many disciplines, it is critical (life and death critical in some cases) that the scientists (or engineers) know why something is going to occur:

  • In pharmaceuticals, chemists need to understand how certain chemicals can be combined in certain combinations (recipes) to drive human outcomes or results.
  • In mechanical engineering, building engineers need to know how certain materials and designs can be combined to support the weight of a 40 story building (that looks like it was made out of Lego blocks).
  • In electrical engineering, electrical engineers need to understand how much wiring, what type of wiring and the optimal designs are required to support the electrical needs of buildings or vehicles.

Again, the laws that underpin these disciplines can be used to understand why certain actions or combinations lead to predictable outcomes.

Big Data and the “Death” of Why
An article by Chris Anderson in 2006 titled “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete” really called into question the “science” nature of the data science role.  The premise of the article was that the massive amounts of data were yielding insights about the human behaviors without requiring the heavy statistical modeling typically needed when using sampled data sets.  This is the quote that most intrigued me:

“Google conquered the advertising world with nothing more than applied mathematics. It didn’t pretend to know anything about the culture and conventions of advertising — it just assumed that better data, with better analytical tools, would win the day. And Google was right.”

With the vast amounts of detailed data available and high-powered analytic tools, it is possible to identify what works without having to worry about why it worked.  Maybe when it comes to human behaviors, there are no laws that can be used to understand (or codify) why humans take certain actions under certain conditions.  In fact, we already know that humans are illogical decision-making machines (see “Human Decision-Making in a Big Data World”).

However, there are some new developments that I think will require “data science” to become more like other “sciences.”

Internet of Things and the “Birth” of Why
The Internet of Things (IOT) will require organizations to understand and codify why certain inputs lead to predictable outcomes.  For example, it will be critical for manufacturers to understand and codify why certain components in a product break down most often, by trying to address questions such as:

  • Was the failure caused by the materials used to build the component?
  • Was the failure caused by the design of the component?
  • Was the failure caused by the use of the component?
  • Was the failure caused by the installation of the component?
  • Was the failure caused by the maintenance of the component?

As we move into the world of IoT, we will start to see increased collaboration between analytics and physics.  See what organizations like GE are doing with the concept of “Digital Twins”.

The Digital Twin involves building a digital model, or twin, of every machine – from a jet engine to a locomotive – to grow and create new business and service models through the Industrial Internet.[1]

Digital twins are computerized companions of physical assets that can be used for various purposes. Digital twins use data from sensors installed on physical objects to represent their real-time status, working condition or position.[2]

GE is building digital models that mirror the physical structures of their products and components.  This allows them to not only accelerate the development of new products, but allows them to test the products in a greater number of situations to determine metrics such as mean-time-to-failure, stress capability and structural loads.

As the worlds of physics and IoT collide, data scientist will become more like other “scientists” as their digital world will begin to be governed by the laws that govern disciplines such as physics, aerodynamics, chemistry and electricity.

Data Science and the Cost of Wrong
Another potential driver in the IoT world is the substantial cost of being wrong.  As discussed in my blog “Understanding Type I and Type II Errors”, the cost of being wrong (false positives and false negatives) has minimal impact when trying to predict human behaviors such as which customers might respond to which ads, or which customers are likely to recommend you to their friends.

However in the world of IOT, the costs of being wrong (false positives and false negatives) can have severe or even catastrophic financial, legal and liability costs.  Organizations cannot afford to have planes falling out of the skies or autonomous cars driving into crowds or pharmaceuticals accidently killing patients.

Summary
Traditionally, big data historically was not concerned with understanding or quantifying “why” certain actions occurred because for the most part, organizations were using big data to understand and predict customer behaviors (e.g., acquisition, up-sell, fraud, theft, attrition, advocacy).  The costs associated with false positives and false negatives were relatively small compared to the financial benefit or return.

And while there may never be “laws” that dictate human behaviors, in the world of IOT where organizations are melding analytics (machine learning and artificial intelligence) with physical products, we will see “data science” advancing beyond just “data” science.  In IOT, the data science team must expand to include scientists and engineers from the physical sciences so that the team can understand and quantify the “why things happen” aspect of the analytic models.  If not, the costs could be catastrophic.

[1] https://www.ge.com/digital/blog/dawn-digital-industrial-era

[2] https://en.wikipedia.org/wiki/Digital_Twins

The post Is Data Science Really Science? appeared first on InFocus Blog | Dell EMC Services.

Read the original blog entry...

More Stories By William Schmarzo

Bill Schmarzo, author of “Big Data: Understanding How Data Powers Big Business”, is responsible for setting the strategy and defining the Big Data service line offerings and capabilities for the EMC Global Services organization. As part of Bill’s CTO charter, he is responsible for working with organizations to help them identify where and how to start their big data journeys. He’s written several white papers, avid blogger and is a frequent speaker on the use of Big Data and advanced analytics to power organization’s key business initiatives. He also teaches the “Big Data MBA” at the University of San Francisco School of Management.

Bill has nearly three decades of experience in data warehousing, BI and analytics. Bill authored EMC’s Vision Workshop methodology that links an organization’s strategic business initiatives with their supporting data and analytic requirements, and co-authored with Ralph Kimball a series of articles on analytic applications. Bill has served on The Data Warehouse Institute’s faculty as the head of the analytic applications curriculum.

Previously, Bill was the Vice President of Advertiser Analytics at Yahoo and the Vice President of Analytic Applications at Business Objects.

@ThingsExpo Stories
Internet of @ThingsExpo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with the 21st International Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. @ThingsExpo Silicon Valley Call for Papers is now open.
SYS-CON Events announced today that Ocean9will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Ocean9 provides cloud services for Backup, Disaster Recovery (DRaaS) and instant Innovation, and redefines enterprise infrastructure with its cloud native subscription offerings for mission critical SAP workloads.
Five years ago development was seen as a dead-end career, now it’s anything but – with an explosion in mobile and IoT initiatives increasing the demand for skilled engineers. But apart from having a ready supply of great coders, what constitutes true ‘DevOps Royalty’? It’ll be the ability to craft resilient architectures, supportability, security everywhere across the software lifecycle. In his keynote at @DevOpsSummit at 20th Cloud Expo, Jeffrey Scheaffer, GM and SVP, Continuous Delivery Busine...
SYS-CON Events announced today that Enzu will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY, and the 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Enzu’s mission is to be the leading provider of enterprise cloud solutions worldwide. Enzu enables online businesses to use its IT infrastructure to their competitive ad...
Everywhere we turn in our industry we can find strong opinions about the direction, type and nature of cloud’s impact on computing and business. Another word that is used in every context in our industry is “hybrid.” In his session at 20th Cloud Expo, Alvaro Gonzalez, Director of Technical, Partner and Field Marketing at Peak 10, will use a combination of a few conceptual props and some research recently commissioned by Peak 10 to offer a real-world consideration of how the various categories of...
SYS-CON Events announced today that Interoute has been named “Bronze Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Interoute is the owner operator of Europe's largest network and a global cloud services platform, which encompasses over 70,000 km of lit fiber, 15 data centers, 17 virtual data centers and 33 colocation centers, with connections to 195 additional partner data centers. Our full-service Unifie...
SYS-CON Events announced today that CollabNet, a global leader in enterprise software development, release automation and DevOps solutions, will be a Bronze Sponsor of SYS-CON's 20th International Cloud Expo®, taking place from June 6-8, 2017, at the Javits Center in New York City, NY. CollabNet offers a broad range of solutions with the mission of helping modern organizations deliver quality software at speed. The company’s latest innovation, the DevOps Lifecycle Manager (DLM), supports Value S...
SYS-CON Events announced today that SoftLayer, an IBM Company, has been named “Gold Sponsor” of SYS-CON's 18th Cloud Expo, which will take place on June 7-9, 2016, at the Javits Center in New York, New York. SoftLayer, an IBM Company, provides cloud infrastructure as a service from a growing number of data centers and network points of presence around the world. SoftLayer’s customers range from Web startups to global enterprises.
SYS-CON Events announced today that Peak 10, Inc., a national IT infrastructure and cloud services provider, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Peak 10 provides reliable, tailored data center and network services, cloud and managed services. Its solutions are designed to scale and adapt to customers’ changing business needs, enabling them to lower costs, improve performance and focus intern...
Multiple data types are pouring into IoT deployments. Data is coming in small packages as well as enormous files and data streams of many sizes. Widespread use of mobile devices adds to the total. In this power panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, panelists will look at the tools and environments that are being put to use in IoT deployments, as well as the team skills a modern enterprise IT shop needs to keep things running, get a handle on all this data, and deli...
SYS-CON Events announced today that Linux Academy, the foremost online Linux and cloud training platform and community, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Linux Academy was founded on the belief that providing high-quality, in-depth training should be available at an affordable price. Industry leaders in quality training, provided services, and student certification passes, its goal is to c...
With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend @CloudExpo | @ThingsExpo, June 6-8, 2017, at the Javits Center in New York City, NY and October 31 - November 2, 2017, Santa Clara Convention Center, CA. Learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.
SYS-CON Events announced today that Loom Systems will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Founded in 2015, Loom Systems delivers an advanced AI solution to predict and prevent problems in the digital business. Loom stands alone in the industry as an AI analysis platform requiring no prior math knowledge from operators, leveraging the existing staff to succeed in the digital era. With offices in S...
SYS-CON Events announced today that EARP Integration will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. EARP Integration is a passionate software house. Since its inception in 2009 the company successfully delivers smart solutions for cities and factories that start their digital transformation. EARP provides bespoke solutions like, for example, advanced enterprise portals, business intelligence systems an...
We build IoT infrastructure products - when you have to integrate different devices, different systems and cloud you have to build an application to do that but we eliminate the need to build an application. Our products can integrate any device, any system, any cloud regardless of protocol," explained Peter Jung, Chief Product Officer at Pulzze Systems, in this SYS-CON.tv interview at @ThingsExpo, held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA
SYS-CON Events announced today that Hitachi Data Systems, a wholly owned subsidiary of Hitachi LTD., will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City. Hitachi Data Systems (HDS) will be featuring the Hitachi Content Platform (HCP) portfolio. This is the industry’s only offering that allows organizations to bring together object storage, file sync and share, cloud storage gateways, and sophisticated search and...
SYS-CON Events announced today that Progress, a global leader in application development, has been named “Bronze Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Enterprises today are rapidly adopting the cloud, while continuing to retain business-critical/sensitive data inside the firewall. This is creating two separate data silos – one inside the firewall and the other outside the firewall. Cloud ISVs ofte...
SYS-CON Events announced today that Fusion, a leading provider of cloud services, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Fusion, a leading provider of integrated cloud solutions to small, medium and large businesses, is the industry’s single source for the cloud. Fusion’s advanced, proprietary cloud service platform enables the integration of leading edge solutions in the cloud, including cloud...
SYS-CON Events announced today that delaPlex will exhibit at SYS-CON's @CloudExpo, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. delaPlex pioneered Software Development as a Service (SDaaS), which provides scalable resources to build, test, and deploy software. It’s a fast and more reliable way to develop a new product or expand your in-house team.
SYS-CON Events announced today that Progress, a global leader in application development, has been named “Bronze Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Enterprises today are rapidly adopting the cloud, while continuing to retain business-critical/sensitive data inside the firewall. This is creating two separate data silos – one inside the firewall and the other outside the firewall. Cloud ISVs oft...