Organizations can make data science a repeatable, predictable tool, which business professionals use to get more value from their data
Enterprise data and AI projects are often scattershot, underbaked, siloed, and not adaptable to predictable business changes. As a result, the vast majority fail. These expensive quagmires can be avoided, and this book explains precisely how.
Data science is emerging as a hands-on tool for not just data scientists, but business professionals as well. Managers, directors, IT leaders, and analysts must expand their use of data science capabilities for the organization to stay competitive. Smarter Data Science helps them achieve their enterprise-grade data projects and AI goals. It serves as a guide to building a robust and comprehensive information architecture program that enables sustainable and scalable AI deployments.
When an organization manages its data effectively, its data science program becomes a fully scalable function that’s both prescriptive and repeatable. With an understanding of data science principles, practitioners are also empowered to lead their organizations in establishing and deploying viable AI. They employ the tools of machine learning, deep learning, and AI to extract greater value from data for the benefit of the enterprise.
By following a ladder framework that promotes prescriptive capabilities, organizations can make data science accessible to a range of team members, democratizing data science throughout the organization. Companies that collect, organize, and analyze data can move forward to additional data science achievements:
Improving time-to-value with infused AI models for common use cases Optimizing knowledge work and business processes Utilizing AI-based business intelligence and data visualization Establishing a data topology to support general or highly specialized needs Successfully completing AI projects in a predictable manner Coordinating the use of AI from any compute node. From inner edges to outer edges: cloud, fog, and mist computing
When they climb the ladder presented in this book, businesspeople and data scientists alike will be able to improve and foster repeatable capabilities. They will have the knowledge to maximize their AI and data assets for the benefit of their organizations.
By:
Neal Fishman,
Cole Stryker
Foreword by:
Grady Booch
Imprint: John Wiley & Sons Inc
Country of Publication: United States
Dimensions:
Height: 231mm,
Width: 185mm,
Spine: 18mm
Weight: 522g
ISBN: 9781119693413
ISBN 10: 1119693411
Pages: 304
Publication Date: 15 May 2020
Audience:
Professional and scholarly
,
Undergraduate
Format: Paperback
Publisher's Status: Active
Foreword for Smarter Data Science xix Epigraph xxi Preamble xxiii Chapter 1 Climbing the AI Ladder 1 Readying Data for AI 2 Technology Focus Areas 3 Taking the Ladder Rung by Rung 4 Constantly Adapt to Retain Organizational Relevance 8 Data-Based Reasoning is Part and Parcel in the Modern Business 10 Toward the AI-Centric Organization 14 Summary 16 Chapter 2 Framing Part I: Considerations for Organizations Using AI 17 Data-Driven Decision-Making 18 Using Interrogatives to Gain Insight 19 The Trust Matrix 20 The Importance of Metrics and Human Insight 22 Democratizing Data and Data Science 23 Aye, a Prerequisite: Organizing Data Must Be a Forethought 26 Preventing Design Pitfalls 27 Facilitating the Winds of Change: How Organized Data Facilitates Reaction Time 29 Quae Quaestio (Question Everything) 30 Summary 32 Chapter 3 Framing Part II: Considerations for Working with Data and AI 35 Personalizing the Data Experience for Every User 36 Context Counts: Choosing the Right Way to Display Data 38 Ethnography: Improving Understanding Through Specialized Data 42 Data Governance and Data Quality 43 The Value of Decomposing Data 43 Providing Structure Through Data Governance 43 Curating Data for Training 45 Additional Considerations for Creating Value 45 Ontologies: A Means for Encapsulating Knowledge 46 Fairness, Trust, and Transparency in AI Outcomes 49 Accessible, Accurate, Curated, and Organized 52 Summary 54 Chapter 4 A Look Back on Analytics: More Than One Hammer 57 Been Here Before: Reviewing the Enterprise Data Warehouse 57 Drawbacks of the Traditional Data Warehouse 64 Paradigm Shift 68 Modern Analytical Environments: The Data Lake 69 By Contrast 71 Indigenous Data 72 Attributes of Difference 73 Elements of the Data Lake 75 The New Normal: Big Data is Now Normal Data 77 Liberation from the Rigidity of a Single Data Model 78 Streaming Data 78 Suitable Tools for the Task 78 Easier Accessibility 79 Reducing Costs 79 Scalability 79 Data Management and Data Governance for AI 80 Schema-on-Read vs. Schema-on-Write 81 Summary 84 Chapter 5 A Look Forward on Analytics: Not Everything Can Be a Nail 87 A Need for Organization 87 The Staging Zone 90 The Raw Zone 91 The Discovery and Exploration Zone 92 The Aligned Zone 93 The Harmonized Zone 98 The Curated Zone 100 Data Topologies 100 Zone Map 103 Data Pipelines 104 Data Topography 105 Expanding, Adding, Moving, and Removing Zones 107 Enabling the Zones 108 Ingestion 108 Data Governance 111 Data Storage and Retention 112 Data Processing 114 Data Access 116 Management and Monitoring 117 Metadata 118 Summary 119 Chapter 6 Addressing Operational Disciplines on the AI Ladder 121 A Passage of Time 122 Create 128 Stability 128 Barriers 129 Complexity 129 Execute 130 Ingestion 131 Visibility 132 Compliance 132 Operate 133 Quality 134 Reliance 135 Reusability 135 The xOps Trifecta: DevOps/MLOps, DataOps, and AIOps 136 DevOps/MLOps 137 DataOps 139 AIOps 142 Summary 144 Chapter 7 Maximizing the Use of Your Data: Being Value Driven 147 Toward a Value Chain 148 Chaining Through Correlation 152 Enabling Action 154 Expanding the Means to Act 155 Curation 156 Data Governance 159 Integrated Data Management 162 Onboarding 163 Organizing 164 Cataloging 166 Metadata 167 Preparing 168 Provisioning 169 Multi-Tenancy 170 Summary 173 Chapter 8 Valuing Data with Statistical Analysis and Enabling Meaningful Access 175 Deriving Value: Managing Data as an Asset 175 An Inexact Science 180 Accessibility to Data: Not All Users are Equal 183 Providing Self-Service to Data 184 Access: The Importance of Adding Controls 186 Ranking Datasets Using a Bottom-Up Approach for Data Governance 187 How Various Industries Use Data and AI 188 Benefi ting from Statistics 189 Summary 198 Chapter 9 Constructing for the Long-Term 199 The Need to Change Habits: Avoiding Hard-Coding 200 Overloading 201 Locked In 202 Ownership and Decomposition 204 Design to Avoid Change 204 Extending the Value of Data Through AI 206 Polyglot Persistence 208 Benefi ting from Data Literacy 213 Understanding a Topic 215 Skillsets 216 It’s All Metadata 218 The Right Data, in the Right Context, with the Right Interface 219 Summary 221 Chapter 10 A Journey’s End: An IA for AI 223 Development Efforts for AI 224 Essential Elements: Cloud-Based Computing, Data, and Analytics 228 Intersections: Compute Capacity and Storage Capacity 234 Analytic Intensity 237 Interoperability Across the Elements 238 Data Pipeline Flight Paths: Preflight, Inflight, Postflight 242 Data Management for the Data Puddle, Data Pond, and Data Lake 243 Driving Action: Context, Content, and Decision-Makers 245 Keep It Simple 248 The Silo is Dead; Long Live the Silo 250 Taxonomy: Organizing Data Zones 252 Capabilities for an Open Platform 256 Summary 260 Appendix Glossary of Terms 263 Index 269
NEAL FISHMAN is a Distinguished Engineer and CTO of Data-Based Pathology at IBM. He is an IBM-certified Senior IT Architect and Open Group Distinguished Chief Architect. COLE STRYKER is a journalist based in Los Angeles. He is the author of Epic Win for Anonymous and Hacking the Future.