Welcome to insideBIGDATA’s “Heard on the Street” round-up column! In this regular feature, we highlight thought-leadership commentaries from members of the big data ecosystem. Each edition covers the trends of the day with compelling perspectives that can provide important insights to give you a competitive advantage in the marketplace. We invite submissions with a focus on our favored technology topics areas: big data, data science, machine learning, AI and deep learning. Enjoy!
How changing the design process plays a huge role in preventing AI bias. Commentary by Holger Kuehnle, Creative Director at Artefact
As artificial intelligence (AI) continues to become more embedded into our daily lives, there is an increased need for organizations to ensure their AI isn’t biased. While many organizations are actively looking to address AI bias, oftentimes, humans are manually filtering out violent, adult, or other explicit content from training content and text prompts. This means humans are defining to some extent what’s explicit or violent, etc., which is a huge source of bias that’s tied to the culture and norms of those making the calls on what content is filtered. There is a fundamental need for us to rethink how we design to ensure that AI bias and inequities don’t continue to creep into products, services and systems. Some ways to address this is for organizations to include diverse communities in all steps of the design process, assess if their products or systems are making assumptions about people’s identities or beliefs, and most importantly hold every team member accountable to the inclusion goals and plans set to ensure there’s shared responsibility amongst all stakeholders.
FBI Warns of Hackers Using Deepfakes to Apply for Remote Positions. Commentary by Stuart Wells, Jumio CTO
Modern day cybercriminals have the knowledge, tools and sophistication to create highly realistic deepfakes, while leveraging stolen personally identifiable information (PII), to pose as real people and deceive companies into hiring them. Posing as an employee, hackers can steal a wide range of confidential data, from customer and employee information to company financial reports. This FBI security warning is one of many that have been reported by federal agencies in the past several months. Recently, the U.S. Treasury, State Department and FBI released an official warning indicating that companies must be cautious of North Korean IT workers pretending to be freelance contractors to infiltrate companies and collect revenue for their country. Organizations that unknowingly pay North Korean hackers potentially face legal consequences and violate government sanctions. As workforce operations remain widely remote or hybrid, many organizations have no way of truly knowing the employees and contractors they are hiring are legitimate candidates. Tougher security measures are needed to detect deepfakes and thwart these highly advanced cybercriminals. Biometric authentication – which leverage a person’s unique human traits to verify identity – is a safe, secure security measure that can be incorporated into the workforce onboarding process and every employee login to guarantee the person signing into their systems is who they claim to be and not a hacker in disguise.
The Importance of Humanizing Digital Interactions and Data Measurements. Commentary by Jim Dwyer, Senior Vice President – Innovation and Transformation at Sutherland
As tech continues to drive our everyday activities, consumers are expecting brands to not only provide a seamless technical experience but also a flawless human experience. However, delivering both will be a challenge for many brands. It’s important that a company utilizes advanced digital tools such as analytics, AI (Artificial Intelligence), cognitive technology and automation, but how does a brand truly engage customers based on human-centered design? By providing their desired customers with a tool or application that overcomes or eases their challenges and creates an emotional connection. A recent study shared that customers who have an emotional relationship with a brand have a 306% higher lifetime value and will likely recommend the company at a rate of 71%, instead of the average 45%. In addition to the statistics above proving the effectiveness of engagement, we must also remember the critical importance of data when measuring a brand’s customer’s digital and human experiences. How will you know your success as a brand if you cannot measure it? Not only should data be captured in real time and easily accessible, but it also needs to be in standardized formats so advanced tools can easily access and analyze the data for insights that can drive deeper engagements and emotional connections. We are in an exciting time where artificial intelligence not only helps us connect with our audience but also helps us quantify the success of our goals, but it’s up to brands to truly ensure they’re utilizing the tools and resources available to us by providing the most exceptional digital human experience.
Proactive vs Reactive Data Quality. Commentary by Gleb Mezhanskiy, founding CEO of Datafold
Data quality has come into the forefront of the modern data stack discussion due to how much leverage is being layered on top of data warehouses. One of the first proposed solutions to data quality was monitoring. Monitoring tools use machine learning to observe data pipelines and send alerts, typically via Slack, when anomalies occur, or trends change drastically. These tools let you know when something looks irregular, but they do not know if this was an expected change or not. This results in data engineers receiving notifications that are complex to triage and may contain false positives. This approach is insufficient and ultimately creates more toil for an organization’s data engineers. Most issues that need to be addressed in data quality are bugs that are accidentally added when analytics engineers are updating models. Quality should and can be proactively dealt with in the pull request so that pipelines don’t break down in the first place. Receiving an alert about a broken production data pipeline is too late. A data quality tool should be proactive and help catch anomalies before they get merged.
RansomHouse Hackers Attack AMD. Commentary by Roshan Piyush, Security Research Engineer at Traceable AI
While ransomware isn’t a new attack method, double extortion is on the rise as hackers seek higher payouts. With this situation, the AMD systems were infiltrated and sensitive files were exfiltrated then used as leverage. The days of keeping bad actors out with prevention-focused solutions like firewalls are long gone. They will one day find a way in, and organizations like AMD, can address this by monitoring behavior on their systems. It’s important to utilize adaptive tools that establish a baseline of how users interact with a network and can flag unusual activity that could be indicative of a malicious attack. There’s a place for prevention today, but it needs to be supported by threat detection to minimize the impact of breach attempts.To add as the stolen data suggests AMD employees were using passwords as simple as ‘password,’ ‘123456’ and ‘Welcome1.’ The attackers could have possibly used credential stuffing (where known or breached credentials from other sources are stuffed on the login page to see which succeeds). It is much less a simple attack and could have been executed by anyone on the internet that can access login entry points to their systems. APIs here play an important role in providing attackers with the access vector, making API observability, monitoring, and rate-limiting important for organizations.
RansomHouse Hackers Attack AMD. Commentary by Gorka Sadowski, chief strategy officer, Exabeam
No matter how robust your security stack is, your organization will still be vulnerable to incidents stemming from compromised credentials. In this case, RansomHouse claims to have compromised AMD due to the use of weak passwords throughout the organization. According to the latest Verizon DBIR, over 80% of breaches involve brute force or the use of lost or stolen credentials. Credentials are interesting assets for bad actors, both to initially access an organization or to establish persistence. Proper training, feedback loops, visibility, and effective technical capabilities are the keys to defending against attacks caused by compromised credentials. A helpful defender capability is the development of a baseline for normal employee behavior that can assist organizations with identifying the use of compromised credentials and related intrusions. If you can establish normal behavior first, only then can abnormalities be known – a great asset in uncovering unknowingly compromised accounts.
AI and Blockchain Come Together for Greater Authenticity. Commentary by Cory Hymel, Director of Blockchain at Gigster
We’re still in the early stages of what is capable with AI and machine learning – and we’ve barely scratched the surface of what is possible with blockchain technology. Companies combining the value of blockchain and AI are seeing incredible benefits for FinServ, supply chains, DApps, and more. Explainable AI is one of the big steps towards improving data integrity and trust in AI models. Blockchain improves explainability by offering authenticity, a digital record for an easy audit trail, and better data security to ensure data integrity. Leveraging blockchain as a means to more easily transfer value between multiple AI system while maining provenance of decisions create incredible new opportunities in the space. As Web3 increases access to more and more data, blockchain-based business networks can benefit from AI for an enhanced ability to process and correlate data with new levels of speed and intelligence. Neither AI or blockchain are overhyped, and their combination will be something to watch out for going forward.
The Next Generation of Deepfakes. Commentary by Rijul Gupta, Co-Founder and CEO of DeepMedia
At this point, most people are likely aware of synthetically generated deepfakes which often appear as viral videos of celebrities making outlandish statements on social media. However, a new use case for the technology is emerging which could threaten national security – deepfake aerial and satellite imagery. These are the next big threat in global intelligence, especially during times of conflict and rampant misinformation on social media. Though satellite images are currently used by governments and companies for a variety of legitimate reasons such as farming and environmental monitoring, adversaries will soon be able to use them to alter the location of roads, trees, buildings, and even the shoreline in satellite/drone videos. The result is a synthetic creation of undetectable, falsified topography. In addition to protecting deepfake datasets, it’s becoming more important than ever to put additional research into deepfake detection methods. To help the advancement of this important research, we recently released a dataset on Github which includes over 1M images of synthetic aerial images.
AI will revolutionize IVF. Commentary by Co-Founder and CEO of Oma Robotics, Gurjeet Singh, PhD
In-Vitro Fertilization (IVF) is the process of creating healthy embryos outside the human body and implanting them to an intended parent. Today, almost 100,000 births in the US happen due to IVF. IVF hasn’t changed in 30 years. It is a manual process, where an embryologist, hunched over a microscope, fertilizes each individual egg with an individual sperm cell. No wonder that 70% of all IVF cycles fail. Looking deeper, the IVF process breaks down to many visual decisions e.g. are the eggs mature enough to be fertilized? Which sperm cell should be used to fertilize each egg? Are the embryos developing at a healthy rate? Which cells should be taken from an embryo for a biopsy? What is the grade of each embryo? Which embryo should be implanted first? The upshot is that the results of IVF between labs and between embryologists vary significantly. A lot. There are labs that achieve a 30% rate of success on an average and there are labs that achieve a 65% rate of success. These are decisions based on vision and all the operations are based on manual dexterity. AI and robotics will revolutionize these by helping embryologists make the correct decisions consistently. Several companies are developing AI for embryo selection, and it’s feasible that AI will also be used to help identify the most promising sperm cell in a semen sample, to help evaluate the viability of an embryo, and/or to help ensure eggs are ready for fertilization.
AI’s Growth in Customer Service Elevating Satisfaction. Commentary by Rob McDougall, CEO of Upstream Works Software
The early hype around AI applications in the contact center indicated that it would replace agents. But, the reality is there will always be a need for human engagement, and AI applications are best suited to help and support agents. They can automate rules-based and data-collection tasks that don’t require a human agent, so the agent can focus on providing real value to the customer experience that only human-led support can provide. As staffing is one of the cornerstone problems of post-pandemic business, AI applications can help agents get up to speed faster and augment their capabilities with tools that enable more efficient and meaningful CX – and ultimately improve the agent’s job and reduce the risk of attrition. Artificial intelligence applications are best at complex information processing. The key is to set realistic expectations for AI projects and focus on the needs of the contact center to effectively address each problem without creating new silos. Augmenting agents’ abilities without adding unnecessary application complexity is central to improving the customer and agent experience with AI.
The ABCs of data mesh implementation. Commentary by Juan Sequeda, Principal Scientist at data.world
Data is the engine of modern businesses, but most enterprises are still struggling to find the optimal way to get the most value out of their data. The popularity of data mesh has increased dramatically over the past year, and with it a set of common issues during implementation. The Data Product ABCs — a framework developed by the team at data.world — is an emerging method to prevent these issues from arising in the first place. It includes questions every data leader should ask over five areas where stumbling blocks typically arise during data mesh implementation: Accountability, Boundaries, Contracts and Expectations, Downstream Consumers, and Explicit Knowledge. The questions themselves are essential building blocks to help scale a data mesh approach, such as: Who is responsible for this data? What is the data? What are the sharing agreements and policies? Who are the current consumers? What is the meaning of the data? Getting the answers to these questions at the start of implementation, throughout the process, and continually as an enterprise adapts is the solid foundation for enterprise data management.
Preparing for the Dog Days of Summer with Data Observability. Commentary by Rohit Choudhary, CEO, and Co-Founder of Acceldata
As many of us across the nation are experiencing right now, this summer is expected to be hotter than average across most of the US. With rising temperatures comes increased power consumption and potential blackouts as a result of an overwhelmed and antiquated power grid. Texas has already broken the record for power demand with air conditioners being the main culprit behind stressing out the state’s electric grid. During extreme weather events, we see just how unprepared the energy industry is for the surges in demand. Technology such as data observability, which offers an end-to-end view of the data pipeline, can ensure that energy companies have access to reliable data, and that they are equipped with usable, quality data about energy usage to prepare for and avoid spikes and subsequent outages. When something in the data pipeline breaks, business shuts down. However, by using data observability, activity spikes and potential data irregularities are immediately identifiable. Monitoring and management capabilities can be applied to set thresholds so that a potential issue can be addressed before an outage occurs. While Mother Nature can’t be controlled, the energy industry can leverage modern technology to prepare for and predict potential data disasters during the dog days of summer.
Analytics and data science are joining together. Commentary by Todd Mostak, CTO and co-Founder of HEAVY.AI
Traditionally, analytics and data science have been treated as two distinct disciplines, but the lines are blurring between the two and they’ve begun to converge – that’s a very positive development. Ultimately, the purpose of both analysts and data scientists is to uncover risks, opportunities and learnings and deliver new business value. Organizations just want actionable insights and answers from their data. The convergence between analytics and data science makes that easier, allowing different experts to work together toward a common goal. This trend allows organizations to get a more holistic perspective of a growing number of data sources. Analysts and data scientists can combine different workflows that have been siloed in the past. For example, Business Intelligence (typically done by general business analysts) and ML workflows (data scientists). By joining these workflows together, enterprises can significantly improve operational efficiency.
Secure in delivery. Commentary by Prakash Sethuraman, CISO at CloudBees
According to a recent survey, 95% of executive respondents think their software supply chain is secure, yet 58% say that if they experienced a software supply chain vulnerability, they have no idea what their company would do. While their confidence is high, they are simply unprepared. Far too many companies still think of security and compliance as a point-in-time activity, but in reality these functions need to be continuous. Security and compliance should be built into every stage of the software lifecycle – development, delivery, and production. Being “secure in development” means ensuring that your code is clean from the start by embedding security validation measures early in your development process. While secure in development resembles shifting left, shift left ignores the rest of the software supply chain and places too much burden on the developer while not dedicating enough resources to being secure in delivery and production as well. “Secure in delivery” is focused on controlling for all the things that can go wrong in the delivery process aside from the code itself. To ensure your code is secure in delivery, it’s important to automate everything, create access and privilege controls for the code and the pipeline itself, and create and update a catalog of immutable objects. The last essential stage is “secure in production,” which is the ability to keep track of an application—and the environment it’s running in—after it’s released. Even after code is deployed, you should still keep track of code because its connectedness is what makes a software supply chain whole and secure. In essence, an ideal approach is a continuous approach; a holistic process where application security is embedded throughout the software delivery supply chain.
How the combination of AI and human intelligence enhance logistics planning capabilities. Commentary by Marc Meyer, Chief Commercial Officer at Transmetrics
Today, most of the organizations have an extensive pool of data that they can leverage to improve their operations and planning. The logistics industry is just another great example of that – there is so much transactional and IoT data coming from sensors, sorting stations, vehicles, etc. that it becomes impossible for human brain to plan the most optimal and efficient route for every shipment that the organization transports. Given this context, AI can enhance humans’ capabilities and improve the efficiency of logistics planning. By automating tedious, repetitive and time consuming data-related tasks, AI can enable planners to focus on delivering service excellence. For example, the current logistics planning mostly relies on legacy solutions such as Excel, and different planning departments of the same organization might not have access to the single source of truth, leading to errors and inefficiencies in planning. With AI-powered systems, all the planning work can be centralized and calculations can be performed in the background, so planners can choose the most optimal scenarios for transporting an item. On top of that, at the core of the synergy between AI and Human intelligence lies the “Human-in-the-Loop” concept. Essentially, it means that planners always have a choice not to accept the system suggestions, or to update the data in the system based on their experience or upcoming business objectives that the system might not be aware of. Considering that the whole industry is suffering from a lack of talent, combining Human and Artificial Intelligence can bring financial and environmental benefits to organizations without the need to grow their teams.
What’s The Likelihood That AI Will Ever Become Sentient? Commentary by Justin Harrison, Founder and CEO of YOV, Inc. (You, Only Virtual)
The line between synthetic voices and real ones has been blurring for years but AI is never going to be ‘sentient’ in the ways we’re being made to be afraid of now. A major qualifier for something being sentient is being aware of its own existence and impending death— and that’s strongly based upon ingrained imperatives that a machine simply doesn’t have. Furthermore, all human and animal motivation is based on emotion. Emotions are based on biological directives, so even if a program became aware of its own existence, it would be devoid of the kind of motivations we attribute to ‘sentient’ beings.
Sign up for the free insideBIGDATA newsletter.
Join us on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1