Interview with Tony Falco of Hydrolix: The Real AI Bottleneck Is Data Retention
Three years back, as companies rushed to test ChatGPT in their operations, Tony Falco was puzzling over something else: where do AI models get their memory?
Falco runs operations at Hydrolix as COO. Through 2023, he kept seeing the same story. Companies would pick a model, run pilots, love the initial results, then discover the storage costs for training data made the whole project unworkable. Most vendors responded by adding AI features to their existing tools. Falco took Hydrolix in another direction: make it economically possible to keep years of complete log data online and ready to query.
The approach worked. Hydrolix went from 4 customers to over 650 in 24 months. Fox, ABC, and Paramount now rely on the platform for live event broadcasts. We interviewed Falco because while the industry fixates on which AI model performs best, he’s spent two years tackling a more fundamental question: can companies actually afford the data infrastructure their AI needs?
In this conversation, he explains why data sampling undermines model accuracy, why Hydrolix built its business around customer choice rather than vendor lock-in, and what hundreds of production deployments have shown him about where AI hype meets operational reality.
Foundation: Understanding Hydrolix and the data infrastructure challenge facing AI deployments
1. For readers who aren’t familiar with Hydrolix, can you start by explaining what your company does and what problem you’re solving in the data infrastructure space?
While our story goes back to 2008, it wasn’t until about 6 years ago when founders Marty Kagan and Hasan Alayi had the right team in place to conquer the overwhelming issue: how to manage the rising cost of storing data while prolonging the amount of time an immense amount of data needed to be stored in order to be effective. Hydrolix creates solutions to contain the massive databases needed to build successful and secure companies.
It may sound cliche, but our company values have really shaped every aspect of how we approach business. We have 5 values we adhere to: Family First, Build What Matters, Transparent By Default, Lead By Example, and Clear The Way. While they’re all important, the overarching philosophy is when one succeeds, we all succeed.
2. Hydrolix positions itself as the ‘infrastructure’ for AI. Can you explain the ‘Cost-Performance Paradox’ you often mention – why traditional platforms are failing – and how Hydrolix’s architecture specifically solves this?
Unlike Hydrolix, traditional platforms find themselves struggling to handle real-time performance, affordability, and the massive amount of data. They can’t handle time-stamped data logs with high cardinality fields.
This type of data (data that has many labels per entry), is the fastest growing need, especially with increased AI usability. Hydrolix’s architecture helps companies manage optimized data logs with performance and affordability solutions. We don’t make companies pick two out of three because our architecture decouples the components and removes the “indexing tax” that usually comes with high data retention. Put simply: a complete dataset equates to a more informed AI model.
3. Industry research shows 38% of data specialists struggle to extract actionable insights from log data. A major culprit is data ‘sampling’ – where companies keep only 10-20% of their data to save money. Can you explain to readers how this sampling creates bias that ruins AI model accuracy?
Saving money is a short-sighted way of viewing this common situation. “Intelligence sampling,” like fat free ice cream, cuts out a lot of the good stuff. Companies actually need full datasets in order to understand patterns, aid in decision-making, and distinguish real users from bots or DDoS attacks. Data sampling creates biases and wrongly trains AI models, which can lead to crude versus distinctive AI reasoning.
We’ve done studies on the real cost of data sampling and we call it “The Cost Paradox.” What we’ve seen is that some companies still rely on 2005 costs that scaled as high as $1,000 per terabyte. However, Hydolix’s infrastructure is able to provide a 97% cost reduction (about $20-30 per terabyte/month) and companies have access to their full dataset.
Strategic Positioning: How Hydrolix fits into the AI landscape and your “Bring Your Own Model” philosophy
4. You’ve grown from 4 to over 650 customers in just 24 months – remarkable growth. As you scaled, how did you think about integrating AI into both your product and your operations? Where did AI fit into your growth strategy?
AI was in infancy when we started solving data problems for customers. Incidentally, our business model fits solutions around our clients – we lead with the client’s preferred technology and software. For example, we don’t just store data, we unify it. We break down data silos across platforms like Zendesk and Slack by pulling all of that data together, in one place, so customers see the whole picture, enabling them to solve their user issues and debug faster.
We’ve also grown because from the very start, we lean into our client’s Virtual Private Cloud (VPC) architecture versus moving massive datasets, costing more money, to external AI tools. With a VPC strategy, barriers are removed and clients have full visibility and control for security purposes.
5. Before offering AI to customers, you implemented it internally – giving every employee access to at least one approved AI model. Why did you start there, and what did you learn from internal adoption that shaped your customer offerings?
What became immediately clear to us is how important the human perspective is. No matter which AI tool is used, the result has to be human approved. Personally, Claude Anthropic and Skills are essential for engineers and OpenAI for deep research, but I’d never just copy/paste from AI. You have to be accountable for what you’re sharing and that includes reviewing what AI delivers back to you. It can be too easy to generate slop and we made it a rule of thumb to understand what results AI is producing.
6. That brings us to your core philosophy. You’ve chosen a “Bring Your Own Model” (BYOM) approach. Why did you decide against locking customers into a proprietary Hydrolix AI model?
We serve our customers by being a vital part of a larger ecosystem. Providing solutions built around our customer’s VPC and pre-existing tech allows us to deliver a unique value proposition centered on data sovereignty.
Customers save money with a BYOM by avoiding egress fees and keeping their own specialised tools like SageMaker that integrate seamlessly with their Hydrolix repositories. This also ensures sensitive data isn’t passed between third-parties, allowing highly regulated industries maintain top security and compliance.
We’re focused on the three factors that matter most to our clients: peak performance, affordability, and optimized data. In order to deliver all three, it’s our job to work with our customers, not the other way around.
7. Your slogan is ‘You’ve got the models, we’ll handle the logs.’ With your ability to keep petabytes of data hot for up to two years with 20x-50x compression, are you essentially providing the historical context and persistent data foundation that AI models need to make accurate predictions?
Exactly. Hydrolix provides the hot storage data lake that enables companies years of searchable data quickly. This is the only way to build an AI model that can pinpoint fractures in patterns, security alerts, fraud, and optimize user experiences online. For example, because our data remains “hot,” companies can more accurately predict user behavior based on longer data patterns (the antithesis of data sampling).
Hydrolix doesn’t just handle logs, we’re enforcing trustworthy pattern recognition and predictive optimization to keep AI models as accurate as possible.
Technical Depth. Responsible AI implementation and when NOT to use AI
8. You have two strong principles that push back against AI hype: security teams need ‘traceable facts, not just a model’s opinion,’ and ‘LLMs make pronouncements when they should talk probabilistically.’ Can you walk readers through a specific security scenario where these issues mattered, and how Hydrolix addresses these AI limitations?
An LLM is designed to convince users that something based on probabilities is fact. However, their reliance on probabilistic patterns can lead to false conclusions. An example of this is in a high-stakes security environment with a security breach. If a model labels a traffic spike as a bot attack when it isn’t, this could be an expensive and unnecessary response. That’s why we rely on humans to verify evidence instead of leaving it up to AI models.
True machine learning like isolation forests and other unsupervised learning techniques help identify these events more accurately. Hydrolix strictly follows this rule: AL should suggest, but data must prove. A human must look at the complete data and can review some interpretation, but don’t rely on it as conclusive until security teams review facts. This goes back to not blindly believing AI.
9. Rather than relying solely on large language models, you combine traditional unsupervised learning techniques with LLMs. What’s the advantage of this hybrid approach, and when does traditional machine learning actually outperform generative AI?”
While there are many pros to having a hybrid approach, the main advantage of combining unsupervised learning with LLMs is bringing together system-created data and human-readable insights.
By using traditional machine learning as a high-speed filter, we establish standard baselines that prevent the outliers often found in generative models while significantly reducing computational costs. Once the unsupervised learning identifies a genuine outlier, the LLM acts as the investigative layer, translating complex statistical shifts into actionable, human-readable insights.
This strategy is uniquely powered by the Hydrolix architecture, as our ability to store 100% of historical data ensures that the underlying models are trained on entire datasets to see the big picture versus a potentially misleading sample.
10. “In an industry where everyone is adding ‘AI-powered’ to everything, you’ve emphasized being ‘very careful about when it’s appropriate to use AI and when it isn’t.’ What’s an example where you deliberately chose NOT to use AI, and what was your reasoning?”
I think our company draws a very clear line between what we do and what our customer does when it comes to outcome expectations. Hydrolix delivers a reliable, efficient data repository that’s compressed but immediately accessible. We provide the tools and organizational solutions to our customers that best serve their needs.
Our customer’s job is to use their preferred AI models to distill the data and draw conclusions from it. They know their industry, and therefore it’s only best for them to audit and label that data.
Data Requirements: What AI models actually need to deliver accurate results
11. Let’s talk about what AI models actually need to be accurate. First, you’ve mentioned they need ‘a minimum of one full seasonality cycle’ of historical data – can you break down what this means? And second, what happens when companies try to cut corners by training on sampled data instead of complete datasets? What real-world consequences have you seen from sampling bias, selection bias, and class imbalance?
First, we don’t believe that customers get to us with mistakes. We look at how sampling can be based on a few biases that impact the outcome of the data. For example, sampling bias, selection bias, and class imbalances all dilute what is 100% truth into scaled down assumptions.
That’s why we recommend 15-months of data curation as a default because decision-makers usually want to compare data year-over-year to identify patterns that influence their business. For example, a travel site may want to see what time of year people go on vacation, where they go, how much money they’re spending, etc. This information replaces the gist of data with firm, fact-based information for informed decision-making.
Real-World Impact: Concrete use cases from bot detection to live event optimization
12. Let’s make this concrete with a real use case. You’ve described using Hydrolix with AWS SageMaker and Bedrock to build custom bot detection systems. Can you walk readers through how this works in practice – what does a company see, what actions can they take, and what would have happened without AI?
In practice, Hydrolix acts as the high-fidelity data foundation for the AWS AI ecosystem, allowing companies to build an overview of human behavior, creating normal digital footprints. By feeding 100% of your unsampled logs into AWS SageMaker, we train custom models that learn the baselines for our customers. AI models are tracking things like browsing, time spent on product details pages, and checkout speed. Once an anomaly is detected, AWS Bedrock takes over as the investigative layer, translating that complex technical data into a plain-English incident report that explains why a specific user was flagged as a bot.
Without this AI-driven approach, companies are forced to rely on non-dynamic rules that hackers easily bypass by mimicking basic human behaviors. In a traditional environment, security teams are often overwhelmed by “false positives” or miss the “slow-burn” bots that operate just below the threshold of old-school detection scripts. By moving from reactive rules to predictive AI models, our customers can identify and block sophisticated fraud in real-time, preventing financial loss before a single illegitimate transaction is finalized.
The interesting part is that each company requires their own detailed “normal” and “abnormal” schematic to provide the correct data to identify fraud. And since Hydrolix’s infrastructure integrates with our customer’s ecosystems, there’s a shorter learning curve and faster onboarding.
13. You claim this bot detection approach reduces ‘days of analysis to minutes.’ Can you share actual numbers from a customer deployment? How many days became how many minutes, and what threat did they catch that they would have missed with their old approach?
I’ll give you a real customer example of what and why immediate hot data is needed.
Elkjøp, the largest consumer electronics retailer in the Nordic countries, had a DDoS attack during their Black Friday campaign’s peak traffic. When they discussed the incident with us, they said, “The entire event from spotting to stopping the attack was instant. No sites went out of service and none of our customers experienced any impact whatsoever.”
14. Your other major use case is optimizing ad performance for media companies – Fox, ABC, and Paramount are customers. When you’re optimizing which ads to serve in real-time during a live event using AI, what kind of revenue lift are we talking about? What does this mean for a major sporting event or awards show?
Advertising dollars are expensive, and in live broadcasting, every second of an interrupted stream or a misfired ad is a direct hit to the result. While Hydrolix doesn’t make the creative decisions, we’re depended upon to provide the system that media giants rely on to ensure their monetization strategies actually land. By immediately identifying and fixing digital errors in real-time raw logs, media companies can capture millions in previously lost revenue that would be invisible in a sampled dashboard.
Better data can also protect media buyers by distinguishing real human users from the bots and pirated streams that siphon away legitimate audiences. Our architecture allows broadcasters to monitor CDN traffic at a scale of millions of rows per second, preserving the high CPMs that premium live events command. This ensures that every advertising dollar is spent on intent-based views, turning raw log data into a verified insurance policy for live revenue.
Looking Ahead :Your roadmap and lessons learned from 650+ deployments
15. Your roadmap for the next few quarters includes AI SQL generation, where anyone can ask questions in plain English and get accurate queries. For readers who aren’t data engineers, can you paint a picture of how this changes who can access insights? What questions could a marketing manager ask that they can’t today?
One of the biggest benefits of AI is the democratization of certain types of intelligences. An example of this is the stock market. More people are investing with the help of AI to better understand their buys. In our case, our AI SQL generation is making it easier for someone to use AI like they would Google.
Here’s an example: Let’s say a marketing manager wanted to know every instance an article they ran was viewed, how long it was viewed, and any information about that article that could be correlated back to show performance indicators. Data could show if there were 503s, 404s, slow loading times, and other issues that made the article ineffectual.
Removing barriers not only allows everyone and anyone to get deeper-level data, it generates answers in layman terms that can also be understood by the user.
16. You’ve been in the trenches implementing AI – both inside Hydrolix with your own teams and across more than 650 customer deployments spanning bot detection, ad optimization, security operations, and more. What surprised you most about what actually works versus what you initially thought would work? And looking back, what’s the biggest mistake you made early on that you’d want other companies to avoid?
I think one thing that I’ve been surprised by is the irreplaceability of human engagement. That topic is discussed often within AI, but, an actual person is needed to pay attention to results delivered by AI. While it empowers workload and there are many benefits, there’s still incredible value in talking to people, interacting with people, business processes owned by people, and so on and so on.
AI is not a cheat code to avoid human contact or replace human accountability, and I see big companies making that mistake. It takes humans to see the big picture and integrating AI models into that picture also demands humans that understand what other humans need.
Origianl Creator: Genaro Palma
Original Link: https://justainews.com/industries/it-and-technology/tony-falco-hydrolix-interview-ai-bottleneck-data-retention/
Originally Posted: Wed, 04 Feb 2026 12:06:14 +0000












What do you think?
It is nice to know your opinion. Leave a comment.