Infogain at Microsoft Ignite 2024

Empowering an AI Transformation at Microsoft Ignite

We joined over 14,000 attendees at this year’s Ignite in in Chicago, November 18-22. The theme focused on how Microsoft is empowering their customers and partners through AI transformation. Microsoft Ignite is an amazing event, giving IT professionals the opportunity to ramp up their skillsets with an array of more than 800 sessions, demos and expert-led labs from Microsoft and their partners.

This year, there was buzz and an excitement particularly around Copilot and Security. The event also showcased cutting-edge advancements in AI, cloud computing, hybrid work solutions and accessibility. Generative AI has taken on an expanded role in Microsoft, with tools like CoPilot transforming productivity across M365 and Azure.

The real-time speech and video translation innovations within MS Teams were impressive. Individuals can now communicate across Teams calls in their preferred language and receive a translation to the recipient’s language in real-time. This demo was fascinating to see live!

The latest trends included an emphasis in hybrid collaboration technologies, seamless connectivity, and integrated solutions, all with security underpinning everything. For example, new products like Microsoft Security Exposure Management provides more proactive security monitoring by unifying disparate data silos for visibility of end-to-end attack surface and automates the generation of possible attack paths.

At the event, we observed key product launches across AI, cloud computing, and hybrid workspaces, including:

Ignite 2024 reinforced Microsoft’s vision of driving innovation through AI-driven tools. In addition, it was a fantastic opportunity to network with Microsoft’s stakeholders and partners. If you want more details, you can also watch most of the keynotes and sessions on demand or read the latest in the Microsoft ignite Book of News.

We’re a Microsoft Gold Partner, and every day, we enable our customers to deploy Microsoft technologies and build long term value. Contact us with your questions.

Every ML project begins with an initial model development phase that includes exploratory data analysis, feature engineering, model creation, training, and evaluation. During this phase, data science teams prioritize delivering a high-performing model, however, generating the ML models goes beyond this phase. While model development requires deep data science expertise, deploying the model demands operational skills.

The key challenges in moving ML model from development to production includes:

We propose a six-step method for ML Operationalization that can help overcome these challenges.

Download the whitepaper to read more.

Will a large context window eliminate the need for a RAG approach?

Although the future of GenAI is still very much a blank canvas, senior leaders still need to ensure that they make wise choices regarding how it will and will not be used within their companies. These leaders, especially the ones in roles focused on growth, commerce, innovation and operations, need to fully understand the nuances of the various AI technologies being proposed within their orgs, as well as the long-term benefits and issues that these technologies can bring to the table.

Take retrieval-augmented generation (RAG) for example. Companies have begun to use this rapidly evolving technology to leverage vast amounts of unstructured data, but already there is talk of “RAG killers” that can outperform it.

In natural language processing (NLP), the “context window” is the span of text that an algorithm or model considers while understanding or generating language. New supermassive context windows can accommodate over one million tokens, which drives the idea that they could replace RAG. These context windows expand the range of use cases, but that doesn’t make them RAG killers, nor do executives need to choose between one or the other as they chart the futures of their companies.

Context windows include a certain number of words or tokens surrounding a target word to capture relationships and meaning within the text. For example, in models like Word2Vec, the context window helps in understanding semantic relationships by analysing words before and after a target word. In transformers like GPT-3, the context window is the portion of text the model uses to generate or predict the next token, directly impacting the coherence and relevance of its output.

Will a large context window eliminate the need for a RAG approach

Figure 1. Context Window

RAG is a hybrid model that combines the strengths of retrieval-based and generation-based approaches. In essence, it retrieves relevant information from a large corpus or database (like a search engine) and then uses that information to generate a coherent and contextually appropriate response or piece of text. This allows for more accurate and informed responses, particularly in scenarios where the model might not have all the required information within its training data alone.
Context Window
Figure 2. RAG Architecture

What are the key differences?

As with any technology, the ideal choice for a specific enterprise depends on the structure of that enterprise, its legacy stack, ability to support new technologies, and dozens of other variables.

Here are some of the key differences between RAG and long context windows.

Feature Long Context Window RAG Model
Capacity Can handle up to thousands of tokens Efficient data handling by retrieving smaller text chunks
Perplexity Lower perplexity due to broader context awareness, reducing ambiguity  Efficiently reduces perplexity by narrowing down to relevant information
Coherence High coherence across contents, preserving narrative flow  May struggle with coherence across diverse retrieved sources
Dependence  Dependence on input context Dependent on external databases or knowledge sources
Diversity  Limited by the scope of pre-trained data Better handling of diverse information from multiple sources
Precision General high-level precision, not detail-oriented High precision in answering specific queries accurately
Consistency Strong narrative consistency across long texts Inconsistent context due to varied retrieved sources
Context Relevance Maintains relevant context throughout the document  May vary, depending on the quality of retrieved documents
Current Information  Limited to knowledge up to the training cut-off Better at incorporating the most recent data
Memory Intensive Requires significant GPU memory for large context processing Distributed load between retrieval and generation processes
Latency Higher latency due to handling extensive textual data Faster inference balanced by retrieval workload
Optimization Requires advanced hardware optimization for effective use Lower hardware requirements compared to LCW
Adaptation Limited adaptability due to fixed pre-trained data Highly adaptable, quickly integrates new data
Customization Less customizable without extensive re-training on new data Easily customized for domain-specific needs via retrieval tuning
Environment Compatibility Challenging integration in diverse environments without extensive tuning Supports frequent updates with minimal re-training
Ideal For Document summarization, in-depth analysis, thesis writing Real-time Q&A, chatbots, and dynamic content retrieval
Less Suited For Real-time information retrieval, dynamic environments Long-form content generation without retrieval refresh
Content Generation Ideal for generating comprehensive, unbroken narratives  Limited by the need for real-time retrieval
Training Time Longer training times due to heavy data processing requirements Faster training if pre-established retrieval corpus is available
Inference Time Slower inference with large contexts, impacting responsiveness Faster inference balanced by retrieval workload
Algorithmic Complexity High complexity making it difficult to optimize Relatively simpler algorithms with modular retrieval and generation components
Integration Complex integration due to hardware and optimization needs Easier integration with existing systems due to modularity
Maintenance High maintenance with frequent updates required Lower maintenance mainly focusing on retrieval database updates
Deployment Challenging deployment in resource-constrained environments Simpler deployment with standard hardware support
Cost High, requires extensive training and updating Minimal, no training required
Data Timeliness Data can quickly become outdated Data retrieved on demand, ensuring currency
Transparency Low, unclear how data influences outcomes High, shows retrieved documents
Scalability Limited, scaling up involves significant resources High, easily integrates with various data sources
Performance Performance can degrade with larger context sizes Selective data retrieval enhances performance
Adaptability Requires retraining for significant adaptations Can be tailored to specific tasks without retraining

It’s always a balance.

Both large context window and RAG systems have their own pros and cons, but large context LLMs are not always RAG killers.

The Long Context Window model excels in maintaining extensive context and coherence, suitable for comprehensive document processing but demands higher computational resources.

In contrast, the RAG model is optimal for real-time information retrieval, offering flexibility and precision with lower resource use, making it ideal for dynamic querying applications like chatbots and Q&A systems.

A hybrid approach (Long Context Window LLM + RAG) is the need of the hour, and can make applications more robust and efficient.

As enterprise leaders consider the future of AI within their companies and organizations, both RAG and large context windows offer significant benefits. Wise leaders will understand the ramifications of each technology, then choose how to combine and balance them based on their vision for the future of their enterprise.

What happens when customers have too many choices?

As people ponder what to buy, many brands offer recommendations. The recommender systems that enable this leverage data on what users have already liked or bought and are particularly useful when brands offer such a huge array of products that the choices can overwhelm the customer, whether that’s music, movies, accessories, or other products.

Recommender systems are integral to modern digital ecosystems. Well-executed recommendations can increase the customer’s affinity for a brand and can drive revenue for brands by increasing cross-sell and upsell. But when recommender systems don’t deliver truly personalized experiences, customers become disillusioned with the underlying brand, with disappointing results for the bottom line.

Traditional Recommender System Techniques

Core recommendation engine functions like handling user interactions and maintaining scalability still rely on traditional techniques, which can include:

GenAI can’t replace recommenders.

But it can help brands fill data gaps and understand nuanced user behavior by generating synthetic data, creating personalized content, and providing context-aware insights that enhance the results that recommenders provide.

Overcoming Data Scarcity and Cold Start Problems

These traditional methods rely heavily on historical user-item interaction data. But when that data is limited or missing, they can struggle, especially with the “cold start” problem –– making recommendations to new users or introducing new items to existing users.

The Cold Start Problems

To address this, brands can use GenAI to create synthetic data that mimics real user interactions, then use it to enrich existing datasets, leading to more robust and nuanced recommendations. It most often does this with two techniques:

Hybrid models can leverage both traditional methods and GenAI. For instance, they might use collaborative filtering to identify similar users and recommend items based on their preferences while simultaneously using a VAE to generate synthetic interactions for users with sparse data, enhancing the performance of the collaborative filtering model.

Unified Propensity Models

Recommender systems traditionally use propensity models to drive upsell and cross-sell, predict churn, and perform other functions. These models are accurate but can be complex and resource intensive. GenAI offers an alternative, a “decent proxy” for these models that can analyze trends and user behavior. Although not necessarily as precise, GenAI can achieve good results with significantly less effort, so long as we remember that GenAI models themselves can inherit biases from the data used to train them.

While GenAI can be more efficient , individual propensity models might still provide slightly more accurate results in some scenarios. The key lies in finding the right balance—so when the brand has a high-value customer making a key business purchase and wants to target them with a premium product offering, it might be better to use a dedicated upsell model. For broader recommendations where speed and efficiency are paramount, GenAI’s unified approach can be highly beneficial.

Do you need a recommender system at all?

The null model—simply recommending popular items—can work surprisingly well. It’s easy to implement, requires minimal data, and guarantees users will see well-regarded products. It can be particularly useful for new businesses or those with limited data on user preferences. Focusing on popular items can also create a sense of social proof, encouraging users to choose products others have enjoyed.

But again, GenAI can play a role. It can be crucial in determining when to choose a null model over a recommender system (assuming the brand already has one). It can also analyze user behavior, interaction patterns, data availability, and other factors to predict the most effective approach for each scenario. We’re not at the point where GenAI can (or should) dictate when to switch between null models and complex recommender systems, but soon it will be able to leverage intelligent routing or agentic frameworks to make this recommendation.

Ever onward

As GenAI matures, recommender systems will transform into even more intricate tools, weaving themselves seamlessly into the fabric of digital experiences. Recommendations will not only anticipate our desires but surprise us with hidden gems. GenAI can unlock this potential, but further exploration and development are needed to bring this vision to life.

Citations

Every day, more enterprises leverage GenAI to enable self-service, streamline call-center operations, and otherwise improve the experiences they offer customers and employees. But questions arise about how to optimize the relationships between retrieval-augmented generation (RAG), knowledge graphs (KG), and large language models (LLM). Combining these three components in the smartest ways enables enterprise tools to generate more accurate and useful responses that can improve the experience for all users.

Augmenting RAG with KGs leverages the structured nature of KGs to enhance traditional vector search retrieval methods, which improves the depth and contextuality of the information it retrieves. KGs organize data as nodes and relationships representing entities and their connections, which can be queried to infer factual information. Combining KGs with RAG enables LLMs to generate more accurate and contextually enriched responses.

Implementing a KG-RAG approach involves several steps.

Knowledge Graph Curation

RAG Integration

System Optimization

Flow diagram

Implementing a KG-RAG approach involves several steps

When should you choose KG over RAG?

When you typically have complex queries. KGs support more diverse and complex queries than vector databases, handling logical operators and intricate relationships. This capability helps LLMs generate more diverse and interesting text. Examples of complex queries where KG outperforms RAG include

When you need to enhance reasoning and inference. KGs enable reasoning and inference, providing indirect information through entity relationships. This enhances the logical and consistent nature of the text generated by the LLM.

When you need to reduce hallucinations. KGs provide more precise and specific information than vector databases, which reduces hallucinations in LLMs. Vector databases indicate the similarity or relatedness between two entities or concepts, whereas KGs enable a better understanding of the relationship between them. For instance, a KG can tell you that “Eiffel Tower” is a landmark in “Paris”, while a vector database can only indicate how similar the two concepts are.

The KG approach effectively retrieves and synthesizes information dispersed across multiple PDFs. For example, it can handle cases where an objective answer (e.g., yes/no) is in one document and the reasoning is in another PDF. This reduces wrong answers (False Positive) which is common in Vector db

Vector databases give similar matches just on text similarity which increases False Negative cases and hallucinations. Knowledge graphs capture contextual and causality relationships, so they don’t produce answers that seem similar based purely on text similarity.

Metric RAG KG
Correct Response 153 167
Incorrect Response 46 31
Partially Correct 1 2

*Comparison run on sample data between RAG and KG on 200 questions

When you need scalability and flexibility. KGs can scale more easily as new data is added while maintaining the structure and integrity of the knowledge base. RAG often requires significant retraining and re-indexing of documents.

The best way to leverage the combined power of RAG, KGs, and LLMs always varies with each enterprise and its specific needs. Finding and maintaining the ideal balance enables the enterprise to provide more useful responses that improve both the customer experience and the employee experience.

Business Benefits

Combining these three powerful technologies can improve the customer experience immensely, even if just by providing more accurate, precise, and relevant answers to questions.

More relevant responses

A KG could model relationships between product features (nodes) and customer reviews (nodes). When a customer asks about a specific product, the system can retrieve relevant data points like reviews, specifications, or compatibility (edges) based on the structured graph. This reduces misinformation and enhances the relevance of responses, improving the customer experience.

KGs are also better at resolving multi-hop queries—those that require information from multiple sources. For example, a question like “Which laptops with Intel i7 processors are recommended for gaming under $1,500?” would require a system to traverse various nodes—products, price ranges, and processor specifications—and combine the results to deliver a complete and accurate response.

Better personalization

KG enable businesses to create a detailed, interconnected view of customer preferences, behaviour, and history. For instance, nodes representing a customer’s purchase history, browsing behaviour, and preferences can be linked to product recommendations and marketing content through specific edges like purchased, viewed, or interested in.

This enables the seamless multichannel experiences that are the goal of many marketing initiatives. With KGs, businesses can maintain a consistent understanding of customers across multiple platforms. For example, a support query initiated on social media can be connected to the customer’s order history and past issues using the graph’s interconnected relationships. This provides a unified experience across channels, such as online chat and in-store support, ensuring consistency and enhancing customer loyalty.

Faster, more consistent responses

KGs can link various pieces of data (nodes) such as FAQs, product documents, and troubleshooting guides. When a customer asks a question, the system can traverse these linked nodes to deliver a response faster than traditional methods, as the relationships between nodes are pre-established and easy to query. For instance, an issue with a smartphone’s battery can be connected to battery FAQs, known issues, and repair services, providing faster resolution times.

KG benefits also extend to the employee experience

Smarter knowledge management

KGs help employees, particularly those in customer support and sales, access the right information quickly. For example, nodes representing common issues, product details, and past customer queries can be linked together. When a customer asks a question, employees can easily navigate through these interconnected nodes, finding precise answers without searching multiple databases.

Shorter Training Times

Because KGs organize information in an intuitive, node-based structure, employees can quickly learn to navigate relationships such as product features, customer preferences, and support histories. For instance, a new hire in tech support can easily find how a specific device model is connected to its known issues and solutions, reducing training complexity and time.

Smarter decisions and better collaboration

KGs can centralize and structure knowledge across departments. For instance, marketing can link nodes such as campaign effectiveness and customer feedback, while the product team may link user feedback with feature improvements. These interconnected graphs allow employees to share insights easily and align strategies across departments, breaking down silos.

Reduced Workload on Repetitive Tasks

KGs can automate repetitive queries by linking commonly asked questions to their answers. For example, support tickets (nodes) could be linked to solutions (nodes), and if a similar query arises, the system can automatically retrieve the relevant information without human intervention.

Use cases for RAG, KGs, and LLMs will continue to expand as more enterprises leverage GenAI. Each is powerful in its own right but combining them creates a suite of tools that are far greater than the sum of their parts.

References

The telecom industry is in the midst of a technology-driven transformation. Due to rising numbers of devices, increasing broadband and mobile connection speeds, telecom’s technology challenges include implementing IoT, embracing AI, taking advantage of 5G and making their customer gain a competitive advantage. Telcos are focused on:

We start with a roadmap using strategies that measure market effectiveness for better conversion rates. Our research into products, segments and behaviors helps us to understand customer profitability and increase margins, followed by an action plan to understand churn and predict CLV.

In this playbook, discover how we’ve helped our clients increase ARPU, drive operational efficiencies, improve NPS scores, conversion rates, revenues and more.

On a global scale, the telecom industry provides crucial services that billions of consumers and businesses rely on every day. Like other industries, telecom has undergone a technology-driven transformation. We’ve helped global telecom’s increase profits and business growth to:

As your partner for success, you leverage our state-of-the-art data science and analytics methodologies and exclusive partnerships.

This playbook, especially curated for the telecom industry demonstrates how we’ve helped our clients increase conversion rates and revenues; reduce churn rates and increase response rates, and more.