I used to be at KGC (Data Graph Convention) 2024, which is occurring Might 6-10 at Cornell Tech. I used to be presenting (nearly) at their Well being Care and Life Sciences (HCLS) workshop, so my audio system move was solely legitimate for as we speak for the HCLS portion of KGC. My journey report covers just a few talks that I attended right here. Attending nearly was a bit chaotic as classes went over typically, so that you may depart a session to attend one other, solely to seek out that it hadn’t began but. That is onerous to forsee, we’ve got confronted this subject ourselves the primary time we moved an inside convention from in-person to hybrid.
KGs in RAG (Tom Smoker, WhatWhyHow.AI)
I’ve been working with Massive Language Fashions (LLMs) and Retrieval Augmented Era (RAG) for nearly a yr now, and I went to this speak hoping for insights on easy methods to use graphs as enter to RAG programs. Understandably, the speaker spent a while masking the fundamentals, which I personally didn’t discover very fruitful. Nonetheless, there have been some nuggets of knowledge I received out of the speak. First, the RAG pipelines can decrease the chance of hallucinations by utilizing LLMs for planning and reasoning, however with out delegating to LLMs for factual info. And second, an agent structure can extra effectively use smaller sub-graphs which may typically be generated dynamically in Closed World fashions.
A aspect dialogue on chat additionally yielded a paper reference Getting from Generative AI to Reliable AI: what LLMs could study from Cyc (Lenat and Marcus, 2023). The paper seems to be actually attention-grabbing on an preliminary skim and I plan to learn in additional element later.
Data Graphs for Precision Oncology (Krishna Bulusu, AstraZeneca)
A pleasant overview of functions of Data Graph (KG) to Drug Discovery (DD). DD makes an attempt to use KG to resolve 3 foremost issues: (1) discover gene inflicting illness (2) match drug with illness and (3) (drug, gene, illness) as a elementary relationship in DD. The speaker identified that the large benefit of KGs is Explainability. He additionally talked about using graph clustering for node stratification.
Combining graph and vector illustration for environment friendly info retrieval (Peio Popov, Ontotext)
This was a presentation from OntoText the place they demonstrated new options constructed into their GraphDB database. This was of curiosity to me personally since our KG can also be constructed utilizing GraphDB. Particularly they’ve built-in LLM and vector search assist into their merchandise to allow them to be invoked from a SPARQL question. This offers GraphDB customers the ability to mix these strategies in the identical name fairly than construct multi-stage pipelines.
I additionally realized the excellence between Semantic, Full textual content and Vector Search as ones primarily based off KG, Lucene (or Lucene-like) indexes and vector search platforms, I’d beforehand conflate the primary and third.
Data Engineering in Scientific Resolution Assist: When a Graph Representational Mannequin shouldn’t be sufficient (Maulik Kamdar, Optum)
This was a presentation from my ex-colleague Maulik Kamdar. He talks about challenges in Scientific Resolution Assist (CDS) the place a KG alone is inadequate. Particularly the case he’s contemplating the place a number of third occasion ontologies should be aligned into one KG. On this scenario, comparable ideas are mixed into ValueSets, that are then composed with naked ideas or with one another to kind Scientific Guidelines. Scientific Guidelines are additional mixed to kind Scientific Calculators or Questionnaires, that are then mixed to kind Resolution Timber and Flowcharts, that are then mixed into Scientific Pointers. I’m in all probability biased given our widespread historical past, however I discovered this speak to be essentially the most academic for me.
Data Graphs, Theorem Provers and Language Fashions (Vijay Saraswat and Nikolaos Vasiloglou)
The audio system mentioned the position of self-discovery, In-Context Studying (ICL), symbiotic integration of KG with search, and Graph RAG in reasoning engines powered by KG and LLM. They characterize an Agent as an LLM primarily based black field that is supplied with pairs of input-output situations to study some unknown operate (just like ML fashions). They describe ICL as studying by few shot and plenty of shot examples. Additionally they speak about utilizing the output of KG to fact-check / improve LLMs and utilizing LLMs to generate assertions that can be utilized to create a KG. Their demo reveals how an LLM is ready to study to generate a Datalog like graph question language from textual content prompts utilizing few-shot examples.
The speaker made reference to the next three papers in assist of the strategies he was describing, which I’ve duly added to my studying record.
A Scalable and Strong Named Entity Recognition and Linking System for a Scientific Healthcare Data Graph (Sujit Pal, Elsevier Well being)
This was my speak. I had initially meant to attend in particular person nevertheless it appeared wasteful to fly throughout the nation to ship a 5-minute presentation. It did take a little bit of planning to current remotely however I realized two helpful life classes.
- You may generate a presentation video from MS Powerpoint. Merely create your slides and document a slideshow the place you document your self narrating your presentation. As soon as finished, export as an MP4 and add to Youtube or different video service.
- You may print posters on-line and have them delivered to another person.
Enormous because of my colleague Tom Woodcock who attended in particular person, and who was type sufficient to hold and grasp my poster on the convention for me, and who additionally agreed to current my slideshow for me (though I believe that ultimately he didn’t need to). Many thanks additionally to my ex-colleague Helena Deus (a part of the HCLS organizing workforce), who helped stroll me by to a workable resolution and was instrumental in my speak being delivered efficiently. Additionally because of Leah Walton from the HCLS organizing workforce, for supporting me in my try and current remotely.
Right here is the Youtube video for my 5-minute presentation in case you have an interest. It’s a bit high-level since I had solely 5 minutes to cowl the whole lot, however there is a bit more info within the poster under.
Graphs for good – Speculation era for Uncommon Illness Therapy (Brian Martin, AbbVie)
This presentation revolves round a graph that connects illnesses to medicine by way of illness variants, gene, pathway, gene and compound entities. This was used to discover a remedy for a uncommon illness utilizing present medicines. It was later prolonged to seek out candidate cures for a gaggle of 20 most uncared for illnesses worldwide. The audio system verified that outcomes for Dengue fever correlates effectively with beforehand identified info, thus supporting the veracity of the method. The paper describing this work is Leveraging a Billion-Edge Data Graph for Drug Re-purposing and Goal Prioritization utilizing Genomically-Knowledgeable Subgraphs (Martin et al, 2022).
Producing and Querying Graphs with LLM (Brian Martin, Subha Madhavan, Berenice Wulbrecht)
Panel dialogue the place numerous methods for producing and querying graphs utilizing LLMs have been mentioned. Entertaining (and considerably predictable) comparisons of Property Graphs vs RDF graphs to Ford and Ferrari cars, and the way LLMs remodel them into Teslas (with its self-driving know-how). Additionally they speak about extracting assertions from a corpus of paperwork to create a KG personalized for the corpus, after which utilizing the KG to fact-check the output of the LLM for RAG queries towards that corpus.
Total, I believe it was an ideal convention. Realized lots, would love to return and current right here sooner or later, hopefully this time in particular person.