Your browser does not support JavaScript!

Home    Collections    Type of Work    Post-graduate theses  

Post-graduate theses

Search command : Author="Κοπιδάκης"  And Author="Γεώργιος"

Current Record: 3 of 6633

Back to Results Previous page
Next page
Add to Basket
[Add to Basket]
Identifier 000466161
Title 3D scene generation and editing using foundational models and geometric algebra
Alternative Title Δημιουργία και επεξεργασία τρισδιάστατων σκηνών χρησιμοποιώντας μεγάλα μοντέλα όρασης-γλώσσας και γεωμετρική άλγεβρα
Author Αγγελής, Δημήτριος Α.
Thesis advisor Παπαγιαννάκης, Γεώργιος
Reviewer Τζίτζικας, Ιωάννης
Ρούσσος, Αναστάσιος
Πρατικάκης, Πολύβιος
Abstract In the realm of Embodied AI, the creation of 3D simulated environments holds paramount significance, yet it often demands specialized expertise and substantial manual labor, consequently limiting their diversity and expansiveness. In this thesis, we introduce a novel framework designed to address this limitation by facilitating the fully automated generation of 3D environments tailored to user-supplied prompts. Our framework automates scene generation and exhibits versatility in crafting diverse scenes, adjusting designs to various styles, and comprehending the semantics of intricate queries. Central to our approach is the utilization of a large language model (LLM), which imbues the framework with common-sense knowledge to envision plausible scene configurations. Additionally, we harness a vast collection of 3D assets sourced from Objaverse to populate scenes with a rich array of objects. We further enhance the framework by integrating a sophisticated agent capable of providing feedback to the generation process. This agent, powered by Multimodal Models like GPT-4 Vision, operates as a feedback agent, guiding the generation towards desired outcomes. Furthermore, we harness the capabilities of Retrieval Augmented Generation (RAG) to enrich the generation process, and incorporate the use of reference images, leveraging the advanced visual understanding of GPT-4 Vision. User evaluations indicate a strong preference for our approach, with 75% of users favoring scenes generated using the feedback agent, 55.6% preferring scenes generated using RAG, and 83.3% agreeing that there is a resemblance of the generated scene to the referenced image. In comparison with the state of the art, our implementation is faster and more modular, enhancing user experience and system efficiency. Additionally, this thesis introduces a novel algorithm that integrates Large Language Models (LLMs) with Conformal Geometric Algebra (CGA) to revolutionize controllable 3D scene editing, particularly for object repositioning tasks. Conventional methods typically suffer from reliance on large training datasets or lack a formalized language for precise edits. Utilizing CGA as a robust formal language, our framework precisely models spatial transformations necessary for accurate object repositioning. Leveraging the zero-shot learning capabilities of pre-trained LLMs, our framework translates natural language instructions into CGA operations, facilitating exact spatial transformations within 3D scenes without the need for specialized pre-training. To accurately assess the impact of CGA, we benchmark against robust Euclideanbased baselines, evaluating both latency and accuracy. Comparative performance evaluations indicate that our framework significantly reduces LLM response times by 16% and boosts success rates by 9.6% on average compared to traditional methods. These advancements underscore our framework’s potential to democratize 3D scene generation and editing, enhancing accessibility and fostering innovation across sectors such as education, entertainment, and virtual reality.
Language English
Subject Generative artificial intelligence
Large language models
Large vision models
Γενετική τεχνητή νοημοσύνη
Δημιοργία τρισδιάστατης σκηνής
Επεξεργασία τρισδιάστατης σκηνής
Issue date 2024-07-26
Collection   School/Department--School of Sciences and Engineering--Department of Computer Science--Post-graduate theses
  Type of Work--Post-graduate theses
Permanent Link https://elocus.lib.uoc.gr//dlib/2/0/a/metadata-dlib-1721197178-332584-24089.tkl Bookmark and Share
Views 2

Digital Documents
No preview available

Download document
View document
Views : 1