Controlled diffusion model can change material properties in images (2024)

Researchers from the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) and Google Research may have just performed digital sorcery — in the form of a diffusion model that can change the material properties of objects in images.

Dubbed Alchemist, the system allows users to alter four attributes of both real and AI-generated pictures: roughness, metallicity, albedo (an object’s initial base color), and transparency. As an image-to-image diffusion model, one can input any photo and then adjust each property within a continuous scale of -1 to 1 to create a new visual. These photo editing capabilities could potentially extend to improving the models in video games, expanding the capabilities of AI in visual effects, and enriching robotic training data.

The magic behind Alchemist starts with a denoising diffusion model: In practice, researchers used Stable Diffusion 1.5, which is a text-to-image model lauded for its photorealistic results and editing capabilities. Previous work built on the popular model to enable users to make higher-level changes, like swapping objects or altering the depth of images. In contrast, CSAIL and Google Research’s method applies this model to focus on low-level attributes, revising the finer details of an object’s material properties with a unique, slider-based interface that outperforms its counterparts.

While prior diffusion systems could pull a proverbial rabbit out of a hat for an image, Alchemist could transform that same animal to look translucent. The system could also make a rubber duck appear metallic, remove the golden hue from a goldfish, and shine an old shoe. Programs like Photoshop have similar capabilities, but this model can change material properties in a more straightforward way. For instance, modifying the metallic look of a photo requires several steps in the widely used application.

“When you look at an image you’ve created, often the result is not exactly what you have in mind,” says Prafull Sharma, MIT PhD student in electrical engineering and computer science, CSAIL affiliate, and lead author on a new paper describing the work. “You want to control the picture while editing it, but the existing controls in image editors are not able to change the materials. With Alchemist, we capitalize on the photorealism of outputs from text-to-image models and tease out a slider control that allows us to modify a specific property after the initial picture is provided.”

Precise control

“Text-to-image generative models have empowered everyday users to generate images as effortlessly as writing a sentence. However, controlling these models can be challenging,” says Carnegie Mellon University Assistant Professor Jun-Yan Zhu, who was not involved in the paper. “While generating a vase is simple, synthesizing a vase with specific material properties such as transparency and roughness requires users to spend hours trying different text prompts and random seeds. This can be frustrating, especially for professional users who require precision in their work. Alchemist presents a practical solution to this challenge by enabling precise control over the materials of an input image while harnessing the data-driven priors of large-scale diffusion models, inspiring future works to seamlessly incorporate generative models into the existing interfaces of commonly used content creation software.”

Alchemist’s design capabilities could help tweak the appearance of different models in video games. Applying such a diffusion model in this domain could help creators speed up their design process, refining textures to fit the gameplay of a level. Moreover, Sharma and his team’s project could assist with altering graphic design elements, videos, and movie effects to enhance photorealism and achieve the desired material appearance with precision.

The method could also refine robotic training data for tasks like manipulation. By introducing the machines to more textures, they can better understand the diverse items they’ll grasp in the real world. Alchemist can even potentially help with image classification, analyzing where a neural network fails to recognize the material changes of an image.

Sharma and his team’s work exceeded similar models at faithfully editing only the requested object of interest. For example, when a user prompted different models to tweak a dolphin to max transparency, only Alchemist achieved this feat while leaving the ocean backdrop unedited. When the researchers trained comparable diffusion model InstructPix2Pix on the same data as their method for comparison, they found that Alchemist achieved superior accuracy scores. Likewise, a user study revealed that the MIT model was preferred and seen as more photorealistic than its counterpart.

Keeping it real with synthetic data

According to the researchers, collecting real data was impractical. Instead, they trained their model on a synthetic dataset, randomly editing the material attributes of 1,200 materials applied to 100 publicly available, unique 3D objects in Blender, a popular computer graphics design tool.

“The control of generative AI image synthesis has so far been constrained by what text can describe,” says Frédo Durand, the Amar Bose Professor of Computing in the MIT Department of Electrical Engineering and Computer Science (EECS) and CSAIL member, who is a senior author on the paper. “This work opens new and finer-grain control for visual attributes inherited from decades of computer-graphics research.”

"Alchemist is the kind of technique that's needed to make machine learning and diffusion models practical and useful to the CGI community and graphic designers,” adds Google Research senior software engineer and co-author Mark Matthews. “Without it, you're stuck with this kind of uncontrollable stochasticity. It's maybe fun for a while, but at some point, you need to get real work done and have it obey a creative vision."

Sharma’s latest project comes a year after he led research on Materialistic, a machine-learning method that can identify similar materials in an image. This previous work demonstrated how AI models can refine their material understanding skills, and like Alchemist, was fine-tuned on a synthetic dataset of 3D models from Blender.

Still, Alchemist has a few limitations at the moment. The model struggles to correctly infer illumination, so it occasionally fails to follow a user’s input. Sharma notes that this method sometimes generates physically implausible transparencies, too. Picture a hand partially inside a cereal box, for example — at Alchemist’s maximum setting for this attribute, you’d see a clear container without the fingers reaching in.

The researchers would like to expand on how such a model could improve 3D assets for graphics at scene level. Also, Alchemist could help infer material properties from images. According to Sharma, this type of work could unlock links between objects' visual and mechanical traits in the future.

MIT EECS professor and CSAIL member William T. Freeman is also a senior author, joining Varun Jampani, and Google Research scientists Yuanzhen Li PhD ’09, Xuhui Jia, and Dmitry Lagun. The work was supported, in part, by a National Science Foundation grant and gifts from Google and Amazon. The group’s work will be highlighted at CVPR in June.

Controlled diffusion model can change material properties in images (2024)

FAQs

Controlled diffusion model can change material properties in images? ›

Controlled diffusion model can change material properties in images. “Alchemist” system adjusts the material attributes of specific objects within images to potentially modify video game models to fit different environments, fine-tune VFX, and diversify robotic training.

What are some challenges of diffusion models? ›

Limitations of Diffusion Models

Despite their advantages, diffusion models also have some limitations: Longer training and generation times, making them computationally expensive. Difficulty in handling text-based data as compared to image data. Prone to generating unrealistic images if the denoising algorithm fails.

What are the advantages of diffusion model? ›

A primary advantage of diffusion models over GANs and VAEs is the ease of training with simple and efficient loss functions and their ability to generate highly realistic images.

What can diffusion model do? ›

In Other words, Diffusion models can generate coherent images from noise. Diffusion models train by adding noise to images, which the model then learns how to remove. The model then applies this denoising process to random seeds to generate realistic images.

What are diffusion models in image processing? ›

Abstractly speaking, the idea of diffusion model is to take an unknown probability distribution (the distribution of natural-looking images), then progressively convert it to a known probability distribution (standard gaussian distribution), then learn a neural network that reverses the process.

What is the goal of diffusion models in image generation? ›

In particular, diffusion models aim to model the data generation process by iteratively refining a data sequence from a simple initial distribution to the target data distribution. By doing so, they can generate realistic and coherent data samples that resemble the training data.

What is the difference between generative AI and diffusion model? ›

Generative artificial intelligence (AI) refers to algorithms that create synthetic but realistic output. Diffusion models currently offer state of the art performance in generative AI for images. They also form a key component in more general tools, including text-to-image generators and large language models.

What are the advantages and disadvantages of diffusion? ›

It is useful in the capillary beds of the lungs because oxygen can quickly be diffused into the bloodstream to oxygenate the rest of the body. A disadvantage of diffusion is that it is not effective over long distances and large, polar, or ionic molecules cannot be moved by simple diffusion.

What is the disadvantage of diffusion of innovation theory? ›

The diffusion of innovation theory, while widely applied in various fields like policy, politics, and public health, has several limitations. One significant constraint is the theory's pro-innovation and individual blame bias, as it overlooks the impact of societal, cultural, and external factors on behavior change .

What can you imagine creating with a diffusion model? ›

Diffusion models work in a somewhat similar way, starting with a form of digital “noise” and, step-by-step, transforming it into a coherent and detailed piece of content. They can create art, videos, music, and even new scientific research.

What are the applications of diffusion model? ›

Compared to traditional generative models, diffusion models have better image quality, interpretable latent space, and robustness to overfitting. Diffusion models have diverse applications across several domains, such as text-to-video synthesis, image-to-image translation, image search, and reverse image search.

Are diffusion models difficult to control? ›

Diffusion models can sometimes be very sensitive to hyper parameters. It's not easy like a VAE but not as hard as GAN. Try some more fiddling.

What are the advantages of diffusion models? ›

Advantages and Limitations of Diffusion Models

Diffusion Models possess unique advantages that set them apart from GANs. Firstly, diffusion models offer fine-grained control over the generation process, allowing users to manipulate the quality and diversity of the generated data.

What is diffusion image? ›

Diffusion imaging is a functional MR imaging tool that creates tissue contrast representative of the random, microscopic translational motion of water molecules within human body tissues.

What is diffusion model theory? ›

Diffusion theory concerns with the spread of an innovation through a population. Researchers in diffusion theory have developed analytical models for explaining and forecasting the dynamics of diffusion of an innovation (an idea, practice, or object perceived as new by an individual) in a socio-technical system.

Why are diffusion models slow? ›

Diffusion models are slower than their GAN counterparts because of the iterative and sequential reverse diffusion process.

What are the challenges of stable diffusion? ›

Current challenges in stable diffusion include the lack of robustness in the generation process and the difficulty for non-experts to understand the complex internal structures and operations of diffusion-based generative models.

Why do diffusion models not suffer from mode collapse? ›

Unlike GANs, diffusion models are less prone to this issue because they use an iterative refinement process that gradually improves the generated images, rather than attempting to produce them all at once.

What are the advantages of diffusion? ›

It is the process by which molecules move from high to low concentrations. The advantage of diffusion is that it does not require energy to regulate, unlike cells that have to make proteins that pump molecules across the membrane.

References

Top Articles
Latest Posts
Article information

Author: Saturnina Altenwerth DVM

Last Updated:

Views: 6807

Rating: 4.3 / 5 (44 voted)

Reviews: 83% of readers found this page helpful

Author information

Name: Saturnina Altenwerth DVM

Birthday: 1992-08-21

Address: Apt. 237 662 Haag Mills, East Verenaport, MO 57071-5493

Phone: +331850833384

Job: District Real-Estate Architect

Hobby: Skateboarding, Taxidermy, Air sports, Painting, Knife making, Letterboxing, Inline skating

Introduction: My name is Saturnina Altenwerth DVM, I am a witty, perfect, combative, beautiful, determined, fancy, determined person who loves writing and wants to share my knowledge and understanding with you.