I would like to use the models for the semantic segmentation of custom data. #158

eorjsld · 2024-04-07T07:47:18Z

eorjsld
Apr 7, 2024

For example EVA (keras_cv_attention_models/keras_cv_attention_models/beit)
Can models like EVA be used for semantic segmentation? Is it possible to change it in a simple way? Or do I need to build a from scratch?

Input shape of semantic segmantation looks like train = ( number of image, size x, size y, 3(rgb) ) mask = ( number of mask, size x, size y, 5 (5classes to_categorical) )

When creating an Eva model and fit it, I entered training and validation data in that format, and the following error occurred.
'ValueError: Shapes (4, 196, 196, 5) and (4, 5) are incompatible'

This is the code part:
eva1 = eva.EvaGiantPatch14(input_shape=(196, 196, 3), num_classes=5, activation="gelu", classifier_activation="softmax", pretrained="imagenet21k-ft1k")

eva1.compile(loss=keras.losses.CategoricalCrossentropy(),
optimizer=Adam(learning_rate=0.001),
metrics=[CategoricalAccuracy()])

eva1_history = eva1.fit(
X_train.astype('float32'), y_train_cat,
verbose=1,
batch_size=adjusted_batch_size,
validation_data=(X_test.astype('float32'), y_test_cat),
shuffle=True,
epochs=10,
)

Is it not possible to utilize it this way?

Thank you so much for your help!

leondgarse · 2024-04-08T05:06:18Z

leondgarse
Apr 8, 2024
Maintainer

I'm rather unfamiliar with semantic segmentation models, but the default model created using eva.EvaGiantPatch14 is a classification model, and the output shape is [batch_size, num_classes], which cannot be directly used. You need to add a segmentation header. Some references like oxford_pets_image_segmentation and keras_segmentation/models/segnet.py, also may refer sam image_encoders.py and sam mask_decoders.py.

Just make sure model output shape matching your y labels. An example could be:

import kecam
from kecam.backend import layers, models
from kecam.attention_layers import conv2d_no_bias, layer_norm, activation_by_name

""" EVA backbone """
patch_size = 4  # Should better divisible by input_shape
backbone = kecam.models.EvaLargePatch14(input_shape=(196, 196, 3), num_classes=0, patch_size=patch_size)
print(f"{backbone.layers[-3].output_shape = }")  # layer before `reduce_mean`
# backbone.layers[-3].output_shape = (None, 2401, 1024)

inputs = backbone.inputs
nn = backbone.layers[-3].output
nn = layers.Reshape([-1, int(nn.shape[1] ** 0.5), nn.shape[-1]])(nn)
print(f"{nn.shape = }")
# nn.shape = TensorShape([None, 49, 49, 1024])

""" Neck """
embed_dims = 256
nn = conv2d_no_bias(nn, embed_dims, kernel_size=1, use_bias=False, name="neck_1_")
nn = layer_norm(nn, name="neck_1_")
nn = conv2d_no_bias(nn, embed_dims, kernel_size=3, padding="SAME", use_bias=False, name="neck_2_")
nn = layer_norm(nn, name="neck_2_")
print(f"{nn.shape = }")
# nn.shape = TensorShape([None, 49, 49, 256])

""" Upsample 4x """
activation = "gelu"
nn = layers.Conv2DTranspose(embed_dims // 4, kernel_size=2, strides=2, name="up_1_conv_transpose")(nn)
nn = layer_norm(nn, epsilon=1e-6, name="up_1_")  # epsilon is fixed using 1e-6
nn = activation_by_name(nn, activation=activation, name="up_1_")
nn = layers.Conv2DTranspose(embed_dims // 8, kernel_size=2, strides=2, name="up_2_conv_transpose")(nn)
nn = activation_by_name(nn, activation=activation, name="up_2_")
print(f"{nn.shape = }")
# nn.shape = TensorShape([None, 196, 196, 32])

""" Output head """
num_classes = 5
nn = conv2d_no_bias(nn, num_classes, kernel_size=3, padding="SAME", use_bias=True, name="output_")
output = activation_by_name(nn, activation="softmax", name="output_")
model = models.Model(inputs, output, name=backbone.name + "_segment")
print(f"{model.output_shape = }")
# model.output_shape = (None, 196, 196, 5)

Then this model should be able to apply with your compile -> fit process.

2 replies

leondgarse Apr 8, 2024
Maintainer

You can also use a decoder from stable_diffusion/encoder_decoder.py:

import kecam
from kecam.backend import layers, models

patch_size = 8  # Should better divisible by input_shape
backbone = kecam.models.EvaLargePatch14(input_shape=(192, 192, 3), num_classes=0, patch_size=patch_size)
print(f"{backbone.layers[-3].output_shape = }")
# backbone.layers[-3].output_shape = (None, 576, 1024)

inputs = backbone.inputs
nn = backbone.layers[-3].output
nn = layers.Reshape([-1, int(nn.shape[1] ** 0.5), nn.shape[-1]])(nn)
print(f"{nn.shape = }")
# nn.shape = TensorShape([None, 24, 24, 1024])

num_classes = 5
decoder = kecam.stable_diffusion.encoder_decoder.Decoder(input_shape=nn.shape, num_blocks=[1, 1, 1, 1], output_channels=num_classes)
model = models.Model(inputs, decoder(nn), name=backbone.name + "_segment")
print(f"{model.output_shape = }")
# model.output_shape = (None, 192, 192, 5)

eorjsld Apr 9, 2024
Author

Thank you for the quick and detailed reply! I even succeeded in creating the model with your help. However, due to limitations in the performance of the computer currently available to me, I am unable to test whether it works due to a memory shortage during the compilation stage. I'm in the process of gaining access to a high-performance computer. I will test it and post the results. Thank you so much again.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I would like to use the models for the semantic segmentation of custom data. #158

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

I would like to use the models for the semantic segmentation of custom data. #158

eorjsld Apr 7, 2024

Replies: 1 comment · 2 replies

leondgarse Apr 8, 2024 Maintainer

leondgarse Apr 8, 2024 Maintainer

eorjsld Apr 9, 2024 Author

eorjsld
Apr 7, 2024

Replies: 1 comment 2 replies

leondgarse
Apr 8, 2024
Maintainer

leondgarse Apr 8, 2024
Maintainer

eorjsld Apr 9, 2024
Author