Probability-based Framework to Fuse Temporal Consistency and Semantic Information for Background Segmentation

The following code is used to reproduce the experimental results in the article.

Part1: Reproducing the results

Reproducing the results in the paper consists of the following steps: (Note that you may install the jupyter notebook and tensorflow 1.12 to run the semantic model-related calculations. To evaluate the top model, we suggest to use the Clion IDE under the Linux environment. Also, one has to install opencv 3.x or 4.x in advance.)

1. Training the Top model

1.1. Prepare the training set

The training set consists of 3 parts: the one contains the logits from the temporal model, the one contains the logits from the semantic model and the one contains the groundtruths. These datasets are prepared already in the folder train_top_model.

If one want to use models that only produce binary segmentations or one cannot get the source code but have their binary results at hand, use the inversed binary segmentations as the estimation of the conditional background probability. In other word, you should replace images in train_top_model/probabilities by the inverse of corresponding binary results. However, if one want to use the histogram based model and reproduce our result, just go ahead.

1.2. Training the model

Run the notebook to train the top model. During the training process, you can see how the parameters are changed. The trained model as well as the summaries are saved at train_top_model/Logs. Use tensorboard to check these summaries.

2. Create semantic logits for the CDnet dataset

Up to now, we cannot find a way to combine the DeepLab model with C++ based temporal models in a concise way. Thus, we should calculate logits from the semantic model in advance. As the whole logits for CDnet is approximately 30Gb+, we only upload the trained semantic model as frozened graph. Detailed processes are given to show how to use the model and to create all logits of the CDnet dataset.

2.1. A demo for the fine-tuned DeepLab V3+ on the CDnet

The folder deeplab_demo_cdnet contains a demo about how to use our fine-tuned version of the DeepLab V3+ model on CDnet2014. Refer to the jupyter notebook DeepLab_Demo_cdnet_model for details.

If you successfully run the demo, you get the following result on a demo image.

Logits of the semantic segmentation are saved at this folder.

Since the fine-tuning process consists of countless trivil modifications of the original code and would cause bugs on other machines, we do not publish our training code in the current stage.

2.2. Calculate all logits for the CDnet dataset

Run the notebook at calculate_ss_logits_for_cdnet to get all logits. Remember to change the paths for the cdnet datset and that for the logits.

3. Evaluate the top model

Only a few parameters are required to learn in the top model, and once learned, they can be applied to a wide range of environments. Note that the top model is build upon two other models: 1. the temporal-consistent based one; 2. the semantic model. One can replace each by arbitrary algorithm. For reproducing our results, we suggest to use the fine-tuned DeepLab V3+ as the semantic model. For temporal models, we provide both ways to use the modified AFH model or the the SWCD method.

As told in Section 2, since the logits are calculated in advance, we are only required to load these logits as gray images when running either program. The learned parameters of the top model are constants in the code.

Before running our code, one has to download the entire CDnet2014 datset and extract them somewhere.

3.1. Evaluate AFH, SS, and AFH+SSA (Reproducing the results in Figure5, Figure6)

To evaluate the AFH model only, semantic model only, and the top model built upon the AFH model and the fine-tuned DeepLab V3+, MAKE SURE the following preparations are done:

1). The entire CDnet2014 datset are downloaded and extracted;

2). All semantic logits are calculated for the CDnet2014 dataset (see section 2.2);

3). One gets all the trained parameters of the top model. (Since we have already recorded these parameters, you do not need to train the top model again.)

4). (optional) Setup folders to record all kinds of intermediate results. To do this, just copy, replicate, and rename the empty results folder under the root path of the downloaded CDnet2014 dataset.

5). Open the C++ project and modifiy the paths in main.cpp according to your own settings. The first two paths must be set correctly!

6). (no need to change) Write the learned parameters in step 1.2 in BACKSUB.h. Since we have trained the top mode and set the parameters in advance, there is no need to change them. The first column are for other use and PLEASE DO NOT CHANGE THEM! If one want to use other parameters, just copy the trained parameters to the SECOND~LAST COLUMNS!

7). Set the path for opencv in CMakeLists.txt.

Run the project and you will see all intermediate results. These results are also recorded in the corresponding folders.

We also provide a program to calculate the scores in a jupyter notebook. All score records are also given there.

3.2. Evaluate SWCD+SSA (Reproducing the results in Table I, Table II, Figure7)

This is our best scored model, which built upon the SWCD method and the fine-tuned DeepLab V3+. Before running our code, MAKE SURE the following preparations are done:

1). The entire CDnet2014 datset are downloaded and extracted;

2). All semantic logits are calculated for the CDnet2014 dataset (see section 2.2);

3). Download the results of SWCD from CDnet website and extract them;

4). One gets all the trained parameters of the top model. (Since we have already recorded these parameters, you do not need to train the top model again.)

5). (optional) Setup folders to record all kinds of intermediate results. To do this, just copy, replicate, and rename the empty results folder under the root path of the downloaded CDnet2014 dataset.

6). Open the C++ project and modifiy the paths in main.cpp according to your own settings. The first two paths must be set correctly!

7). (no need to change) Write the learned parameters in step 1.2 in BACKSUB.h. Since we have trained the top mode and set the parameters in advance, there is no need to change them. The first column are for other use and PLEASE DO NOT CHANGE THEM! If one want to use other parameters, just copy the trained parameters to the SECOND~LAST COLUMNS!

8). Set the path for opencv in CMakeLists.txt.

Run the project and you will see binary results. These results are also recorded in the corresponding folders. We also provide a program to calculate the scores in a jupyter notebook. All score records as well as per category scores are also given there.

Part2: Demo for Deep Neural Network

This demo illustrate how to build the neural network in Figure3 and how to use the neural network for inference. We train the three parts seperately for the reason of efficiency. Neverthless, one can always train the whole model end-to-end using much more resources. A few details will be added after the publishment of the research.

4. Unified Model Inference

import matplotlib.pyplot as plt
import numpy as np
import os
import glob
import random
import cv2

import tensorflow as tf
import tensorflow.keras as keras
import tensorflow.keras.backend as K

from tensorflow.keras.optimizers import Adam, SGD, RMSprop, Nadam, Adagrad, Adadelta
from tensorflow.keras.callbacks import TensorBoard,ModelCheckpoint,ReduceLROnPlateau,EarlyStopping,LambdaCallback
from tensorflow.keras.layers import *
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.utils import plot_model,Sequence

from utilities import plot_tensor
from deeplabv3pMS import Deeplabv3
import math

from joblib import Parallel, delayed

4.1. Initialization

video_name_str = "./Demo_Video/"
results_path = './Demo_Results/'

seq_len = 10
seq_delta = 20
batch_size = 5

4.2. Load data

images_dataset_path_list = sorted(glob.glob(video_name_str+'*.jpg'))

Height,Width = cv2.imread(images_dataset_path_list[0],0).shape
Height_Ext = math.ceil(Height/16)*16
Width_Ext = math.ceil(Width/16)*16
print(Height,Width,Height_Ext,Width_Ext)

480 720 480 720

input_dataset = np.zeros([len(images_dataset_path_list),Height_Ext,Width_Ext,3],dtype = np.float32)
def read_input_images(i):
    input_dataset[i,:Height,:Width,:] = cv2.imread(images_dataset_path_list[i])
Parallel(n_jobs=-1, verbose=2, prefer="threads")(delayed(read_input_images)(i) for i in range(len(images_dataset_path_list)));

[Parallel(n_jobs=-1)]: Using backend ThreadingBackend with 20 concurrent workers.
[Parallel(n_jobs=-1)]: Done   1 tasks      | elapsed:    0.0s
[Parallel(n_jobs=-1)]: Done 122 tasks      | elapsed:    0.0s
[Parallel(n_jobs=-1)]: Done 325 tasks      | elapsed:    0.1s
[Parallel(n_jobs=-1)]: Done 600 out of 600 | elapsed:    0.3s finished

4.3. Create Generator

def extract_clip_indices(begin,end,head,clip_len,delta):
    position = head - begin;
    sequence = np.arange(begin,end+1,1)
    if head < clip_len*delta:
        clip_indices = np.flip([sequence[position+i*delta] for i in range(clip_len)])
    else:
        clip_indices = np.flip([sequence[position-i*delta] for i in range(clip_len)])
    return clip_indices 
    
def extract_clip_indices_list(begin,end,head_list,clip_len,delta):
    clip_indices_list = []
    for head in head_list:
        clip_indices_list.append(extract_clip_indices(begin,end,head,clip_len,delta))
    return clip_indices_list

class unified_test_X_generator(Sequence):
    def __init__(self,input_dataset,seq_len,seq_delta,batch_size):
        # Pre settings
        self.data = input_dataset
        self.seq_len = seq_len
        self.seq_delta = seq_delta
        self.batch_size = batch_size
        self.clip_frames_list = extract_clip_indices_list(0,
                                                          self.data.shape[0],
                                                          np.arange(self.data.shape[0]),
                                                          self.seq_len,self.seq_delta)
        #print(self.clip_frames_list)
        
    def __len__(self):
        'Denotes the number of batches per epoch'
        return int(np.ceil(self.data.shape[0] / self.batch_size))
    
    def __getitem__(self, i):
        
        batch_X = []

        for j in range(i*self.batch_size,min(self.data.shape[0],(i+1)*self.batch_size)):
            batch_X.append(self.data[self.clip_frames_list[j],:,:,:])

        batch_X = np.asarray(batch_X)
        
        return [batch_X[:,:,:,:,:]/255,batch_X[:,-1,:,:,:]]
    
    def on_epoch_end(self):
        'Updates indexes after each epoch'
        pass

X_generator = unified_test_X_generator(input_dataset,seq_len=seq_len,seq_delta=seq_delta,batch_size=batch_size)

print(X_generator.__getitem__(0)[0].shape)
plot_tensor(X_generator.__getitem__(20)[0][0,:,:,:])

(5, 10, 480, 720, 3)

4.4. Load Semantic model

# The first and last classes are irrelevant ones. Only the middle 13 classes are meaningful.
deeplabv3p = Deeplabv3(input_tensor=None,input_shape=(None,None,3),classes=15)

kernel_tensor = tf.zeros((3,3,1))

inputs = keras.layers.Input(shape=(None,None,3))
upscaled_inputs = keras.layers.UpSampling2D(interpolation='bilinear')(inputs)
out_up = deeplabv3p(upscaled_inputs)
out_raw = deeplabv3p(inputs)
out_up_back = keras.layers.MaxPooling2D()(out_up)
out_up_back_epd = Lambda(lambda x: tf.expand_dims(x,axis=1))(out_up_back)
out_epd = Lambda(lambda x: tf.expand_dims(x,axis=1))(out_raw)
out_concatinate = keras.layers.Concatenate(axis=1)([out_up_back_epd,out_epd])
ms_hidden = Conv3D(filters=1, kernel_size=(5, 5, 1),
                    padding='same',data_format='channels_first',
                    activation='linear')(out_concatinate)

ms_hidden = keras.layers.LeakyReLU(alpha=0.01)(ms_hidden)

ms_hidden = Lambda(lambda x: tf.squeeze(x,axis=1))(ms_hidden)

ms_out = Activation('softmax')(ms_hidden)

ms_out_person = Lambda(lambda x: tf.nn.erosion2d(x, filters=kernel_tensor, strides=(1,1,1,1),
                                                 padding='SAME',data_format='NHWC',dilations=(1,1,1,1)))(ms_out[:,:,:,1:2])

ms_out1 = tf.concat([ms_out[:,:,:,0:1],ms_out_person],axis=-1)
ms_out2 = tf.concat([ms_out1,ms_out[:,:,:,2:15]],axis=-1)

MS_model = Model(inputs,ms_out2,name='MS_model')

MS_model.load_weights('./Semantic_model_weights/Smaller_MS_model_final.h5')

for layer in MS_model.layers:
    layer.trainable = False

4.5. Load Temporal Model

temporal_model = Sequential()
temporal_model.add(ConvLSTM2D(filters=40, kernel_size=(3, 3),strides=2,
                   input_shape=(seq_len, None, None, 3),
                   padding='same', return_sequences=True))
temporal_model.add(BatchNormalization())

temporal_model.add(ConvLSTM2D(filters=40, kernel_size=(3, 3),strides=2,
                   padding='same', return_sequences=False))
temporal_model.add(BatchNormalization())

temporal_model.add(UpSampling2D())
temporal_model.add(Conv2D(filters=10, kernel_size=(3, 3),
               activation='relu',
               padding='same', data_format='channels_last'))
temporal_model.add(UpSampling2D())
temporal_model.add(Conv2D(filters=1, kernel_size=(3, 3),
               activation='sigmoid',
               padding='same', data_format='channels_last'))

temporal_model.load_weights('./Temporal_model_weights/3_temporal_model_fall.h5')

for layer in temporal_model.layers:
    layer.trainable = False

4.6. Load Top Model

probability_stack = keras.layers.Concatenate(axis=-1)([temporal_model.output for i in range(15)])
inverse_probability_stack = Lambda(lambda x: 1.0-x)(probability_stack)
joint_probabilitys_0 = Lambda(lambda x: (x[0]*x[1])[0])([probability_stack,MS_model.outputs])
joint_probabilitys_1 = Lambda(lambda x: (x[0]*x[1])[0])([inverse_probability_stack,MS_model.outputs])
class MinMaxConstraint(tf.keras.constraints.Constraint):
    def __init__(self, min_value=0.0, max_value=1.0):
        self.min = min_value
        self.max = max_value

    def clip(self, w):
        m = tf.keras.backend.clip(w,self.min,self.max)

    def __call__(self, w):
        return tf.keras.backend.clip(w,0.0,1.0)
part_0 = keras.layers.Dense(1, use_bias=False, kernel_constraint=MinMaxConstraint())(joint_probabilitys_0)
part_1 = keras.layers.Dense(1, use_bias=False, kernel_constraint=MinMaxConstraint())(joint_probabilitys_1)   
total_prob = Lambda(lambda x: x[0]+x[1])([part_0,part_1])
unified_model = Model(inputs=[temporal_model.input,MS_model.input], outputs=total_prob)

# Coeeficients for the first and last classes are irrelevant ones. Only the middle 13 classes are meaningful.
part_0_weights = [np.array([[0.],[1.],[0.],[1.],[1.],[0.],[1.],[1.],[0.],[0.],[0.],[0.],[0.],[0.],[0.]])]
part_1_weights = [np.array([[0.],[1.],[0.],[0.],[0.],[0.],[0.],[0.],[0.],[0.],[0.],[0.],[0.],[0.],[0.]])]

unified_model.layers[-3].set_weights(part_0_weights)
unified_model.layers[-2].set_weights(part_1_weights)

4.7. Predict and Save

y_pred_raw = unified_model.predict(X_generator,max_queue_size = 10, workers = 8, verbose=1)
y_pred = np.where(y_pred_raw>0.5, 1.0, 0.0)[:,0:Height,0:Width,:]

120/120 [==============================] - 186s 2s/step

for k in range(len(y_pred)):
    y_pred_temp = np.concatenate((y_pred[k]*255, y_pred[k]*255, y_pred[k]*255), axis=-1)
    cv2.imwrite(results_path+str(k+1).zfill(6)+".png",y_pred_temp)

4.8. Illustrations

check_index = 20
plot_tensor(X_generator.__getitem__(check_index)[0][:,-1,:,:,:])
plot_tensor(y_pred_raw[check_index*batch_size:(check_index+1)*batch_size])

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
Demo_Video		Demo_Video
Semantic_model_weights		Semantic_model_weights
Temporal_model_weights		Temporal_model_weights
calculate_ss_logits_for_cdnet		calculate_ss_logits_for_cdnet
deeplab_demo_cdnet		deeplab_demo_cdnet
evaluate_AFH_SS		evaluate_AFH_SS
evaluate_SWCD_SS		evaluate_SWCD_SS
train_top_model		train_top_model
.gitattributes		.gitattributes
README.md		README.md
Unified Model Inference.ipynb		Unified Model Inference.ipynb
deeplabv3pMS.py		deeplabv3pMS.py
output_14_1.png		output_14_1.png
output_33_0.png		output_33_0.png
output_33_1.png		output_33_1.png
utilities.py		utilities.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Probability-based Framework to Fuse Temporal Consistency and Semantic Information for Background Segmentation

Part1: Reproducing the results

1. Training the Top model

1.1. Prepare the training set

1.2. Training the model

2. Create semantic logits for the CDnet dataset

2.1. A demo for the fine-tuned DeepLab V3+ on the CDnet

2.2. Calculate all logits for the CDnet dataset

3. Evaluate the top model

3.1. Evaluate AFH, SS, and AFH+SSA (Reproducing the results in Figure5, Figure6)

3.2. Evaluate SWCD+SSA (Reproducing the results in Table I, Table II, Figure7)

Part2: Demo for Deep Neural Network

4. Unified Model Inference

4.1. Initialization

4.2. Load data

4.3. Create Generator

4.4. Load Semantic model

4.5. Load Temporal Model

4.6. Load Top Model

4.7. Predict and Save

4.8. Illustrations

About

Releases

Packages

Languages

zengzhi2015/SS_TC_BS

Folders and files

Latest commit

History

Repository files navigation

Probability-based Framework to Fuse Temporal Consistency and Semantic Information for Background Segmentation

Part1: Reproducing the results

1. Training the Top model

1.1. Prepare the training set

1.2. Training the model

2. Create semantic logits for the CDnet dataset

2.1. A demo for the fine-tuned DeepLab V3+ on the CDnet

2.2. Calculate all logits for the CDnet dataset

3. Evaluate the top model

3.1. Evaluate AFH, SS, and AFH+SSA (Reproducing the results in Figure5, Figure6)

3.2. Evaluate SWCD+SSA (Reproducing the results in Table I, Table II, Figure7)

Part2: Demo for Deep Neural Network

4. Unified Model Inference

4.1. Initialization

4.2. Load data

4.3. Create Generator

4.4. Load Semantic model

4.5. Load Temporal Model

4.6. Load Top Model

4.7. Predict and Save

4.8. Illustrations

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages