Contributing

ImputeGAP allows users to integrate their own algorithms. We provide a wrapper that needs to be adjusted with the core of the imputation algorithm.

Initialize a Git Repository

command:

$ git init
$ git clone https://github.com/eXascaleInfolab/ImputeGAP
$ cd ./ImputeGAP

A. Integration Steps (in Python)

Basic Features

  1. Navigate to the ./imputegap/algorithms directory.

  2. Create a new file by copying mean_impute.py and rename it with the name of your algorithm, e.g., new_alg.py.

  3. Rename the function def mean_impute(), e.g., def new_alg().

  4. Replace the section under # core of the algorithm with your algorithm’s implementation. The algorithms should take as input the TimeSeries object structure and should return a numpy.ndarray matrix.

  5. Navigate to ./imputegap/recovery/imputation.py:

    1. Copy the class MeanImpute(BaseImputer) into the corresponding family of algorithms.

    2. Rename the class. e.g., class NewAlg(BaseImputer).

    3. Change the value of the algorithm variable from mean_impute to new_alg

    4. In the def impute() method, replace the call of the function to link into your new algorithm, e.g.,

    from imputegap.algorithms.new_alg import new_alg
    self.recov_data = new_alg(self.incomp_data, params)
    


Advanced Features

I. Initialize default values

  1. To set the default values of your algorithm, please update ./imputegap/env/default_values.toml and add your configuration:

command:

[new_alg]
param_integer = 42
param_float = 0.42
param_string = "value_42"
  1. Update the ./imputegap/tools/utils.py file, and specify your configuration in the load_parameters function.


II. Benchmark

To access the benchmarking features, please update ./imputegap/tools/utils.py by adding your algorithm in the def config_impute_algorithm function.

elif algorithm == "new_alg":
    imputer = Imputation.MyFamily.NewAlg(incomp_data)

Replace MyFamily with either: Statistics, MatrixCompletion, PatternSearch, MachineLearning, or DeepLearning


III. Optimizer

To enable the optimization module, please update ./imputegap/tools/algorithm_parameters.py.

  1. Open ./imputegap/tools/algorithm_parameters.py copy paste lines 59 to 63 and update the algorithm name and parameters, e.g.,

    'new_alg': {
            "param_integer": tune.grid_search([i for i in range(2, 20 1)]),
            "param_float": tune.loguniform(1e-6, 1),
            "param_string": ["value_1", "value_n"]
        },
    
  2. Add your parameters in the def save_optimization() function of the file ./imputegap/tools/utils.py to save the optimal parameters, line 820 to 825:

    if algorithm == "new_alg":
        params_to_save = {
            "param_integer": int(optimal_params[0]),
            "param_float": float(optimal_params[1]),
            "param_string": str(optimal_params[2])
    }
    

IV. Update the call

Navigate to ./imputegap/recovery/imputation.py:

Improve the imputation call of the NewAlg class in the def impute() function, and add the call of the optimizer and the default values of the parameters.

if params is not None:
    param_integer, param_float, param_string = self._check_params(user_def, params)  # call the optimizer
else:
    param_integer, param_float, param_string = utils.load_parameters(query="default", algorithm=self.algorithm, verbose=self.verbose)  # load the default values

self.recov_data = new_alg(incomp_data=self.incomp_data, param_integer=param_integer, param_float=param_float, param_string=param_string, logs=self.logs, verbose=self.verbose)


B. Integration Steps (other languages)

We will show how to adjust the integration wrapper in C++

  1. Navigate to the ./imputegap/algorithms directory.

  2. If not already done, convert your CPP/H files into a shared object format (.so) and place them in the imputegap/algorithms/lib folder. a. Go to ./imputegap/wrapper/AlgoCollection and update the Makefile. Copy commands from libSTMVL.so or modify them as needed. b. Optionally, copy your C++ project files into the directory. c. Generate the .so file using the make command:

    make your_lib_name
    
    1. Optional: To include the .so file in the “in-built” directory, open a command line, navigate to the root directory, and execute the library build process:

      rm -rf dist/
      python setup.py sdist bdist_wheel
      
  3. Rename cpp_integration.py to reflect your algorithm’s name.

  4. Modify the native_algo() function: a. Update the shared object parameter to match your shared library. b. Convert input parameters to the appropriate C++ types and pass them to your shared object methods. c. Convert the imputed matrix back to a numpy format.

  5. Adapt the template method your_algo.py with the appropriate parameters, ensuring compatibility with the TimeSeries object and a numpy.ndarray return type.

  6. Adapt the ./imputegap/recovery/imputation.py: a. Add a function to call your new algorithm by copying and modifying class MeanImpute(BaseImputer) as needed. You can copy-paste the class into the corresponding category of algorithms.

  7. Perform imputation as needed.



Example with C++ Algorithm

Once your cpp and h files are ready to be converted (you can look at ./imputegap/wrapper/AlgoCollection/shared/SharedLibCDREC.cpp or ./imputegap/wrapper/AlgoCollection/shared/SharedLibCDREC.h), create a .so file for linux and windows, and a .dylib file for MAC OS.

  1. Modify the Makefile:

    libCDREC.so:
        g++ -O3 -D ARMA_DONT_USE_WRAPPER -fPIC -rdynamic -shared -o lib_cdrec.so -Wall -Werror -Wextra -pedantic \
        -Wconversion -Wsign-conversion -msse2 -msse3 -msse4 -msse4.1 -msse4.2 -fopenmp -std=gnu++14 \
        Stats/Correlation.cpp Algorithms/CDMissingValueRecovery.cpp  Algebra/Auxiliary.cpp \
        Algebra/CentroidDecomposition.cpp  shared/SharedLibCDREC.cpp \
        -lopenblas -larpack
    
  2. Generate the shared library:

    make libCDREC.so
    
  3. Place the generated .so file in imputegap/algorithms/lib

  4. Optional: To include the .so file in the “in-built” directory:

    rm -rf dist/
    python setup.py sdist bdist_wheel
    




Wrapper

  1. In imputegap/algorithms/cpp_integration.py, update the function name and parameter count, and ensure the .so file matches:

    def native_cdrec(__py_matrix, __py_rank, __py_epsilon, __py_iterations):
    
        shared_lib = utils.load_share_lib("lib_cdrec") # in-build files
        # shared_lib = utils.load_share_lib("./your_path/lib_cdrec.so") # external files
    
  2. Convert variables to corresponding C++ types:

    __py_n = len(__py_matrix);
    __py_m = len(__py_matrix[0]);
    
    assert (__py_rank >= 0);
    assert (__py_rank < __py_m);
    assert (__py_epsilon > 0);
    assert (__py_iterations > 0);
    
    __ctype_size_n = __native_c_types_import.c_ulonglong(__py_n);
    __ctype_size_m = __native_c_types_import.c_ulonglong(__py_m);
    
    __ctype_rank = __native_c_types_import.c_ulonglong(__py_rank);
    __ctype_epsilon = __native_c_types_import.c_double(__py_epsilon);
    __ctype_iterations = __native_c_types_import.c_ulonglong(__py_iterations);
    
    __ctype_matrix = __marshal_as_native_column(__py_matrix);
    
  3. Call the C++ algorithm with the required parameters:

    shared_lib.cdrec_imputation_parametrized(__ctype_matrix, __ctype_size_n, __ctype_size_m, __ctype_rank, __ctype_epsilon, __ctype_iterations);
    
  4. Convert the imputed matrix back to numpy:

    __py_imputed_matrix = __marshal_as_numpy_column(__ctype_matrix, __py_n, __py_m);
    
    return __py_imputed_matrix;
    


Method Implementation

  1. In imputegap/algorithms/cpp_integration.py, create or adapt a generic method for your needs:

    def cdrec(contamination, truncation_rank, iterations, epsilon, logs=True, lib_path=None):
    
        start_time = time.time()  # Record start time
    
        # Call the C++ function to perform recovery
        imputed_matrix = native_cdrec(contamination, truncation_rank, epsilon, iterations)
    
        end_time = time.time()
    
        if logs:
            print(f"\n\t\t> logs, imputation cdrec - Execution Time: {(end_time - start_time):.4f} seconds\n")
    
        return imputed_matrix
    


Imputer Class

  1. Add your algorithm to the catalog in ./imputegap/recovery/imputation.py

  2. Copy and modify class MeanImpute(BaseImputer) to fit your requirements:

    class MatrixCompletion:
        class CDRec(BaseImputer):
            algorithm = "cdrec"
    
            def impute(self, user_defined=True, params=None):
    
                self.imputed_matrix = cdrec(contamination=self.infected_matrix, truncation_rank=rank, iterations=iterations, epsilon=epsilon, logs=self.logs)
    
                return self