Contributing¶
ImputeGAP allows users to integrate their own algorithms. We provide a wrapper that needs to be adjusted with the core of the imputation algorithm.
Initialize a Git Repository¶
command:
$ git init
$ git clone https://github.com/eXascaleInfolab/ImputeGAP
$ cd ./ImputeGAP
A. Integration Steps (in Python)¶
Basic Features¶
Navigate to the
./imputegap/algorithms
directory.Create a new file by copying
mean_impute.py
and rename it with the name of your algorithm, e.g.,new_alg.py
.Rename the function
def mean_impute()
, e.g.,def new_alg()
.Replace the section under
# core of the algorithm
with your algorithm’s implementation. The algorithms should take as input theTimeSeries
object structure and should return anumpy.ndarray
matrix.Navigate to
./imputegap/recovery/imputation.py
:Copy the
class MeanImpute(BaseImputer)
into the corresponding family of algorithms.Rename the class. e.g.,
class NewAlg(BaseImputer)
.Change the value of the
algorithm
variable frommean_impute
tonew_alg
In the def
impute()
method, replace the call of the function to link into your new algorithm, e.g.,
from imputegap.algorithms.new_alg import new_alg self.recov_data = new_alg(self.incomp_data, params)
Advanced Features¶
I. Initialize default values¶
To set the default values of your algorithm, please update
./imputegap/env/default_values.toml
and add your configuration:
command:
[new_alg]
param_integer = 42
param_float = 0.42
param_string = "value_42"
Update the
./imputegap/tools/utils.py
file, and specify your configuration in theload_parameters
function.
II. Benchmark¶
To access the benchmarking features, please update ./imputegap/tools/utils.py
by adding your algorithm in the def config_impute_algorithm
function.
elif algorithm == "new_alg": imputer = Imputation.MyFamily.NewAlg(incomp_data)
Replace MyFamily with either: Statistics, MatrixCompletion, PatternSearch, MachineLearning, or DeepLearning
III. Optimizer¶
To enable the optimization module, please update ./imputegap/tools/algorithm_parameters.py
.
Open
./imputegap/tools/algorithm_parameters.py
copy paste lines 59 to 63 and update the algorithm name and parameters, e.g.,'new_alg': { "param_integer": tune.grid_search([i for i in range(2, 20 1)]), "param_float": tune.loguniform(1e-6, 1), "param_string": ["value_1", "value_n"] },
Add your parameters in the
def save_optimization()
function of the file./imputegap/tools/utils.py
to save the optimal parameters, line 820 to 825:if algorithm == "new_alg": params_to_save = { "param_integer": int(optimal_params[0]), "param_float": float(optimal_params[1]), "param_string": str(optimal_params[2]) }
IV. Update the call¶
Navigate to ./imputegap/recovery/imputation.py
:
Improve the imputation call of the NewAlg
class in the def impute()
function, and add the call of the optimizer and the default values of the parameters.
if params is not None:
param_integer, param_float, param_string = self._check_params(user_def, params) # call the optimizer
else:
param_integer, param_float, param_string = utils.load_parameters(query="default", algorithm=self.algorithm, verbose=self.verbose) # load the default values
self.recov_data = new_alg(incomp_data=self.incomp_data, param_integer=param_integer, param_float=param_float, param_string=param_string, logs=self.logs, verbose=self.verbose)
B. Integration Steps (other languages)¶
We will show how to adjust the integration wrapper in C++
Navigate to the
./imputegap/algorithms
directory.If not already done, convert your CPP/H files into a shared object format (
.so
) and place them in theimputegap/algorithms/lib
folder. a. Go to./imputegap/wrapper/AlgoCollection
and update the Makefile. Copy commands fromlibSTMVL.so
or modify them as needed. b. Optionally, copy your C++ project files into the directory. c. Generate the.so
file using themake
command:make your_lib_name
Optional: To include the .so file in the “in-built” directory, open a command line, navigate to the root directory, and execute the library build process:
rm -rf dist/ python setup.py sdist bdist_wheel
Rename
cpp_integration.py
to reflect your algorithm’s name.Modify the
native_algo()
function: a. Update the shared object parameter to match your shared library. b. Convert input parameters to the appropriate C++ types and pass them to your shared object methods. c. Convert the imputed matrix back to a numpy format.Adapt the template method
your_algo.py
with the appropriate parameters, ensuring compatibility with theTimeSeries
object and anumpy.ndarray
return type.Adapt the
./imputegap/recovery/imputation.py
: a. Add a function to call your new algorithm by copying and modifyingclass MeanImpute(BaseImputer)
as needed. You can copy-paste the class into the corresponding category of algorithms.Perform imputation as needed.
Example with C++ Algorithm¶
Once your cpp and h files are ready to be converted (you can look at ./imputegap/wrapper/AlgoCollection/shared/SharedLibCDREC.cpp
or ./imputegap/wrapper/AlgoCollection/shared/SharedLibCDREC.h
), create a .so
file for linux and windows, and a .dylib
file for MAC OS.
Modify the Makefile:
libCDREC.so: g++ -O3 -D ARMA_DONT_USE_WRAPPER -fPIC -rdynamic -shared -o lib_cdrec.so -Wall -Werror -Wextra -pedantic \ -Wconversion -Wsign-conversion -msse2 -msse3 -msse4 -msse4.1 -msse4.2 -fopenmp -std=gnu++14 \ Stats/Correlation.cpp Algorithms/CDMissingValueRecovery.cpp Algebra/Auxiliary.cpp \ Algebra/CentroidDecomposition.cpp shared/SharedLibCDREC.cpp \ -lopenblas -larpack
Generate the shared library:
make libCDREC.so
Place the generated
.so
file inimputegap/algorithms/lib
Optional: To include the .so file in the “in-built” directory:
rm -rf dist/ python setup.py sdist bdist_wheel
Modify the Makefile:
libCDREC.so: g++ -O3 -D ARMA_DONT_USE_WRAPPER -fPIC -rdynamic -shared -o lib_cdrec.so -Wall -Werror -Wextra -pedantic \ -Wconversion -Wsign-conversion -msse2 -msse3 -msse4 -msse4.1 -msse4.2 -fopenmp -std=gnu++14 \ Stats/Correlation.cpp Algorithms/CDMissingValueRecovery.cpp Algebra/Auxiliary.cpp \ Algebra/CentroidDecomposition.cpp shared/SharedLibCDREC.cpp \ -lopenblas -larpack
Generate the shared library:
make libCDREC.so
Place the generated
.so
file inimputegap/algorithms/lib
Optional: To include the .so file in the “in-built” directory:
rm -rf dist/ python setup.py sdist bdist_wheel
Modify the Makefile:
libCDREC.dylib: clang++ -dynamiclib -O3 -fPIC -std=c++17 -o lib_cdrec.dylib \ -I/opt/homebrew/include \ -L/opt/homebrew/lib \ -L/opt/homebrew/opt/openblas/lib \ Stats/Correlation.cpp Algorithms/CDMissingValueRecovery.cpp Algebra/Auxiliary.cpp \ Algebra/CentroidDecomposition.cpp shared/SharedLibCDREC.cpp \ -larmadillo -lopenblas -larpack
Generate the shared library:
make libCDREC.dylib
Place the generated
.dylib
file inimputegap/algorithms/lib
Optional: To include the .dylib file in the “in-built” directory:
rm -rf dist/ python setup.py sdist bdist_wheel
Wrapper
In
imputegap/algorithms/cpp_integration.py
, update the function name and parameter count, and ensure the.so
file matches:def native_cdrec(__py_matrix, __py_rank, __py_epsilon, __py_iterations): shared_lib = utils.load_share_lib("lib_cdrec") # in-build files # shared_lib = utils.load_share_lib("./your_path/lib_cdrec.so") # external files
Convert variables to corresponding C++ types:
__py_n = len(__py_matrix); __py_m = len(__py_matrix[0]); assert (__py_rank >= 0); assert (__py_rank < __py_m); assert (__py_epsilon > 0); assert (__py_iterations > 0); __ctype_size_n = __native_c_types_import.c_ulonglong(__py_n); __ctype_size_m = __native_c_types_import.c_ulonglong(__py_m); __ctype_rank = __native_c_types_import.c_ulonglong(__py_rank); __ctype_epsilon = __native_c_types_import.c_double(__py_epsilon); __ctype_iterations = __native_c_types_import.c_ulonglong(__py_iterations); __ctype_matrix = __marshal_as_native_column(__py_matrix);
Call the C++ algorithm with the required parameters:
shared_lib.cdrec_imputation_parametrized(__ctype_matrix, __ctype_size_n, __ctype_size_m, __ctype_rank, __ctype_epsilon, __ctype_iterations);
Convert the imputed matrix back to
numpy
:__py_imputed_matrix = __marshal_as_numpy_column(__ctype_matrix, __py_n, __py_m); return __py_imputed_matrix;
Method Implementation
In
imputegap/algorithms/cpp_integration.py
, create or adapt a generic method for your needs:def cdrec(contamination, truncation_rank, iterations, epsilon, logs=True, lib_path=None): start_time = time.time() # Record start time # Call the C++ function to perform recovery imputed_matrix = native_cdrec(contamination, truncation_rank, epsilon, iterations) end_time = time.time() if logs: print(f"\n\t\t> logs, imputation cdrec - Execution Time: {(end_time - start_time):.4f} seconds\n") return imputed_matrix
Imputer Class
Add your algorithm to the catalog in
./imputegap/recovery/imputation.py
Copy and modify
class MeanImpute(BaseImputer)
to fit your requirements:class MatrixCompletion: class CDRec(BaseImputer): algorithm = "cdrec" def impute(self, user_defined=True, params=None): self.imputed_matrix = cdrec(contamination=self.infected_matrix, truncation_rank=rank, iterations=iterations, epsilon=epsilon, logs=self.logs) return self