Handling sparse categorical data for Bayesian optimization

Question

I am working with Bayesian Optimization. Before using it to guide the selection process of my real-world wet-lab chemical experiments where I'll have much more flexibility, I want to evaluate its effectiveness on a prior training dataset in my field of study. However, this training dataset is not the canonical use-case for Bayesian Optimization for the following reasons:

Some of the variables are categorical; to better represent the categorical variables rather than a simple one-hot encoding, I have provided embeddings in tSNE to better capture relative differences. The model, however, still ultimately should suggest one of the provided discrete categories.
It is sparse. For context, each row represents a different chemical mixture, where a single functional target value is measured for each mixture (i.e., 'target_value'). I have a total of 7 different variables to learn from, i.e.,

component1_structure (3 tSNE dimensions, but effectively categorical)
component1_ratio (1 dimension, continuous)
component1_weight (1 dimension, continuous)
component2_structure (2 tSNE dimensions, but effectively categorical)
component2_ratio (1 dimension, continuous)
component3_ratio (1 dimension, continuous)
component4_ratio (1 dimension, continuous)

This combinatorial space is theoretically very large. However, I only have 1801 different mixtures (i.e., combinations of these variables) with functional data; I want the model to strictly explore/suggest among these 1801 mixtures where I have functional data.

Rather than sequential optimization, each round of experiments will have multiple (i.e., 10) different mixtures to test. Hence, as described in this paper, I incorporate the Kriging believer algorithm.

My goal is to demonstrate that Bayesian Optimization to search this training data is effective; hence, ultimately motivating using it for an actual real-world dataset dynamically (where the second constraint described above fortunately won't apply). Nevertheless, I want to conceive of a way to use Bayesian Optimization for this preexisting training data.

Here is an implementation example on Colab: https://colab.research.google.com/drive/19cY54d6M7jYl1fRDpL7oGVPF832JU247?usp=sharing

My current strategy as you'll see on the Colab is using a weighted distance metric to basically identify the unique point (among the 1801 not already selected) closest to what the model suggests. However, I have doubts this is a sound approach.

Suggestions on how to improve this strategy and/or code implementations would be very helpful.

Thanks in advance!

Handling sparse categorical data for Bayesian optimization

0 Answers0