FrozenLake

This guide helps you quickly set up and run FrozenLake experiments with ReMe integration. The FrozenLake experiment demonstrates how task memory can improve an agent's performance in a navigation task.

Environment Setup

1. Clone the Repository

git clone https://github.com/modelscope/ReMe.git
cd ReMe/cookbook/frozenlake

2. FrozenLake Environment Setup

Install Gymnasium for FrozenLake environment:

pip install gymnasium

This will install: - gymnasium - for the FrozenLake environment - ray - for parallel execution - openai - for LLM API access - other dependencies

3. Start ReMe Service

If you haven't installed ReMe yet, follow these steps:

# Go back to the project root
cd ../..

# Create a virtual environment (optional)
conda create -p ./reme-env python==3.10
conda activate ./reme-env

# Install ReMe
pip install .

Launch the ReMe service to enable memory library functionality:

reme \
  backend=http \
  http.port=8002 \
  llm.default.model_name=qwen-max-2025-01-25 \
  embedding_model.default.model_name=text-embedding-v4 \
  vector_store.default.backend=local

Add your api key for agent:

export OPENAI_API_KEY="xxx"
export OPENAI_BASE_URL="xxx"

Run Experiments

1. Quick Test: Performance Evaluation Only (Default)

Run the main experiment script to test agent performance using existing memory:

cd cookbook/frozenlake
python run_frozenlake.py

What this does: - Tests the agent on randomly generated FrozenLake maps - Uses the default memory library (frozenlake_no_slippery) - Evaluates performance with multiple runs for statistical significance - Results are automatically saved to ./exp_result/ directory

2. Advanced: Training + Testing (Memory Generation)

To create new memories through training and then test performance:

You can modify the experiment parameters directly in the run_frozenlake.py file. The main parameters are in the main() function:

def main():
    experiment_name = "frozenlake_no_slippery"  # Name of the experiment
    max_workers = 4                           # Number of parallel workers
    training_runs = 4                         # Runs per training map
    num_training_maps = 50                    # Number of maps for training
    test_runs = 1                             # Runs per test configuration
    num_test_maps = 100                       # Number of test maps
    is_slippery = False                       # Enable slippery mode

Key parameters to consider: - experiment_name: Used as the workspace ID for task memory - is_slippery: When True, agent movement becomes stochastic (harder) - max_workers: Increase for faster execution on multi-core systems

3. View Experiment Results

After running experiments, analyze the statistical results:

python run_exp_statistic.py

What this script does: - Processes all result files in ./exp_result/ - Calculates success rates and performance metrics - Generates a summary table showing performance comparisons - Analyzes the effect of task memory on performance - Saves results to frozenlake_summary.csv

Understanding the Implementation

Key Components

FrozenLakeReactAgent (frozenlake_react_agent.py)
Implements a ReAct agent that interacts with the FrozenLake environment
Handles task memory retrieval and storage
Uses LLM (via OpenAI API) for decision making
Experiment Runner (run_frozenlake.py)
Manages the overall experiment flow
Handles training and testing phases
Uses Ray for parallel execution
Map Manager (map_manager.py)
Generates and manages test maps
Ensures consistent evaluation across experiments
Statistics Analyzer (run_exp_statistic.py)
Processes experiment results
Calculates performance metrics
Generates comparative analysis

Output Files

./exp_result/*_training.jsonl: Results from training phase
./exp_result/*_test_no_memory.jsonl: Test results without task memory
./exp_result/*_test_with_memory.jsonl: Test results with task memory
./exp_result/frozenlake_summary.csv: Statistical summary

Task Memory Mechanism

The task memory system works as follows:

Memory Creation: During training, successful trajectories are sent to the ReMe service
Memory Retrieval: During testing, the agent queries relevant memories based on the current map
Memory Application: The agent uses retrieved memories to guide its decision-making

The experiment demonstrates how task memory can significantly improve performance, especially in challenging environments like the slippery FrozenLake.