Sampler | Twinkle

Sampler

Mon, 01 Jan 0001 00:00:00 +0000

Sampler is a component in Twinkle for generating model outputs, primarily used for sample generation in RLHF training. The sampler supports multiple inference engines, including vLLM and native PyTorch.

Basic Interface

class Sampler(ABC):

 @abstractmethod
 def sample(
 self,
 inputs: Union[InputFeature, List[InputFeature], Trajectory, List[Trajectory]],
 sampling_params: Optional[SamplingParams] = None,
 adapter_name: str = '',
 *,
 num_samples: int = 1,
 ) -> List[SampleResponse]:
 """Sample from given inputs"""
 ...

 def add_adapter_to_model(self, adapter_name: str, config_or_dir, **kwargs):
 """Add LoRA adapter"""
 ...

 def set_template(self, template_cls: Union[Template, Type[Template], str], **kwargs):
 """Set template"""
 ...

The core method of the sampler is sample, which accepts input data and returns generated samples.

Available Samplers

Twinkle provides two sampler implementations:

vLLMSampler

vLLMSampler uses the vLLM engine for efficient inference, supporting high-throughput batch sampling.

High Performance: Uses PagedAttention and continuous batching
LoRA Support: Supports dynamic loading and switching of LoRA adapters
Multi-Sample Generation: Can generate multiple samples per prompt
Tensor Parallel: Supports tensor parallelism to accelerate large model inference

See:

TorchSampler

TorchSampler uses native PyTorch and transformers for inference, suitable for small-scale sampling or debugging.

Easy to Use: Based on transformers’ standard interface
High Flexibility: Easy to customize and extend
Low Memory Footprint: Suitable for small-scale sampling

See:

How to Choose

vLLMSampler: Suitable for production environments and large-scale training that require high throughput
TorchSampler: Suitable for debugging, small-scale experiments, or custom requirements

In RLHF training, samplers are typically separated from the Actor model, using different hardware resources to avoid interference between inference and training.

vLLMSampler

Mon, 01 Jan 0001 00:00:00 +0000

vLLMSampler uses the vLLM engine for efficient inference, supporting high-throughput batch sampling.

Usage Example

from twinkle.sampler import vLLMSampler
from twinkle.data_format import SamplingParams
from twinkle import DeviceMesh

# Create sampler
sampler = vLLMSampler(
 model_id='ms://Qwen/Qwen3.5-4B',
 device_mesh=DeviceMesh.from_sizes(dp_size=2, tp_size=2),
 remote_group='sampler_group'
)

# Add LoRA
sampler.add_adapter_to_model('my_lora', 'path/to/lora')

# Set sampling parameters
params = SamplingParams(
 max_tokens=512,
 temperature=0.7,
 top_p=0.9,
 top_k=50
)

# Perform sampling
responses = sampler.sample(
 trajectories,
 sampling_params=params,
 adapter_name='my_lora',
 num_samples=4 # Generate 4 samples per prompt
)

Features

High Performance: Achieves high throughput using PagedAttention and continuous batching
LoRA Support: Supports dynamic loading and switching of LoRA adapters
Multi-Sample Generation: Can generate multiple samples per prompt
Tensor Parallel: Supports tensor parallelism to accelerate large model inference

Remote Execution

vLLMSampler supports the @remote_class decorator and can run in Ray clusters:

import twinkle
from twinkle import DeviceGroup, DeviceMesh
from twinkle.sampler import vLLMSampler

# Initialize Ray cluster
device_groups = [
 DeviceGroup(name='sampler', ranks=4, device_type='cuda')
]
twinkle.initialize('ray', groups=device_groups)

# Create remote sampler
sampler = vLLMSampler(
 model_id='ms://Qwen/Qwen3.5-4B',
 device_mesh=DeviceMesh.from_sizes(dp_size=4),
 remote_group='sampler'
)

# sample method executes in remote worker
responses = sampler.sample(trajectories, sampling_params=params)

Environment Variables

TWINKLE_VLLM_IPC_TIMEOUT_S: Controls the timeout (in seconds) for the IPC channel (ZMQ REQ/REP) between vLLMSampler and the vLLM worker extension. Default is 300. This value must be greater than 0.

In RLHF training, vLLMSampler is typically separated from the Actor model, using different hardware resources to avoid interference between inference and training.

TorchSampler

Mon, 01 Jan 0001 00:00:00 +0000

TorchSampler uses native PyTorch and transformers for inference, suitable for small-scale sampling or debugging.

Usage Example

from twinkle.sampler import TorchSampler
from twinkle import DeviceMesh

sampler = TorchSampler(
 model_id='ms://Qwen/Qwen3.5-4B',
 device_mesh=DeviceMesh.from_sizes(dp_size=1),
)

responses = sampler.sample(trajectories, sampling_params=params)

Features

Easy to Use: Based on transformers’ standard interface
High Flexibility: Easy to customize and extend
Low Memory Footprint: Suitable for small-scale sampling

Use Cases

TorchSampler is particularly suitable for:

Debugging and Development: Simple and straightforward, easy to debug
Small-Scale Experiments: Scenarios that don’t require high throughput
Custom Requirements: Scenarios that need to modify sampling logic
Resource-Constrained: Environments with limited memory or GPU resources

For production environments or large-scale training, it’s recommended to use for better performance.