# download_file_mapper Mapper to download URL files to local files or load them into memory. This operator downloads files from URLs and can either save them to a specified directory or load the contents directly into memory. It supports downloading multiple files concurrently and can resume downloads if the `resume_download` flag is set. The operator processes nested lists of URLs, flattening them for batch processing and then reconstructing the original structure in the output. If both `save_dir` and `save_field` are not specified, it defaults to saving the content under the key `image_bytes`. The operator logs any failed download attempts and provides error messages for troubleshooting. 下载URL文件到本地文件或将它们加载到内存中的映射器。 该算子从URL下载文件,并可以将它们保存到指定目录或直接将内容加载到内存中。它支持并发下载多个文件,并且如果设置了`resume_download`标志,则可以恢复下载。该算子处理嵌套的URL列表,将其展平以进行批处理,然后在输出中重建原始结构。如果`save_dir`和`save_field`均未指定,默认情况下将内容保存在`image_bytes`键下。该算子记录任何失败的下载尝试,并提供错误消息以便故障排除。 Type 算子类型: **mapper** Tags 标签: cpu ## 🔧 Parameter Configuration 参数配置 | name 参数名 | type 类型 | default 默认值 | desc 说明 | |--------|------|--------|------| | `download_field` | | `None` | The filed name to get the url to download. | | `save_dir` | | `None` | The directory to save downloaded files. | | `save_field` | | `None` | The filed name to save the downloaded file content. | | `resume_download` | | `False` | Whether to resume download. if True, skip the sample if it exists. | | `timeout` | | `30` | Timeout for download. | | `max_concurrent` | | `10` | Maximum concurrent downloads. | | `args` | | `''` | extra args | | `kwargs` | | `''` | extra args | ## 📊 Effect demonstration 效果演示 not available 暂无 ## 🔗 related links 相关链接 - [source code 源代码](../../../data_juicer/ops/mapper/download_file_mapper.py) - [unit test 单元测试](../../../tests/ops/mapper/test_download_file_mapper.py) - [Return operator list 返回算子列表](../../Operators.md)