datacomp_large数据集下载,snapshot_download使用示例
2023-12-22 14:14:58
数据集下载:https://huggingface.co/datasets/mlfoundations/datacomp_large
import os
from huggingface_hub import snapshot_download
def download_parquet_files(repo_id, output_dir):
"""
Download .parquet files from a Hugging Face dataset repository using snapshot_download.
Args:
- repo_id (str): The ID of the Hugging Face dataset repository.
- output_dir (str): Directory where the .parquet files will be saved.
"""
if not os.path.exists(output_dir):
os.makedirs(output_dir)
cache_dir = os.path.join(output_dir, "cache")
hf_snapshot_args = dict(
repo_id=repo_id,
allow_patterns="*.parquet",
local_dir=output_dir,
cache_dir=cache_dir,
local_dir_use_symlinks=False,
repo_type="dataset",
resume_download=True,
max_workers=16
)
snapshot_download(**hf_snapshot_args)
if __name__ == "__main__":
REPO_ID = "mlfoundations/datacomp_large" # Replace with your dataset repo ID
OUTPUT_DIR = "/data/xiedong/datasets_meizu/datacomp_all/large/metadata" # Replace with your desired output directory
download_parquet_files(REPO_ID, OUTPUT_DIR)
文章来源:https://blog.csdn.net/x1131230123/article/details/135150482
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。 如若内容造成侵权/违法违规/事实不符,请联系我的编程经验分享网邮箱:veading@qq.com进行投诉反馈,一经查实,立即删除!
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。 如若内容造成侵权/违法违规/事实不符,请联系我的编程经验分享网邮箱:veading@qq.com进行投诉反馈,一经查实,立即删除!