11月智能汽车AI挑战赛——智能驾驶汽车虚拟仿真视频数据理解
  3XDZIv8qh70z 2023年11月19日 18 0

赛题理解:

赛题任务:

  • 输入:元宇宙仿真平台生成的前视摄像头虚拟视频数据(8-10秒左右);
  • 输出:对视频中的信息进行综合理解,以指定的json文件格式,按照数据说明中的关键词(key)填充描述型的文本信息(value,中文/英文均可以);

赛题只提供了测试集,所以我们要通过预训练模型预测,或者直接使用外部数据训练后进行预测

要解题,先对视频进行抽帧,接下来就要将图像与文本进行匹配

导入所需第三方库
import paddle
from PIL import Image
from clip import tokenize, load_model
import glob, json, os
import cv2
from PIL import Image
from tqdm import tqdm_notebook
import numpy as np
from sklearn.preprocessing import normalize
import matplotlib.pyplot as plt
导入预训练模型 匹配文本 提交格式
model, transforms = load_model('ViT_B_32', pretrained=True)

en_match_words = {
"scerario" : ["suburbs","city street","expressway","tunnel","parking-lot","gas or charging stations","unknown"],
"weather" : ["clear","cloudy","raining","foggy","snowy","unknown"],
"period" : ["daytime","dawn or dusk","night","unknown"],
"road_structure" : ["normal","crossroads","T-junction","ramp","lane merging","parking lot entrance","round about","unknown"],
"general_obstacle" : ["nothing","speed bumper","traffic cone","water horse","stone","manhole cover","nothing","unknown"],
"abnormal_condition" : ["uneven","oil or water stain","standing water","cracked","nothing","unknown"],
"ego_car_behavior" : ["slow down","go straight","turn right","turn left","stop","U-turn","speed up","lane change","others"],
"closest_participants_type" : ["passenger car","bus","truck","pedestrain","policeman","nothing","others","unknown"],
"closest_participants_behavior" : ["slow down","go straight","turn right","turn left","stop","U-turn","speed up","lane change","others"],
}

submit_json = {
    "author" : "abc" ,
    "time" : "231011",
    "model" : "model_name",
    "test_results" : []
}
导入视频,保存路径,并使用opencv库对视频进行抽帧,并把图片形状转化成适合模型的输入
paths = glob.glob('./初赛测试视频/*')
paths.sort()

for video_path in paths:
    print(video_path)
clip_id = video_path.split('/')[-1]
cap = cv2.VideoCapture(video_path)
img = cap.read()[1]
image = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
image = Image.fromarray(image)
image = transforms(image).unsqueeze(0)

single_video_result = {
    "clip_id": clip_id,
    "scerario" : "cityroad",
    "weather":"unknown",
    "period":"night",
    "road_structure":"ramp",
    "general_obstacle":"nothing",
    "abnormal_condition":"nothing",
    "ego_car_behavior":"turning right",
    "closest_participants_type":"passenger car",
    "closest_participants_behavior":"braking"
}
使用预训练模型进行分类,并存储到结果json中
for keyword in en_match_words.keys():
    if keyword not in ["weather", "road_structure"]:
        continue
        
    texts = np.array(en_match_words[keyword])

    with paddle.no_grad():
        logits_per_image, logits_per_text = model(image, tokenize(en_match_words[keyword]))
        probs = paddle.nn.functional.softmax(logits_per_image, axis=-1)

    probs = probs.numpy()        
    single_video_result[keyword] = texts[probs[0].argsort()[::-1][0]]
    
submit_json["test_results"].append(single_video_result)
with open('clip_result.json', 'w', encoding='utf-8') as up:
    json.dump(submit_json, up, ensure_ascii=False)

baseline中使用的模型为一种基于对比文本-图像对的预训练方法。CLIP用文本作为监督信号来训练可迁移的视觉模型,使得最终模型的zero-shot效果堪比ResNet50,泛化性非常好,预训练好的模型不需要进行训练与微调就可以实现特别好的效果

跑完baseline得分为93.00,暂未尝试进行提升

对于提分步骤在看完直播课后得到以下方式进行提升:

1.对于标签的选择还可以进行调整,选择适合从图片角度进行识别的标签;

2.clip的版本不同会对版本造成些许影响;

3.可以考虑使用多次抽帧,进行多次判断最后投票得出分类;

4.clip的prompt文本会对效果产生影响;

5.思考如何让clip回复unknown


【版权声明】本文内容来自摩杜云社区用户原创、第三方投稿、转载,内容版权归原作者所有。本网站的目的在于传递更多信息,不拥有版权,亦不承担相应法律责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@moduyun.com

  1. 分享:
最后一次编辑于 2023年11月19日 0

暂无评论

推荐阅读
3XDZIv8qh70z
作者其他文章 更多