My quick lazy manual transcription:
What data was used to train Sora?
We used publicly available data and licensed data.
So, videos on YouTube?
I’m actually not sure about that.
OK, videos from Facebook? Instagram?
You know if they were publicly available, um yeah, publicly available to use there might be the data but I’m not sure. I’m not confident about it.
What about Shutterstock? I know you guys have a deal with them.
I’m just not gonna go into the details of the data that was used but it was publicly available or licensed data.
EDIT: Please help, can’t figure out how preserve line breaks. Edit: Improved it a bit.
Two spaces on the end.
-–
Yada yada verse
Yada yada verse
Yada yada verse
Yada yada chorus
Yada yada chorus
Yada yada chorus
Lemmy’s markup language is based on the CommonMark spec.
A line ending (not in a code span or HTML tag) that is preceded by two or more spaces and does not occur at the end of a block is parsed as a hard line break
They copied what reddit uses. As for why reddit does it that way - I have no idea.