Is anyone actually surprised by this?

You are viewing a single thread.
View all comments View context
11 points

The runner is open source, and that’s what matter in this discussion. If you host the model on your own servers, you can ensure that no corporation (american or Chinese) has access to your data. Access to the training code and data is irrelevant here.

permalink
report
parent
reply

The guys at HF (and many others) appear to have a different understanding of Open Source.

As the Open Source AI definition says, among others:

Data Information: Sufficiently detailed information about the data used to train the system so that a skilled person can build a substantially equivalent system. Data Information shall be made available under OSI-approved terms.

  • In particular, this must include: (1) the complete description of all data used for training, including (if used) of unshareable data, disclosing the provenance of the data, its scope and characteristics, how the data was obtained and selected, the labeling procedures, and data processing and filtering methodologies; (2) a listing of all publicly available training data and where to obtain it; and (3) a listing of all training data obtainable from third parties and where to obtain it, including for fee.

Code: The complete source code used to train and run the system. The Code shall represent the full specification of how the data was processed and filtered, and how the training was done. Code shall be made available under OSI-approved licenses.

  • For example, if used, this must include code used for processing and filtering data, code used for training including arguments and settings used, validation and testing, supporting libraries like tokenizers and hyperparameters search code, inference code, and model architecture.

Parameters: The model parameters, such as weights or other configuration settings. Parameters shall be made available under OSI-approved terms.

  • The licensing or other terms applied to these elements and to any combination thereof may contain conditions that require any modified version to be released under the same terms as the original.

These three components -data, code, parameter- shall be released under the same condition.

permalink
report
parent
reply

Privacy

!privacy@lemmy.ml

Create post

A place to discuss privacy and freedom in the digital world.

Privacy has become a very important issue in modern society, with companies and governments constantly abusing their power, more and more people are waking up to the importance of digital privacy.

In this community everyone is welcome to post links and discuss topics related to privacy.

Some Rules

  • Posting a link to a website containing tracking isn’t great, if contents of the website are behind a paywall maybe copy them into the post
  • Don’t promote proprietary software
  • Try to keep things on topic
  • If you have a question, please try searching for previous discussions, maybe it has already been answered
  • Reposts are fine, but should have at least a couple of weeks in between so that the post can reach a new audience
  • Be nice :)

Related communities

much thanks to @gary_host_laptop for the logo design :)

Community stats

  • 7K

    Monthly active users

  • 3.2K

    Posts

  • 86K

    Comments