Mozilla Firefox new alt-text generator powered by "fully private on-device AI model"(hacks.mozilla.org)

posted 5 months ago

frogman [he/him]@beehaw.org

technology@beehaw.org

70 commentshide report

New accessibility feature coming to Firefox, an “AI powered” alt-text generator.

"Starting in Firefox 130, we will automatically generate an alt text and let the user validate it. So every time an image is added, we get an array of pixels we pass to the ML engine and a few seconds after, we get a string corresponding to a description of this image (see the code).

…

Our alt text generator is far from perfect, but we want to take an iterative approach and improve it in the open.

…

We are currently working on improving the image-to-text datasets and model with what we’ve described in this blog post…"

Sort:

Hot Top Controversial New Old

[ - ]

leanleft@lemmy.ml

2 points

5 months ago

There are way more companies who want to text-mine user content than there are blind people using the internet to read my content.

permalink

report

[ - ]

Zworf@beehaw.org

5 points

5 months ago

One thing I’d love to see in Firefox is a way to offload the translation engine to my local ollama server. This way I can get much better translations but still have everything private.

permalink

report

[ - ]

Kissaki@beehaw.org

4 points

5 months ago

So, planned experimentation and availabiltiy

PDF editor when adding an image in Firefox 130
PDF reading
[hopefully] general web browsing

Sounds like a good plan.

Once quantized, these models can be under 200MB on disk, and run in a couple of seconds on a laptop – a big reduction compared to the gigabytes and resources an LLM requires.

While a reasonable size for Laptop and desktop, the couple of seconds time could still be a bit of a hindrance. Nevertheless, a significant unblock for blind/text users.

I wonder what it would mean for mobile. If it’s an optional accessibility feature, and with today’s smartphones storage space I think it can work well though.

Running inference locally with small models offers many advantages:

They list 5 positives about using local models. On a blog targeting developers, I would wish if not expect them to list the downsides and weighing of the two sides too. As it is, it’s promotional material, not honest, open, fully informing descriptions.

While they go into technical details about the architecture and technical implementation, I think the negatives are noteworthy, and the weighing could be insightful for readers.

So every time an image is added, we get an array of pixels we pass to the ML engine

~~An array of pixels doesn’t make sense to me. Images can have different widths, so linear data with varying sectioning content would be awful for training.~~

~~I have to assume this was a technical simplification or unintended wording mistake for the article.~~

permalink

report

[ - ]

grrgyle@slrpnk.net

2 points

5 months ago

I imagine it’s a 2D array? So width would be captured by uhh like a[N].len.

It could be I’m misunderstanding you, because not not sure what you mean by:

linear data with varying sectioning content

permalink

report

parent

[ - ]

Kissaki@beehaw.org

2 points

5 months ago

Looking at Wikipedia on arrays, I think I’m just not used to array as terminology for multi-dimensional data structures. TIL

permalink

report

parent

[ - ]

pheet@sopuli.xyz

1 point

5 months ago

Might be a significant issue if more applications adopt these kind of festures and can’t share the resources in a meaningful way.

permalink

report

parent

[ - ]

Kissaki@beehaw.org

6 points

5 months ago

From your OP description:

EDIT: the AI creates an initial description, which then receives crowdsourced additional context per-image to improve generated output. look for the “Example Output” heading in the article.

That’s wrong. There is nothing crowd sourced. What you read in the article is that when you add an image in the PDF editor it can generate an alt text for the image, and you as a user validate and confirm it. That’s still local PDF editing though.

The caching part is about the model dataset, which is static.

permalink

report

[ - ]

frogman [he/him]@beehaw.orgOP

1 point

5 months ago

my bad, i misunderstood. thanks.

permalink

report

parent

[ - ]

kandoh@reddthat.com

7 points

5 months ago

This seems like a very useful feature, and a great benefit to blind web users

permalink

report

Technology

!technology@beehaw.org

Create post

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:

This community’s icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

Community stats

2.7K
Monthly active users
3K
Posts
57K
Comments

Community stats

Community moderators