4 points

Say’s only available in the US, used a VPN to sign up, good to have alternatives!

permalink
report
reply
27 points
*

I tested it out just now. It seems to be much better than gpt 3.5, but just a little worse than gpt 4.

I tested this: “Explain the plot of Cinderella in a sentence where each word has to begin with the next letter in the alphabet from A to Z, without repeating any letters.”

GPT-4 is able to always get it right first try. It gave me: “A beautiful Cinderella, determined, endures, finds glass heel, invites joy; king’s love magically nurtures opulence, prince quests, restores slipper, transformation unveils, validates wistful xerophyte yearning zealously."

Claude 2.0 took 4 tries to get it right. It gave me these: “A beautiful young girl called Cinderella did evil for her fair godmother’s horrid instructions just kicking laundry maidens’ nice outfits purportedly quelling ridiculous step-sisters’ terrible underhanded villainy with x-rays yielding zilch.” and then “Awaited ball causes Downtrodden Ella’s fairy godmother’s help. I’ll just knock loud mean old prune’s quest, righteously sabotaging two ugly vicious women. Xanthippe yells, “Zounds!”” and then “Abandoned by cruel destiny, Ella finds godmother’s hidden invitation juxtaposing king’s lavish ball. Magic occurs, placing radiant slipper to unveil victorious wife. Xerxes yearns, “Zounds, she’s found!”” and then finally “Awaiting ball, Cinderella dreams every Friday godmother helps. Inside jousting knights leer maidens’ nice outfits. Prince quietly rescues sweetheart, though ugly vengeful wife x-rays yellow zebra.”

After that, i tried this prompt: “Andrew is free from 11 am to 3 pm, Joanne is free from noon to 2 pm and then 3:30 pm to 5 pm. Hannah is available at noon for half an hour, and then 4 pm to 6 pm. What are some options for start times for a 30 minute meeting for Ca Given the availability of Andrew, Joanne, and Hannah, we need to find overlapping times for a 30-minute meeting.

After cross-referencing their schedules, the potential start times that accommodate all three schedules are:

12:00 pm to 12:30 pm: This slot is within Andrew’s free time (11 am - 3 pm), Joanne’s free time (12 pm - 2 pm), and Hannah’s first free half hour at noon. There is no other slot that fits within all three schedules, due to Joanne’s break from 2 pm to 3:30 pm and Hannah’s availability starting again only at 4 pm, when Andrew is no longer available. Therefore, there is just one option for a 30-minute meeting that works for all three of them.”

Claude 2.0 responded with: “Okay, based on the availability you provided, here are some options for 30 minute meeting start times that work for Andrew, Hannah and Joanne: 11:30 am - This works because Andrew is free starting at 11am, Hannah is available at noon for half an hour, and Joanne is free from noon to 2pm. 3:30 pm - This works because Andrew is free until 3pm, Hannah is free from 4pm to 6pm, and Joanne is free from 3:30pm to 5pm. 4:30 pm - This also works for the same reasons as 3:30pm. Andrew is free until 3pm, Hannah is free from 4pm to 6pm, and Joanne from 3:30pm to 5pm. 5 pm - This could work as well since Andrew is free until 3pm, Hannah until 6pm, and Joanne until 5pm. It’s pushing the end of Joanne’s availability though. So in summary, the best options are 11:30 am, 3:30 pm or 4:30 pm to ensure there is overlap with all three people’s availability. Let me know if you need any clarification or have additional constraints!”

I told Claude it is incorrect, and it proceeded to correct itself and tell me there is NO available time overlap. After several more attempts, it eventually gave up and asked me for the correct answer.

So although Claude 2.0 is much better than GPT-3.5, it is still worse than GPT-4 for complex reasoning.

EDIT: I just realized that Claude has a 100k token context window. This is SIGNIFICANTLY more than GPT-4’s normal 8k context and 32k api context. This immediately makes it much more useful, as it can parse ~75,000 words. I also have not tested Claude’s creative, programming, mathematic, etc., skills. But regardless of how that compares to GPT’s, it doesn’t matter because of the sheer quantity of tokens that can be parsed by Claude.

permalink
report
reply
1 point

That’s interesting. I haven’t tried its reasoning skills. I did try playing Jeopardy! with it though, and it showed a lot of improvement from previous attempts. Usually chatbots are very bad at Jeopardy, telling you just about any answer is correct, but Claude 2 did really well, explaining why I was wrong several times. I did ask it to provide an explanation about whether my answers were right or wrong in the initial prompt, so that might’ve made a difference though.

permalink
report
parent
reply
4 points
*

Just tried it out, withe some questions about ceramic firing in a electric kiln. Seems to have similar accuracy to chatgpt, maybe closer to gpt4.

It’s not clear when using it what version it’s on, so this may have been Claude 1, I’m unsure where to check.

permalink
report
reply
3 points
*
Deleted by creator
permalink
report
parent
reply
2 points

Hard to believe something that feels like it’s lying to you all the time. I asked it about a topic that I’m in and have a website about, it told me the website was hypothetical. It got it wrong twice, even after it agreed it was wrong, and then told me the wrong thing again.

Is this what they consider hallucinations?

permalink
report
parent
reply
2 points

I asked perplexity that same question. It kind of did better, it made no errors in temperature’s like the others do. It just left those details out, initially. After asking follow-up questions it answered correctly, but also gave some unnecessary and unrelated information.

I didn’t use any of the prompts, I was asking about saggar firing processes and temps, the prompts were just ceramics related.

permalink
report
parent
reply
1 point
*
Deleted by creator
permalink
report
parent
reply
9 points

Seems to only be available in the US and UK for now tho.

permalink
report
reply
2 points

Luckily it doesn’t need a phone number like OpenAI so you can just VPN it.

permalink
report
parent
reply
1 point
*

I tried using a VPN and it still didn’t allow me to sign up.

permalink
report
parent
reply
1 point

I’ve used ProtonVPN and managed to sign in easily

permalink
report
parent
reply
8 points

You shouldn’t have to though. The whole “only available if you happen to live in X” is so much bs when it comes to things like this. Sure if it was a giveaway and needed to be shipped, I could understand. But a website being locked away to only certain regions is ridiculous.

permalink
report
parent
reply
6 points

I suspect it has to do with legal compliance. Only available in US = only needing to comply with US law.

permalink
report
parent
reply
15 points

Woah, this is huge. Claude 1 was already more useful and coherent than ChatGPT (3.5, not 4). The big point was that it wasn’t available to everyone. This could really steal some marketshare from OpenAI if things go well.

permalink
report
reply
-1 points

What market though? These AI chatbots seem like money sinks for a potential development into something useful in the distant future.

permalink
report
parent
reply
11 points

The market of people buying APIs for popular chatbots. Right now OpenAI’s GPT is overwhelmingly the most popular option and pretty expensive. You constantly see a lot of “powered by GPT” features on products now, but hopefully Claude can provide some better competition.

permalink
report
parent
reply
0 points

Fair, I don’t see any real use for these right now. Chatbots just seem like a gimmick that can help people cheat in school (not that I give a fuck about that). Probably just the online circles we run in, what sorta things are powered by GPT? Customer support and stuff?

permalink
report
parent
reply

Technology

!technology@lemmy.world

Create post

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


Community stats

  • 16K

    Monthly active users

  • 13K

    Posts

  • 557K

    Comments