openai / evals Public

Notifications You must be signed in to change notification settings
Fork 2.6k
Star 14.7k

Code
Issues 88
Pull requests 45
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Pull requests: openai/evals

Labels 10 Milestones 0

New pull request New

45 Open 1,238 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Add support for new models (gpt-4o, o1-preview and o1-mini)

#1558 opened Sep 15, 2024 by sakher

Loading…

Bugfixing completion stats break with new reasoning tokens release

#1555 opened Sep 13, 2024 by lucapericlp

Loading…

anthropic_solver.py

#1554 opened Sep 4, 2024 by iHuydang

Loading…

13 tasks done

Fix a bug in examples/mmlu.ipynb when using gpt-4o or gpt-4o-mini

#1551 opened Aug 25, 2024 by RobinWitch

Loading…

13 tasks done

Fix the is_chat_model function to work with gpt-4o

#1550 opened Aug 22, 2024 by LoryPack

Loading…

3 tasks done

Added Icelandic QA evaluation data from news texts

#1548 opened Aug 20, 2024 by thorunna

Loading…

12 of 13 tasks

Added Icelandic QA evaluation data from Wikipedia

#1547 opened Aug 20, 2024 by thorunna

Loading…

12 of 13 tasks

Updating make-me-say to be compatible with Solvers

#1546 opened Aug 18, 2024 by lennart-finke

Loading…

1 task done

Fix Information exposure alert through an exception #1543

#1545 opened Aug 8, 2024 by arpitjain099

Loading…

13 tasks done

Fix log injection error

#1544 opened Aug 8, 2024 by arpitjain099

Loading…

13 tasks done

Remove global OpenAI client initialization

#1539 opened Jul 21, 2024 by michaelAlvarino

Loading…

Fix Unit Test Failures in OpenAI, Anthropic, and Google Gemini Resolvers

#1537 opened Jun 24, 2024 by sakher

Loading…

Fix problematic sample in Schelling Point

#1534 opened May 22, 2024 by JunShern

Loading…

Update README: Add Langtrace as an Eval vendor

#1531 opened May 21, 2024 by karthikscale3

Loading…

5 of 13 tasks

Add support for gpt-4o

#1530 opened May 16, 2024 by androettop

Loading…

show evals in wandb weave

#1522 opened Apr 19, 2024 by yogeshg • Draft

13 tasks

Added Quran Eval & Simple Fact Model-Graded Definition

#1511 opened Apr 1, 2024 by sakher

Loading…

13 tasks done

Add Classification Rule Articulation Eval

#1510 opened Mar 30, 2024 by danesherbs

Loading…

13 tasks done

eval pattern-concat-logic

#1508 opened Mar 28, 2024 by natanaelwf

Loading…

13 tasks done

Fix specifying API arguments from the CLI

#1505 opened Mar 27, 2024 by LoryPack

Loading…

6 tasks done

[Evals] Add eval for Dhivehi diacritical marks

#1495 opened Mar 16, 2024 by aanaseer

Loading…

11 of 12 tasks

Add **kwargs to OpenAIChatCompletionFn

#1494 opened Mar 15, 2024 by ezraporter

Loading…

Extending to Azure OpenAI implementation

#1470 opened Feb 23, 2024 by pkt1583

Loading…

Adding Indian Women Menstrual Health Chatbot Eval

#1430 opened Dec 11, 2023 by cranberrydeveloper

Loading…

13 tasks done

Choose completion function for evaluation of modelgraded evals

#1418 opened Nov 17, 2023 by LoryPack

Loading…

6 tasks done

Previous 1 2 Next

Previous Next

ProTip! Adding no:label will show everything without a label.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly