ai-data

AI Data | CreamWorks Inc.

AI Data by CreamWorks Inc.

Ethical AI datasets and evaluation suites for safer, more capable models.

We provide domain-specific corpora, custom labeling, and red-team style evaluation data. Small, clean, and compliant by design.

Data Provenance
Documented sources • author signals • collection logs
Safety-First
PII scrubbing • policy filters • eval coverage
Multilingual
JP / EN (+KR roadmap)
Request Access

Offerings

1) Curated Datasets Licensable

Clean domain corpora for pretraining, SFT, RAG, and evaluation.

  • Consumer reviews (beauty/health) — balanced & de-duplicated
  • How-to & troubleshooting knowledge — stepwise, procedural
  • Travel & local guides — structured POI attributes

2) Custom Labeling

Human-in-the-loop annotation with clear rubrics.

  • Safety labels (toxicity, bias, harm categories)
  • Intent, entities, aspects, sentiment
  • Evaluation item authoring (MCQ, rubric-graded freeform)

3) Evaluation Suites

Benchmarks aligned to practical use cases.

  • Factuality & retrieval grounding (RAG)
  • Actionability & stepwise reasoning
  • Safety guardrail adversarials (red-team prompts)

Example Datasets (Snapshot)

NameLanguageSize (approx.)SchemaPrimary Use
Consumer-Reviews-Beauty-JP JA ~120K docs { product, claim, aspect, sentiment, source_meta } SFT / Aspect-RAG / Sentiment
HowTo-Procedural-EN EN ~80K steps { task, steps[], constraints, risk_flags } Planning / Toolformer-style supervision
Travel-POI-Guides-JPEN JA/EN ~45K entries { poi, geo, hours, tags[], narrative, tips } RAG grounding / Multilingual eval

Numbers are indicative; final specs shared upon request.

Compliance & Ethics

  • GDPR/CCPA-aligned collection and processing principles
  • PII removal, sensitive attribute minimization, and blocklists
  • Robots.txt & website ToS respected; opt-out honored
  • Provenance & consent documentation available under NDA
  • Research & safety use accommodations for accredited institutions

Licensing Models

ModelBest forNotes
Flat License One-time integration Per-dataset fee; update packs optional
Subscription Ongoing updates Quarterly refresh; SLA for data health
Usage-Based API/eval calls Per-token / per-call metering for hosted access

How to Engage

  1. Tell us your use case (pretrain, SFT, eval, RAG, safety).
  2. Choose licensing (flat / subscription / usage-based).
  3. We share specs & sample; you review quality and fit.
  4. Sign terms → deliver via S3/GCS or hosted endpoint.

Interested in working with us?

For data access, partnerships, or any other inquiries,
please reach out through our contact form.

Contact Us

FAQ

What license terms do you support?

Commercial licenses, research-only licenses, and custom terms with field-of-use restrictions.

Can you create safety evaluation sets for our policy?

Yes. We author adversarial prompts and rubric-scored outputs aligned to your safety policy and red-team goals.

Do you host the data?

We can deliver via S3/GCS or provide a hosted read-only API with access logs and metering.

© 2025 CreamWorks Inc. All rights reserved.