What kinds of datasets do you provide?

Domain-focused text datasets (reviews, how-to, travel), multilingual corpora (JP/EN/KR), and safety/evaluation sets.

How do you license your data?

We support flat, subscription, and usage-based (per-token/call) models. Custom terms for research are available.

We follow GDPR/CCPA principles, remove personal data, honor robots.txt/ToS, and document provenance and author consent processes.

AI Data | CreamWorks Inc.

Ethical AI datasets and evaluation suites for safer, more capable models.

We provide domain-specific corpora, custom labeling, and red-team style evaluation data. Small, clean, and compliant by design.

Data Provenance
Documented sources • author signals • collection logs

Safety-First
PII scrubbing • policy filters • eval coverage

Multilingual
JP / EN (+KR roadmap)

Clean domain corpora for pretraining, SFT, RAG, and evaluation.

Human-in-the-loop annotation with clear rubrics.

Benchmarks aligned to practical use cases.

Name	Language	Size (approx.)	Schema	Primary Use
Consumer-Reviews-Beauty-JP	JA	~120K docs	{ product, claim, aspect, sentiment, source_meta }	SFT / Aspect-RAG / Sentiment
HowTo-Procedural-EN	EN	~80K steps	{ task, steps[], constraints, risk_flags }	Planning / Toolformer-style supervision
Travel-POI-Guides-JPEN	JA/EN	~45K entries	{ poi, geo, hours, tags[], narrative, tips }	RAG grounding / Multilingual eval

Numbers are indicative; final specs shared upon request.

Model	Best for	Notes
Flat License	One-time integration	Per-dataset fee; update packs optional
Subscription	Ongoing updates	Quarterly refresh; SLA for data health
Usage-Based	API/eval calls	Per-token / per-call metering for hosted access

For data access, partnerships, or any other inquiries,
please reach out through our contact form.

What license terms do you support?

Commercial licenses, research-only licenses, and custom terms with field-of-use restrictions.

Can you create safety evaluation sets for our policy?

Yes. We author adversarial prompts and rubric-scored outputs aligned to your safety policy and red-team goals.

Do you host the data?

We can deliver via S3/GCS or provide a hosted read-only API with access logs and metering.