AI Document Analysis Tools Detailed Evaluation Results

AI Document Analysis Tools Detailed Evaluation Results

Study Overview: Detailed comparative evaluation of response accuracy when uploading identical large-scale documents to various AI chat tools and asking the same questions. Comprehensively recorded accuracy scores, wait times, models used, error conditions, etc. Errors and upload failures are calculated as 0 points. Both Japanese and English versions were tested for each file size, verifying practicality from multiple angles.

1MB File Detailed Evaluation Results

Japanese version: 570K characters, 340K tokens | English version: 1.01M characters, 290K tokens
Language App Name Model Used Wait Time Accuracy Score Notes
Japanese chatman gpt-4.1 (500K tokens) 18sFast 90 Large-scale search excluded
ChatGPT gpt-5.1 49sNormal 100 -
Gemini 3.0 Pro 28sNormal 100 -
Claude Opus 4.5 47sNormal 100 -
DeepSeek Fixed 111sSlow 0 Evaluation reason: Inaccurate answer
Perplexity Sonar (Fast model) 44sNormal 40 -
docAnalyzer.AI gpt-4.1-mini 23sFast 65 -
FileGPT gpt-4 Error Error Upload error
notebook-lm Fixed 5sFast 0 Answered that no information was found
Unriddle gpt-4.1 13sFast 5 Evaluation reason: Inaccurate answer
English chatman gpt-4.1 (500K tokens) 23sFast 85 Large-scale search excluded
ChatGPT gpt-5.1 41sNormal 100 -
Gemini 3.0 Pro 23sFast 100 -
Claude Opus 4.5 51sNormal 100 -
DeepSeek Fixed Error Error Upload error
Perplexity Sonar (Fast model) 29sNormal 0 Evaluation reason: Inaccurate answer
docAnalyzer.AI gpt-4.1-mini 16sFast 70 -
FileGPT gpt-4 Error Error Upload error
notebook-lm Fixed 12sFast 98 -
Unriddle gpt-4.1 9sFast 15 Evaluation reason: Inaccurate content