Analysis for Crops
What a Million Rows of Crop Data Tells Us About Farming
I recently explored a fascinating dataset with 1,000,000 crop records, covering different regions, soil types, and growing conditions. Here’s what I found.
🌾 The Big Six Crops
The dataset is centered around six main crops: Maize, Rice, Barley, Wheat, Cotton, and Soybean. Each crop appears in large, almost equal proportions, giving us a balanced view of agricultural practices.
The most common crop in the dataset was Maize, slightly ahead of Rice and Barley.
💧 Rainfall and Water Needs
When looking at rainfall requirements, Rice came out on top, averaging about 550 mm of rainfall. Soybean and Wheat followed closely, while Maize required the least.
This lines up with what we know from real-world agriculture: Rice thrives in wetter conditions, while Maize is relatively more drought-tolerant.
🌡️ Temperature Preferences
Interestingly, all six crops had very similar average temperature needs — around 27.5°C. This suggests the dataset is modeled around warm, tropical-to-subtropical farming conditions where temperature doesn’t vary much between crops.
🌱 Soil Matters: Top Crops by Soil Type
Each soil type had its “favorite” crops:
Clay: Rice dominated, followed by Cotton and Barley.
Sandy: Wheat thrived the most.
Loam: Maize and Barley were top performers.
Peaty: Wheat and Barley led the pack.
Silt: Rice was the clear winner.
Chalky: Soybean took the top spot.
This reflects the natural strengths of different soils — for example, clay holding more water (perfect for Rice), and sandy soils favoring Wheat.
🌟 Key Takeaways
Rice is the thirstiest crop in this dataset.
Temperature wasn’t a major differentiator, hovering around 27°C for all crops.
Soil type has a strong influence on which crops dominate.
In short, this dataset shows that while temperature may not vary much in these records, water and soil play a crucial role in determining which crops succeed.
Here is a simple charting I did with the dataset
import pandas as pd
import matplotlib.pyplot as plt
# Load dataset (make sure it's attached in Kaggle)
df = pd.read_csv("/kaggle/input/agriculture-crop-yield/crop_yield.csv")
# --- Exploration ---
print("Shape:", df.shape)
print(df.head())
print("\nMissing values:\n", df.isnull().sum())
print("\nUnique crops:", df['Crop'].nunique())
print("Unique regions:", df['Region'].nunique())
print("Unique soil types:", df['Soil_Type'].nunique())
# --- Most common crops ---
crop_counts = df['Crop'].value_counts().head(10)
print("\nMost common crops in dataset:\n", crop_counts)
crop_counts.plot(kind="bar", figsize=(8,6))
plt.title("Most Common Crops in Dataset")
plt.xlabel("Crop")
plt.ylabel("Count")
plt.show()
# --- Average rainfall by crop ---
avg_rainfall = df.groupby("Crop")["Rainfall_mm"].mean().sort_values(ascending=False).head(10)
print("\nCrops with highest avg rainfall requirement:\n", avg_rainfall)
avg_rainfall.plot(kind="barh", figsize=(8,6))
plt.title("Top 10 Crops by Average Rainfall Requirement")
plt.xlabel("Rainfall (mm)")
plt.show()
# --- Average temperature by crop ---
avg_temp = df.groupby("Crop")["Temperature_Celsius"].mean().sort_values()
print("\nAverage temperature per crop:\n", avg_temp)
avg_temp.plot(kind="bar", figsize=(10,6))
plt.title("Average Growing Temperature by Crop")
plt.ylabel("Temperature (°C)")
plt.xticks(rotation=90)
plt.show()
# --- Crop preference by soil type ---
soil_crop = df.groupby("Soil_Type")["Crop"].value_counts().groupby(level=0).head(3)
print("\nTop 3 crops per soil type:\n", soil_crop)
Output:
Shape: (1000000, 10)
Region Soil_Type Crop Rainfall_mm Temperature_Celsius \
0 West Sandy Cotton 897.077239 27.676966
1 South Clay Rice 992.673282 18.026142
2 North Loam Barley 147.998025 29.794042
3 North Sandy Soybean 986.866331 16.644190
4 South Silt Wheat 730.379174 31.620687
Fertilizer_Used Irrigation_Used Weather_Condition Days_to_Harvest \
0 False True Cloudy 122
1 True True Rainy 140
2 False False Sunny 106
3 False True Rainy 146
4 True True Cloudy 110
Yield_tons_per_hectare
0 6.555816
1 8.527341
2 1.127443
3 6.517573
4 7.248251
Missing values:
Region 0
Soil_Type 0
Crop 0
Rainfall_mm 0
Temperature_Celsius 0
Fertilizer_Used 0
Irrigation_Used 0
Weather_Condition 0
Days_to_Harvest 0
Yield_tons_per_hectare 0
dtype: int64
Unique crops: 6
Unique regions: 4
Unique soil types: 6
Most common crops in dataset:
Crop
Maize 166824
Rice 166792
Barley 166777
Wheat 166673
Cotton 166585
Soybean 166349
Name: count, dtype: int64
Crops with highest avg rainfall requirement:
Crop
Rice 550.510286
Soybean 550.474299
Wheat 550.248678
Cotton 549.903414
Barley 549.561147
Maize 549.195094
Name: Rainfall_mm, dtype: float64
Average temperature per crop:
Crop
Maize 27.477555
Rice 27.498745
Barley 27.500890
Soybean 27.509328
Wheat 27.515932
Cotton 27.527394
Name: Temperature_Celsius, dtype: float64
Top 3 crops per soil type:
Soil_Type Crop
Chalky Soybean 28040
Maize 27885
Cotton 27817
Clay Rice 27960
Cotton 27734
Barley 27726
Loam Maize 27908
Barley 27896
Cotton 27804
Peaty Wheat 27908
Barley 27857
Maize 27819
Sandy Wheat 28069
Cotton 27955
Rice 27803
Silt Rice 27954
Wheat 27838
Barley 27805
Name: count, dtype: int64