格式的情况,可使用如下代码:
gen
date=date(v1,'YMD')
format date
%td
谢谢你回复啊!我被这个日期格式搞疯了要。我输入%td格式,为什么一转成月份数据就直接变样子了啊。例如
01nov1999在%td格式下面,但是一转成%tm的就变成3172m6。为什么会有这种问题出现啊,请你教教我吧!谢谢啦!
还有,如果将日期型数据转化为数值型或者字符型该怎么能保证显示跟之前的日期型数据是一致的。我用你提到的类似的方法将月份型数据2012022转化为字符型的,g
begin=string(year(yeartime)*10^2+month(yeartime,'%6.0f'),结果出来的是196109,不知道怎么回事啊!
你第一个问题,把日数据变成月度数据:
还是假设时间变量名为v1
gen ym=mofd(v1)
format ym %tm
如果变成年度数据;
gen Year=year(DateAnnounced)
如果变成季度数据:
gen
yq=qofd(DateAnnounced)
对于你第二个问题,保证所有日期数据格式一直,最好的方法是吧各种格式的(字符串的,数字的)转变为Stata日期格式。
http://www.stata.com/statalist/archive/2011-10/msg00329.html
Re: st: RE: dates
From
Nick Cox <<A
href='mailto:njcoxstata@gmail.com'>njcoxstata@gmail.com>
To
statalist@hsphsun2.harvard.edu
Subject
Re: st: RE:
dates
Date
Sun, 9 Oct 2011 14:37:00
+0100
As you say, and as I implied, 'month' here
is in the sense of Stata
monthly date, as returmed by -mofd()-,
not
month of year alone.
Nick
On Sun, Oct 9, 2011 at 1:48 PM, Steven Samuels <<A
href='mailto:sjsamuels@gmail.com'>sjsamuels@gmail.com>
wrote:
> The one-step solution is very neat! I wasn't even aware of
of -dofm- . But substituting the month for
-mofd- won't do because information on year is missing
In the original one-step formula, substitute
for 'visdate'
any date in the visit month,
e.g. the first:
>
> day(dofm(1 + mofd(mdy(vmonth,1,vyear)))-1)
>
>
> Steve
>
>
> On Oct 7, 2011, at 8:01 AM, Nick Cox wrote:
>
> In this particular problem you don't have a daily date, but
just use the month instead of the result of -mofd()-.
>
> Nick
>
n.j.cox@durham.ac.uk
>
> Nick Cox
>
> Suppose -visdate- as here is a daily date variable.
>
> Then the length of the current month is given by the last day
of the
> current month, which is given by the first day of the next
month less
> 1.
>
> day(dofm(1 + mofd(visdate)) - 1)
>
> In steps:
>
> 1. current month is mofd(visdate)
> 2. next month is 1 + mofd(visdate)
> 3. first day of next month is dofm(1 + mofd(visdate))
> 4. last day of this month is dofm(1 + mofd(visdate)) - 1
> 5. day of last day ... you got it long since.
>
> But I never remember most of the function names and always
have to
> look them up.
>
> It's key _never_ to type in rules about 28/31 or leap years,
because
> Stata already knows.
>
> Nick
>
> On Thu, Oct 6, 2011 at 11:55 PM, Steven Samuels <<A
href='mailto:sjsamuels@gmail.com'>sjsamuels@gmail.com>
wrote:
>> Oops! The original algorithm assigned days only from 1 to
15. The correction is below.
A better version
would assign days according to whether the
month has 28, 29, 30, or 31 days, but I'll leave that to
others.
>>
>>
>> Steve
>>
>>
>>
>> With enough missing dates
it might
be better to randomly assign a day of the month, or you risk
distorting the distribution of inter-visit intervals.
>>
>>
>>
>>
>> *********************************
>> clear
>> input str10 date
>> 200801
>> 20080113
>> end
>> set seed 21932
>> gen visdate = date(date, 'YMD')
>> tempvar day
>> gen str2 `day' = string(ceil(30*runiform())) if
length(date)==6
>> replace `day' = '0'+`day' if real(`day')<10
>> gen fakeday = (length(date)==6)
>> replace visdate = date(date + `day', 'YMD') if
length(date)==6
>> format visdate %td
>> list date visdate fakeday
>> *****************************
>>
>>
>>
>> On Oct 6, 2011, at 5:46 PM, Michael Eisenberg wrote:
>>
>> Thanks so much.
>>
>> On Thu, Oct 6, 2011 at 8:23 AM, Nick Cox <<A
href='mailto:n.j.cox@durham.ac.uk'>n.j.cox@durham.ac.uk>
wrote:
>>> You don't say what 'without success' means
precisely.
>>>
>>> '200801' does not match either date pattern. If there
is no information on day of month, Stata can only return missing
for a daily date.
>>>
>>> -date('200801' + '15', 'YMD')- seems to be the most
common fudge. I would always tag such guessed dates with an
indicator variable.
>>>
>>> Nick
>>>
n.j.cox@durham.ac.uk
>>>
>>> Michael Eisenberg
>>>
>>> I have a list of visit dates for patients.
Unfortunately, the format
>>> is not constant.
>>>
>>> Most are listed with the year, month, day such as
20080105 for Jan 5,
>>> 2008 but some are listed only with the year and month
200801 for Jan
>>> 2008.
>>>
>>> I attempted to convert them into stata dates with the
commands below
>>> without success.
>>>
>>> gen ndate = date(dx_date, 'YMD')
>>> or
>>> gen ndate = date(dx_date, 'CCYYNNDD')
>>>
>>> Can stata handle such inconsistent data?
代码示例:
clear
input str10 date
200801
20080113
end
set seed 21932
gen visdate = date(date, 'YMD')
tempvar day
gen str2 `day' = string(ceil(30*runiform())) if
length(date)==6
replace `day' = '0'+`day' if real(`day')<10
gen fakeday = (length(date)==6)
replace visdate = date(date + `day', 'YMD') if
length(date)==6
format visdate %td
list date visdate fakeday
gen ym=mofd(visdate)
format ym %tm
http://www.ssc.wisc.edu/sscc/pubs/stata_dates.htm
Working with Dates in Stata
Stata has many tools for working with dates. This article will
introduce you to some of the most useful and easy to use
features.
A Stata date is simply a number, but with the
%td
format
applied Stata will interpret that number as 'number of days since
January 1, 1960.' You can then use that number in a variety of
ways. Stata has similar tools that measure time in terms of
milliseconds, months, quarters, years and more. This article will
focus on days, but if you know how to work with days you can
quickly learn the others.
Often the first task is to convert the data you've been given into
official Stata dates.
Converting Strings to Dates
If you've been given a date in string form, such as 'November 3,
2010', '11/3/2010' or '2010-11-03 08:35:12' it can be converted
using the
date function. The date function
takes two arguments, the string to be converted, and a series of
letters called a 'mask' that tells Stata how the string is
structured. In a date mask,
Y means year,
M means month,
D means day and
# means an element should be
skipped.
Thus the mask
MDY means 'month, day, year' and
can be used to convert both 'November 3, 2010' and '11/3/2010'. A
date like '2010-11-03 08:35:12' requires the mask
YMD### so that the last three numbers
are skipped. If you are interested in tracking the time of day you
need to switch to the
clock function and the
%tc
format so
time is measured in milliseconds rather than days, but they are
very similar.
To see this in action, type (or copy and paste) the following into
Stata:
use http://www.ssc.wisc.edu/sscc/pubs/files/dates.dta
This is an example data set containing the above dates as
dateString1,
dateString2
and
dateString3. To
convert them to Stata dates do the following:
gen date1=date(dateString1,'MDY')
gen date2=date(dateString2,'MDY')
gen date3=date(dateString3,'YMD###')
Note that the mask goes in quotes.
Converting Numbers to Dates
Another common scenario gives you dates as three separate numeric
variables, one for the year, one for the month and one for the day.
The
year,
month
and
day variables in the example data
set contain the same date as the others but in this format. To
convert such dates to Stata dates, use the
mdy
function. It
takes three numeric arguments: the month, day and year to be
converted.
gen date4=mdy(month,day,year)
Formatting Date Variables
While the four date variables you've created are perfectly
functional dates as far as Stata is concerned, they're difficult
for humans to interpret. However, the
%td
format tells
Stata to print them out as human readable dates:
format date1 %td
format date2 %td
format date3 %td
format date4 %td
This turns the
18569
now stored in all
four variables into
03nov2010
(18,569 days since
January 1, 1960) in all output. Try a
list to see the result. If you
remember your varlist
syntax, you can do them all at once with:
format date? %td
You can have Stata output dates in different formats as well. For
instructions type
help dates
and then click on
the link Formatting date and time
values.
Using Dates
Often your goal in creating a Stata date will be to create a time
variable that can be included in a statistical command. If so, you
can probably use it with no further modification. However, there
are some common data preparation tasks involving dates.
Date Constants
If you need to refer to a particular date in your code, then in
principle you could refer to it by number. However, it's usually
more convenient to use the same functions used to import date
variables. For example, the following are all equivalent ways of
referring to November 3, 2010:
18569
date('November 3, 2010','MDY')
mdy(11,3,2010)
The
td pseudofunction was designed
for tasks like this and is somewhat more convenient to use. It
takes a single argument (which cannot be a variable name) and
converts it to a date on the assumption that the argument is a
string containing a date in the format day, month, year. This
matches the output of the
%td format, e.g.
3nov2010. Thus the
following is also equivalent:
td(3nov2010)
However, the following is not:
td(11/3/2010)
This will be interpreted as March 11, 2010, not November 3,
2010.
Extracting Date Components
Sometimes you need to pull out the components of a date. You can do
so with the
year,
month
and
day
functions:
gen year1=year(date1)
gen month1=month(date1)
gen day1=day(date1)
Before and After
Since dates are just numbers, before and after are equivalent to
less than and greater than. Thus:
gen before2010=(date1
gen after2010=(date1>date('January 1 2010','MDY'))
Durations and Intervals
Durations in days can be found using simple subtraction. The
example data set contains the dates beginning
and
ending, and you can
find out the duration of the interval between them with:
gen duration=ending-beginning
Durations in months are more difficult because months vary in
length. One common approach is to ignore days entirely and
calculate the duration solely from the year and month components of
the dates involved:
gen
durationInMonths=(year(ending)-year(beginning))*12+month(ending)-month(beginning)
Just keep in mind that this approach says January 31 and February 1
are one month apart, while January 1 and January 31 are zero months
apart.
Date Arithmetic
If you need to add (or subtract) a period measured in days to a
date, it is straightforward to do so. Just remember to format all
new date variables as dates with %td:
gen tenDaysLater=date1+10
gen yesterday=date1-1
format %td tenDaysLater yesterday
If the period is measured in weeks, just multiply by 7. Months are
again problematic since different months have different lengths.
Years have the same problem if you need to be precise enough to
care about leap years.
You can avoid this by building a new date based on the components
of the old one, modified as required. The only trick is that you
must handle year changes properly. For example, the following works
properly:
gen oneMonthLater=mdy(month(date1)+1,day(date1),year(date1))
format %td oneMonthLater
oneMonthLater is now December 3, 2010. But
the following does not:
gen
twoMonthsLaterBad=mdy(month(date1)+2,day(date1),year(date1))
format %td twoMonthsLaterBad
This tries to set the month component of the new date to 13, which
is invalid. It needs to be January of the next year instead. The
following code will do allow you to add or subtract any number of
months (just change the final number in the first line and the name
of the new variable):
gen newMonth=month(date1)+2
gen newYear=year(date1)+floor((newMonth-1)/12)
replace newMonth=mod((newMonth-1),12)+1
gen twoMonthsLater=mdy(newMonth,day(date1),newYear)
format %td twoMonthsLater
drop newMonth newYear
If you need to do such things frequently you might want to turn
this bit of code into a program, or even an ado file.
Learning More
To read the full documentation on Stata dates, type
help
dates and
then click on thedates and times
link at the top (the PDF
documentation is much easier to read in this case). There you'll
learn to:
- Work with times
- Use intervals other than days, such as months, quarters or
years
- Create your own date format for output (e.g. November 3rd,
2010 rather
than3nov2010)
- Track leap seconds, in case you need to be extremely
precise--you'll also find an explanation of why such things
exist
Last Revised: 11/9/2010
|
http://dss.princeton.edu/online_help/stats_packages/stata/time_series_data.htm
Time Series Data in Stata
Time series data and tsset
To use Stata's time-series functions and analyses, you must first
make sure that your data are, indeed, time-series. First, you must
have a date variable that is in Stata date format. Secondly, you
must make sure that your data are sorted by this date variable. If
you have panel data, then your data must be sorted by the date
variable within the variable that identifies the panel. Finally,
you must use the
tsset
command to tell Stata that your
data are time-series:
sort datevar tsset datevar
or
sort panelvar datevar tsset panelvar datevar
The first example tells Stata that you have simple time-series
data, and the second tells Stata that you have panel data.
Stata Date Format
Stata stores dates as the number of elapsed days since January 1,
1960. There are different ways to create elapsed Stata dates that
depend on how dates are represented in your data. If your original
dataset already contains a single date variable, then use the
date() function or one of the other string-date commands. If you
have separate variables storing different parts of the date (month,
day and year; year and quarter, etc.) then you will need to use the
partial date variable functions.
Date functions for a single string date variable
Sometimes, your data will have the dates in string format. (A
string variable is simply a variable containing anything other than
just numbers.) Stata provides a way to convert these to time-series
dates. The first thing you need to know is that the string must be
easily separated into its components. In other words, strings like
'01feb1990' 'February 1, 1990' '02/01/90' are acceptable, but
'020190' is not.
For example, let's say that you have a string variable 'sdate' with
values like '01feb1990' and you need to convert it to a daily
time-series date:
gen daily=date(sdate,'DMY')
Note that in this function, as with the other functions to convert
strings to time-series dates, the 'DMY' portion indicates the order
of the day, month and year in the variable. Had the values been
coded as 'February 1, 1990' we would have used 'MDY' instead. What
if the original date only has two digits for the year? Then we
would use:
gen daily=date(sdate,'DM19Y')
Whenever you have two digit years, simply place the century before
the 'Y.' If you have the last two digit years mixed, such as 1/2/98
and 1/2/00, use:
gen daily=date(sdate,'DMY',2020)
where 2020 is the largest year you have in your data set. Here are
the other functions:
weekly(stringvar,'wy')
monthly(stringvar,'my')
quarterly(stringvar,'qy')
halfyearly(stringvar,'hy')
yearly(stringvar,'y')
Note: Stata 10 uses upper case letters as DMY whereas earlier
version of Stata uses lower case, dmy.
Date functions for partial date variables
Often you will have separate variables for the various components
of the date; you need to put them together before you can designate
them as proper time-series dates. Stata provides an easy way to do
this with numeric variables. If you have separate variables for
month, day and year then use the mdy() function to create an
elapsed date variable. Once you have created an elapsed date
variable, you will probably want to format it, as described
below.
Use the mdy() function to create an elapsed Stata date variable
when your original data contains separate variables for month, day
and year. The month, day and year variables must be numeric. For
example, suppose you are working with these data:
| month |
day |
year |
| 7 |
11 |
1948 |
| 1 |
21 |
1952 |
| 11 |
2 |
1994 |
| 8 |
12 |
1993 |
Use the following Stata command to generate a new variable named
mydate:
gen mydate = mdy(month,day,year)
where mydate is an elapsed date varible, mdy() is the Stata
function, and month, day, and year are the names of the variables
that contain data for month, day and year, respectively.
If you have two variables, 'year' and 'quarter' use the 'yq()'
function:
gen qtr=yq(year,quarter) gen qtr=yq(1990,3)
The other functions are:
| mdy(month,day,year) |
for daily data |
| yw(year, week) |
for weekly data |
| ym(year,month) |
for monthly data |
| yq(year,quarter) |
for quarterly data |
| yh(year,half-year) |
for half-yearly data |
Converting a date variable stored as a single number
If you have a date variable where the date is stored as a single
number of the form yyyymmdd (for example, 20041231 for December 31,
2004) the following set of functions will convert it into a Stata
elapsed date.
gen year = int(date/10000)
gen month = int((date-year*10000)/100)
gen day = int((date-year*10000-month*100))
gen mydate = mdy(month,day,year)
format mydate %d
Time series date formats
Use the format command to display elapsed Stata dates as calendar
dates. In the example given above, the elapsed date variable,
mydate, has the following values, which represent the number of
days before or after January 1, 1960.
| month |
day |
year |
mydate |
| 7 |
11 |
1948 |
-4191 |
| 1 |
21 |
1952 |
-2902 |
| 8 |
12 |
1993 |
12277 |
| 11 |
2 |
1994 |
12724 |
You can use the format command to display elapsed dates in a more
customary way. For example:
format mydate %d
where mydate is an elapsed date variable and %d is the format which
will be used to display values for that variable.
| month |
day |
year |
mydate |
| 7 |
11 |
1948 |
11jul48 |
| 1 |
21 |
1952 |
21jan52 |
| 8 |
12 |
1993 |
12aug93 |
| 11 |
2 |
1994 |
02nov94 |
Other formats are available to control the display of elapsed
dates.
Time-series dates in Stata have their own formats similar to
regular date formats. The main difference is that for a regular
date format a 'unit' or single 'time period' is one day. For time
series formats, a unit or single time period can be a day, week,
month, quarter, half-year or year. There is a format for each of
these time periods:
| Format |
Description |
Beginning |
+1 Unit |
+2 Units |
+3 Units |
| %td |
daily |
01jan1960 |
02jan1960 |
03Jan1960 |
04Jan1960 |
| %tw |
weekly |
week 1, 1960 |
week 2, 1960 |
week 3, 1960 |
week 4, 1960 |
| %tm |
monthly |
Jan, 1960 |
Feb, 1960 |
Mar, 1960 |
Apr, 1960 |
| %tq |
quarterly |
1st qtr, 1960 |
2nd qtr, 1960 |
3rd qtr, 1960 |
4th qtr, 1961 |
| %th |
half-yearly |
1st half, 1960 |
2nd half, 1960 |
1st half, 1961 |
2nd half, 1961 |
| %ty |
yearly |
1960 |
1961 |
1962 |
1963 |
You should note that in the weekly format, the year is divided into
52 weeks. The first week is defined as the first seven days,
regardless of what day of the week it may be. Also, the last week,
week 52, may have 8 or 9 days. For the quarterly format, the first
quarter is January through March. For the half-yearly format, the
first half of the year is January through June.
It's even more important to note that you cannot jump from one
format to another by simply re-issuing the format command because
the units are different in each format. Here are the corresponding
results for January 1, 1999, which is an elapsed date of
14245:
| %td |
%tw |
%tq |
%th |
%ty |
| 01jan1999 |
2233w50 |
5521q2 |
9082h2 |
|
These dates are so different because the elapsed date is actually
the number of weeks, quarters, etc., from the first week, quarter,
etc of 1960. The value for %ty is missing because it would be equal
to the year 14,245 which is beyond what Stata can accept.
Any of these time units can be translated to any of the others.
Stata provides functions to translate any time unit to and from %td
daily units, so all that is needed is to combine these
functions.
These functions translate to %td dates:
| dofw() |
weekly to daily |
| dofm() |
monthly to daily |
| dofq() |
quarterly to daily |
| dofy() |
yearly to daily |
These functions translate from %td dates:
| wofd() |
daily to weekly |
| mofd() |
daily to monthly |
| qofd() |
daily to quarterly |
| yofd() |
daily to yearly |
For more information see the Stata User's Guide, chapter 27.
Specifying dates
Often we need to consuct a particular analysis only on observations
that fall on a certain date. To do this, we have to use something
called a date literal. A date literal is simply a way of entering a
date in words and have Stata automatically convert it to an elapsed
date. As with the d() literal to specify a regular date, there are
the w(), m(), q(), h(), and y() literals for entering weekly,
monthly, quarterly, half-yearly, and yearly dates, respectively.
Here are some examples:
reg x y if w(1995w9) sum income if q(1988-3) tab gender if
y(1999)
If you want to specify a range of dates, you can use the tin() and
twithin() functions:
reg y x if tin(01feb1990,01jun1990) sum income if
twithin(1988-3,1998-3)
The difference between tin() and twithin() is that tin() includes
the beginning and end dates, whereas twithin() excludes them.
Always enter the beginning date first, and write them out as you
would for any of the d(), w(), etc. functions.
Time Series Variable Lists
Often in time-series analyses we need to 'lag' or 'lead' the values
of a variable from one observation to the next. If we have many
variables, this can be cumbersome, especially if we need to lag a
variable more than once. In Stata, we can specify which variables
are to be lagged and how many times without having to create new
variables, thus saving alot of disk space and memory. You should
note that the tsset command must have been issued before any of the
'tricks' in this section will work. Also, if you have defined your
data as panel data, Stata will automatically re-start the
calculations as it comes to the beginning of a panel so you need
not worry about values from one panel being carried over to the
next.
L.varname and F.varname
If you need to lag or lead a variable for an analysis, you can do
so by using the L.varname (to lag) and F.varname (to lead). Both
work the same way, so we'll just show some examples with L.varname.
Let's say you want to regress this year's income on last year's
income:
reg income L.income
would accomplish this. The 'L.' tells Stata to lag income by one
time period. If you wanted to lag income by more than one time
period, you would simply change the L. to something like 'L2.' or
'L3.' to lag it by 2 and 3 time periods, respectively. The
following two commands will produce the same results:
reg income L.income L2.income L3.income
reg income L(1/3).income
D.varname
Another useful shortcut is D.varname, which takes the difference of
income in time 1 and income in time 2. For example, let's say a
person earned $20 yesterday and $30 today.
| Date |
income |
D.income |
D2.income |
| 02feb1999 |
20 |
. |
. |
| 02mar1999 |
30 |
10 |
. |
| 02apr1999 |
45 |
15 |
5 |
So, you can see that D.=(income-income
t-1) and
D2=(income-income
t-1)-(income
t-1-income
t-2)
S.varname
S.varname refers to seasonal differences and works like D.varname,
except that the difference is always taken from the current
observation to the n
thobservation:
| Date |
income |
S.income |
S2.income |
| 02feb1999 |
20 |
. |
. |
| 02mar1999 |
30 |
10 |
. |
| 02apr1999 |
45 |
15 |
25 |
In other words: S.income=income-income
t-1 and
S2.income=income-incomet-2
For more on lags, leads, differences and seasonal check the
Time series 101
guide